List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:May 28 2009 3:17pm
Subject:Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)
View as plain text  
Hello Marc,

Marc Alff a écrit, Le 26.05.2009 23:53:
> Hi Guilhem
> 
> Guilhem Bichot wrote:
>> Bonjour Marc,
>>
>> Marc Alff a écrit, Le 21.05.2009 21:42:
>>> #At file:///home/malff/BZR-TREE/mysql-6.0-perfschema/ based on
>>> revid:marc.alff@stripped
>>>
>>>  3150 Marc Alff    2009-05-21
>>>       Implemented code review comments, continued.
>>>             Reworked all the thread instrumentation:
>>>       - reverted the THD::store_global(bool) change implemented earlier,
>>>       - removed the init_and_bind_psi_thread calls
>>>             Instead, instrumented threads by intercepting the
>>> pthread_create()
>>>       call with an instrumented mysql_thread_create().

>>> === modified file 'include/mysql/psi/psi.h'
>>> --- a/include/mysql/psi/psi.h    2009-05-13 16:44:00 +0000
>>> +++ b/include/mysql/psi/psi.h    2009-05-21 19:42:38 +0000

>>>  /**
>>>    Create instrumentation for a thread.
>>>    @param key the registered key
>>> -  @param identity the address of the rwlock itself
>>> +  @param identity an address typical of the thread
>> Have you considered using, as "identity", what's returned by
>> pthread_self() (it's in THD::real_id)?
> 
> With thread pools, the instrumented thread is created without a real
> thread, so pthread_self() can not be used.

Ah, indeed.

> Note that the identity pointer is not provided by the caller of
> mysql_thread_create(), so the code writer (who instruments something)
> does not have to worry about that.

Well, let's look at this example from scheduler.cc:
   for (i= 0; i < thread_pool_size; i++)
   {
     pthread_t thread;
     int error;
     if ((error= mysql_thread_create(key_thread_libevent,
                                     &thread, &connection_attrib,
                                     libevent_thread_proc, 0)))
Here, spawn_thread_v1 is going to use &thread as "identity" for the
create_thread() call. "thread" is a local variable in the loop, it has
chances to have the same address in all iterations of the loop (it is
the case on my Linux build). So we'll have several threads with the same
identity, which defeats the purpose of "identity" if I understood correctly.
If the code writer wants to avoid that, she/he has to provide something
else than 0 as the last parameter of mysql_thread_create(), so has to
care about this.
How about just putting in the comment of mysql_thread_create() something
like this:
"P4 (or if it's NULL, P1) will be (later?) used as identity of the
thread in Performance Schema tables".

  > We might have pthread_self() somewhere in the
performance_schema.THREADS table, later.

Ok, later, maybe.

>>> === modified file 'sql/event_scheduler.cc'
>>> --- a/sql/event_scheduler.cc    2009-05-12 03:15:02 +0000
>>> +++ b/sql/event_scheduler.cc    2009-05-21 19:42:38 +0000
>>> @@ -228,10 +228,7 @@ event_scheduler_thread(void *arg)
>>>  
>>>    thd->thread_stack= (char *)&thd;              // remember where
> our
>>> stack is
>>>  
>>> -#ifdef HAVE_PSI_INTERFACE
>>> -  thd->m_psi=
>>> init_and_bind_psi_thread_with_id(key_thread_event_scheduler,
>>> -                                               thd, thd->thread_id);
>>> -#endif
>>> +  mysql_thread_set_id(thd->thread_id);
>> Why not move the call to mysql_thread_set_id() into store_globals()?
>> Currently those two calls, mysql_thread_set_id() and store_globals(),
>> happen very close to each other; I have the impression that by doing the
>> move, all mysql_thread_set_id() calls (all: in sql/, sql/backup,
>> storage/myisam, storage/maria) would be gone. It also feels natural;
>> store_globals() is expected to do whatever is needed to get a functional
>> "just works"-THD. It would reduce the number of calls (everything would
>> happen in the existing store_globals() calls); there are already so many
>> "boilerplate" operations that we copy-paste every time we want to write
>> code to create a new utility thread (MyISAM backup driver, online backup
>> kernel, NDB binlog, replication, event scheduler...). Last, it would
>> reduce the presence of P_S calls in non-P_S code (like of events,
>> backup, engines), which generally makes things simpler to understand
>> (when one is reading code of feeature X it's better if she/he does not
>> meet code bits of feature Y, makes less to parse).
>>
> 
> Every thread needs to be instrumented, and having the instrumentation in
> the start_routine function for this thread seems natural to me.

But adding a P_S function which must be called in every thread function,
seems like a burden to me.
If mysql_thread_set_id(thd->thread_id) was done by THD::store_globals():
- every thread having a THD wouldn't need an explicit hard-coded call to
mysql_thread_set_id()
- those few threads which don't have a THD would still need such
hard-coded call in their thread function, yes (there are no such threads
in the patch right now).

mysql_thread_set_id(thd->thread_id) is about setting some
posix-thread-specific data by using some data from THD.
This existing call: "my_pthread_setspecific_ptr(THR_THD,  this)"
is also about setting some posix-thread-specific data (THR_THD key) by
using some data from THD ("this" is the THD pointer here), and it
happens in... THD::store_globals().

> Instrumented threads do not have to have an associated THD.

I was unclear, I meant the calls to
mysql_thread_set_id(*thd->thread_id*) only.
Those without a THD need an explicit call for sure.

>>> === modified file 'sql/ha_ndbcluster.cc'
>>> --- a/sql/ha_ndbcluster.cc    2009-05-12 03:15:02 +0000
>>> +++ b/sql/ha_ndbcluster.cc    2009-05-21 19:42:38 +0000
>>> @@ -10166,10 +10167,7 @@ pthread_handler_t ndb_util_thread_func(v
>>>    }
>>>  
>>>  /* FIXME: waiting for Bug#40899 */
>>> -#ifdef HAVE_PSI_INTERFACE
>>> -  thd->m_psi= init_and_bind_psi_thread_with_id(key_thread_ndb_util,
>>> -                                               thd, thd->thread_id);
>>> -#endif
>>> +  mysql_thread_set_id(thd->thread_id);
>> Actually, thd->thread_id is 0 here so the call does nothing (m_thread_id
>> was already 0 before it).

How about this one?

>>> === modified file 'sql/mysql_priv.h'
>>> --- a/sql/mysql_priv.h    2009-05-18 15:36:08 +0000
>>> +++ b/sql/mysql_priv.h    2009-05-21 19:42:38 +0000
>>> @@ -2248,36 +2248,6 @@ extern PSI_file_key key_file_global_ddl_
>>>    Initialise all the performance schema instrumentation points used
>>> by the server.
>>>  */
>>>  void init_server_psi_keys();
>>> -
>>> -/**
>>> -  Create an instrumented thread.
>>> -  @param key the instrumentation thread key
>>> -  @param identity the thread identity
>>> -  @param id the thread identifier
>>> -  @return an instrumented thread, or NULL.
>>> -*/
>>> -PSI_thread* init_psi_thread_with_id(PSI_thread_key key,
>>> -                                    void* identity,
>>> -                                    ulong id);
>> By the way I like the new way much better than the previous one using
>> store_globals(real_thread). Thanks!
> 
> Not to mention, now the parent/child relationship between instrumented
> threads can be properly captured, for performance_schema.THREADS later.

Very very good.

>>> === modified file 'sql/scheduler.cc'
>>> --- a/sql/scheduler.cc    2009-05-16 00:41:08 +0000
>>> +++ b/sql/scheduler.cc    2009-05-21 19:42:38 +0000
>>> @@ -513,11 +516,7 @@ static void libevent_add_connection(THD
>>>      DBUG_VOID_RETURN;
>>>    }
>>>  
>>> -#ifdef HAVE_PSI_INTERFACE
>>> -  thd->m_psi= init_psi_thread_with_id(key_thread_one_connection, thd,
>>> thd->thread_id);
>>> -#else
>>> -  thd->m_psi= NULL;
>>> -#endif
>>> +  mysql_thread_prepare(key_thread_one_connection, thd,
>>> thd->thread_id, & thd->m_psi);
>> All calls to this function are:
>> mysql_thread_prepare(x, y, z, & thd->m_psi);
>> So, if P_S is not compiled in, they expand to
>> thd->m_psi= 0;
>> - I still don't understand why there has to be a m_psi in THD if P_S is
>> not compiled in (maybe I'll know when I read your previous replies)
> 
> See the comments in thd_scheduler::m_psi

The comment says that this is "because doing so will change the binary
layout of THD, which is exposed to plugin code that may be compiled
differently.".
That is, it says that THD is part of the plugin API.
However Serg wrote, in a previous email:
http://lists.mysql.com/internals/36526
"Marc is right, no ifdefs in the interface structures.
THD is not part of the interface, though.
handler is."
So I don't understand well.
However, it's not a very important issue to me.

>>> === modified file 'sql/sql_parse.cc'
>>> --- a/sql/sql_parse.cc    2009-05-18 15:36:08 +0000
>>> +++ b/sql/sql_parse.cc    2009-05-21 19:42:38 +0000
>>> @@ -461,10 +461,7 @@ pthread_handler_t handle_bootstrap(void
>>>  {
>>>    THD *thd= (THD*) arg;
>>>  
>>> -#ifdef HAVE_PSI_INTERFACE
>>> -  thd->m_psi= init_and_bind_psi_thread_with_id(key_thread_bootstrap,
>>> -                                               thd, thd->thread_id);
>>> -#endif
>>> +  mysql_thread_set_id(thd->thread_id);
>> So here it feels like a misleading (too vague) function name. The line
>> above suggests that this sets the thread's id. There are several thread
>> ids (THD::thread_id, the one of P_S), the function's name does not
>> suggest that this is about the P_S one. By default, when reading
>> sql_parse, one may even guess wrongly (being more used to thd->thread_id
>> than to P_S').
> 
> Well, all the instrumentation API have been named:
> - in lowercase
> - mysql_<object>_<operation>
> as for example mysql_mutex_lock, mysql_file_delete, etc.
> 
> This naming scheme is consistent, and has been changed already to
> address previous comments.

Yes.

> As naming for other things are not always consistent, there will always
> be 1 exception somewhere where 1 instrumentation API has a name close to
> another API, no matter what naming convention is used.
> 
> No change.

Ah.
So, mysql_thread_set_id(), in sql/ code, means "set the P_S-specific
thread id", and not set "THD::thread_id" or "THD::real_id" as the
function's name could suggest to a user more versed in sql/, where
THD::thread_id is all around, than in P_S.

I understand the desire to keep a uniform naming convention, it's good;
however in case of conflict like here, it's not good to introduce
confusion for the sake of uniformity. Just like when there was a
conflict between mysql_file.h::mysql_close() and the client
mysql_close(), the former was rather named mysql_file_close().

Here is another, more "fundamental" (big word for a naming issue) reason
why the name is misleading. mysql_thread_*(), to any reader, is a
wrapper around the OS threads API; it manipulates OS threads.
mysql_thread_create() creates an OS thread (and does some additional
P_S-specific stuff). mysql_thread_set_id() will be understood as "set
the ID of an OS thread". The ID of an OS thread is by definition
pthread_self() ("DESCRIPTION The pthread_self() function shall return
the thread ID of the calling thread." say the man pages). So
mysql_thread_set_id() is a misleading name because it does not set the
OS thread's ID (which is pthread_self()), as users of threads would
expect; it rather sets a P_S-specific ID. It usurps its name because
this name should be reserved to setting what is the OS thread ID
(pthread_self()). Just like, by the way, THD::thread_id usurps its name.

That's why I would name it mysql_thread_set_psi_id(); the ID which is
set is not the thread's ID (pthread_self()) it's a P_S-specific ID which
thus be mentioned in the name. I don't find that this violates the
naming convention at all. And I don't find that the problem is only due
to a bad naming convention in existing code either.

If you still disagree, let's keep this as a "disagreement item" for
later processing.

>>>  
>>>    do_handle_bootstrap(thd);
>>>    return 0;
>>> === modified file 'storage/perfschema/pfs.cc'
>>> --- a/storage/perfschema/pfs.cc    2009-05-19 15:56:07 +0000
>>> +++ b/storage/perfschema/pfs.cc    2009-05-21 19:42:38 +0000
>>> @@ -977,15 +977,65 @@ static void create_file_v1(PSI_file_key
>>>      file_lost++;
>>>  }
>>>  
>>> +struct PFS_spawn_thread_arg
>>> +{
>>> +  PFS_thread *m_parent_thread;
>>> +  PSI_thread_key m_child_key;
>>> +  const void *m_child_identity;
>>> +  void *(*m_user_start_routine)(void*);
>> Use type pthread_handler_t here.
> 
> I don't see pthread_handler_t on my platform, nor a pthread_handler.
> This comes from the prototype of pthread_create().

pthread_handler_t is defined in my_pthread.h. It's used like here:
pthread_handler_t handle_slave_io(void *arg)
{

>>> +  if (psi_arg == NULL)
>>> +    return EAGAIN;
>> As this is very unlikely, I'd use unlikely().

How about this one?

  > Added comments, and moved m_psi in thd_scheduler.
> 
> Thanks for the analysis, I agree thd_scheduler::m_psi is much better.

Eh, thanks for having done the changes (and so fast).

-- 
Mr. Guilhem Bichot <guilhem@stripped>
Sun Microsystems / MySQL, Lead Software Engineer
Bordeaux, France
www.sun.com / www.mysql.com



Thread
bzr commit into mysql-6.0-perfschema branch (marc.alff:3150) Marc Alff21 May
  • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Guilhem Bichot26 May
    • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Marc Alff26 May
      • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Guilhem Bichot28 May
        • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Marc Alff1 Jul
          • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Guilhem Bichot10 Jul
            • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Marc Alff10 Jul
              • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Guilhem Bichot27 Jul
                • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Marc Alff28 Jul
                  • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Guilhem Bichot28 Jul