MySQL Lists are EOL. Please join:

List:Commits« Previous MessageNext Message »
From:Marc Alff Date:July 1 2009 5:38pm
Subject:Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)
View as plain text  
Hi Guilhem

Guilhem Bichot wrote:
> Hello Marc,
> 
> Marc Alff a écrit, Le 26.05.2009 23:53:
>> Hi Guilhem
>>
>> Guilhem Bichot wrote:
>>> Bonjour Marc,
>>>
>>> Marc Alff a écrit, Le 21.05.2009 21:42:
>>>> #At file:///home/malff/BZR-TREE/mysql-6.0-perfschema/ based on
>>>> revid:marc.alff@stripped
>>>>
>>>>  3150 Marc Alff    2009-05-21
>>>>       Implemented code review comments, continued.
>>>>             Reworked all the thread instrumentation:
>>>>       - reverted the THD::store_global(bool) change implemented
>>>> earlier,
>>>>       - removed the init_and_bind_psi_thread calls
>>>>             Instead, instrumented threads by intercepting the
>>>> pthread_create()
>>>>       call with an instrumented mysql_thread_create().
> 
>>>> === modified file 'include/mysql/psi/psi.h'
>>>> --- a/include/mysql/psi/psi.h    2009-05-13 16:44:00 +0000
>>>> +++ b/include/mysql/psi/psi.h    2009-05-21 19:42:38 +0000
> 
>>>>  /**
>>>>    Create instrumentation for a thread.
>>>>    @param key the registered key
>>>> -  @param identity the address of the rwlock itself
>>>> +  @param identity an address typical of the thread
>>> Have you considered using, as "identity", what's returned by
>>> pthread_self() (it's in THD::real_id)?
>>
>> With thread pools, the instrumented thread is created without a real
>> thread, so pthread_self() can not be used.
> 
> Ah, indeed.
> 
>> Note that the identity pointer is not provided by the caller of
>> mysql_thread_create(), so the code writer (who instruments something)
>> does not have to worry about that.
> 
> Well, let's look at this example from scheduler.cc:
>   for (i= 0; i < thread_pool_size; i++)
>   {
>     pthread_t thread;
>     int error;
>     if ((error= mysql_thread_create(key_thread_libevent,
>                                     &thread, &connection_attrib,
>                                     libevent_thread_proc, 0)))
> Here, spawn_thread_v1 is going to use &thread as "identity" for the
> create_thread() call. "thread" is a local variable in the loop, it has
> chances to have the same address in all iterations of the loop (it is
> the case on my Linux build). So we'll have several threads with the same
> identity, which defeats the purpose of "identity" if I understood
> correctly.
> If the code writer wants to avoid that, she/he has to provide something
> else than 0 as the last parameter of mysql_thread_create(), so has to
> care about this.
> How about just putting in the comment of mysql_thread_create() something
> like this:
> "P4 (or if it's NULL, P1) will be (later?) used as identity of the
> thread in Performance Schema tables".
> 

Added comments in mysql_thread_create().


>  > We might have pthread_self() somewhere in the
> performance_schema.THREADS table, later.
> 
> Ok, later, maybe.
> 
>>>> === modified file 'sql/event_scheduler.cc'
>>>> --- a/sql/event_scheduler.cc    2009-05-12 03:15:02 +0000
>>>> +++ b/sql/event_scheduler.cc    2009-05-21 19:42:38 +0000
>>>> @@ -228,10 +228,7 @@ event_scheduler_thread(void *arg)
>>>>  
>>>>    thd->thread_stack= (char *)&thd;              // remember where
> our
>>>> stack is
>>>>  
>>>> -#ifdef HAVE_PSI_INTERFACE
>>>> -  thd->m_psi=
>>>> init_and_bind_psi_thread_with_id(key_thread_event_scheduler,
>>>> -                                               thd, thd->thread_id);
>>>> -#endif
>>>> +  mysql_thread_set_id(thd->thread_id);
>>> Why not move the call to mysql_thread_set_id() into store_globals()?
>>> Currently those two calls, mysql_thread_set_id() and store_globals(),
>>> happen very close to each other; I have the impression that by doing the
>>> move, all mysql_thread_set_id() calls (all: in sql/, sql/backup,
>>> storage/myisam, storage/maria) would be gone. It also feels natural;
>>> store_globals() is expected to do whatever is needed to get a functional
>>> "just works"-THD. It would reduce the number of calls (everything would
>>> happen in the existing store_globals() calls); there are already so many
>>> "boilerplate" operations that we copy-paste every time we want to write
>>> code to create a new utility thread (MyISAM backup driver, online backup
>>> kernel, NDB binlog, replication, event scheduler...). Last, it would
>>> reduce the presence of P_S calls in non-P_S code (like of events,
>>> backup, engines), which generally makes things simpler to understand
>>> (when one is reading code of feeature X it's better if she/he does not
>>> meet code bits of feature Y, makes less to parse).
>>>
>>
>> Every thread needs to be instrumented, and having the instrumentation in
>> the start_routine function for this thread seems natural to me.
> 
> But adding a P_S function which must be called in every thread function,
> seems like a burden to me.
> If mysql_thread_set_id(thd->thread_id) was done by THD::store_globals():
> - every thread having a THD wouldn't need an explicit hard-coded call to
> mysql_thread_set_id()
> - those few threads which don't have a THD would still need such
> hard-coded call in their thread function, yes (there are no such threads
> in the patch right now).
> 
> mysql_thread_set_id(thd->thread_id) is about setting some
> posix-thread-specific data by using some data from THD.
> This existing call: "my_pthread_setspecific_ptr(THR_THD,  this)"
> is also about setting some posix-thread-specific data (THR_THD key) by
> using some data from THD ("this" is the THD pointer here), and it
> happens in... THD::store_globals().
> 

I think you don't understand some parts here.

THD::store_global() is to bind a physical thread running (pthread) to a
logical thread (THD), using thread local storage.
This is required because the server code does not pass a THD object to
every code that needs it, so the code relies on _current_thd to get the
THD back.

This association has to be made again every time a thread from the
thread pool picks up a THD and execute it, so to simplify at every query.

mysql_thread_set_id(thd->thread_id) is to give an identifier to the
instrumentation, for the PSI_thread associated with the connection.
A THD is given an identifier only once, and the associated PSI_thread is
given the same identifier only once, when a connection is created,
typically.

So:
- THD::store_global() needs to be done for every query
- mysql_thread_set_id(thd->thread_id) needs to be done once for a thread

and as a result, it makes sense that these two lines of code are
actually in different places. Having the 2 calls in the same place would
be a design/code bug.

Calling mysql_thread_set_id(thd->thread_id) in the code that spawn a new
thread, in the start_routine code, is not a burden: it is exactly how
things should be done.

No change.

>> Instrumented threads do not have to have an associated THD.
> 
> I was unclear, I meant the calls to
> mysql_thread_set_id(*thd->thread_id*) only.
> Those without a THD need an explicit call for sure.
> 
>>>> === modified file 'sql/ha_ndbcluster.cc'
>>>> --- a/sql/ha_ndbcluster.cc    2009-05-12 03:15:02 +0000
>>>> +++ b/sql/ha_ndbcluster.cc    2009-05-21 19:42:38 +0000
>>>> @@ -10166,10 +10167,7 @@ pthread_handler_t ndb_util_thread_func(v
>>>>    }
>>>>  
>>>>  /* FIXME: waiting for Bug#40899 */
>>>> -#ifdef HAVE_PSI_INTERFACE
>>>> -  thd->m_psi= init_and_bind_psi_thread_with_id(key_thread_ndb_util,
>>>> -                                               thd, thd->thread_id);
>>>> -#endif
>>>> +  mysql_thread_set_id(thd->thread_id);
>>> Actually, thd->thread_id is 0 here so the call does nothing (m_thread_id
>>> was already 0 before it).
> 
> How about this one?
> 

Removed then.

>>>> === modified file 'sql/mysql_priv.h'
>>>> --- a/sql/mysql_priv.h    2009-05-18 15:36:08 +0000
>>>> +++ b/sql/mysql_priv.h    2009-05-21 19:42:38 +0000
>>>> @@ -2248,36 +2248,6 @@ extern PSI_file_key key_file_global_ddl_
>>>>    Initialise all the performance schema instrumentation points used
>>>> by the server.
>>>>  */
>>>>  void init_server_psi_keys();
>>>> -
>>>> -/**
>>>> -  Create an instrumented thread.
>>>> -  @param key the instrumentation thread key
>>>> -  @param identity the thread identity
>>>> -  @param id the thread identifier
>>>> -  @return an instrumented thread, or NULL.
>>>> -*/
>>>> -PSI_thread* init_psi_thread_with_id(PSI_thread_key key,
>>>> -                                    void* identity,
>>>> -                                    ulong id);
>>> By the way I like the new way much better than the previous one using
>>> store_globals(real_thread). Thanks!
>>
>> Not to mention, now the parent/child relationship between instrumented
>> threads can be properly captured, for performance_schema.THREADS later.
> 
> Very very good.
> 
>>>> === modified file 'sql/scheduler.cc'
>>>> --- a/sql/scheduler.cc    2009-05-16 00:41:08 +0000
>>>> +++ b/sql/scheduler.cc    2009-05-21 19:42:38 +0000
>>>> @@ -513,11 +516,7 @@ static void libevent_add_connection(THD
>>>>      DBUG_VOID_RETURN;
>>>>    }
>>>>  
>>>> -#ifdef HAVE_PSI_INTERFACE
>>>> -  thd->m_psi= init_psi_thread_with_id(key_thread_one_connection,
> thd,
>>>> thd->thread_id);
>>>> -#else
>>>> -  thd->m_psi= NULL;
>>>> -#endif
>>>> +  mysql_thread_prepare(key_thread_one_connection, thd,
>>>> thd->thread_id, & thd->m_psi);
>>> All calls to this function are:
>>> mysql_thread_prepare(x, y, z, & thd->m_psi);
>>> So, if P_S is not compiled in, they expand to
>>> thd->m_psi= 0;
>>> - I still don't understand why there has to be a m_psi in THD if P_S is
>>> not compiled in (maybe I'll know when I read your previous replies)
>>
>> See the comments in thd_scheduler::m_psi
> 
> The comment says that this is "because doing so will change the binary
> layout of THD, which is exposed to plugin code that may be compiled
> differently.".
> That is, it says that THD is part of the plugin API.
> However Serg wrote, in a previous email:
> http://lists.mysql.com/internals/36526
> "Marc is right, no ifdefs in the interface structures.
> THD is not part of the interface, though.
> handler is."
> So I don't understand well.
> However, it's not a very important issue to me.
> 

THD is used by some storage engines, therefore it is an interface.
Whether this was by design or not is a different topic.
I consider this point closed then.

>>>> === modified file 'sql/sql_parse.cc'
>>>> --- a/sql/sql_parse.cc    2009-05-18 15:36:08 +0000
>>>> +++ b/sql/sql_parse.cc    2009-05-21 19:42:38 +0000
>>>> @@ -461,10 +461,7 @@ pthread_handler_t handle_bootstrap(void
>>>>  {
>>>>    THD *thd= (THD*) arg;
>>>>  
>>>> -#ifdef HAVE_PSI_INTERFACE
>>>> -  thd->m_psi= init_and_bind_psi_thread_with_id(key_thread_bootstrap,
>>>> -                                               thd, thd->thread_id);
>>>> -#endif
>>>> +  mysql_thread_set_id(thd->thread_id);
>>> So here it feels like a misleading (too vague) function name. The line
>>> above suggests that this sets the thread's id. There are several thread
>>> ids (THD::thread_id, the one of P_S), the function's name does not
>>> suggest that this is about the P_S one. By default, when reading
>>> sql_parse, one may even guess wrongly (being more used to thd->thread_id
>>> than to P_S').
>>
>> Well, all the instrumentation API have been named:
>> - in lowercase
>> - mysql_<object>_<operation>
>> as for example mysql_mutex_lock, mysql_file_delete, etc.
>>
>> This naming scheme is consistent, and has been changed already to
>> address previous comments.
> 
> Yes.
> 
>> As naming for other things are not always consistent, there will always
>> be 1 exception somewhere where 1 instrumentation API has a name close to
>> another API, no matter what naming convention is used.
>>
>> No change.
> 
> Ah.
> So, mysql_thread_set_id(), in sql/ code, means "set the P_S-specific
> thread id", and not set "THD::thread_id" or "THD::real_id" as the
> function's name could suggest to a user more versed in sql/, where
> THD::thread_id is all around, than in P_S.
> 
> I understand the desire to keep a uniform naming convention, it's good;
> however in case of conflict like here, it's not good to introduce
> confusion for the sake of uniformity. Just like when there was a
> conflict between mysql_file.h::mysql_close() and the client
> mysql_close(), the former was rather named mysql_file_close().
> 
> Here is another, more "fundamental" (big word for a naming issue) reason
> why the name is misleading. mysql_thread_*(), to any reader, is a
> wrapper around the OS threads API; it manipulates OS threads.
> mysql_thread_create() creates an OS thread (and does some additional
> P_S-specific stuff). mysql_thread_set_id() will be understood as "set
> the ID of an OS thread". The ID of an OS thread is by definition
> pthread_self() ("DESCRIPTION The pthread_self() function shall return
> the thread ID of the calling thread." say the man pages). So
> mysql_thread_set_id() is a misleading name because it does not set the
> OS thread's ID (which is pthread_self()), as users of threads would
> expect; it rather sets a P_S-specific ID. It usurps its name because
> this name should be reserved to setting what is the OS thread ID
> (pthread_self()). Just like, by the way, THD::thread_id usurps its name.
> 
> That's why I would name it mysql_thread_set_psi_id(); the ID which is
> set is not the thread's ID (pthread_self()) it's a P_S-specific ID which
> thus be mentioned in the name. I don't find that this violates the
> naming convention at all. And I don't find that the problem is only due
> to a bad naming convention in existing code either.
> 
> If you still disagree, let's keep this as a "disagreement item" for
> later processing.

Renamed to mysql_thread_set_psi_id().


> 
>>>>  
>>>>    do_handle_bootstrap(thd);
>>>>    return 0;
>>>> === modified file 'storage/perfschema/pfs.cc'
>>>> --- a/storage/perfschema/pfs.cc    2009-05-19 15:56:07 +0000
>>>> +++ b/storage/perfschema/pfs.cc    2009-05-21 19:42:38 +0000
>>>> @@ -977,15 +977,65 @@ static void create_file_v1(PSI_file_key
>>>>      file_lost++;
>>>>  }
>>>>  
>>>> +struct PFS_spawn_thread_arg
>>>> +{
>>>> +  PFS_thread *m_parent_thread;
>>>> +  PSI_thread_key m_child_key;
>>>> +  const void *m_child_identity;
>>>> +  void *(*m_user_start_routine)(void*);
>>> Use type pthread_handler_t here.
>>
>> I don't see pthread_handler_t on my platform, nor a pthread_handler.
>> This comes from the prototype of pthread_create().
> 
> pthread_handler_t is defined in my_pthread.h. It's used like here:
> pthread_handler_t handle_slave_io(void *arg)
> {
> 

Ok, so using pthread_handler_t means that the performance schema
instrumentation headers now depends on the server implementation.
In turn, this means that a storage engine that only uses plain,
unmodified pthread, would be polluted with the macros from my_pthread.h

Implementors of storage engines using plain pthread, for example Falcon,
should be able to instrument their code.

No change.

>>>> +  if (psi_arg == NULL)
>>>> +    return EAGAIN;
>>> As this is very unlikely, I'd use unlikely().
> 
> How about this one?
> 

This has been implemented already.

>  > Added comments, and moved m_psi in thd_scheduler.
>>
>> Thanks for the analysis, I agree thd_scheduler::m_psi is much better.
> 
> Eh, thanks for having done the changes (and so fast).
> 
Thread
bzr commit into mysql-6.0-perfschema branch (marc.alff:3150) Marc Alff21 May
  • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Guilhem Bichot26 May
    • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Marc Alff26 May
      • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Guilhem Bichot28 May
        • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Marc Alff1 Jul
          • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Guilhem Bichot10 Jul
            • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Marc Alff10 Jul
              • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Guilhem Bichot27 Jul
                • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Marc Alff28 Jul
                  • Re: bzr commit into mysql-6.0-perfschema branch (marc.alff:3150)Guilhem Bichot28 Jul