Hi Guilhem
Guilhem Bichot wrote:
> Hi Marc,
>
> 1) I have the impression that all m_flags class members are unused
> except in some DBUG_ASSERT(), is that true? If yes, they just add
> complexity for little value. Can they be removed? PSI_FLAG_GLOBAL is
> unused?
>
So far this flag is unused.
The intent was to use the knowledge that an instrument is a singleton to
shortcut the aggregation chain "by instance", "by class (=event name)"
to have only an aggregation bucket per class, and share it.
Given that the server has a lot of singletons with mutexes, it could be
worth it.
Summary table per instance then would display the per class data,
transparently.
This would improve the code efficiency during instrumentation, at the
cost of some minor extra complexity when an instrument is instantiated.
At some point, we will need to see if this optimization really has some
benefit and should be implemented, or if the flag can be removed and the
code simplified.
I'll keep that as an unresolved item that will need to be fixed.
> 2) Looking at new files in mysql-test/, I see only tests which verify
> that P_S table exist, that forbidden operations are really forbidden
> (like deletions in certain P_S tables), but nothing which tests that P_S
> gives correct statistical numbers. How about testing the instrumentation
> of key objects, for example:
> - create a MyISAM table and verify that P_S reports that
> THR_LOCK_myisam, LOCK_mdl, etc were taken the expected number of times
> - same with conditions, tables, threads...
> ?
Keep in mind that all the tests have not been written yet, especially
functional tests.
I agree these tests will have a great value, but I am also concerned
about their maintenance: every time the server *implementation* changes,
the tests will be broken.
I think this pattern can be used to cover some instruments, for which
the behavior of the code is well known and fixed, but could be
problematic for other more volatile things (like, instrumenting malloc)
I will look into helper scripts to write this kind of tests and see how
it goes.
>
> 3) About your recent fix with HA_FAST_KEY_READ. As you wrote, P_S is
> fast to read keys in random order, so in all logic it should have this
> flag, but for other reasons (consistency) we rather have to rely on a
> trick (don't use HA_FAST_KEY_READ and rows will be cached). As the
> behaviour isn't guaranteed to stay (I don't think Optimizer folks will
> remember that they MUST keep caching on for the sake of P_S), could you
> please write a .test, so that if the behaviour is later broken this is
> caught?
> I understand that this involves concurrency, but this is testable with
> "debug sync points" (they exist in 6.0 and are used to test Online
> Backup); here's how: I think you need a second thread to do something
> while a first thread is between rnd_next() and rnd_pos(); then you would
> put a debug sync point between those two calls; the test would make the
> first thread start its SELECT statement (which reads P_S tables if I
> understood correctly), pause at the sync point, then the test would let
> the second thread run which overwrites data in P_S tables, and the
> trouble should come when the test does "reap" in the first thread. See
> example in suite/backup/t/backup_lock_myisam.test. Test should fail when
> HA_FAST_KEY_READ is used, blah blah you know the thing. If you have
> trouble with setting up your first sync points, just ask.
> It's ok to not do it right now, but I hope you could put this in your
> short-term todo (before push to main).
> In Maria, sometimes for certain fixes of concurrency bugs I haven't
> written a .test, but that was only because I knew that those bugs are
> automatically always tested by the Random Query Generator in pushbuild2
> for every push.
>
Thanks, I will look into the debug sync points.
Do you know if there is a specific flag to use to tell the optimizer to
always cache records, and if not should this flag be implemented ?
> 4) In what user-visible view do the *_lock_stat counters
> (read_lock_stat, write_lock_stat etc) end up?
Right now this data is maintained in the instrumentation but not exposed.
The first set of tables for the performance schema basically exposes
"WAIT" data.
Another set of tables with "LOCK" or "STATE" data will be implemented
once the specifications for this are available.
In case of how long a mutex is held (the lock stats), we already know
that this data will be displayed eventually, even if the table name and
structure to display it is not yet defined, which is why I implemented
this part in the instrumentation already.
The motivation was to make sure there will not be a road block later, to
make sure the instrumentation events works.
>
> 5) I'm looking at end_mutex_wait_v1(). There seems to be a nice logic
> with statistics rolling up to their aggregates rolling up to their
> aggregates (aggregate_single_stat_chain()) but close to this, a
> hard-coded rolling up:
> if (flag_events_waits_summary_by_thread_by_event_name).
> Calls to aggregate_single_stat() (not the _chain() version) are always
> tied to flag_events_waits_summary_by_thread_by_event_name. Is that
> summary table "breaking your nice model" somehow?
No, the model is fine.
An event is basically a data point (x1, x2, x3, ..., xN) in a
N-dimensional space.
Each aggregation step consist in moving along one axis in a given dimension
An aggregation chain is moving several steps in the same dimension.
For example,
(raw data) --> (raw by instrument instances) --> (raw by instrument class)
is one chain.
Another chain can be:
(raw data) --> (raw by thread instance) --> (raw by user@host) --> (raw
by connected hostname)
Since these 2 chains are orthogonal, the instrumentation point for the
raw event starts 2 chains, instead of having 1 chain which is the
concatenation (which would not work so well).
A chain can be arbitrary long, depending on how much / how little we
aggregate at each step (for example to illustrate, IP can be grouped by
255.255.255.0 mask, then by 255.255.0.0 mask, then by 255.0.0.0 mask,
etc -- the possibilities are endless --).
In a N-dimensional cube, with the data point at one corner, and the
EVENT_WAITS_SUMMARY_GLOBAL table (it has 1 row) at the opposite corner
at (0, 0, ..., 0), each chain can be represented by a different path
that leaves from the data point corner and progresses in direction or
the zero corner.
Note that, the SUM/MIN/MAX being associative, 2 chains do never converge
on the same point (otherwise, the same event would be counted twice).
Also, many chains will fall short and never reach the (0, 0, ..., 0)
point, but that's another topic called "higher level aggregates" for
when we will come to them (the aggregate is not maintained on real time,
but computed on demand during SELECT * from P_S.xxx).
None of these tables are implemented yet.
That's for the model itself.
Now, in case of EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME, since
there is no aggregate for all the threads created by a given user@host
(yet) or all the traffic originating from a given hostname (yet), the
chain length is currently 1.
For chains of length 1, the code is inlined.
I guess i need to write more doxygen pages then :)
Regards,
-- Marc