Vladislav Vaintroub wrote:
> I'm keeping to get errors in the index recovery (lost parent page of an
> index page typically, not on disk and I cannot find any info about this page
> in the whole log).
> I have an idea on how to log splits to make recovery work better. Please
> give me your feedback.
> The basic idea is that we do not do log single pages anymore but bunch of
> pages (every page that was changed during the split). And we do it
> atomically, in a single serial log record. And we do not release them until
> they are logged.
> Our index page got some links to other pages (next on the same level, prior
> on the same level and parent) and when splitting some or all of them are
> modified. Also, a new page is always created (and I believe even 2
> That means, new record type that includes several pages would somewhat more
> heavy than individual pages we used to log . On the other hand, split should
> be considered a relatively rare operation, most page updates do not
> But benefits are obvious (for me at least):
> next/prior chain is consistent, parent does not lose the child, child does
> not lose the parent and we do not need to think about the order when
> logging. We log an atomic operation (split) and there is no way that
> recovery get an inconsistent index because server stopped while doing a
> split and while
> Please share your thoughts.
I think we need to find out what is happening. Pages shouldn't get
lost. If there's a bug, it should be fixed, not papered over.
The parent and both children are all in use while all three are logged.
I don't understand how they could get out of sync.
Is there any change that an attempted optimization has short circuited
the interlock between log pruning and the flush of the page cache? We
do depend on a page having been written when computing the recovery
point. If the page cache, in fact, has not been completely written, all
hell will break loose during recovery.
You seen to have an anomaly that shouldn't happen if the other
mechanisms are working properly. I strongly suggest you find what the
actual failure is rather than working around the failure.
President, NimbusDB, Inc.