List:Commits« Previous MessageNext Message »
From:Kevin Lewis Date:June 25 2008 10:59pm
Subject:RE: bzr commit into mysql-6.0-falcon branch (vvaintroub:2717) WL#4173
View as plain text  
Vlad,

A faster algorithm in this case is very prudent.  I'm not just making this
up either.  At Pervasive, we had the exact same question of using a faster
algorithm for checksums on pages.  It made a difference then, and I suspect
it will make a difference now.  In fact, I'm sure of it.

Kevin

>-----Original Message-----
>From: Vladislav Vaintroub [mailto:vaintroub@stripped]
>Sent: Wednesday, June 25, 2008 5:36 PM
>To: 'Kevin Lewis'; 'Vladislav Vaintroub'; commits@stripped
>Subject: RE: bzr commit into mysql-6.0-falcon branch (vvaintroub:2717)
>WL#4173
>
>> +static const uint16 NON_ZERO_CHECKSUM_MAGIC = 0xAFFE;
>It should be non-zero , so I took the first hex constant I could think of.
>Affe is monkey in German, so it gives software some personal touch ;)
>AFAIK this constant it is not widely used, unlike  0xdead, 0xbeef, 0xf00d,
>0xbaad, 0xbabe, 0xcafe or 0xabba ;)
>
>>  Why not do two loops?  One from data to skipStart and the other
>> from
>> skipEnd to the en of the page.
>Valid point. Maybe it is worth doing. Maybe not. It is possible that a
smart
>compiler would do it for me and maybe it can optimize even better than
that,
>unroll loops and do some arithmetic after figuring at compile time  that
>skipStart and skipEnd to are only resolved to data+2 and data+4.
>I think we should take this possible optimization into account. But I think
>also it is better for now to measure the total cost of checksum calculation
>and then discuss the potential gain of saving a branch instruction on the
>cost of readability? If checksum costs say 0.2% and we'll be able to speed
>up it to 10%, the net gain will be 0.02%, and I'd argue, in this case
>readability is more important.
>
>
>
>
>> -----Original Message-----
>> From: Kevin Lewis [mailto:klewis@stripped]
>> Sent: Wednesday, June 25, 2008 11:51 PM
>> To: 'Vladislav Vaintroub'; commits@stripped
>> Subject: RE: bzr commit into mysql-6.0-falcon branch (vvaintroub:2717)
>> WL#4173
>>
>> Vlad,
>>
>> The checksum function needs to be as fast as possible.  You have added
>> a
>> check for;
>>
>> if (!(data >= skipStart && data < skipEnd))
>>
>> in order to skip the checksum bytes.  But it does this on every byte in
>> the
>> page.  Why not do two loops?  One from data to skipStart and the other
>> from
>> skipEnd to the en of the page.
>>
>> Question,  where did you get this number?
>>
>> +static const uint16 NON_ZERO_CHECKSUM_MAGIC = 0xAFFE;
>>
>> >-----Original Message-----
>> >From: Vladislav Vaintroub [mailto:vvaintroub@stripped]
>> >Sent: Tuesday, June 24, 2008 7:21 PM
>> >To: commits@stripped
>> >Subject: bzr commit into mysql-6.0-falcon branch (vvaintroub:2717)
>> WL#4173
>> >
>> >#At bzr+ssh://bk-internal.mysql.com/bzrroot/server/mysql-6.0-falcon/
>> >
>> > 2717 Vladislav Vaintroub	2008-06-25
>> >      WL#4173 - Falcon checksums
>> >
>> >      - New parameter falcon_checksums (default ON) to calculate and
>> verify
>> >page checksums.
>> >      - If parameter is set, verify checksums as pages are read from
>> disk.
>> >        On error, stop with abort()/ DebugBreak() after message
>> including
>> >filename,
>> >        file offset and page size has been written to the error log.
>> >      - Update page->checksum as pages are written to the disk. If
>> >falcon_checksums is not
>> >        set page->checksum is set to magic value 0 (believed to be
>> always
>> >correct by validation
>> >        routine)
>> >      - Checksum algorithm is slightly modified Fletchers 8 bit (never
>> >produces 0 checksum)
>> >modified:
>> >  mysql-test/suite/falcon/r/falcon_options.result
>> >  mysql-test/suite/falcon/r/falcon_options2.result
>> >  storage/falcon/IO.cpp
>> >  storage/falcon/IOx.h
>> >  storage/falcon/Page.h
>> >  storage/falcon/SectorBuffer.cpp
>> >  storage/falcon/StorageParameters.h
>> >
>> >per-file messages:
>> >  mysql-test/suite/falcon/r/falcon_options.result
>> >    New parameter falcon_checksums
>> >  mysql-test/suite/falcon/r/falcon_options2.result
>> >    New parameter falcon_checksums
>> >  storage/falcon/IO.cpp
>> >    implement page checksum calculation and validation.
>> >    - use modified fletcher checksum algorithm
>> >    - update checksums, before pages are written to disk.
>> >      set page->checksum to 0 if falcon_checksums is not set
>> >    - verify checksums, as pages are read from disk.
>> >      Skip verification is falcon_checksums is not set or page-
>> >checksum is
>> >0
>> >  storage/falcon/IOx.h
>> >    New methods :
>> >    -  readPages( pread() plus checksum verification)
>> >    -  checksum (modified Fletchers algorithm)
>> >    -  pageChecksum()
>> >    -  validateChecksum()
>> >  storage/falcon/Page.h
>> >    checksum is made *unsigned* short
>> >    (avoids couple of cast, checksum algorithms internally work
>> >    with and produce unsigned values)
>> >  storage/falcon/SectorBuffer.cpp
>> >    replace pread with readPages - later does checksum verification in
>> >addition to pread()
>> >  storage/falcon/StorageParameters.h
>> >    New parameter falcon_checksums
>> >=== modified file 'mysql-test/suite/falcon/r/falcon_options.result'
>> >--- a/mysql-test/suite/falcon/r/falcon_options.result	2008-06-23
>> >10:22:43 +0000
>> >+++ b/mysql-test/suite/falcon/r/falcon_options.result	2008-06-25
>> >00:21:05 +0000
>> >@@ -1,6 +1,7 @@
>> > SHOW VARIABLES LIKE 'falcon_%';
>> > Variable_name	Value
>> > falcon_checkpoint_schedule	7 * * * * *
>> >+falcon_checksums	ON
>> > falcon_consistent_read	ON
>> > falcon_debug_mask	0
>> > falcon_debug_server	OFF
>> >@@ -90,6 +91,7 @@ SELECT * FROM INFORMATION_SCHEMA.global_
>> > WHERE variable_name LIKE 'falcon%';
>> > VARIABLE_NAME	VARIABLE_VALUE
>> > FALCON_CHECKPOINT_SCHEDULE	7 * * * * *
>> >+FALCON_CHECKSUMS	ON
>> > FALCON_CONSISTENT_READ	ON
>> > FALCON_DEBUG_MASK	4096
>> > FALCON_DEBUG_SERVER	OFF
>> >
>> >=== modified file 'mysql-test/suite/falcon/r/falcon_options2.result'
>> >--- a/mysql-test/suite/falcon/r/falcon_options2.result	2008-06-23
>> >10:22:43 +0000
>> >+++ b/mysql-test/suite/falcon/r/falcon_options2.result	2008-06-25
>> >00:21:05 +0000
>> >@@ -2,6 +2,7 @@ SELECT * FROM INFORMATION_SCHEMA.global_
>> > WHERE variable_name LiKE 'falcon_%';
>> > VARIABLE_NAME	VARIABLE_VALUE
>> > FALCON_CHECKPOINT_SCHEDULE	7 * * * * *
>> >+FALCON_CHECKSUMS	ON
>> > FALCON_CONSISTENT_READ	ON
>> > FALCON_DEBUG_MASK	0
>> > FALCON_DEBUG_SERVER	OFF
>> >
>> >=== modified file 'storage/falcon/IO.cpp'
>> >--- a/storage/falcon/IO.cpp	2008-04-29 08:39:19 +0000
>> >+++ b/storage/falcon/IO.cpp	2008-06-25 00:21:05 +0000
>> >@@ -100,10 +100,14 @@ static  int getLinuxVersion();
>> > //#define SIMULATE_DISK_FULL
>1000000;
>> >
>> > extern uint		falcon_direct_io;
>> >+extern char		falcon_checksums;
>> >
>> > static const int TRACE_SYNC_START	= -1;
>> > static const int TRACE_SYNC_END		= -2;
>> >
>> >+static const uint16 NO_CHECKSUM_MAGIC = 0;
>> >+static const uint16 NON_ZERO_CHECKSUM_MAGIC = 0xAFFE;
>> >+
>> > #ifdef SIMULATE_DISK_FULL
>> > static int simulateDiskFull = SIMULATE_DISK_FULL;
>> > #endif
>> >@@ -243,7 +247,7 @@ void IO::readPage(Bdb * bdb)
>> > 		FATAL ("can't continue after fatal error");
>> >
>> > 	SEEK_OFFSET offset = (int64) bdb->pageNumber * pageSize;
>> >-	int length = pread (offset, pageSize, (UCHAR *)bdb->buffer);
>> >+	int length = readPages (offset, pageSize, (UCHAR *)bdb->buffer);
>> >
>> > 	if (length != pageSize)
>> > 		{
>> >@@ -255,6 +259,39 @@ void IO::readPage(Bdb * bdb)
>> > 	++reads;
>> > }
>> >
>> >+/*
>> >+  Read from tablespace file and perform and checksum verification
>> >+*/
>> >+int IO::readPages(int64 offset, int length, UCHAR *buffer)
>> >+{
>> >+	int result = pread (offset, length , buffer);
>> >+
>> >+	if(!falcon_checksums || result != length)
>> >+		return result;
>> >+
>> >+	// Verify checksums
>> >+	Page *page = (Page *)buffer;
>> >+
>> >+	for(int i = 0; i < length/pageSize ; i++ , page = (Page *)((UCHAR
>> >*)page + pageSize))
>> >+		validateChecksum(page, pageSize, offset + i*pageSize);
>> >+
>> >+	return result;
>> >+}
>> >+
>> >+void IO::validateChecksum(Page *page, size_t pageSize, int64
>> fileOffset)
>> >+{
>> >+	if (page->checksum == NO_CHECKSUM_MAGIC)
>> >+		return;
>> >+
>> >+	uint16 chksum = pageChecksum(page, pageSize);
>> >+
>> >+	if (page->checksum != chksum)
>> >+		FATAL("Checksum error (expected %d, got %d) at page %d"
>> >+			"file '%s', offset " I64FORMAT ", pagesize %d",
>> >+			(int)chksum, (int)page->checksum,
>> >(int)(fileOffset/pageSize),
>> >+			fileName, fileOffset, pageSize);
>> >+}
>> >+
>> > bool IO::trialRead(Bdb *bdb)
>> > {
>> > 	Sync sync (&syncObject, "IO::trialRead");
>> >@@ -278,23 +315,7 @@ void IO::writePage(Bdb * bdb, int type)
>> >
>> > 	ASSERT(bdb->pageNumber != HEADER_PAGE || ((Page*)(bdb->buffer))-
>> >>pageType == PAGE_header);
>> > 	tracePage(bdb);
>> >-	SEEK_OFFSET offset = (int64) bdb->pageNumber * pageSize;
>> >-	int length = pwrite (offset, pageSize, (UCHAR *)bdb->buffer);
>> >-
>> >-	if (length != pageSize)
>> >-		{
>> >-		if (errno == ENOSPC)
>> >-			throw SQLError(DEVICE_FULL, "device full error on
>> %s,
>> >page %d\n", (const char*) fileName, bdb->pageNumber);
>> >-
>> >-		declareFatalError();
>> >-		FATAL ("write error on page %d (%d/%d/%d) of \"%s\": %s
>> (%d)",
>> >-				bdb->pageNumber, length, pageSize, fileId,
>> >-				(const char*) fileName, strerror (errno),
>> errno);
>> >-		}
>> >-
>> >-	++writes;
>> >-	++writesSinceSync;
>> >-	++writeTypes[type];
>> >+	writePages(bdb->pageNumber, pageSize, (UCHAR *)bdb->buffer,
>> type);
>> > }
>> >
>> > void IO::writePages(int32 pageNumber, int length, const UCHAR* data,
>> int
>> >type)
>> >@@ -308,7 +329,17 @@ void IO::writePages(int32 pageNumber, in
>> > 	if (simulateDiskFull && offset + length > simulateDiskFull)
>> > 		throw SQLError(DEVICE_FULL, "device full error on %s, page
>> >%d\n", (const char*) fileName, pageNumber);
>> > #endif
>> >-
>> >+
>> >+	Page *page = (Page *)data;
>> >+
>> >+	for (int i = 0;i < length/pageSize; i++ ,page = (Page *)((UCHAR
>> *)page
>> >+ pageSize))
>> >+		{
>> >+		if (falcon_checksums)
>> >+			page->checksum = pageChecksum(page, pageSize);
>> >+ 		else
>> >+			page->checksum = NO_CHECKSUM_MAGIC;
>> >+		}
>> >+
>> > 	int ret = pwrite (offset, length, data);
>> >
>> > 	if (ret != length)
>> >@@ -330,15 +361,19 @@ void IO::writePages(int32 pageNumber, in
>> >
>> > void IO::readHeader(Hdr * header)
>> > {
>> >-	static const int sectorSize = 4096;
>> >-	UCHAR temp[sectorSize * 2];
>> >-	UCHAR *buffer = (UCHAR*) (((UIPTR) temp + sectorSize - 1) /
>> sectorSize
>> >* sectorSize);
>> >+	static const int bufSize = 32*1024;
>> >+	UCHAR temp[bufSize + 4096];
>> >+	UCHAR *buffer = ALIGN(temp,4096);
>> > 	int n = lseek (fileId, 0, SEEK_SET);
>> >-	n = ::read (fileId, buffer, sectorSize);
>> >+	n = ::read (fileId, buffer, bufSize);
>> >
>> > 	if (n < (int) sizeof (Hdr))
>> > 		FATAL ("read error on database header");
>> >-
>> >+
>> >+	Hdr* hdr = (Hdr*) buffer;
>> >+	if (falcon_checksums && hdr->pageSize <= n)
>> >+		validateChecksum((Page*) buffer, hdr->pageSize, 0);
>> >+
>> > 	memcpy(header, buffer, sizeof(Hdr));
>> > }
>> >
>> >@@ -679,3 +714,51 @@ static int getLinuxVersion()
>> > 	return vers = KERNEL_VERSION(major,minor,release);
>> > }
>> > #endif
>> >+
>> >+/*
>> >+	Fletcher's checksum, with tweaks
>> >+	Tweak 1 : never produces 0 checksum.
>> >+	Tweak 2 : allows to skip some bytes in buffer
>> >+*/
>> >+
>> >+uint16 IO::checksum(UCHAR *data, size_t len, UCHAR *skipStart, UCHAR
>> >*skipEnd)
>> >+{
>> >+	uint16 sum1 = 0xff, sum2 = 0xff;
>> >+
>> >+	while (len)
>> >+		{
>> >+		size_t tlen = len > 21 ? 21 : len;
>> >+		len -= tlen;
>> >+		do
>> >+			{
>> >+			if (!(data >= skipStart && data < skipEnd))
>> >+				{
>> >+				sum1 += *data;
>> >+				sum2 += sum1;
>> >+				}
>> >+			data++;
>> >+			}
>> >+		while (--tlen);
>> >+		sum1 = (sum1 & 0xff) + (sum1 >> 8);
>> >+		sum2 = (sum2 & 0xff) + (sum2 >> 8);
>> >+		}
>> >+	/* Second reduction step to reduce sums to 8 bits */
>> >+	sum1 = (sum1 & 0xff) + (sum1 >> 8);
>> >+	sum2 = (sum2 & 0xff) + (sum2 >> 8);
>> >+	uint16 result = ((sum2 & 0xff) << 8) | (sum1 & 0xff);
>> >+	if(result == 0)
>> >+		result = NON_ZERO_CHECKSUM_MAGIC;
>> >+	return result;
>> >+}
>> >+
>> >+/*
>> >+  Calculate checksum for a page.
>> >+  Skips 2 bytes checksum field in the page header
>> >+*/
>> >+uint16 IO::pageChecksum(Page *page, size_t size)
>> >+{
>> >+	UCHAR *data = (UCHAR*)page;
>> >+	return checksum(data, size , data + OFFSET(Page*,checksum),
>> >+		data + OFFSET(Page*,checksum) + sizeof(page->checksum));
>> >+}
>> >+
>> >
>> >=== modified file 'storage/falcon/IOx.h'
>> >--- a/storage/falcon/IOx.h	2008-03-27 06:09:29 +0000
>> >+++ b/storage/falcon/IOx.h	2008-06-25 00:21:05 +0000
>> >@@ -39,6 +39,7 @@ static const int WRITE_TYPE_MAX			= 6;
>> > class Bdb;
>> > class Hdr;
>> > class Dbb;
>> >+class Page;
>> >
>> > class IO
>> > {
>> >@@ -60,6 +61,7 @@ public:
>> > 	void	writePage (Bdb *buffer, int type);
>> > 	void	writePages(int32 pageNumber, int length, const UCHAR* data,
>> int
>> >type);
>> > 	void	readPage (Bdb *page);
>> >+	int		readPages(int64 offset, int length, UCHAR* data);
>> > 	bool	createFile (const char *name, uint64 initialAllocation);
>> > 	bool	openFile (const char *name, bool readOnly);
>> > 	void	longSeek(int64 offset);
>> >@@ -75,7 +77,10 @@ public:
>> > 	static void		trace(int fd, int pageNumber, int pageType,
>> int
>> >pageId);
>> > 	static void		traceOpen(void);
>> > 	static void		traceClose(void);
>> >-
>> >+	uint16	checksum(UCHAR *data, size_t len, UCHAR *skipStart,
>> UCHAR
>> >*skipEnd);
>> >+	uint16	pageChecksum(Page *page, size_t pageSize);
>> >+	void	validateChecksum(Page *page, size_t pageSize, int64
>> fileOffset);
>> >+
>> > 	static void		createPath (const char *fileName);
>> > 	static const char *baseName(const char *path);
>> > 	static void		expandFileName(const char *fileName, int
>> length,
>> >char *buffer, const char **baseFileName = NULL);
>> >
>> >=== modified file 'storage/falcon/Page.h'
>> >--- a/storage/falcon/Page.h	2008-06-17 17:41:54 +0000
>> >+++ b/storage/falcon/Page.h	2008-06-25 00:21:05 +0000
>> >@@ -58,8 +58,8 @@ class EncodedDataStream;
>> > class Page
>> > {
>> > public:
>> >-	short	pageType;
>> >-	short	checksum;
>> >+	int16	pageType;
>> >+	uint16	checksum;
>> >
>> > #ifdef HAVE_PAGE_NUMBER
>> > 	int32	pageNumber;
>> >
>> >=== modified file 'storage/falcon/SectorBuffer.cpp'
>> >--- a/storage/falcon/SectorBuffer.cpp	2008-06-17 21:00:45 +0000
>> >+++ b/storage/falcon/SectorBuffer.cpp	2008-06-25 00:21:05 +0000
>> >@@ -40,7 +40,7 @@ void SectorBuffer::readPage(Bdb* bdb)
>> > void SectorBuffer::readSector()
>> > {
>> > 	uint64 offset = sectorNumber * cache->pagesPerSector * cache-
>> >>pageSize;
>> >-	activeLength = dbb->pread(offset, SECTOR_BUFFER_SIZE, buffer);
>> >+	activeLength = dbb->readPages(offset, SECTOR_BUFFER_SIZE,
>> buffer);
>> > }
>> >
>> > void SectorBuffer::setSector(Dbb* db, int sector)
>> >
>> >=== modified file 'storage/falcon/StorageParameters.h'
>> >--- a/storage/falcon/StorageParameters.h	2008-03-11 16:15:47
>> +0000
>> >+++ b/storage/falcon/StorageParameters.h	2008-06-25 00:21:05
>> +0000
>> >@@ -11,7 +11,7 @@
>> > // #define PLUGIN_VAR_OPCMDARG     0x2000 = Argument optional for cmd
>> line
>> > // #define PLUGIN_VAR_MEMALLOC     0x8000 = String needs memory
>> allocated
>> >
>> >-
>> >+PARAMETER_BOOL(checksums, "Enable Falcon checksum validation", 1,
>> 0x200,
>> >NULL)
>> > PARAMETER_UINT(debug_mask, "Falcon message type mask for logged
>> messages.",
>> >0, 0, INT_MAX, 0, StorageInterface::updateDebugMask)
>> > PARAMETER_BOOL(debug_server, "Enable Falcon debug code.", 0, 0x0200,
>> NULL)
>> > PARAMETER_UINT(debug_trace, "Falcon debug trace trigger.", 0, 0,
>> INT_MAX,
>> >0, NULL)
>> >
>> >
>> >--
>> >MySQL Code Commits Mailing List
>> >For list archives: http://lists.mysql.com/commits
>> >To unsubscribe:
>> http://lists.mysql.com/commits?unsub=1


Thread
bzr commit into mysql-6.0-falcon branch (vvaintroub:2717) WL#4173Vladislav Vaintroub25 Jun
  • RE: bzr commit into mysql-6.0-falcon branch (vvaintroub:2717) WL#4173Kevin Lewis25 Jun
    • RE: bzr commit into mysql-6.0-falcon branch (vvaintroub:2717) WL#4173Vladislav Vaintroub26 Jun
      • RE: bzr commit into mysql-6.0-falcon branch (vvaintroub:2717) WL#4173Kevin Lewis26 Jun
  • Re: bzr commit into mysql-6.0-falcon branch (vvaintroub:2717) WL#4173Olav Sandstaa2 Jul
    • RE: bzr commit into mysql-6.0-falcon branch (vvaintroub:2717) WL#4173Vladislav Vaintroub5 Jul