Ingo Strüwing wrote:
> Hi Rafal,
> Rafal Somla, 22.10.2009 13:04:
>> Ingo Strüwing wrote:
>> However, the ultimate solution is to do
>> proper consistency checking using checksums or the like. One could argue
>> that it is better to rely on a complete solution rather than have a
>> false safety feeling based on some simplistic and partial solution.
> Checksums don't help. With a wrong version number, we might expect the
> checksum at a wrong place. How could we tell the user if the image is
> corrupted or the storage module confused the version number? The
> difference is relevant. In the latter case the user might recover his
> data by fixing the mess in the storage.
I understand and agree that having a wrong version number screws things up.
However, I don't understand how do you want to use it as an argument for
storing version number in the stream as opposed to storing it out-of-band.
In either case we must rely on the storage module to give us correct version
number (or correct bytes which store version number).
I could even argue that with the solution which I propose, chances of a
corrupted version number are smaller. Thinking about XBSA storage, quite
probably storing version number in XBSA object attribute is safer than
storing it in the stream, which will go to external physical storage device
such as magnetic tape and perhaps probability of damaging information on
tape is higher than having object attributes corrupted (I know it is a long
shot, but anyway... :)).
>>> If we define "magic bytes" + "version number" as part of the image, we
>>> would immediately detect a failing storage module.
>> ... unless it fails after giving us correct magic bytes. E.g., if
>> version number is corrupted we are screwed anyway. So even in this
>> variant we must trust BSM to a big extend.
> If the backup storage module is not able to return the blob unmodified,
> no solution can work.
Agree. That was my point when I was challenging your argument that in-stream
storage is bit safer as it requires less trust in the correct functioning of
the storage module. Not so much: in my solution we must trust that storage
module gives us correct version number - in your solution we must trust that
it correctly retrieves the two bytes which contain version number. To me
this is roughly equivalent.
> Checksums are on the roadmap. There is no point in thinking about
> possible problems by not having them. With the current format we cannot
> reliably detect image corruption. But with the next format we will.
> If the storage module returns a modified blob, we have image corruption.
> Nothing else. If the mark is included in the image, we reduced two
> possible failures to one.
OK, I buy this argument - in my solution there are 2 possible failure
points: wrong version number handling and data corruption. In your solution
only 2nd failure is relevant.
>>>>> 6. "acknowledgement": The services look like they all return a
>> I think that for "set compression algorithm" service, reporting error
>> and reporting that compression is not supported are two different
>> things. In the latter case the service completes successfully, informing
>> the user that compression is not supported. This is different from the
>> situation where service fails because of some reason.
>> So, we have three possibilities here:
>> 1. Service fails with error.
>> 2. Service completes successfully and reports that compression is on.
>> 3. Service completes successfully and reports that compression is not
> Possibility 3 is not what I would accept as a user. I want compression.
> The system tells me: Backup successful, not compressed. Eh?
The addressee of these responses is not the final user (DBA) but the user of
the storage module (backup kernel in our case). Backup kernel will decide
what system tells the final user and this should be something sensible and
meaningful for a DBA. But to give sensible and meaningful messages, system
must have good information about what is happening.
In this scenario, backup kernel asks storage module to setup compression. If
module does not support compression, nothing really bad happens. One can
consider this a valid reply to such request (possibility 3). This is
different from situation when module supports compression, but failed to
initialize it for some reason. In that case it would report error
(possibility 1). Without this distinction, backup kernel would not be able
to give a good feedback to the final user.
> And we do not have extra notifiers for other things that could be seen
> as success, e.g. empty image: Restore successful, nothing restored;
Well yes. If by "empty image" you mean an image which has correct format but
contains no data, then this is a correct behaviour.
> wrong location: Open of location successful, not a backup image. If I
Yes, good example. I think it is a matter of design decision to decide what
is considered an error and what is not. And it is good to realize these
choices. For the "open location for reading" request, it is considered an
error if location does not contain a backup image. In that case we specify
that service request should fail.
But we could consider alternative design, where service succeeds in that
case and passes additional information that the location does not contain
backup image. Then a following "read bytes from location" request should
fail, unless we decide that it also makes sense to read bytes from locations
which do not contain backup images.
> spend more time on it, I can probably come up with more ridiculous examples.
> Sorry to become emotional here. Maybe I disqualify myself as a voter for
> this work.
All these are design choices and we are now discussing design. I think there
is no need to be emotional about that. What is described in HLS currently
reflects what I consider the most sensible choices.
>>> "Amount of data that has been written"
>> From this perspective, it is very easy to imagine that a BSM will talk
>> to some more or less exotic devices such as tapes, where it can even
>> require physical loading of the tape before any bytes can be written.
>> The POSIX write() interface allows the application to resume control and
>> do something in such situations, while your simplified solution would
>> mean that the backup kernel (or one of its threads) will be blocked
>> waiting for the device to be ready.
> So you suggest asynchronous operation again as a use case.
Yes, because as I wrote I think it is quite possible that storage module
will drive some unusual physical storage device. Is this not a valid argument?
> You exclude management functions from the interface. You say this shall
> be implemented outside of the server.
Yes, because I want to keep interface minimal (as explained in several
places). That is, design and specify interface only for this functionality
which is needed to implement current BACKUP/RESTORE commands.
> You exclude progress information from the interface. As a user I would
> desire this a lot.
I was not thinking about that. If you have good ideas please propose them -
now is a good time for doing this.
> But you include an option for asynchronous operation. What could the
> backup kernel do with it to serve the user?
Will give the user possibility to use wider range of storage modules in a
> I want to say that I see the selection of features as arbitrary. At
> least it doesn't follow my taste.
I think design decisions are always arbitrary to some extend. It is also
impossible to satisfy everyone tastes. I'm afraid we have to live with that...
>>> "Information about end of stream"
>>> If you see it this way, why don't you think it could be good that during
>>> a stream write the storage module gets some idea about when the stream
>>> is at its end?
>>> You probably will answer that the SM can assume end of stream when a
>>> "close" is requested. And the I will ask, why the backup kernel cannot
>>> assume end of stream when the SM does not deliver any data any more
>>> (zero length read)?
>> Sorry, I don't fully understand your concern(s). For the last sentence,
>> SM can not deliver any data for other reasons than end of stream.
>> Perhaps it knows that it will have to wait very long for next byte from
>> the stream and thus it returns control to the caller informing that no
>> bytes have arrived yet.
> So we are back at asynchronous operation. But for true asynchronous
> operation, we need a service, which allows the storage module to wake
> the kernel when there is more data available.
OK, I hope I managed to explain already why I consider "amount of data that
has been written" and "information about end of stream" to be two
independent pieces of information. It should be understood that this is just
how specification is written - it says nothing about how it is to be
>>> In my eyes the "information about end of stream" is an inconsistency in
>>> level of detail with other service specifications. I suggest to get rid
>>> of it. It is obvious that end of stream needs to be signaled somehow.
>> Maybe it is obvious, but in the HLS I try to list all information which
>> goes in and out of each service.
> I don't believe that. I guess there is a lot of information implicit
> because it is so obvious that we don't think about it.
To clarify: specifications specify information that a service should report.
It makes no sense to request that service reports information that is
already known otherwise. Likewise, it makes no sense to request that service
reports information which it does not have access to.
Good that you verify the specification from that perspective. However, with
"read bytes from location" service, the information whether the stream has
more data or not can not be inferred form other sources. Only the storage
module could possibly know about that and this is why the specification
requires it to report that information. I hope that you agree this is not
> For example, why don't the services tell about begin of stream? Or that
> the open service tells if read services can now be used to get data?
Yes, these are implicit informations which can be inferred by the kernel.
But end of stream is not such an information.
> And what if the user intentionally removed a tape during the operation?
> From a standpoint one could see this as successful abort of the
> operation. But there is no "aborted" information.
I imagine in that case "Read bytes from location" would fail with error.
It is true that current specification does not give storage modules a
possibility to actively abort a session. Only backup kernel can do that. Do
you think there is a need for such possibility?
> Etc, etc.
Well, I'm still not convinced that my specification, as it is now, is
I don't know what is your impression, but mine is that although this
discussion is lengthy, we still clarify many things and understand better
each ones position. You also give some interesting suggestions I was not
thinking about, like possibility to abort operation by storage module or
interface for getting progress info from storage modules.