Can I measure a TableBatchOperation's size? - azure-storage

The .Net SDK documentation for TableBatchOperation says that
A batch operation may contain up to 100 individual table operations, with the requirement that each operation entity must have same partition key. A batch with a retrieve operation cannot contain any other operations. Note that the total payload of a batch operation is limited to 4MB.
It's easy to ensure that I don't add more than 100 individual table operations to the batch: in the worst case, I can check the Count property. But is there any way to check the payload size other than manually serialising the operations (at which point I've lost most of the benefit of using the SDK)?

As you add entities you can track the size of the names plus data. Assuming you're using a newer library where the default is Json, the additional characters added should be relatively small (compared to the data if you're close to 4MB) and estimable. This isn't a perfect route, but it would get you close.
Serializing as you go especially if you're actually getting close to the 100 entity limit or the 4MB limit frequently is going to lose you a lot of perf, aside from any convenience lost. Rather than trying to track as you go either by estimating size or serializing, you might be best off sending the batch request as-is and if you get a 413 indicating request body too large, catch the error, divide the batch in 2, and continue.

I followed Emily Gerner's suggestion using optimistic inserts and error handling, but using StorageException.RequestInformation.EgressBytes to estimate the number of operations which fit in the limit. Unless the size of the operations varies wildly, this should be more efficient. There is a case to be made for not raising len every time, but here's an implementation which goes back to being optimistic each time.
int off = 0;
while (off < ops.Count)
{
// Batch size.
int len = Math.Min(100, ops.Count - off);
while (true)
{
var batch = new TableBatchOperation();
for (int i = 0; i < len; i++) batch.Add(ops[off + i]);
try
{
_Tbl.ExecuteBatch(batch);
break;
}
catch (Microsoft.WindowsAzure.Storage.StorageException se)
{
var we = se.InnerException as WebException;
var resp = we != null ? (we.Response as HttpWebResponse) : null;
if (resp != null && resp.StatusCode == HttpStatusCode.RequestEntityTooLarge)
{
// Assume roughly equal sizes, and base updated length on the size of the previous request.
// We assume that no individual operation is too big!
len = len * 4000000 / (int)se.RequestInformation.EgressBytes;
}
else throw;
}
}
off += len;
}

Related

Ragel: avoid redundant call of "when" clause function

I'm writing Ragel machine for rather simple binary protocol, and what I present here is even more simplified version, without any error recovery whatsoever, just to demonstrate the problem I'm trying to solve.
So, the message to be parsed here looks like this:
<1 byte: length> <$length bytes: user data> <1 byte: checksum>
Machine looks as follows:
%%{
machine my_machine;
write data;
alphtype unsigned char;
}%%
%%{
action message_reset {
/* TODO */
data_received = 0;
}
action got_len {
len = fc;
}
action got_data_byte {
/* TODO */
}
action message_received {
/* TODO */
}
action is_waiting_for_data {
(data_received++ < len);
}
action is_checksum_correct {
1/*TODO*/
}
len = (any);
fmt_separate_len = (0x80 any);
data = (any);
checksum = (any);
message =
(
# first byte: length of the data
(len #got_len)
# user data
(data when is_waiting_for_data #got_data_byte )*
# place higher priority on the previous machine (i.e. data)
<:
# last byte: checksum
(checksum when is_checksum_correct #message_received)
) >to(message_reset)
;
main := (msg_start: message)*;
# Initialize and execute.
write init;
write exec;
}%%
As you see, first we receive 1 byte that represents length; then we receive data bytes until we receive needed amount of bytes (the check is done by is_waiting_for_data), and when we receive next (extra) byte, we check whether it is a correct checksum (by is_checksum_correct). If it is, machine is going to wait for next message; otherwise, this particular machine stalls (I haven't included any error recovery here on purpose, in order to simplify diagram).
The diagram of it looks like this:
$ ragel -Vp ./msg.rl | dot -Tpng -o msg.png
Click to see image
As you see, in state 1, while we receiving user data, conditions are as follows:
0..255(is_waiting_for_data, !is_checksum_correct),
0..255(is_waiting_for_data, is_checksum_correct)
So on every data byte it redundantly calls is_checksum_correct, although the result doesn't matter at all.
The condition should be as simple: 0..255(is_waiting_for_data)
How to achieve that?
How is is_checksum_correct supposed to work? The when condition happens before the checksum is read, according to what you posted. My suggestion would be to check the checksum inside message_received and handle any error there. That way, you can get rid of the second when and the problem would no longer exist.
It looks like semantic conditions are a relatively new feature in Ragel, and while they look really useful, maybe they're not quite mature enough yet if you want optimal code.

How to get NSStream total length?

I want to know, if there an easy way to get the total length in bytes of the NSStream object. So, for example, in C# I can get a Stream.Length property, and that'll be the answer. In objective-c, so far, I haven't found anything like that. The simplest solution, I could imagine would be "read bytes to buffer and count their number":
long totalLength = 0;
while((result = [sInput read:buffer maxLength:BUFFER_SIZE]) != 0) {
if(result > 0) {
totalLength += result;
}
As stated in docs return value for read method is:
A positive number indicates the number of bytes read;
0 indicates that the end of the buffer was reached;
A negative number means that the operation failed.
finally the size values would contain the total length.
Is this a correct way to solve the issue, or is there a simpler way? Btw, is my code correct? (I'm not confident in my obj-c skills yet)

Does this code fill the CPU cache?

I have two ways to program the same functionality.
Method 1:
doTheWork(int action)
{
for(int i = 0 i < 1000000000; ++i)
{
doAction(action);
}
}
Method 2:
doTheWork(int action)
{
switch(action)
{
case 1:
for(int i = 0 i < 1000000000; ++i)
{
doAction<1>();
}
break;
case 2:
for(int i = 0 i < 1000000000; ++i)
{
doAction<2>();
}
break;
//-----------------------------------------------
//... (there are 1000000 cases here)
//-----------------------------------------------
case 1000000:
for(int i = 0 i < 1000000000; ++i)
{
doAction<1000000>();
}
break;
}
}
Let's assume that the function doAction(int action) and the function template<int Action> doAction() consist of about 10 lines of code that will get inlined at compile-time. Calling doAction(#) is equiavalent to doAction<#>() in functionality, but the non-templated doAction(int value) is somewhat slower than template<int Value> doAction(), since some nice optimizations can be done in the code when the argument value is known at compile time.
So my question is, do all the millions of lines of code fill the CPU L1 cache (and more) in the case of the templated function (and thus degrade performance considerably), or does only the lines of doAction<#>() inside the loop currently being run get cached?
It depends on the actual code size - 10 lines of code can be little or much - and of course on the actual machine.
However, Method 2 violently violates this decades rule of thumb: instructions are cheap, memory access is not.
Scalability limit
Your optimizations are usually linear - you might shave off 10, 20 maybe even 30% of execution time. Hitting a cache limit is highly nonlinear - as in "running into a brick wall" nonlinear.
As soon as your code size significantly exceeds the 2nd/3rd level cache's size, Method 2 will lose big time, as the following estimation of a high end consumer system shows:
DDR3-1333 with 10667MB/s peak memory bandwidth,
Intel Core i7 Extreme with ~75000 MIPS
gives you 10667MB / 75000M = 0.14 bytes per instruction for break even - anything larger, and main memory can't keep up with the CPU.
Typical x86 instruction sizes are 2..3 bytes executing in 1..2 cycles (now, granted, this isn't necessarily the same instructions, as x86 instructions are split up. Still...)
Typical x64 instruction lengths are even larger.
How much does your cache help?
I found the following number (different source, so it's hard to compare):
i7 Nehalem L2 cache (256K, >200GB/s bandwidth) which could almost keep up with x86 instructions, but probably not with x64.
In addition, your L2 cache will kick in completely only if
you have perfect prediciton of the next instructions or you don't have first-run penalty and it fits the cache completely
there's no significant amount of data being processed
there's no significant other code in your "inner loop"
there's no thread executing on this core
Given that, you can lose much earlier, especially on a CPU/board with smaller caches.
The L1 instruction cache will only contain instructions which were fetched recently or in anticipation of near future execution. As such, the second method cannot fill the L1 cache simply because the code is there. Your execution path will cause it to load the template instantiated version that represents the current loop being run. As you move to the next loop, it will generally invalidate the least recently used (LRU) cache line and replace it with what you are executing next.
In other words, due to the looping nature of both your methods, the L1 cache will perform admirably in both cases and won't be the bottleneck.

Guess a buffer size and recall an api function or call it everytime twotimes to get exact buffer size

I want to retrieve the version information of msi-package(s)
What's the better way?
First: Guessing a buffer that is large enough and recall if it doesn't fit (ERROR_MORE_DATA)
1 func call vs. 3 func calls and buffer can be bigger then needed
Second: Call the api function to get the buffer size and then recall it to get the string with a (perfect) matching buffer size
2 func calls every time with a perfect buffer size
It's about (1 or 3) function call(s) vs. 2 function calls every time.
Is there any best practice for this "problem"?
I hope to get a generalized answer (assume calling function is really time consuming and/or buffer size can be very different (10 bytes to 200 megabyte) for further code writing. :-)
pseudo code:
First:
StringBuffer = 10 // (byte) guessing returned string will fit in 10 bytes
result = MsiGetProductInfoW(
product,
INSTALLPROPERTY_VERSIONSTRING,
VersionString,
StringBuffer
); //maybe it fits in 10
if result = ERROR_MORE_DATA then //doesnt fit in 10 so recall to get the correct buffer size
begin
MsiGetProductInfoW(
product,
INSTALLPROPERTY_VERSIONSTRING,
nil,
StringBuffer
);
Inc(StringBuffer); // cause null-terminated string
// recall it with matching
MsiGetProductInfoW(
product,
INSTALLPROPERTY_VERSIONSTRING,
VersionString,
StringBuffer
);
end;
Second:
StringBuffer = 0;
// get buffer size
MsiGetProductInfoW(
product,
INSTALLPROPERTY_VERSIONSTRING,
nil,
StringBuffer
);
Inc(StringBuffer); // cause null-terminated string
// use it with the correct buffersize
MsiGetProductInfoW(
product,
INSTALLPROPERTY_VERSIONSTRING,
VersionString,
StringBuffer
);
Thank you!
In your First option, you can skip the second call, because even on the failing first call, the needed size should be stored in StringBuffer.
This makes the choice (1 or 2) vs. (always 2). That should be clear enough. Further, it shouldn't be hard to come up with a reasonable-sized buffer, that will pass 90+% of the time.

Hacky Sql Compact Workaround

So, I'm trying to use ADO.NET to stream a file data stored in an image column in a SQL Compact database.
To do this, I wrote a DataReaderStream class that takes a data reader, opened for sequential access, and represents it as a stream, redirecting calls to Read(...) on the stream to IDataReader.GetBytes(...).
One "weird" aspect of IDataReader.GetBytes(...), when compared to the Stream class, is that GetBytes requires the client to increment an offset and pass that in each time it's called. It does this even though access is sequential, and it's not possible to read "backwards" in the data reader stream.
The SqlCeDataReader implementation of IDataReader enforces this by incrementing an internal counter that identifies the total number of bytes it has returned. If you pass in a number either less than or greater than that number, the method will throw an InvalidOperationException.
The problem with this, however, is that there is a bug in the SqlCeDataReader implementation that causes it to set the internal counter to the wrong value. This results in subsequent calls to Read on my stream throwing exceptions when they shouldn't be.
I found some infomation about the bug on this MSDN thread.
I was able to come up with a disgusting, horribly hacky workaround, that basically uses reflection to update the field in the class to the correct value.
The code looks like this:
public override int Read(byte[] buffer, int offset, int count)
{
m_length = m_length ?? m_dr.GetBytes(0, 0, null, offset, count);
if (m_fieldOffSet < m_length)
{
var bytesRead = m_dr.GetBytes(0, m_fieldOffSet, buffer, offset, count);
m_fieldOffSet += bytesRead;
if (m_dr is SqlCeDataReader)
{
//BEGIN HACK
//This is a horrible HACK.
m_field = m_field ?? typeof (SqlCeDataReader).GetField("sequentialUnitsRead", BindingFlags.NonPublic | BindingFlags.Instance);
var length = (long)(m_field.GetValue(m_dr));
if (length != m_fieldOffSet)
{
m_field.SetValue(m_dr, m_fieldOffSet);
}
//END HACK
}
return (int) bytesRead;
}
else
{
return 0;
}
}
For obvious reasons, I would prefer to not use this.
However, I do not want to buffer the entire contents of the blob in memory either.
Does any one know of a way I can get streaming data out of a SQL Compact database without having to resort to such horrible code?
I contacted Microsoft (through the SQL Compact Blog) and they confirmed the bug, and suggested I use OLEDB as a workaround. So, I'll try that and see if that works for me.
Actually, I decided to fix the problem by just not storing blobs in the database to begin with.
This eliminates the problem (I can stream data from a file), and also fixes some issues I might have run into with Sql Compact's 4 GB size limit.