So, I'm trying to use ADO.NET to stream a file data stored in an image column in a SQL Compact database.
To do this, I wrote a DataReaderStream class that takes a data reader, opened for sequential access, and represents it as a stream, redirecting calls to Read(...) on the stream to IDataReader.GetBytes(...).
One "weird" aspect of IDataReader.GetBytes(...), when compared to the Stream class, is that GetBytes requires the client to increment an offset and pass that in each time it's called. It does this even though access is sequential, and it's not possible to read "backwards" in the data reader stream.
The SqlCeDataReader implementation of IDataReader enforces this by incrementing an internal counter that identifies the total number of bytes it has returned. If you pass in a number either less than or greater than that number, the method will throw an InvalidOperationException.
The problem with this, however, is that there is a bug in the SqlCeDataReader implementation that causes it to set the internal counter to the wrong value. This results in subsequent calls to Read on my stream throwing exceptions when they shouldn't be.
I found some infomation about the bug on this MSDN thread.
I was able to come up with a disgusting, horribly hacky workaround, that basically uses reflection to update the field in the class to the correct value.
The code looks like this:
public override int Read(byte[] buffer, int offset, int count)
{
m_length = m_length ?? m_dr.GetBytes(0, 0, null, offset, count);
if (m_fieldOffSet < m_length)
{
var bytesRead = m_dr.GetBytes(0, m_fieldOffSet, buffer, offset, count);
m_fieldOffSet += bytesRead;
if (m_dr is SqlCeDataReader)
{
//BEGIN HACK
//This is a horrible HACK.
m_field = m_field ?? typeof (SqlCeDataReader).GetField("sequentialUnitsRead", BindingFlags.NonPublic | BindingFlags.Instance);
var length = (long)(m_field.GetValue(m_dr));
if (length != m_fieldOffSet)
{
m_field.SetValue(m_dr, m_fieldOffSet);
}
//END HACK
}
return (int) bytesRead;
}
else
{
return 0;
}
}
For obvious reasons, I would prefer to not use this.
However, I do not want to buffer the entire contents of the blob in memory either.
Does any one know of a way I can get streaming data out of a SQL Compact database without having to resort to such horrible code?
I contacted Microsoft (through the SQL Compact Blog) and they confirmed the bug, and suggested I use OLEDB as a workaround. So, I'll try that and see if that works for me.
Actually, I decided to fix the problem by just not storing blobs in the database to begin with.
This eliminates the problem (I can stream data from a file), and also fixes some issues I might have run into with Sql Compact's 4 GB size limit.
Related
I have an array of objects that each needs to load itself from binary file data. I create an array of these objects and then call an AsyncAction for each of them that starts it reading in its data. Trouble is, they are not loading entirely - they tend to get only part of the data from the files. How can I make sure that the whole thing is read? Here is an outline of the code: first I enumerate the folder contents to get a StorageFile for each file it contains. Then, in a for loop, each receiving object is created and passed the next StorageFile, and it creates its own Buffer and DataReader to handle the read. m_file_bytes is a std::vector.
m_buffer = co_await FileIO::ReadBufferAsync(nextFile);
m_data_reader = winrt::Windows::Storage::Streams::DataReader::FromBuffer(m_buffer);
m_file_bytes.resize(m_buffer.Length());
m_data_reader.ReadBytes(m_file_bytes);
My thought was that since the buffer and reader are class members of the object they would not go out of scope and could finish their work uninterrupted as the next objects were asked to load themselves in separate AsyncActions. But the DataReader only gets maybe half of the file data or less. What can be done to make sure it completes? Thanks for any insights.
[Update] Perhaps what is going is that the file system can handle only one read task at a time, and by starting all these async reads each is interrupting the previous one -? But there must be a way to progressively read a folder full of files.
[Update] I think I have it working, by adopting the principle of concentric loops - the idea is not to proceed to the next load until the first one has completed. I think - someone can correct me if I'm wrong, that the file system cannot do simultaneous reads. If there is an accepted and secure example of how to do this I would still love to hear about it, so I'm not answering my own question.
#include <wrl.h>
#include <robuffer.h>
uint8_t* GetBufferData(winrt::Windows::Storage::Streams::IBuffer& buffer)
{
::IUnknown* unknown = winrt::get_unknown(buffer);
::Microsoft::WRL::ComPtr<::Windows::Storage::Streams::IBufferByteAccess> bufferByteAccess;
HRESULT hr = unknown->QueryInterface(_uuidof(::Windows::Storage::Streams::IBufferByteAccess), &bufferByteAccess);
if (FAILED(hr))
return nullptr;
byte* bytes = nullptr;
bufferByteAccess->Buffer(&bytes);
return bytes;
}
https://learn.microsoft.com/en-us/cpp/cppcx/obtaining-pointers-to-data-buffers-c-cx?view=vs-2017
https://learn.microsoft.com/en-us/windows/uwp/xbox-live/storage-platform/connected-storage/connected-storage-using-buffers
A 2012 answer at StackOverflow (“How do I read a binary file in a Windows Store app”) suggests this method of reading byte data from a StorageFile in a Windows Store app:
IBuffer buffer = await FileIO.ReadBufferAsync(theStorageFile);
byte[] bytes = buffer.ToArray();
That looks simple enough. As I am working in cppwinrt I have translated that to the following, within the same IAsyncAction that produced a vector of StorageFiles. First I obtain a StorageFile from the VectorView using theFilesVector.GetAt(index);
//Then this line compiles without error:
IBuffer buffer = co_await FileIO::ReadBufferAsync(theStorageFile);
//But I can’t find a way to make the buffer call work.
byte[] bytes = buffer.ToArray();
“byte[]” can’t work, to begin with, so I change that to byte*, but then
the error is “class ‘winrt::Windows::Storage::Streams::IBuffer’ has no member ‘ToArray’”
And indeed Intellisense lists no such member for IBuffer. Yet IBuffer was specified as the return type for ReadBufferAsync. It appears the above sample code cannot function as it stands.
In the documentation for FileIO I find it recommended to use DataReader to read from the buffer, which in cppwinrt should look like
DataReader dataReader = DataReader::FromBuffer(buffer);
That compiles. It should then be possible to read bytes with the following DataReader method, which is fortunately supplied in the UWP docs in cppwinrt form:
void ReadBytes(Byte[] value) const;
However, that does not compile because the type Byte is not recognized in cppwinrt. If I create a byte array instead:
byte* fileBytes = new byte(buffer.Length());
that is not accepted. The error is
‘No suitable constructor exists to convert from “byte*” to “winrt::arrayView::<uint8_t>”’
uint8_t is of course a byte, so let’s try
uint8_t fileBytes = new uint8_t(buffer.Length());
That is wrong - clearly we really need to create a winrt::array_view. Yet a 2015 Reddit post says that array_view “died” and I’m not sure how to declare one, or if it will help. That original one-line method for reading bytes from a buffer is looking so beautiful in retrospect. This is a long post, but can anyone suggest the best current method for simply reading raw bytes from a StorageFile reference in cppwinrt? It would be so fine if there were simply GetFileBytes() and GetFileBytesAsync() methods on StorageFile.
---Update: here's a step forward. I found a comment from Kenny Kerr last year explaining that array_view should not be declared directly, but that std::vector or std::array can be used instead. And that is accepted as an argument for the ReadBytes method of DataReader:
std::vector<unsigned char>fileBytes;
dataReader.ReadBytes(fileBytes);
Only trouble now is that the std::vector is receiving no bytes, even though the size of the referenced file is correctly returned in buffer.Length() as 167,513 bytes. That seems to suggest the buffer is good, so I'm not sure why the ReadBytes method applied to that buffer would produce no data.
Update #2: Kenny suggests reserving space in the vector, which is something I had tried, this way:
m_file_bytes.reserve(buffer.Length());
But it didn't make a difference. Here is a sample of the code as it now stands, using DataReader.
buffer = co_await FileIO::ReadBufferAsync(nextFile);
dataReader = DataReader::FromBuffer(buffer);
//The following line may be needed, but crashes
//co_await dataReader.LoadAsync(buffer.Length());
if (buffer.Length())
{
m_file_bytes.reserve(buffer.Length());
dataReader.ReadBytes(m_file_bytes);
}
The crash, btw, is
throw hresult_error(result, hresult_error::from_abi);
Is it confirmed, then, that the original 2012 solution quoted above cannot work in today's world? But of course there must be some way to read bytes from a file, so I'm just missing something that may be obvious to another.
Final (I think) update: Kenny's suggestion that the vector needs a size has hit the mark. If the vector is first prepared with m_file_bytes.assign(buffer.Length(),0) then it does get filled with file data. Now my only worry is that I don't really understand the way IAsyncAction is working and maybe could have trouble looping this asynchronously, but we'll see.
The array_view bridges the gap between Windows APIs and C++ array types. In this example, the ReadBytes method expects the caller to provide some array that it can copy bytes into. The array_view forwards a pointer to the caller's array as well as its size. In this case, you're passing an empty vector. Try resizing the vector before calling ReadBytes.
When you know how many bytes to expect (in this case 2 bytes), this worked for me:
std::vector<unsigned char>fileBytes;
fileBytes.resize(2);
DataReader reader = DataReader::FromBuffer(buffer);
reader.ReadBytes(fileBytes);
cout<< fileBytes[0] << endl;
cout<< fileBytes[1] << endl;
The .Net SDK documentation for TableBatchOperation says that
A batch operation may contain up to 100 individual table operations, with the requirement that each operation entity must have same partition key. A batch with a retrieve operation cannot contain any other operations. Note that the total payload of a batch operation is limited to 4MB.
It's easy to ensure that I don't add more than 100 individual table operations to the batch: in the worst case, I can check the Count property. But is there any way to check the payload size other than manually serialising the operations (at which point I've lost most of the benefit of using the SDK)?
As you add entities you can track the size of the names plus data. Assuming you're using a newer library where the default is Json, the additional characters added should be relatively small (compared to the data if you're close to 4MB) and estimable. This isn't a perfect route, but it would get you close.
Serializing as you go especially if you're actually getting close to the 100 entity limit or the 4MB limit frequently is going to lose you a lot of perf, aside from any convenience lost. Rather than trying to track as you go either by estimating size or serializing, you might be best off sending the batch request as-is and if you get a 413 indicating request body too large, catch the error, divide the batch in 2, and continue.
I followed Emily Gerner's suggestion using optimistic inserts and error handling, but using StorageException.RequestInformation.EgressBytes to estimate the number of operations which fit in the limit. Unless the size of the operations varies wildly, this should be more efficient. There is a case to be made for not raising len every time, but here's an implementation which goes back to being optimistic each time.
int off = 0;
while (off < ops.Count)
{
// Batch size.
int len = Math.Min(100, ops.Count - off);
while (true)
{
var batch = new TableBatchOperation();
for (int i = 0; i < len; i++) batch.Add(ops[off + i]);
try
{
_Tbl.ExecuteBatch(batch);
break;
}
catch (Microsoft.WindowsAzure.Storage.StorageException se)
{
var we = se.InnerException as WebException;
var resp = we != null ? (we.Response as HttpWebResponse) : null;
if (resp != null && resp.StatusCode == HttpStatusCode.RequestEntityTooLarge)
{
// Assume roughly equal sizes, and base updated length on the size of the previous request.
// We assume that no individual operation is too big!
len = len * 4000000 / (int)se.RequestInformation.EgressBytes;
}
else throw;
}
}
off += len;
}
How to construct easily a raw byte-by-byte InputRange/ForwardRange/RandomAccessRange from a file?
file.byChunk(4096).joiner
This reads a file in 4096-byte chunks and lazily joins the chunks together into a single ubyte input range.
joiner is from std.algorithm, so you'll have to import it first.
The easiest way to make a raw byte range from a file is to just read it all right into memory:
import std.file;
auto data = cast(ubyte[]) read("filename");
// data is a full-featured random access range of the contents
If the file is too large for that to be reasonable, you could try a memory-mapped file http://dlang.org/phobos/std_mmfile.html and use the opSlice to get an array off it. Since it is an array, you get full range features, but since it is memory mapped by the operating system, you get lazy reading as you touch the file.
For a simple InputRange, there's LockingTextReader (undocumented) in Phobos, or you could construct one yourself over byChunk or even fgetc, the C function. fgetc would be the easiest to write:
struct FileByByte {
ubyte front;
void popFront() { front = cast(ubyte) fgetc(fp); }
bool empty() { return feof(fp); }
FILE* fp;
this(FILE* fp) { this.fp = fp; popFront(); /* prime it */ }
}
I haven't actually tested that but i'm pretty sure it'd work. (BTW the file open and close is separate from this because ranges are supposed to be just views into data, not managed containers. You wouldn't want the file closed just because you passed this range into a function.)
This is not a forward nor random access range though. Those are trickier to do on streams without a lot of buffering code and I think that'd be a mistake to try to write - generally, ranges should be cheap, not emulating features the underlying container doesn't natively support.
EDIT: The other answer has a non-buffering way! https://stackoverflow.com/a/30278933/1457000 That's awesome.
I'd like to know if I can add SignalR messages directly to the SignalR SQL Backplane (from SQL) so I don't have to use a SignalR client to do so.
My situation is that I have an activated stored procedure for a SQL Service Broker queue, and when it fires, I'd like to post a message to SignalR clients. Currently, I have to receive the message from SQL Service Broker in a seperate process and then immediately re-send the message with a SignalR hub.
I'd like my activated stored procedure to basically move the message directly onto the SignalR SQL Backplane.
Yes and no. I set up a small experiment on my localhost to determine if possible - and it is, if formatted properly.
Now, a word on the [SignalR] schema. It generates three tables:
[SignalR].[Messages_0]
--this holds a list of all messages with the columns of
--[PayloadId], [Payload], and [InsertedOn]
[SignalR].[Messages_0_Id]
--this holds one record of one field - the last Id value in the [Messages_0] table
[SignalR].[Scehma]
--No idea what this is for; it's a 1 column (SchemaVersion) 1 record (value of 1) table
Right, so, I duplicated the last column except I incremented the PayloadId (for the new record and in [Messages_0_Id] and put in GETDATE() as the value for InsertedOn. Immediately after adding the record, a new message came into the connected client. Note that PayloadId is not an identity column, so you must manually increment it, and you must copy that incremented value into the only record in [Messages_0_Id], otherwise your signalr clients will be unable to connect due to Signalr SQL errors.
Now, the trick is populating the [Payload] column properly. A quick look at the table shows that it's probably binary serialized. I'm no expert at SQL, but I'm pretty sure doing a binary serialization is up there in difficulty. If I'm right, this is the source code for the binary serialization, located inside Microsoft.AspNet.SignalR.Messaging.ScaleoutMessage:
public byte[] ToBytes()
{
using (MemoryStream memoryStream = new MemoryStream())
{
BinaryWriter binaryWriter = new BinaryWriter((Stream) memoryStream);
binaryWriter.Write(this.Messages.Count);
for (int index = 0; index < this.Messages.Count; ++index)
this.Messages[index].WriteTo((Stream) memoryStream);
binaryWriter.Write(this.ServerCreationTime.Ticks);
return memoryStream.ToArray();
}
}
With WriteTo:
public void WriteTo(Stream stream)
{
BinaryWriter binaryWriter = new BinaryWriter(stream);
string source = this.Source;
binaryWriter.Write(source);
string key = this.Key;
binaryWriter.Write(key);
int count1 = this.Value.Count;
binaryWriter.Write(count1);
ArraySegment<byte> arraySegment = this.Value;
byte[] array = arraySegment.Array;
arraySegment = this.Value;
int offset = arraySegment.Offset;
arraySegment = this.Value;
int count2 = arraySegment.Count;
binaryWriter.Write(array, offset, count2);
string str1 = this.CommandId ?? string.Empty;
binaryWriter.Write(str1);
int num1 = this.WaitForAck ? 1 : 0;
binaryWriter.Write(num1 != 0);
int num2 = this.IsAck ? 1 : 0;
binaryWriter.Write(num2 != 0);
string str2 = this.Filter ?? string.Empty;
binaryWriter.Write(str2);
}
So, re-implementing that in a stored procedure with pure SQL will be near impossible. If you need to do it on the SQL Server, I suggest using SQL CLR functions. One thing to mention, though - It's easy enough to use a class library, but if you want to reduce hassle over the long term, I'd suggest creating a SQL Server project in Visual Studio. This will allow you to automagically deploy CLR functions with much more ease than manually re-copying the latest class library to the SQL Server. This page talks more about how to do that.
I was inspired by this post and wrote up a version in an SQL stored proc. Works really slick, and wasn't hard.
I hadn't done much work with varbinary before - but SQL server makes it pretty easy to work with, and you can just add the sections together. The format given by James Haug above is accurate. Most of the strings are just "length as byte then the string content" (with the string content just being convert(varbinary,string)). The exception string is the payload, which instead is "length as int32 then string content". Numbers are written out "least significant byte first". I'm not sure whether you can do a conversion like this natively - I found it easy enough to write this myself as a recursive function (something like numToBinary(val,bytesRemaining)... returns varbinary).
If you take this route, I'd still write a parser first (in .NET or another non-SQL language) and test it on some packets generated by SignalR itself. That gives you a better place to work out the kinks in your SQL - and learn the right formatting of the payload package and what not.