I have an array of objects that each needs to load itself from binary file data. I create an array of these objects and then call an AsyncAction for each of them that starts it reading in its data. Trouble is, they are not loading entirely - they tend to get only part of the data from the files. How can I make sure that the whole thing is read? Here is an outline of the code: first I enumerate the folder contents to get a StorageFile for each file it contains. Then, in a for loop, each receiving object is created and passed the next StorageFile, and it creates its own Buffer and DataReader to handle the read. m_file_bytes is a std::vector.
m_buffer = co_await FileIO::ReadBufferAsync(nextFile);
m_data_reader = winrt::Windows::Storage::Streams::DataReader::FromBuffer(m_buffer);
m_file_bytes.resize(m_buffer.Length());
m_data_reader.ReadBytes(m_file_bytes);
My thought was that since the buffer and reader are class members of the object they would not go out of scope and could finish their work uninterrupted as the next objects were asked to load themselves in separate AsyncActions. But the DataReader only gets maybe half of the file data or less. What can be done to make sure it completes? Thanks for any insights.
[Update] Perhaps what is going is that the file system can handle only one read task at a time, and by starting all these async reads each is interrupting the previous one -? But there must be a way to progressively read a folder full of files.
[Update] I think I have it working, by adopting the principle of concentric loops - the idea is not to proceed to the next load until the first one has completed. I think - someone can correct me if I'm wrong, that the file system cannot do simultaneous reads. If there is an accepted and secure example of how to do this I would still love to hear about it, so I'm not answering my own question.
#include <wrl.h>
#include <robuffer.h>
uint8_t* GetBufferData(winrt::Windows::Storage::Streams::IBuffer& buffer)
{
::IUnknown* unknown = winrt::get_unknown(buffer);
::Microsoft::WRL::ComPtr<::Windows::Storage::Streams::IBufferByteAccess> bufferByteAccess;
HRESULT hr = unknown->QueryInterface(_uuidof(::Windows::Storage::Streams::IBufferByteAccess), &bufferByteAccess);
if (FAILED(hr))
return nullptr;
byte* bytes = nullptr;
bufferByteAccess->Buffer(&bytes);
return bytes;
}
https://learn.microsoft.com/en-us/cpp/cppcx/obtaining-pointers-to-data-buffers-c-cx?view=vs-2017
https://learn.microsoft.com/en-us/windows/uwp/xbox-live/storage-platform/connected-storage/connected-storage-using-buffers
Related
A 2012 answer at StackOverflow (“How do I read a binary file in a Windows Store app”) suggests this method of reading byte data from a StorageFile in a Windows Store app:
IBuffer buffer = await FileIO.ReadBufferAsync(theStorageFile);
byte[] bytes = buffer.ToArray();
That looks simple enough. As I am working in cppwinrt I have translated that to the following, within the same IAsyncAction that produced a vector of StorageFiles. First I obtain a StorageFile from the VectorView using theFilesVector.GetAt(index);
//Then this line compiles without error:
IBuffer buffer = co_await FileIO::ReadBufferAsync(theStorageFile);
//But I can’t find a way to make the buffer call work.
byte[] bytes = buffer.ToArray();
“byte[]” can’t work, to begin with, so I change that to byte*, but then
the error is “class ‘winrt::Windows::Storage::Streams::IBuffer’ has no member ‘ToArray’”
And indeed Intellisense lists no such member for IBuffer. Yet IBuffer was specified as the return type for ReadBufferAsync. It appears the above sample code cannot function as it stands.
In the documentation for FileIO I find it recommended to use DataReader to read from the buffer, which in cppwinrt should look like
DataReader dataReader = DataReader::FromBuffer(buffer);
That compiles. It should then be possible to read bytes with the following DataReader method, which is fortunately supplied in the UWP docs in cppwinrt form:
void ReadBytes(Byte[] value) const;
However, that does not compile because the type Byte is not recognized in cppwinrt. If I create a byte array instead:
byte* fileBytes = new byte(buffer.Length());
that is not accepted. The error is
‘No suitable constructor exists to convert from “byte*” to “winrt::arrayView::<uint8_t>”’
uint8_t is of course a byte, so let’s try
uint8_t fileBytes = new uint8_t(buffer.Length());
That is wrong - clearly we really need to create a winrt::array_view. Yet a 2015 Reddit post says that array_view “died” and I’m not sure how to declare one, or if it will help. That original one-line method for reading bytes from a buffer is looking so beautiful in retrospect. This is a long post, but can anyone suggest the best current method for simply reading raw bytes from a StorageFile reference in cppwinrt? It would be so fine if there were simply GetFileBytes() and GetFileBytesAsync() methods on StorageFile.
---Update: here's a step forward. I found a comment from Kenny Kerr last year explaining that array_view should not be declared directly, but that std::vector or std::array can be used instead. And that is accepted as an argument for the ReadBytes method of DataReader:
std::vector<unsigned char>fileBytes;
dataReader.ReadBytes(fileBytes);
Only trouble now is that the std::vector is receiving no bytes, even though the size of the referenced file is correctly returned in buffer.Length() as 167,513 bytes. That seems to suggest the buffer is good, so I'm not sure why the ReadBytes method applied to that buffer would produce no data.
Update #2: Kenny suggests reserving space in the vector, which is something I had tried, this way:
m_file_bytes.reserve(buffer.Length());
But it didn't make a difference. Here is a sample of the code as it now stands, using DataReader.
buffer = co_await FileIO::ReadBufferAsync(nextFile);
dataReader = DataReader::FromBuffer(buffer);
//The following line may be needed, but crashes
//co_await dataReader.LoadAsync(buffer.Length());
if (buffer.Length())
{
m_file_bytes.reserve(buffer.Length());
dataReader.ReadBytes(m_file_bytes);
}
The crash, btw, is
throw hresult_error(result, hresult_error::from_abi);
Is it confirmed, then, that the original 2012 solution quoted above cannot work in today's world? But of course there must be some way to read bytes from a file, so I'm just missing something that may be obvious to another.
Final (I think) update: Kenny's suggestion that the vector needs a size has hit the mark. If the vector is first prepared with m_file_bytes.assign(buffer.Length(),0) then it does get filled with file data. Now my only worry is that I don't really understand the way IAsyncAction is working and maybe could have trouble looping this asynchronously, but we'll see.
The array_view bridges the gap between Windows APIs and C++ array types. In this example, the ReadBytes method expects the caller to provide some array that it can copy bytes into. The array_view forwards a pointer to the caller's array as well as its size. In this case, you're passing an empty vector. Try resizing the vector before calling ReadBytes.
When you know how many bytes to expect (in this case 2 bytes), this worked for me:
std::vector<unsigned char>fileBytes;
fileBytes.resize(2);
DataReader reader = DataReader::FromBuffer(buffer);
reader.ReadBytes(fileBytes);
cout<< fileBytes[0] << endl;
cout<< fileBytes[1] << endl;
What do EMACS Lisp programmers do, when they want to write something roughly the equivalent of...
for line in open("foo.txt", "r", encoding="utf-8").readlines():
...(split on ws and call a fn, or whatever)...
..?
When I look in the EMACS lisp help, I see functions about opening files into text editing buffers -- not exactly what I was intending. I suppose I could write functions to visit the lines of the file, but if I did that, I wouldn't want the user to see it, and besides, it doesn't seem very efficient from a text-processing standpoint.
I think a more direct translation of the original Python code is as follows:
(with-temp-buffer
(insert-file-contents "foo.txt")
(while (search-forward-regexp "\\(.*\\)\n?" nil t)
; do something with this line in (match-string 1)
))
I think with-temp-buffer/insert-file-contents is generally preferable to with-current-buffer/find-file-noselect, because the former guarantees that you're working with a fresh copy of the entire file contents. With the latter construction, if you happen to already have a buffer visiting the target file, then that buffer is returned by find-file-noselect, so if that buffer has been narrowed, you'll only see that part of the file when you process it.
Keep in mind that it may very well be more convenient not to process the file line-by-line. For example, this is an expression that returns a list of all sequences of consecutive digits in the file:
(with-temp-buffer
(insert-file-contents "foo.txt")
(loop while (search-forward-regexp "[0-9]+" nil t)
collect (match-string 0)))
(require 'cl) first to bring in the loop macro.
Yes, that is what you want to do: visit the file in a buffer, and operate on the text in that buffer.
You do not have to display the buffer, i.e., the user need not see it.
And as for efficiency: manipulating text in a buffer is typically the most efficient way to manipulate text.
You can visit a file in a buffer in several ways. You might want to use an existing file buffer for this, depending on the use case. That is, if the file is already "open" in Emacs then you might want to use its buffer.
Or you might want to disregard any existing file buffer for an already "open" file, and read the file anew into a new buffer. For that, as #Sean mentions, you can use insert-file-contents with a buffer that you create. You can create the buffer using with-temp-buffer or generate-new-buffer, depending, again, on what you want/need to do with it.
If you do want to reuse a buffer that is already visiting the file, you can test whether it has been modified in memory, whether it is narrowed, etc., and do whatever is appropriate for your use case. You can check whether there is already a buffer visiting the file (using any path/file name) using function find-buffer-visiting.
To visit the file, taking advantage of any existing buffer that is visiting it, you can use find-file-noselect. That function returns the buffer that visits the file, so you can pass that buffer as the first argument to with-current-buffer. Here is a simple example.
(with-current-buffer (let ((enable-local-variables ())) (find-file-noselect file))
;; Do some stuff with the text in the buffer.
;; Optionally save the buffer back to the file.
)
(The binding of enable-local-variables to nil is a minor optimization, for the common case where you don't need to bother with buffer-local variables.)
How to construct easily a raw byte-by-byte InputRange/ForwardRange/RandomAccessRange from a file?
file.byChunk(4096).joiner
This reads a file in 4096-byte chunks and lazily joins the chunks together into a single ubyte input range.
joiner is from std.algorithm, so you'll have to import it first.
The easiest way to make a raw byte range from a file is to just read it all right into memory:
import std.file;
auto data = cast(ubyte[]) read("filename");
// data is a full-featured random access range of the contents
If the file is too large for that to be reasonable, you could try a memory-mapped file http://dlang.org/phobos/std_mmfile.html and use the opSlice to get an array off it. Since it is an array, you get full range features, but since it is memory mapped by the operating system, you get lazy reading as you touch the file.
For a simple InputRange, there's LockingTextReader (undocumented) in Phobos, or you could construct one yourself over byChunk or even fgetc, the C function. fgetc would be the easiest to write:
struct FileByByte {
ubyte front;
void popFront() { front = cast(ubyte) fgetc(fp); }
bool empty() { return feof(fp); }
FILE* fp;
this(FILE* fp) { this.fp = fp; popFront(); /* prime it */ }
}
I haven't actually tested that but i'm pretty sure it'd work. (BTW the file open and close is separate from this because ranges are supposed to be just views into data, not managed containers. You wouldn't want the file closed just because you passed this range into a function.)
This is not a forward nor random access range though. Those are trickier to do on streams without a lot of buffering code and I think that'd be a mistake to try to write - generally, ranges should be cheap, not emulating features the underlying container doesn't natively support.
EDIT: The other answer has a non-buffering way! https://stackoverflow.com/a/30278933/1457000 That's awesome.
Recently I'm into creating checksums for files in go. My code is working with small and big files. I tried two methods, the first uses ioutil.ReadFile("filename") and the second is working with os.Open("filename").
Examples:
The first function is working with the io/ioutil and works for small files. When I try to copy a big file my ram gets blastet and for a 1.5GB iso it uses 3GB of ram.
func byteCopy(fileToCopy string) {
file, err := ioutil.ReadFile(fileToCopy) //1.5GB file
omg(err) //error handling function
ioutil.WriteFile("2.iso", file, 0777)
os.Remove("2.iso")
}
Even worse when I want to create a checksum with crypto/sha512 and io/ioutil.
It will never finish and abort because it runs out of memory.
func ioutilHash() {
file, _ := ioutil.ReadFile(iso)
h := sha512.New()
fmt.Printf("%x", h.Sum(file))
}
When using the function below everything works fine.
func ioHash() {
f, err := os.Open(iso) //iso is a big ~ 1.5tb file
omg(err) //error handling function
defer f.Close()
h := sha512.New()
io.Copy(h, f)
fmt.Printf("%x", h.Sum(nil))
}
My Question:
Why is the ioutil.ReadFile() function not working right? The 1.5GB file should not fill my 16GB of ram. I don't know where to look right now.
Could somebody explain the differences between the methods? I don't get it with reading the go-doc and examples.
Having usable code is nice, but understanding why its working is way above that.
Thanks in advance!
The following code doesn't do what you think it does.
func ioutilHash() {
file, _ := ioutil.ReadFile(iso)
h := sha512.New()
fmt.Printf("%x", h.Sum(file))
}
This first reads your 1.5GB iso. As jnml pointed out, it continuously makes bigger and bigger buffers to fill it. In the end, And total buffer size is no less than 1.5GB and no greater than 1.875GB (by the current implementation).
However, after that you then make another buffer! h.Sum(file) doesn't hash file. It appends the current hash to file! This may or may not cause yet another allocation.
The real problem is that you are taking that file, now appended with the hash, and printing it with %x. Fmt actually pre-computes using the same type of method jnml pointed out that ioutil.ReadAll used. So it constantly allocated bigger and bigger buffers to store the hex of your file. Since each letter is 4 bits, that means we are talking about no less than a 3GB buffer for that and no greater than 3.75GB.
This means your active buffers may be as big 5.625GB. Combine that with the GC not being perfect and not removing all the intermediate buffers, and it could very easily fill your space.
The correct way to write that code would have been.
func ioutilHash() {
file, _ := ioutil.ReadFile(iso)
h := sha512.New()
h.Write(file)
fmt.Printf("%x", h.Sum(nil))
}
This doesn't do nearly the number the allocations.
The bottom line is that ReadFile is rarely what you want to use. IO streaming (using readers and writers) is always the best way when it is an option. Not only do you allocate much less when you use io.Copy, you also hash and read the disk concurrently. In your ReadFile example, the two resources are used synchronously when they don't depend on each other.
ioutil.ReadFile is working right. It's your fault to abuse the system resources by using that function for things you know are huge.
ioutil.ReadFile is a handy helper for files you're pretty sure in advance that they're going to be small. Like configuration files, most source code files etc. (Actually it's optimizing things for files <= 1e9 bytes, but that's an implementation detail and not part of the API contract. Your 1.5GB file forces it to use slice growing and thus allocating more than one big buffer for your data in the process of reading the file.)
Even your other approach using os.File is not okay. You definitely should be using the "bufio" package for sequential processing of large files, see bufio.NewReader.
I am trying to build a resource file for a website basically jamming all the images into a compressed file that is then unpacked on the output buffers to the client.
my question is in vb2005 can a filestream be multi threaded if you know the size of the converted file, ala like a bit torrent and work on pieces of the filestream ( the individual files in this case) and add them to the resource filestream when they are done instead of one at a time?
If you need something similar to the torrents way of writing to a file, this is how I would implement it:
Open a FileStream on Thread T1, and create a queue "monitor" for step 2
Create a queue that will be read from T1, but written by multiple network reader threads. (the queue data structure would look like this: (position where to write, size of data buffer, data buffer).
Fire up the threads
:)
Anyway, from your comments, your problem seems to be another one..
I have found something in, but I'm not sure if it works:
If you want to write data to a file,
two parallel methods are available,
WriteByte() and Write(). WriteByte()
writes a single byte to the stream:
byte NextByte = 100;
fs.WriteByte(NextByte);
Write(), on the other hand, writes out
an array of bytes. For instance, if
you initialized the ByteArray
mentioned before with some values, you
could use the following code to write
out the first nBytes of the array:
fs.Write(ByteArray, 0, nBytes);
Citation from:
Nagel, Christian, Bill Evjen, Jay
Glynn, Morgan Skinner, and Karli
Watson. "Chapter 24 - Manipulating
Files and the Registry". Professional
C# 2005 with .NET 3.0. Wrox Press. ©
2007. Books24x7. http://common.books24x7.com/book/id_20568/book.asp
(accessed July 22, 2009)
I'm not sure if you're asking if a System.IO.FileStream object can be read from or written to in a multi-threaded fashion. But the answer in both cases is no. This is not a supported scenario. You will need to add some form of locking to ensure serialized access to the resource.
The documentation calls out multi-threaded access to the object as an unsupported scenario
http://msdn.microsoft.com/en-us/library/system.io.filestream.aspx