In Lua, how should I read a file into an array of bytes? - file-io

To read a file into an array of bytes a, I have been using the following code:
file = io.open(fileName, "rb")
str = file:read("*a")
a = {str:byte(1, #str)}
Although this works for smaller files, str:byte fails for a 1MB file, giving stack overflow (string slice too long).
Is there an alternative method which will successfully read these larger files?

local fileName = 'C:\\Program Files\\Microsoft Office\\Office12\\excel.exe'
local file = assert(io.open(fileName, 'rb'))
local t = {}
repeat
local str = file:read(4*1024)
for c in (str or ''):gmatch'.' do
t[#t+1] = c:byte()
end
until not str
file:close()
print(#t) --> 18330984

In case of using LuaJIT, another approach is to read a chunk of bytes and convert it to a C array. If reading the whole file in one shot, the buffer should allocate enough memory to store it (filesize bytes). Alternativaly, it could possible to read the file in chunks and reuse the buffer for each chunk.
The advantage of using a C buffer is that it's more efficient, memory-wise, than to convert a chunk of bytes to a Lua string or to a Lua table. The disadvantage is that FFI is only supported in LuaJIT.
local ffi = require("ffi")
-- Helper function to calculate file size.
local function filesize (fd)
local current = fd:seek()
local size = fd:seek("end")
fd:seek("set", current)
return size
end
local filename = "example.bin"
-- Open file in binary mode.
local fd, err = io.open(filename, "rb")
if err then error(err) end
-- Get size of file and allocate a buffer for the whole file.
local size = filesize(fd)
local buffer = ffi.new("uint8_t[?]", size)
-- Read whole file and store it as a C buffer.
ffi.copy(buffer, fd:read(size), size)
fd:close()
-- Iterate through buffer to print out contents.
for i=0,size-1 do
io.write(buffer[i], " ")
end

This will store each block (1) bytes from the file file.txt into the table bytes
local bytes = {}
file = assert(io.open("file.txt","rb"))
block = 1 --blocks of 1 byte
while true do
local byte = file:read(block)
if byte == nil then
break
else
bytes[#bytes+1] = string.byte(byte)
end
end
file:close()

Related

How save GTiff data in memory by C++

I want to create dataset to store the image data as Gtiff format, and then pass the memory address of whole Gtiff file in memory to anther function that pulish this tiff as WMS service.
Now I found the funciton: VSIGetMemFileBuffer returns that binData is 0x0 and binDataLength is very large number. Why? How to achive my purpose? Thanks a lot!
The following is my code snippet:
GDALDriver *hMemDriver = GetGDALDriverManager()->GetDriverByName("MEM");
char** papszOptions = NULL;
GDALDataset *hMemDS = hMemDriver->Create(“/vsimem/geotiffnameout”, NCOLS, NROWS, 0, GDT_Float32, NULL);
char szTmp[64];
memset(szTmp, 0, sizeof(szTmp));
CPLPrintPointer(szTmp, dataBuff, sizeof(szTmp)); //dataBuff is my image data buffer
papszOptions = CSLSetNameValue(papszOptions, "DATAPOINTER", szTmp);
hMemDS->AddBand(GDT_Float32, papszOptions);
vsi_l_offset binDataLength;
int bUnlinkAndSeize = FALSE;
GByte* binData = VSIGetMemFileBuffer(“/vsimem/geotiffnameout”, &binDataLength, bUnlinkAndSeize); //Here, I get that binData is 0x0 and binDataLength is very large number. Why?
I want to store the image data proecessed as GTiff format in memory!

How can I read \x1a from a file? [duplicate]

I am attempting to write a bittorrent client. In order to parse the file etc. I need to read a torrent file into memory. I have noticed that fread is not reading the entire file into my buffer. After further investigation it appears that whenever the symbol shown below is encountered in the file, fread stops reading the file. Calling the feof function on the FILE* pointer returns 16 indicating that the end of file has been reached. This occurs no matter where the symbol is placed. Can somebody explain why this happens and any solutions that may work.
The symbol is highlighted below:
Here is the code that does the read operation:
char *read_file(const char *file, long long *len){
struct stat st;
char *ret = NULL;
FILE *fp;
//store the size/length of the file
if(stat(file, &st)){
return ret;
}
*len = st.st_size;
//open a stream to the specified file
fp = fopen(file, "r");
if(!fp){
return ret;
}
//allocate space in the buffer for the file
ret = (char*)malloc(*len);
if(!ret){
return NULL;
}
//Break down the call to fread into smaller chunks
//to account for a known bug which causes fread to
//behave strangely with large files
//Read the file into the buffer
//fread(ret, 1, *len, fp);
if(*len > 10000){
char *retTemp = NULL;
retTemp = ret;
int remaining = *len;
int read = 0, error = 0;
while(remaining > 1000){
read = fread(retTemp, 1, 1000, fp);
if(read < 1000){
error = feof(fp);
if(error != 0){
printf("Error: %d\n", error);
}
}
retTemp += 1000;
remaining -= 1000;
}
fread(retTemp, 1, remaining, fp);
} else {
fread(ret, 1, *len, fp);
}
//cleanup by closing the file stream
fclose(fp);
return ret;
}
Thank you for your time :)
Your question is oddly relevant as I recently ran into this problem in an application here at work last week!
The ASCII value of this character is decimal 26 (0x1A, \SUB, SUBSTITUTE). This is used to represent the CTRL+Z key sequence or an End-of-File marker.
Change your fopen mode ("In [Text] mode, CTRL+Z is interpreted as an end-of-file character on input.") to get around this on Windows:
fp = fopen(file, "rb"); /* b for 'binary', disables Text-mode translations */
You should open the file in binary mode. Some platforms, in text (default) mode, interpret some bytes as being physical end of file markers.
You're opening the file in text rather than raw/binary mode - the arrow is ASCII for EOF. Specify "rb" rather than just "r" for your fopen call.

redis bitset -- how to upload an existing bitset array

I have huge data of bitset, stored in db. I want to upload the same to redis bitset, so I can perform bit operations on it. Is there a way to upload this data from either redis-cli or javascript code? I am using bitset.js npm module to load the bitset in my program from db.
One obvious way is to iterate my bitset array within my javascript code and keep calling redis.setbit(...) multiple times. Is there a way to upload all of them at once? If so how?
A bitset in Redis is actually just a string, so you can assign to it directly all at once. The bits in the string are the bits of the bitfield, set in left-to-right order. I.e. setting bit number 0 to 1 yields the binary number 10000000, or a single byte with the value 128. This looks like "\x80" when Redis prints it, which you can see for yourself by running setbit foo 0 1 and then get foo in Redis.
So to construct the right string to send to Redis, we just need to read the bits out of your BitSet and construct a buffer, one byte at a time, with the appropriate bits set.
Below is code that uses bitset.js and the redis npm module to transfer a BitSet in JavaScript into a Redis key. Note that this code assumes that the bitfield fits comfortably in memory.
let redis = require('redis'),
BitSet = require('./bitset');
let client = redis.createClient();
// create some data
let bs = new BitSet;
bs.set(0, 1);
bs.set(31, 1);
// calculate how many bytes we'll need
var numBytes = Math.ceil(bs.msb()/8);
// construct a buffer with that much space
var buffer = new Buffer(numBytes);
// for each byte
for (var i = 0; i < numBytes; i++) {
var byte = 0;
// iterate over each bit
for (var j = 0; j < 8; j++) {
// slide previous bits to the left
byte <<= 1;
// and set the rightmost bit
byte |= bs.get(i*8+j);
}
// put this byte in the buffer
buffer[i] = byte;
}
// now we have a complete buffer to use as our value in Redis
client.set('bitset', buffer, function (err, result) {
client.getbit('bitset', 31, function (err, result) {
console.log('Bit 31 = ' + result);
client.del('bitset', function () {
client.quit();
});
});
});

How to decompress pbzip2 data in memory buffer by using libbz2 library in C++

I have a working version of decompressing bzip2 data where I call the bz2_bzdecompress API. It goes something like this
while (bytes_input < len) {
isDone = false;
// Initialize the input buffer and its length
size_t in_buffer_size = len -bytes_input;
the_bz2_stream.avail_in = in_buffer_size;
the_bz2_stream.next_in = (char*)data +bytes_input;
size_t out_buffer_size =
output_size -bytes_uncompressed; // size of output buffer
if (out_buffer_size == 0) { // out of space in the output buffer
break;
}
the_bz2_stream.avail_out = out_buffer_size;
the_bz2_stream.next_out =
(char*)output +bytes_uncompressed; // output buffer
ret = BZ2_bzDecompress(&the_bz2_stream);
if (ret != BZ_OK && ret != BZ_STREAM_END) {
throw Bzip2Exception("Bzip2 failed. ", ret);
}
bytes_input += in_buffer_size - the_bz2_stream.avail_in;
bytes_uncompressed += out_buffer_size - the_bz2_stream.avail_out;
*data_consumed =bytes_input;
if (ret == BZ_STREAM_END) {
ret = BZ2_bzDecompressEnd(&the_bz2_stream);
if (ret != BZ_OK) {
throw Bzip2Exception("Bzip2 fail. ", ret);
}
isDone = true;
}
}
This works great for native bzip2 compressed files, but for pbzip2 (Parallel Bzip2) and "Splittable" bzip2 data, it throws a "BZ_PARAM_ERROR".
I see that pbzip2 in their documentation says this-
Data compressed with pbzip2 is broken into multiple streams and each
stream is bzip2 compressed looking like this:
[-----|-----|-----|-----|-----|-----|-----|-----|-----]
If you are writing software with libbzip2 to decompress data created
with pbzip2, you must take into account that the data contains
multiple bzip2 streams so you will encounter end-of-stream markers
from libbzip2 after each stream and must look-ahead to see if there
are any more streams to process before quitting. The bzip2 program
itself will automatically handle this condition.
Source:http://compression.ca/pbzip2/
Can someone please tell me how to handle this? Should I be using some other libzip2 API?
Also, pbzip2 files are compatible with the normal "bunzip2" command. How is that bzip2 handles this gracefully while my code throws a BZ_PARAM_ERROR?
Thanks.
After your BZ2_bzDecompressEnd() you need to call BZ2_bzDecompressInit() again (you must have called it initially before that loop), if there is still data left to decompress, i.e. bytes_input < len.
To decompress each of the |-----| blocks, you need to do an init, some number of decompress calls, and an end. So if you still have input left, then you need to do another init, n*decompress, end.
Make sure that you do a final end, in order to avoid a big memory leak.
You're getting a BZ_PARAM_ERROR because you are trying to use an uninitialized bz_stream to decompress. Once you do BZ2_bzDecompressEnd(), you can't use that bz_stream any more, unless you do a BZ2_bzDecompressInit() on it.

Using fread to read and want to use structures from a binary file in C

I have a binary file and i use this code to read it.
FILE * File;
long Size;
char * buffer;
size_t result;
File = fopen ( "STUD.bin" , "rb" );
fseek (File , 0 , SEEK_END);
Size = ftell (File);
rewind (File);
buffer = (char*) malloc (sizeof(char)*Size);
result = fread (buffer,1,Size,File);
Ii want to use those structures from the binary file.
What code should i use for that?
Your question seems a little unclear. But if the case is that you have a structure in the binary file and want to read it then do the following
struct example abc;
fread(&abc,sizeof(abc),1,File);