Decompress gzip file that contine multiple blocks - gzip

I have a Gzip file that has multiple blocks.Every block starts with
1F 8B 08
And ends with
00 00 FF FF
I tried to decompress the file using 7-Zip and gzip tool in linux ,But I always get an error saying that the file is invalid.
So I wrote this python script
import zlib
CHUNKSIZE=1
f=open("file.gz","rb")
buffer=f.read(CHUNKSIZE)
data=""
r=CHUNKSIZE
d = zlib.decompressobj(16+zlib.MAX_WBITS)
while buffer:
outstr = d.decompress(buffer)
print(r)
buffer=f.read(CHUNKSIZE)
r=r+CHUNKSIZE
outstr = d.flush()
I have notice that when it reach to the header of the second block
00 00 00 FF FF 1F 8B 08
at the point between FF and 1F
the script return
zlib.error: Error -3 while decompressing data: invalid block type
I made the size of the chunk to be 1 so the I would know exactly where the problem is.
I know that the problem is not in the file because I have multiple files constructed the same way and they show exactly the same error.

I know that the problem is not in the file because I have multiple
files constructed the same way and they show exactly the same error.
The conclusion is not that the problem is not in the file, but rather that the problem is in all of your files. Someone either inadvertently or deliberately constructed invalid gzip files. It looks like they did that by using Z_SYNC_FLUSH or Z_FULL_FLUSH instead of Z_FINISH to end each stream before starting another faux gzip stream. A gzip stream ends with a last block followed by an eight-byte gzip trailer containing two check values on the integrity of the uncompressed data.
You can nevertheless continue with decompression, though without the comfort of any integrity checking of the data, by simply picking up with a new instance of decompressobj when you get an error and see a new gzip header, 1f 8b 08.
More importantly you should locate and contact the source of these files and say "Hey, WTF?"

Related

S3 not returning an image even while I uploaded a Buffer

I have been uploading effortlessly files to S3 as buffers but the source was via file Upload from the client Side either from postman or a frontend app. This is how it looks usually
{type: buffer, data: buffer} or {type: buffer, data: [buffer]}
anyways, I am able to get the desired buffer to pass to S3 and the result is an S3 url that is an actual image which is downloadable.
New Issue: I got an sql file that contains a column for images stored as MEDIUMBLOB.
This is how it looks in the table below.
I am not able to upload it as a buffer to S3.
I have fetched the content of the table via nodejs and I got data for each row as;
RowDataPacket {
employee_id: 000,
data: <Buffer ef bf bd 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 00 00 00 ef bf bd 00 00 00 ef bf bd 08 02 00 00 00 ef bf bd 38 58 ef bf bd 00 00 ef bf bd 00 49 ... 111181 more bytes>
}
I have this piece of code that just sends that buffer
const buff = result.data;(*data is the buffer as shown in the code snippet above*)
s3.upload(
{
Bucket: bucketName,
Body: buff,
Key: key
}
the result is an S3 url which I expect, but on downloading it, it shows the same image as shown in the screenshot above instead of a real image.
I have made some research to try find a solution, but so far, I have not managed to get a solution that works for me.
But I have tried to upload a file on postman and get a buffer and convert it to base64 then I was able to get the equivalent image. So not really sure if the image in the mysql table is corrupted or hence the buffer I am getting cannot be transformed to a proper image.
The closest to my question is this Similar Issue I had
The difference with this and mine is that the Buffer here is coming in the format which I am also used to and I won't have an issue passing that Buffer to the S3 to get the desired result. But mine is coming from mysql
I am really stuck on this and not sure how to find my way around it.

STM32 Atollic TrueSTUDIO - Graphical view of the Memory

I'm using Atollic TrueSTUDIO for STM32 as an Eclipse Based IDE to perform Digital Signal Processing on audio signal. I'm looking for a way to plot an array (16 bits audio samples) from RAM memory. For the moment I'm using :
The memory View
The SWV real time data time line
None of this tools are powerful to analyse signal on an array, and it doesn't have to be on real time : Just ploting an array after reaching a breakpoint.
Is there an Eclipse Plugin or some other ways to do this ?
I'm considering to export the RAM memory and in a file and plot it in Matlab, but it seems really inapropriate for such a simple thing.
Thanks for any advices
In Atollic, you can easily attach gdb commands to breakpoints. Doing this, allows you to automaticly dump any variables. Additionally, you can execute an external programm once afterwards, to plot the content of the dumped variable.
To do so, goto breakpoint properties and create a new action. Select "Debugger Command Action" and use dump binary value x.bin x to dump variable x to file x.bin
You can also start a python script from the breakpoint to plot the data.
Use an additonal "External Tool Action" and select your python location. Make sure to select your current Working Dictonary. Use the arguments to pass the full path of the python file. The following file will import a float array and plot it.
import struct
import numpy as np
import matplotlib.pyplot as plt
import os
def readBinaryDump(filename):
result = []
N=8
with open(filename,'rb') as f:
while(True):
b = f.read(4*N);
if len(b) ==0:
break
fn = "f"*N
result.append(struct.unpack(fn,b))
result = np.array(result)
return result.ravel()
plt.plot(readBinaryDump("x.bin"))
Don't forget to add the actions to the current breakpoint. Now once the breakpoint is reached, the variable should be dumped and plotted automaticly.
While it's surprising nothing could be embeded in Atollic/Eclipse, I followed the idea of writing an specific application. Here are the steps I used :
Dump Memory :
Debug your software
Stop on a BreakPoint
View > Memory > Export Button > Format : "Plain Text"
The file representing a sine wawe looks like this :
00 00 3E 00 7D 00 BC 00 FB 00 39 01 78 01
B7 01 F6 01 34 02 73 02 B2 02 F0 02 2F 03
You should read these int16 samples like this :
1. 0x0000
2. 0x003E
3. 0x007D
4. etc...
Write this Matlab script :
fileID = fopen('your_file','r');
samples = textscan(fileID,'%s')
fclose(fileID);
samples = samples{1};
words = strcat(samples(2:2:end,1), samples(1:2:end,1));
values = typecast(uint16(hex2dec(words)),'int16');
plot(values) ;
The sinus wave plotted in Matlab
While there aren't any Eclipse plugins that would do what you're asking for that I'm personally aware of, there's STM Studio whose main purpose is displaying variables in real-time. It parses your ELF file to get the available variables, so the effort to at least give it a try should be minimal.
It's available here: https://www.st.com/en/development-tools/stm-studio-stm32.html
ST-Link is required to run it.
Write the simple app in C#. Use semi hosting to dump the memory to the text file. Open it and display.
Recently I had problem with MEMS-es and this was written in less than one hour. IMO (In My Opinion) it is easier to write program which will visualize the data instead of wasting hours or days searching for the ready made one:

Weird pcap header of byte sequence 0a 0d 0d 0a created on Mac?

I have a PCAP file that was created on a Mac with mergecap that can be parsed on a Mac with Apple's libpcap but cannot be parsed on a Linux system. combined file has an extra 16-byte header that contains 0a 0d 0d 0a 78 00 00 00 before the 4d 3c 2b 1a intro that's common in pcap files. Here is a hex dump:
0000000: 0a0d 0d0a 7800 0000 4d3c 2b1a 0100 0000 ....x...M<+.....
0000010: ffff ffff ffff ffff 0100 4700 4669 6c65 ..........G.File
0000020: 2063 7265 6174 6564 2062 7920 6d65 7267 created by merg
0000030: 696e 673a 200a 4669 6c65 313a 2037 2e70 ing: .File1: 7.p
0000040: 6361 7020 0a46 696c 6532 3a20 362e 7063 cap .File2: 6.pc
0000050: 6170 200a 4669 6c65 333a 2034 2e70 6361 ap .File3: 4.pca
0000060: 7020 0a00 0400 0800 6d65 7267 6563 6170 p ......mergecap
Does anybody know what this is? or how I can read it on a Linux system with libpcap?
I have a PCAP file
No, you don't. You have a pcap-ng file.
that can be parsed on a Mac with Apple's libpcap
libpcap 1.1.0 and later can also read some pcap-ng files (the pcap API only allows a file to have one link-layer header type, one snapshot length, and one byte order, so only pcap-ng files where all sections have the same byte order and all interfaces have the same link-layer header type and snapshot length are supported), and OS X Snow Leopard and later have a libpcap based on 1.1.x, so they can read those files.
(OS X Mountain Lion and later have tweaked libpcap to allow it to write pcap-ng files as well; the -P flag makes tcpdump write out pcap-ng files, with text comments attached to some outgoing packets indicating the process ID and process name of the process that sent them - pcap-ng allows text comments to be attached to packets.)
but cannot be parsed on a Linux system
Your Linux system probably has an older libpcap version. (Note: do not be confused by Debian and Debian derivatives calling the libpcap package "libpcap0.8" - they're not still using libpcap 0.8.)
combined file has an extra 16-byte header that contains 0a 0d 0d 0a 78 00 00 00
A pcap-ng file is a sequence of "blocks" that start with a 4-byte block type and a 4-byte length, both in the byte order of the host that wrote them.
They're divided into "sections", each one beginning with a "Section Header Block" (SHB); the block type for the SHB is 0x0a0d0d0a, which is byte-order-independent (so that you don't have to know the byte order to read the SHB) and contains carriage returns and line feeds (so that if the file is, for example, transferred between a UN*X system and a Windows system by a tool that thinks it's transferring a text file and that "helpfully" tries to fix line endings, the SHB magic number will be damaged and it will be obvious that the file was corrupted by being transferred in that fashion; think of it as the equivalent of a shock indicator).
The 0x78000000 is the length; what follows it is the "byte-order magic" field, which is 0x1A2B3C4D (which is not the same as the 0xA1B2C3D4 magic number for pcap files), and which serves the same purposes as the pcap magic number, namely:
it lets code identify that the file is a pcap-ng file
it lets code determine the byte order of the section.
(No, you don't need to know the length before looking for the pcap magic number; once you've found the magic number, you then check the length to make sure it's at least 28 and, if it's less than or equal to 28, you reject the block as not being valid.)
Does anybody know what this is?
A (little-endian) pcap-ng file.
or how I can read it on a Linux system with libpcap?
Either read it on a Linux system with a newer version of libpcap (which may mean a newer version of whatever distribution you're using, or may just mean doing an update if that will get you a 1.1.0 or later version of libpcap), read it with Wireshark or TShark (which have their own library for reading capture files, which supports the native pcap and pcap-ng formats, as well as a lot of other formats), or download a newer version of libpcap from tcpdump.org, build it, install it, and then build whatever other tools need to read pcap-ng files with that version of libpcap rather than the one that comes with the system.
Newer versions of Wireshark write pcap-ng files by default, including in tools such as mergecap; you can get them to write pcap files with a flag argument of -F pcap.

Apache mod_speling falsely "correcting" URLs?

I've been tasked with moving an old dynamic website from a Windows server to Linux. The site was initially written with no regard to character case. Some filenames were all upper-case, some lower-case, and some mixed. This was never a problem in Windows, of course, but now we're moving to a case-sensitive file system.
A with a quick find/rename command (thanks to another tutorial) got the filenames to all lowercase.
However, many of the URL references in the code still point to these mixed-case filenames, so I enabled mod_speling to overcome this issue. It seems to work OK for the most part, with the exception of one page: I have a file name haematobium.html, which, everytime a link points to .../haematobium.html, it gets rewritten as .../hæmatobium.html in the browser.
I don't know how this strange character made its way into the filename in the first place, but I've corrected the code in the HTML document to now link to haematobium.html, then renamed the haematobium.html file itself to match.
When requesting .../haematobium.html in Chrome, it "corrects" to .../hæmatobium.html in the address bar, and shows an error saying "The requested URL .../hæmatobium.html was not found on this server."
In IE9, I'm promted for the login (this is a .htaccess protected page), I enter it, and then if forwards the URL to .../h%C3%A6matobium.html, which again doesn't load.
In my frustration I even copied haematobium.html to both hæmatobium.html and hæmatobium.html, still, none of the three pages actually load.
So my question: I read somewhere that mod_speling tries to "learn" misspelled URLs. Does it actually rename files (is that where the odd character might have come from)? Does it keep a cache of what's been called for, and what it was forwarded to (a cache I could clear)?
PS. there are also many mixed-case references to MySQL database tables and fields, but that's a whole 'nother nightmare.
[Cannot comment yet, therefore answering.]
Your question doesn't make it entirely clear which of the two names (two characters ae [ASCII], or one ligature character æ [Unicode]) for haematobium.html actually exists in your Apache's file system.
Try the following in your shell:
$ echo -n h*matobium.html | hd
The output should be either one of the following two alternatives. This is ASCII, with 61 and 65 for a and e, respectively:
00000000 68 61 65 6d 61 74 6f 62 69 75 6d 2e 68 74 6d 6c |haematobium.html|
00000010
And this is Unicode, with c3 a6 for the single character æ:
00000000 68 c3 a6 6d 61 74 6f 62 69 75 6d 2e 68 74 6d 6c |h..matobium.html|
00000010
I would recommend using the ASCII version, it makes life considerably easier.
Now to your actual question. mod_speling does neither "learn", nor rename or cache its data. The caching is either done by your browsers, or by proxies in between your browsers and the server.
It's actually best practice to test these cases with command line tools like wget or curl, which should be already available or easily installable on any Linux.
Use wget -S or curl -i to actually see the response headers sent by your web server.

How to read text files transfered as binary

My code copies files from ftp (using text transfer mode) to local disk and then trys to process them.
All files contain only text and values are seperated using new line. Sometimes files are moved to this ftp using binary transfer mode and looks like this will mess up line-ends.
Using hex editor, I compared line ends depending the transfer mode used to send files to ftp:
using text mode: file endings are 0D 0A
using binary mode: file endings are 0D 0D 0A
Is it possible to modify my code so it could read files in both cases?
Code from job that illustrates my problem and shows how i'm reading file:
(here i use same file, that contains 14 rows of data)
int i;
container con;
container files = ["c:\\temp\\axa_keio\\ascii.txt", "c:\\temp\\axa_keio\\binary.txt"];
boolean purchLineFirstRow;
IO inFile;
;
for(i=1; i<=conlen(files); i++)
{
inFile = new AsciiIO(conpeek(files,i), "R");
inFile.inFieldDelimiter('\n');
con = inFile.read();
info(int2str(conlen(con)));
}
Files come from Unix system to Windows sytem.
Not sure but maybe the question could be: "Which inFieldDelimiter values should i use to read both Unix and Windows line ends?"
Use inRecordDelimiter:
inFile.inRecordDelimiter('\n');
instead of:
inFile.inFieldDelimiter('\n');
There may still be a dangling CR on the last field, you may wish remove this:
strRem(conpeek(con, conlen(con)), '\r')
See also: http://en.wikipedia.org/wiki/Line_endings