Calculating MD5 hashes of files in compressed zip archive

Calculating MD5 hashes of files in compressed zip archive - vb.net

I would like to know if there is a way to calculate MD5 hashes of files contained in a zip archive.
For example, I have an zip file that contains three files: Prizes.dat, Promotions.dat and OutOfDate.dat and I would like to calculate the MD5 of the three files to compare it with a given string. Since I need to do this on a very very large amount of zip Archives, I'm wondering if there's a way to do this directly without decompressing the file.
Thanks in advance!
superPanda

Stumbled upon this need and discovered a way to check the hash of a file contained in a tarball without writing the uncompressed data to disk (unzipping).
BSD example below hence the md5
tar xOfz archive.tgz foo.txt | md5
tar xOfj archive.bz2 foo.txt | md5
Or use tar xOfz archive.tgz foo.txt | md5sum
for linux.

I think the simplest solution is to calculate the MD5 hash of a zipped file and store it in the zip archive alongside the file. If you are generating these files yourself, you can just hash the file before you zip it. If you are receiving the ZIP files from somewhere else, then write a script that will automate going through all the files and adding hashes. Then whenever you need to check the hash in the program, you can just pull the precomputed hash from the ZIP file.

Related

Openvms: Extracting RMS Indexed file t to Windows as a sequential flat file

I haven't used openvms for 20+ years. It was my 1st OS. I've been asked if it possible to copy the data from RMS files from openvms server to windows as a text file - so that it's readable.
No-one has experience or knowledge of the record structures etc.
The files are xyz.DAT and are relative files. I'm hoping the dat files are fixed length.
My 1st attempt would be to try and use Datatrieve (DTR) but get an error that the image isn't loaded.
Thought it might be as easy using CONVERT/FDL = nnnn.FDL - by changing the Relative to Sequential. The file seems still to be unreadable.
Is there an easy way to stream an RMS index file to a flat ASCII file?
I use to use COBOL and C to access the data in the past but had lots of libraries to help....
I've notice some solution may use odbc to connect but not sure what I can or cannot install on the server.
I can FTP using Filezilla to the server....
Another plan writing C application to read a file and output out as string.....or DCL too.....doesn't have to be quick...
Any ideas
Has mentioned before

The simple solution MIGHT be to to just use: $ TYPE/OUT=test.TXT test.DAT.
This will handle Relatie and Indexed files alike.
It is much the same as $ CONVERT / FDL=NL: test.DAT test.TXT
Both will just read records from the source and transfer the bytes, byte for byte, to the records in a sequential file.
FTP in ASCII mode will transfer that nicely to windows.
You can also use an 'inline' FDL file to generate a 'unix' LF file like:
$ conv /fdl="record; format stream_lf" test.DAT test.TXT
Or CR-LF file using:
$ conv /fdl="record; format stream" test.DAT test.TXT
Both can be transferring in Binary or Ascii with FTP.
MOSTLY - because this really only works well for TEXT ONLY source .DAT file.
There should be no CR, LF, FF or NUL characters in the source or things will break.
As 'habo' points out, use DUMP /RECORD=COUNT=3 to see how 'readable' the source data is.
If you spot 'binary' data using DUMP then you will need to find a record defintion somewhere which maps byte to Integers or Floating points or Dates as needed.
These defintions can be COBOL LIB files, or BASIC MAPS and are often stores IN the CDD (Common Data Dictionary) or indeed in DATATRIEVE .DIC DICTIONARIES
To use such definition you likely need a program to just read following the 'map' and write/print as text. Normally that's not too hard - notably not when you can find an example program on the server to tweak.
If it is just one or two 'suspect' byte ranges, then you can create a DCL loop to read and write and use F$EXTRACT to select the chunks you like.
If you want further help, kindly describe in words what kind of data is expected and perhaps provide the output from DUMP for 3 or 5 rows.
Good luck!
Hein.

Extract from tar.gz by file name

I've got a large amount of files inside .tar.gz.
Is it (programmatically) possible to extract a file by its filename, without the overhead of decompressing other files?

I'll split the reply into two parts
Is it (programmatically) possible to extract a file by its filename
yes, it is possible to extract a file by its filename.
tar xzf tarfile.tar filename
without the overhead of decompressing other files?
In order to extract a file from a compressed tar file the tar program has to find the file you want. If that is the first file in the tarfile, then it only has to uncompress that. If the file isn't the first in the tarfile the tar program needs to scan through the tarfile until it finds the file you want. To do that is MUST uncompress the preceding files in the tarfile. That doesn't mean it has to extract them to disk or buffer these files in memory. It will stream the uncompression so that them memory overhead isn't significant.

I want selected textfiles to act as binary files when merging

For reason I want to be able to merge a text file without risking a merge. Is it possible to treat a text file as a binary file in this case? It is important that no one do this merging by misstake

Using .gitattributes, you should be able to mark a file as binary:
path/to/my/file binary
But that would prevent even to see a diff, considering binary is a pre-defined macro.
So use instead:
path/to/my/file -merge -text
That will work only for branches done after adding that .gitattributes file.

Delete Files Which MD5's listed In Text File - VB.net

I have a list of MD5 hash of files stored in a text file. And I want delete them all when it been found on system or a path. But I have problem to code it. I have tried to but it only scan one file from listed MD5 so its not what i needed. Is there any way to find them and deleted files which there MD5 hash's are listed in a path. Thanks.

pidgin pseudocode:
put md5s in array
cycle through a filesystem
for each file, put into varable, compute md5hash of variable
if md5hash is in array, delete file
maybe you should skip swap files and system folders.

Using Python to analyze PDF files

I wrote a code that detect malicious pdf files.
what I need to do is the following:
every pdf I scan I want to save its hash value in hash database and the output will be saved in output container,
so if I have another pdf file to scan I will check its hash value if exists in hash database then I will print the output from the output container.
but if the hash value doesn't exist so it is added to hash database and the output will be added to the output container.
how could I do that and what the way to link the hash value with the output in the output container

What kind of malicious documents are you worried about? Corrupted files or pdfs with virus in it?
To work with pdf in python you can use
pyPdf
Then you can open the file like:
from pyPdf import PdfFileReader
my_doc = PdfFileReader(file("myfile.pdf", "rb"))
This way you will check if it is a valid file.
About the link, it could be made in the database itself?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas