Determining if two rar files are part of the same set - rar

Let's say I have two files, (name).n.rar and (name).n+1.rar, which appear to be part of the same set (same size, etc). Is there any easy way to tell if they're actually part of the same set, without first downloading the full set? Currently the only way I can tell is by downloading an instance of every file and and then seeing if WinRAR gives me an error when I try to unwrap them.
(And on a related note, assuming there is such a method, can I do the same without having adjacent parts?)
Ideally there's an existing program that can do this, but I can code my own if necessary.
Further notes: These are two sets of archives of the same file. They appear identical to obvious checks: filenames are subsequent, contents are sane, sizes are identical, same number of parts. I then receive a full set of files. If they're not from the same set, I can't unrar them - though it seems that WinRAR will proceed to 100% before giving me the CRC error (file corrupt.)

New Answer
All tests were made using WinRAR 5.01 32-bit. Since the algorythm should remain the same, the following statements should be valid for any other previous version. Feel free to comment if you know that's not true.
I'll give a short briefing about the chat. I tried to pack a file larger than 1GB several times; Then I mixed up the files and tried to extract the archives: it worked. The problem was not the size of the file indeed.
I thought about three possible solutions to the problem:
Architecture was influent in the packaging process: so different people tried to pack the files, and mixing up them would result in an error;
Different people tried to pack the files, giving a slightly different size file (for example 250 MB and 250000 KB). This would have been noticed in the file properties, though;
Files were corrupted during the download: re-downloading them would confirm this hypothesis.
I was most curious about the first one: could architecture be influent in the packaging process?
I found out the answer is yes, it is. Here are the passages to repeat the experiment:
Pack your files in an archive, giving a precise part size, in computer A;
Pack the same exact files, giving the same exact part size, in computer B (TODO: Check if this experiment is still valid with similar architecture, e.g. Intel i7 with Intel i5) with a different architecture (e.g. Intel processor with AMD processor);
Transfer one (or more, if you wish, but of course not all of them!) parts from computer B to computer A. Remember to delete those files from computer A before the transfer;
Place all the files in the same directory, check if they all have the same name (e.g. "AAA part1", "AAA part2"...);
Extract them;
Enjoy your CRC Error!.
Tests were made using an Intel i7-3632QM and an AMD FX 6300.
I have some suspects about the fact that the compressed files are the same, but the CRC code is different.
Old Answer
There is a way indeed. During my Computer Science academic studies, we had a Computer Forensics class. I learned that every file has a static beginning (an header, we could say), that makes a program recognize its type and the way to decrypt it. To see it, you just have to open it with a text editor (Notepad++ is the best so far, I guess)
For example, jpeg images begin with ÿØÿá.
I tried to store a video in some splitted .rar files, and knowing if they are part of the same archive was simpler than I thought.
Every rar file begins with Rar!. On the second or third line, it should appear the name of the file stored in the archive: in my case, myVideo.mp4. If all your archives contain that filename, they're probably part of the same archive.
Things are getting worse if there are several files in the archive and you don't know their names. In fact, if there is more than one file, the RAR files structure is as follows:
File 1:
Rar!
NUL NUL NUL //Random things here
NUL NUL NUL NUL NUL myVideo.mp4 NUL NUL NUL NUL
//Random things here. If the dimensions of the file exceed the archive,
//the next file will begin with the same name.
//Let's assume that this is happening.
EOF
File 2:
Rar!
NUL NUL NUL //Random things here
NUL NUL myVideo.mp4 NUL NUL NUL
//This time the file is complete. Since there is still space in the archive,
//it will add another file
NUL NUL NUL NUL mySecondVideo.mp4 NUL NUL NUL NUL
EOF
Let's assume that at the end of the second archive, mySecondVideo hasn't been fully compressed yet.
File 3:
Rar!
NUL NUL NUL
NUL NUL NUL NUL mySecondVideo.mp4 NUL
NUL NUL NUL
NUL myTextFile.txt
NUL NUL NUL mySecondTextFile.txt NUL
EOF
If mySecondTextFile.txt isn't yet fully compressed, my fourth file will begin with its name.
I hope it's clear, I tried to keep it as simple as possible. In the case of more files, I would start from the last archive. I'd write down the first filename found on that file and I'd search it in the previous one. If I found that name, I'd repeat the sequence until the first archive.

I'm not familiar with RAR-format that much, but in case you decide to write your program in Java I can recommend using 7-Zip-JBinding.
http://sevenzipjbind.sourceforge.net/
http://sevenzipjbind.sourceforge.net/basic_snippets.html#open-multipart-rar-archives
You can download first n+1 parts of the archive and then call extract() method ignoring output data only caring for
IArchiveExtractCallback.setOperationResult(ExtractOperationResult)
calls (checking that CRC was ok) and monitoring files getting opened trough
IArchiveOpenVolumeCallback.getStream(java.lang.String)
If volume n+2 get requested, you can conclude that volume n+1 was the right one.
(I'm not 100% sure about this conclusion, but I would give it a try)

Related

Openvms: Extracting RMS Indexed file t to Windows as a sequential flat file

I haven't used openvms for 20+ years. It was my 1st OS. I've been asked if it possible to copy the data from RMS files from openvms server to windows as a text file - so that it's readable.
No-one has experience or knowledge of the record structures etc.
The files are xyz.DAT and are relative files. I'm hoping the dat files are fixed length.
My 1st attempt would be to try and use Datatrieve (DTR) but get an error that the image isn't loaded.
Thought it might be as easy using CONVERT/FDL = nnnn.FDL - by changing the Relative to Sequential. The file seems still to be unreadable.
Is there an easy way to stream an RMS index file to a flat ASCII file?
I use to use COBOL and C to access the data in the past but had lots of libraries to help....
I've notice some solution may use odbc to connect but not sure what I can or cannot install on the server.
I can FTP using Filezilla to the server....
Another plan writing C application to read a file and output out as string.....or DCL too.....doesn't have to be quick...
Any ideas
Has mentioned before
The simple solution MIGHT be to to just use: $ TYPE/OUT=test.TXT test.DAT.
This will handle Relatie and Indexed files alike.
It is much the same as $ CONVERT / FDL=NL: test.DAT test.TXT
Both will just read records from the source and transfer the bytes, byte for byte, to the records in a sequential file.
FTP in ASCII mode will transfer that nicely to windows.
You can also use an 'inline' FDL file to generate a 'unix' LF file like:
$ conv /fdl="record; format stream_lf" test.DAT test.TXT
Or CR-LF file using:
$ conv /fdl="record; format stream" test.DAT test.TXT
Both can be transferring in Binary or Ascii with FTP.
MOSTLY - because this really only works well for TEXT ONLY source .DAT file.
There should be no CR, LF, FF or NUL characters in the source or things will break.
As 'habo' points out, use DUMP /RECORD=COUNT=3 to see how 'readable' the source data is.
If you spot 'binary' data using DUMP then you will need to find a record defintion somewhere which maps byte to Integers or Floating points or Dates as needed.
These defintions can be COBOL LIB files, or BASIC MAPS and are often stores IN the CDD (Common Data Dictionary) or indeed in DATATRIEVE .DIC DICTIONARIES
To use such definition you likely need a program to just read following the 'map' and write/print as text. Normally that's not too hard - notably not when you can find an example program on the server to tweak.
If it is just one or two 'suspect' byte ranges, then you can create a DCL loop to read and write and use F$EXTRACT to select the chunks you like.
If you want further help, kindly describe in words what kind of data is expected and perhaps provide the output from DUMP for 3 or 5 rows.
Good luck!
Hein.

How to do an incremental read of binary files

TL;DR: can I do an incremental read of binary files with Red or Rebol?
I would like to use Red to process some large (13MB to 2GB) structured binary files (Kurzweil synthesizer files). I've used other languages (C, Go, Tcl, Ruby, Dart) to walk through these files, and now I'd like to do the same with Red or Rebol.
Is there a way to incrementally read binary files, byte by byte? All I see is read/binary which seems to slurp the entire file at once (or a part of a file).
I'll need to jump around a little bit, too (either peek at the next byte, or skip to the end of a section, or skip past variable length strings to the start of data).
(Yes, I could make some helpers that tracked the position and used read/part/seek.)
I would like to make a call to the low level OS read/seek if that is possible - something new to learn.
This is on macos, but a portable solution would be great.
Thanks!
PS: "open/read %abc" gives an error "*** Script Error: open does not allow file! for its port argument", even though the help message say the port argument is "port [port! file! url! block!]"
Rebol has ports for that, which are planned for 0.7.0 release in Red. So, current I/O is very basic and buffer-only, and open is a preliminary stub.
I would like to make a call to the low level OS read/seek if that is possible - something new to learn.
You can leverage Rebol or Red/System FFI as a learning excercise.
Here is how you would do it in Rebol:
>> file: open/direct/binary %file.dat
>> until [none? probe copy/part file 20]
>> close file
#{732F7072696E74657253657474696E6773312E62}
#{696E504B01022D00140006000800000021006149}
#{0910890100001103000010000000000000000000}
...
#{000000006A290000646F6350726F70732F617070}
#{2E786D6C504B0506000000000D000D0068030000}
#{292C00000000}
none
first file or pick file 1 will return the next byte value (integer!)
This even works with text files: open/lines/direct, in that case copy/part file 20 will return 20 lines, or you can use pick file 1 or first file to get the next line.
Soon this will be available on Red too.

finding a corrupted part from the parts of a split archive

I have 7 files with extensions like xyz.rar.001 - xyz.rar.007 clearly they are parts of a single file. I have all the 7 parts. I join them using a file joiner into a single file xyz.rar and try to unrar them with WINRAR , it says that archive is corrupted It is clear that 1 or 2 parts are corrupted. IS THERE ANY WAY TO FIND THEM ? Please help I don't want to re download all of them NOTE- winrar can detect a corrupt part if the parts were splitted using winrar (with extensions like part1.rar , part2.rar etc. ) but not if they are named as rar.001
Parts .001 - .006 should have the same size. Check if there is a file with a different byte size.
Are there multiple files in the RAR or just the one? With multiple you could run a Test and see which is the first file to fail.
I think it's strange that there is a second tool used to split the RAR archive up. (e.g. HJSplit) This lets me think that .002 could be a RAR archive too. Try opening xyz.rar.001 with WinRAR and test/exctract. It happens more that RAR archives have the extension .001 instead of .rar. An example.
Naming your archives in WinRAR like this can be accomplished by putting "xyz.rar.001" as Archive name on the General tab and checking "Old style volume names" on the Advanced tab.
If I then join the files with HJSplit, I get one .rar file (that is corrupt). When I Test it, it says "Next volume is required". In the diagnostic messages I can see "The required volume is absent" and "CRC failed in X. The file is corrupt"
If there is one file stored inside the RAR and the RAR is indeed just chopped up into 7 pieces, there is no way of telling without additional files such as .sfv or .par2. (unless the RAR does not use compression: you can parse the underlying file for errors and calculate the part where it goes wrong)

Inconsistent Behavior In A Batch File's For Statement

I've done very little with batch files but I'm trying to track down a strange bug I've been encountering on a legacy system.
I have a number of .exe files in particular folder. This script is supposed to duplicate them with a different file name.
Code From Batch File
for %%i in (*.exe) do copy \\networkpath\folder\%%i \\networkpath\folder\%%i.backup.exe
(Note: The source and destination folders are THE SAME)
Example Of Desired Behavior:
File1.exe --> Becomes --> File1.exe.backup.exe
File2.exe --> Becomes --> File2.exe.backup.exe
Now first, let me say that this is not the approach I would take. I know there are other (potentially more straight forward) ways to do this. I also know that you might wonder WHY on earth we care about creating a FileX.exe.backup.exe. But this script has been running for years and I'm told the problem only started recently. I'm trying to pinpoint the problem, not rewrite the code (even if it would be trivial).
Example Buggy Output:
File1.exe.backup.exe
File1.exe.backup.exe.backup.exe
File1.exe.backup.exe.backup.exe.backup.exe
File1.exe.backup.exe.backup.exe.backup.exe.backup.exe
File1.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe
File1.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe
etc...
File2.exe.backup.exe
File2.exe.backup.exe.backup.exe
File2.exe.backup.exe.backup.exe.backup.exe
File2.exe.backup.exe.backup.exe.backup.exe.backup.exe
File2.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe
File2.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe
Not knowing anything about batch files, I looked at this and figured that the condition of the for statement was being re-evaluated after each iteration - creating a (near) infinite loop of copying (I can see that, eventually, the copy will fail when the names get too long).
This would explain the behaviour I'm seeing. And when cleaned the directory in question so that it had only the original File1.exe file and ran the script it produced the bug code. The problem is that I CANNOT replicate the behaviour anywhere else!?!
When I create a folder locally with a few .exe files and run the script - I get the expected output. And yes, if I run it again, I get one instance of 'File1.exe.backup.exe.backup.exe' (and each time I run it again, it increases in length by one). But I cannot get it to enter the near-infinite loop case.
It's been driving me crazy.
The bug is occurring on a networked location - so I've tried to recreate it on one - but again, no success. Because it's a shared network location, I wondered if it could have something to do with other people accessing or modifying files in the folder and even introduced delays and wrote a tiny program to perform actions in the same folder - but without any success.
The documentation I can find on the 'for' statement doesn't really help, but all of the tests I've run seem to suggest that the in (*.exe) section is only evaluated once at the beginning of execution.
Does anyone have any suggestions for what might be going on here?
I agree with Andriy M's comment - it looks to be related to Windows 7 Batch Script 'For' Command Error/Bug
The following change should fix the problem:
for /f "eol=: delims=" %%i in ('dir /b *.exe') do copy \\networkpath\folder\%%i \\networkpath\folder\%%i.backup.exe
Any file that starts with a semicolon (highly unlikely, but it can happen) would be skipped with the default EOL of semicolon. To be safe you should set EOL to some character that could never start a file name (or any path). That is why I chose the colon - it cannot appear in a folder or file name, and can only appear after a drive letter. So it should always be safe.
Copy supports wildcard characters also in target path. You can use
copy \\networkpath\folder\*.exe \\networkpath\folder\*.backup.exe

How to copy file to a new location having path length more than 260 characters in vista?

in my previous question as follows:
"How to create a copy of a file having length more than 260 characters."
I have asked almost same qustion and I got a solution to rename the directory name and then copy to a new location. But as per the client's requirement we can't rename the directory in any case.
So now my problem is when we try to copy a file having path length (including file name) more than 260 characters, (say 267 characters), it allows us to copy manually but throws an exception programmetically in vista OS.
Please let me know if any one has a solution.
Prepend \\?\ to your path to enable path lengths up to 32767 characters long. E.g:
copy \\?\C:\big\dir\hierarchy\myfile.txt \\?\C:\tohere.txt
This page has more details.
I've only tested this with DIR in WinXP, but that seems to work.
If the problem is just with the length of the file (not the preceding path), then you can use the short name version of the file (probabaly Outloo~1.xls ).
However, the only real viable solution is to use network redirection or SUBST command to shorten the path. I can imagine that you'd have to be keeping track of the path length in your program, and spawn a SUBST drive letter when the length is exceeded... dropping the drive letter when it's no longer needed. Ugly programming, but no hope.
Or
I know some unicode version of windows api functions (copyfileex..? http://msdn.microsoft.com/en-us/library/aa363852(VS.85).aspx) can handle to 32,767 chars.
Try to name it with a \\?\ prefix, for example, \\?\D:\. I'm not sure if CLR uses this kind of naming style, but you can give it a try.
You could check if compression tools like 7zip are limited by the length of the path in Vista.
If not, that means a "copy by program" would by:
compressed the tree of directories/file you want to copy
copy the .zip/.rar to the destination
uncompress