Why does extracting an archive in Flutter show files not in the archive that are prefixed with _.? - gzip

I have a tar + gzipped file I download and decompress/extract in a Flutter app. The extraction code looks like this:
final gzDecoder = GZipDecoder();
final tar = await gzDecoder.decodeBytes(file.readAsBytesSync());
final tarDecoder = TarDecoder();
final archive = tarDecoder.decodeBytes(tar);
for (final file in archive) {
print(file)
...
When I print out all the files in the archive like above, I see things like:
./question_7815.mp3
./._question_7814.mp3
where the original archive only has ./question_7815.mp3 (not the file prefixed with a ._.
Furthermore, when printing the file size (print(file.size)) I see that the files prefixed with ._ are not the same size, so they do in fact appear to be different files, and they are much smaller.
Anyone know why this happens and potentially how to prevent it?

That's the Apple Double format, so that tar file is almost certainly originally coming from a Mac. The underscore file contains extended attribute information. You don't necessarily need to prevent it. You can just ignore those files, or exclude them during extraction. It is possible to not include them when tarring on the Mac side as well with the --no-mac-metadata option to tar.

Related

How to extract .sql file that seems to be a .zip

I have received a file from a customer. The file is said to be
SQL code (application/sql)
However, this has turned out to be wrong: nothing could open it. It turns out it was secretely a .zip file. By renaming it to '.zip' and manually extracting it I was able to get the files contained in it. I would like to do a similar process in python.
So far I've renamed the file:
file_name_zip = file_name.replace('.sql', '.zip')
os.rename(file_name, file_name_zip)
And I've tried extracting it:
zip_ref = zipfile.ZipFile(file_name_zip, 'r')
zip_ref.extractall(extracted_file)
However, this failed because
zipfile.BadZipFile: File is not a zip file
I've googled, and apparently this can sometimes be fixed using:
zip_file_name_2 = zip_file_name.replace('.zip', '2.zip')
os.system(f'zip -FF {zip_file_name} --out {zip_file_name_2}')
This required me to put in a bunch of settings, which I wasn't able to figure out. There must be a better way to go about this.
Does anybody know how to parse such an .sql file?

How does numpy handle mmap's over npz files?

I have a case where I would like to open a compressed numpy file using mmap mode, but can't seem to find any documentation about how it will work under the covers. For example, will it decompress the archive in memory and then mmap it? Will it decompress on the fly?
The documentation is absent for that configuration.
The short answer, based on looking at the code, is that archiving and compression, whether using np.savez or gzip, is not compatible with accessing files in mmap_mode. It's not just a matter of how it is done, but whether it can be done at all.
Relevant bits in the np.load function
elif isinstance(file, gzip.GzipFile):
fid = seek_gzip_factory(file)
...
if magic.startswith(_ZIP_PREFIX):
# zip-file (assume .npz)
# Transfer file ownership to NpzFile
tmp = own_fid
own_fid = False
return NpzFile(fid, own_fid=tmp)
...
if mmap_mode:
return format.open_memmap(file, mode=mmap_mode)
Look at np.lib.npyio.NpzFile. An npz file is a ZIP archive of .npy files. It loads a dictionary(like) object, and only loads the individual variables (arrays) when you access them (e.g. obj[key]). There's no provision in its code for opening those individual files inmmap_mode`.
It's pretty obvious that a file created with np.savez cannot be accessed as mmap. The ZIP archiving and compression is not the same as the gzip compression addressed earlier in the np.load.
But what of a single array saved with np.save and then gzipped? Note that format.open_memmap is called with file, not fid (which might be a gzip file).
More details on open_memmap in np.lib.npyio.format. Its first test is that file must be a string, not an existing file fid. It ends up delegating the work to np.memmap. I don't see any provision in that function for gzip.

finding a corrupted part from the parts of a split archive

I have 7 files with extensions like xyz.rar.001 - xyz.rar.007 clearly they are parts of a single file. I have all the 7 parts. I join them using a file joiner into a single file xyz.rar and try to unrar them with WINRAR , it says that archive is corrupted It is clear that 1 or 2 parts are corrupted. IS THERE ANY WAY TO FIND THEM ? Please help I don't want to re download all of them NOTE- winrar can detect a corrupt part if the parts were splitted using winrar (with extensions like part1.rar , part2.rar etc. ) but not if they are named as rar.001
Parts .001 - .006 should have the same size. Check if there is a file with a different byte size.
Are there multiple files in the RAR or just the one? With multiple you could run a Test and see which is the first file to fail.
I think it's strange that there is a second tool used to split the RAR archive up. (e.g. HJSplit) This lets me think that .002 could be a RAR archive too. Try opening xyz.rar.001 with WinRAR and test/exctract. It happens more that RAR archives have the extension .001 instead of .rar. An example.
Naming your archives in WinRAR like this can be accomplished by putting "xyz.rar.001" as Archive name on the General tab and checking "Old style volume names" on the Advanced tab.
If I then join the files with HJSplit, I get one .rar file (that is corrupt). When I Test it, it says "Next volume is required". In the diagnostic messages I can see "The required volume is absent" and "CRC failed in X. The file is corrupt"
If there is one file stored inside the RAR and the RAR is indeed just chopped up into 7 pieces, there is no way of telling without additional files such as .sfv or .par2. (unless the RAR does not use compression: you can parse the underlying file for errors and calculate the part where it goes wrong)

RAR a folder without persisting the full path

1) I have a folder called CCBuilds containing couple of files in this path: E:\Testing\Builds\CCBuilds.
2) I have written C# code (Process.Start) to Rar this folder and save it in E:\Testing\Builds\CCBuilds.rar using the following command
"C:\program files\winrar\rar.exe a E:\Testing\Builds\CCBuilds.rar E:\Testing\Builds\CCBuilds"
3) The problem is that, though the rar file gets created properly, when I unrar the file to CCBuilds2 folder (both through code using rar.exe x command or using Extract in context menu), the unrared folder contains the full path, ie. extracting E:\Testing\Builds\CCBuilds.rar ->
E:\Testing\Builds\CCBuilds2\Testing\Builds\CCBuilds\<<my files>>
Whereas I want it to be something like this: E:\Testing\Builds\CCBuilds2\CCBuilds\<<my files>>
How can I avoid this full path persistence while adding to rar / extracting back from it. Any help is appreciated.
Use the -ep1 switch.
More info:
-ep = Files are added to an archive without including the path information. Could result in multiple files existing in the archive
with same name.
-ep1 = Do not store the path entered at the command line in archive. Exclude base folder from names.
-ep2 = Expand paths to full. Store full file paths (except drive letter and leading backslash) when archiving.
(source: http://www.qa.downappz.com/questions/winrar-command-line-to-add-files-with-relative-path-only.html)
Just in case this helps: I am currently working on an MS Access Database project (customer relations management for a small company), and one of the tasks there is to zip docx-files to be sent to customers, with a certain password encryption used.
In the VBA procedure that triggers the zip-packaging of the docx-files, I call WinRAR as follows:
c:\Programme\WinRAR\winrar.exe a -afzip -ep -pThisIsThePassword "OutputFullName" "InputFullName"
-afzip says: "Create a zip file (as opposed to a rar file)
-ep says: Do not include the paths of the source file, i.e. put the file directly into the zip folder
A full list of such switches is available in the WinRAR Help, section "Command line".
x extracts it as E:\Testing\Builds\CCBuilds2\Testing\Builds\CCBuilds\, because you're using full path when declaring the source. Either use -ep1 or set the default working dir to E:\Testing\Builds.
Use of -ep1 is needed but it's a bit tricky.
If you use:
Winrar.exe a output.rar inputpath
Winrar.exe a E:\Testing\Builds\CCBuilds.rar E:\Testing\Builds\CCBuilds
it will include the input path declared:
E:\Testing\Builds\CCBuilds -> E:\Testing\Builds\CCBuilds.rar:
Testing\Builds\CCBuilds\file1
Testing\Builds\CCBuilds\file2
Testing\Builds\CCBuilds\folder1\file3
...
which will end up unpacked as you've mentioned:
E:\Testing\Builds\CCBuilds2\Testing\Builds\CCBuilds\
There are two ways of using -ep1.
If you want the simple path:
E:\Testing\Builds\CCBuilds\
to be extracted as:
E:\Testing\Builds\CCBuilds2\CCBuilds\file1
E:\Testing\Builds\CCBuilds2\CCBuilds\file2
E:\Testing\Builds\CCBuilds2\CCBuilds\path1\file3
...
use
Winrar.exe a -ep1 E:\Testing\Builds\CCBuilds.rar E:\Testing\Builds\CCBuilds
the files inside the archive will look like:
CCBuilds\file1
CCBuilds\file2
CCBuilds\folder1\file3
...
or you could use ep1 to just add the files and folder structure sans the base folder with the help of recursion and defining the base path as the inner path of the structure:
Winrar.exe a -ep1 -r E:\Testing\Builds\CCBuilds.rar E:\Testing\Builds\CCBuilds\*
The files:
E:\Testing\Builds\CCBuilds\file1
E:\Testing\Builds\CCBuilds\file2
E:\Testing\Builds\CCBuilds\folder1\file3
...
inside the archive will look like:
file1
file2
folder1\file3
...
when extracted will look like:
E:\Testing\Builds\CCBuilds2\file1
E:\Testing\Builds\CCBuilds2\file2
E:\Testing\Builds\CCBuilds2\folder1\file3
...
Anyway, these are two ways -ep1 can be used to exclude base path with or without the folder containing the files (the base folder / or base path).

Why .RAR file contains different files with the same name

I got a .RAR file which contains different files with the same name.
For example,
index.txt 40 Text Document 04/01/2010 4:40PM
index.txt 22 Text Document 04/01/2010 4:42PM
index.txt 10 Text Document 04/01/2010 4:45PM
index.txt 13 Text Document 04/01/2010 4:50PM
Why?
Like said before, the files could be in separate paths, but as I'll show further, this isn't always the case.
If you use WinRAR to list the file contents and your options are set as the following, then it only appears you have files with the same name, but they are in different paths.
Options -> File list -> Flat folders view (ctrl+h)
Options -> File list -> Details
After the column CRC32, there is one called Path. If this is different, extraction shouldn't be a problem if:
Extract -> Extraction path and options -> Advanced -> Extract relative paths is set.
If it is Do not extract paths, WinRAR will need to ask you to rename them because of file system limitations.
I assume command line unrar won't be a problem in this case because you need to specify additional parameters to change its default behavior.
It is possible for a RAR archive to have multiple files with the same name in the same directory. If you use Windows, use "C:\Program Files\WinRAR\Rar.exe"
instead of rar on the command line in the following examples.
Create a new file and add it to a RAR archive. You can also check the changes by listing its contents.
rar a rarfile.rar testfile.txt
rar l rarfile.rar
rar a rarfile.rar testfile.txt
If you try to re-add this file, rar will replace the already added file with the same name.
Updating archive rarfile.rar
Updating testfile.txt OK
Done
Create an other file or rename the first one and add it to the RAR file.
move testfile.txt second.txt (new file)
rar a rarfile.rar second.txt (add it)
rar lb rarfile.rar (list archive, bare info)
Rename the second file to the first one's name.
rar rn rarfile.rar second.txt testfile.txt
This is how you create a RAR file with multiple files of the same name in the same path. These steps will be similar in WinRAR. If you try to rename the file again, the file name of all files in that directory will change too.
Why would someone want to do this?
The only explanation I can think of is that the person that created this archive wanted to imitate a version control/backup system. But if you want to extract only one specific version and it isn't the first one, WinRAR extracts the wrong file. It seems I've found a very obscure WinRAR bug :-)
Edit: seems a bad explanation after finding this in the RAR documentation:
-ver[n] File version control
Forces RAR to keep previous file versions when updating
files in the already existing archive. Old versions are
renamed to 'filename;n', where 'n' is the version number.
By default, when unpacking an archive without the switch
-ver, RAR extracts only the last added file version, the name
of which does not include a numeric suffix. But if you specify
a file name exactly, including a version, it will be also
unpacked. For example, 'rar x arcname' will unpack only
last versions, when 'rar x arcname file.txt;5' will unpack
'file.txt;5', if it is present in the archive.
If you specify -ver switch without a parameter when unpacking,
RAR will extract all versions of all files that match
the entered file mask. In this case a version number is
not removed from unpacked file names. You may also extract
a concrete file version specifying its number as -ver parameter.
It will tell RAR to unpack only this version and remove
a version number from file names. For example,
'rar x -ver5 arcname' will unpack only 5th file versions.
If you specify 'n' parameter when archiving, it will limit
the maximum number of file versions stored in the archive.
Old file versions exceeding this threshold will be removed.
they are in different paths, most likely.
try outputting the full path. or see what happens when you extract them.
you'll probably see something like:
index.txt
path1/index.txt
path2/index.txt
etc etc