pyunpack Archive.extractall hangs on password protected .rar file - rar

I'm writing a python script to process some old archives that include many .rar files. I can successfully extract data from them using Archive.extractall from pyunpack, but haven't been able to find much documentation. The documentation link at https://pypi.org/project/pyunpack/0.1.2/ is broken. Everything was going well until the program hung on a specific .rar file. I looked at the file with 7zip, and it said it was password protected. I have no idea what the password is, so I would like the script to simply skip it. I already have extractall in a try block, but I guess there is no exception. How can I test if the file is password protected and not try to extract it if so? Here's the relevant portion of my code.
import os
import rarfile
from pyunpack import Archive
for subdir,dirs,files in os.walk(directoryToExtract):
for file in files:
if rarfile.is_rarfile(os.path.join(subdir, file)):
print("Found rarfile " + os.path.join(subdir, file))
try:
Archive(os.path.join(subdir, file)).extractall(os.path.join(subdir, file[0:len(file)-4]),auto_create_dir=True)
except Exception as e:
print("ERROR: Couldn't unrar " + os.path.join(subdir, file))
print(str(e))

Related

How to extract .sql file that seems to be a .zip

I have received a file from a customer. The file is said to be
SQL code (application/sql)
However, this has turned out to be wrong: nothing could open it. It turns out it was secretely a .zip file. By renaming it to '.zip' and manually extracting it I was able to get the files contained in it. I would like to do a similar process in python.
So far I've renamed the file:
file_name_zip = file_name.replace('.sql', '.zip')
os.rename(file_name, file_name_zip)
And I've tried extracting it:
zip_ref = zipfile.ZipFile(file_name_zip, 'r')
zip_ref.extractall(extracted_file)
However, this failed because
zipfile.BadZipFile: File is not a zip file
I've googled, and apparently this can sometimes be fixed using:
zip_file_name_2 = zip_file_name.replace('.zip', '2.zip')
os.system(f'zip -FF {zip_file_name} --out {zip_file_name_2}')
This required me to put in a bunch of settings, which I wasn't able to figure out. There must be a better way to go about this.
Does anybody know how to parse such an .sql file?

How to open and read a .gz file in Nim (preferably line by line)

I just sat down to write my first Nim script to parse a .vcf (Variant Call Format) file. This file format stores genetic mutations from sequencing data.
For scripting languages, I 'grew up' on Perl and later migrated to Python, but I would love to use a language with the speed that Nim offers. I realize Nim is still young, but I couldn't even find a clear example for how to open and read a .gz (gzip) file (preferably line by line).
Can anyone provide a simple example to open and read a gzip file using Nim, line by line?
In Python, I'm accustomed to the following (uber-simple) code:
import gzip
my_file = gzip.open('my_file.vcf.gz', 'w')
for line in my_file:
# do something
my_file.close()
I have seen related questions, but they're not clear. The posts are also relatively old and I hope/suspect something better has come about. Here's what I've found:
Read gzip-compressed file line by line
File, FileStream, and GZFileStream
Reading files from tar.gz archive in Nim
Really appreciate it.
P.S. I also think it would be useful if someone created a Nim tag in StackOverflow. I do not have the reputation to create tags.
Just in case you need to handle VCF rather than .gz, there's a nice wrapper for htslib written by Brent Pedersen:
https://github.com/brentp/hts-nim
You need to install the htslib in your system, and then require the library in your .nimble file with requires "hts", or install the library with nimble install hts. If you are going to do NGS analysis in Nim you'll need it.
The code you need:
import hts
var v:VCF
doAssert open(v, "myfile.vcf.gz")
# Here you have the VCF file loaded in v, and can access the headers through
# v.header property
for record in v:
# Here you get a Record object per line, e.g. extract the Ref and Alts:
echo v.REF, " ", v.ALT
v.close()
Be sure to follow the docs, because some things differ from python, specially when getting the INFO and FORMAT fields.
Checkout the whole Brent repo. It has plenty of wrappers, code samples and utilities to handle NGS problems (e.g. an ultrafast coverage tool utility called Mosdepth).
Per suggestion from Maurice Meyer, I looked at the tests for the Nim zip package. It turned out to be quite simple. This is my first Nim script, so my apologies if I didn't follow convention, etc.
import zip/gzipfiles # Import zip package
block:
let vcf = newGzFileStream("my_file.vcf.gz") # Open gzip file
defer: outFile.close() # Close file (like a 'final' statement in 'try' block)
var line: string # Declare line variable
# Loop over each line in the file
while not vcf.atEnd():
line = vcf.readLine()
# Cure disease with my VCF file
To install the zip package, I simply ran because it is already in the Nim package library:
> nimble refresh
> nimble install zip
I tried to use Nim some time ago to parse a fastq or fastq.gz file.
The code should be available here:
https://gitlab.pasteur.fr/bli/qaf_demux/blob/master/Nim/src/qaf_demux.nim
I don't remember exactly how this works, but apparently, I did an import zip/gzipfiles and used newGZFileStream on the input file name to obtain a Stream from which lines can be read using .readLine() in this piece of code:
proc fastqParser(stream: Stream): iterator(): Fastq =
result = iterator(): Fastq =
var
nameLine: string
nucLine: string
quaLine: string
while not stream.atEnd():
nameLine = stream.readLine()
nucLine = stream.readLine()
discard stream.readLine()
quaLine = stream.readLine()
yield [nameLine, nucLine, quaLine]
It is used in something that amounts to this piece of code:
let inputFqs = fastqParser(newGZFileStream($inFastqFilename))
Hopefully you can adapt this to your case.
My .nimble file has a requires "zip#head". I suppose this triggers the installation of zip/gzipfiles.

How to read a Bunch of files in a directory in lua

I have a path (as a string) to a directory. In that directory, are a bunch of text files. I want to go to that directory open it, and go to each text file and read the data.
I've tried
f = io.open(path)
f:read("*a")
I get the error "nil Is a directory"
I've tried:
f = io.popen(path)
I get the error: "Permission denied"
Is it just me, but it seems to be a lot harder than it should be to do basic file io in lua?
A directory isn't a file. You can't just open it.
And yes, lua itself has (intentionally) limited functionality.
You can use luafilesystem or luaposix and similar modules to get more features in this area.
You can also use the following script to list the names of the files in a given directory (assuming Unix/Posix):
dirname = '.'
f = io.popen('ls ' .. dirname)
for name in f:lines() do print(name) end

Preventing overwriting when using numpy.savetxt

Is there built error handling for prevent overwriting a file when using numpy.savetxt?
If 'my_file' already exists, and I run
numpy.savetxt("my_file", my_array)
I want an error to be generated telling me the file already exists, or asking if the user is sure they want to write to the file.
You can check if the file already exists before you write your data:
import os
if not os.path.exists('my_file'): numpy.savetxt('my_file', my_array)
You can pass instead of a filename a file handle to np.savetxt(), e.g.,
import numpy as np
a = np.random.rand(10)
with open("/tmp/tst.txt", 'w') as f:
np.savetxt(f,a)
So you could write a helper for opening the file.
Not in Numpy. I suggest writing to a namedTemporaryFile and checking if the destination file exists. If not, rename the file to a concrete file on the system. Else, raise an error.
Not an error handler, but it's possible to create a new version in the form of:
file
filev2
filev2v3
filev2v3v4
so that no file ever gets overwritten.
n=2
while os.path.exists(f'{file}.txt'):
file = file + f'v{n}'
n+=1

Error trying rename a file

Hi community!
I have an application in VB.Net, in the user's computer is located in program files.
The users run always the program as an Administrators.
But in some cases; when the program try to rename a file in the program files the program throws the following exception:
The given path's format is not supported.
SOURCE = System.Security.Util.StringExpressionSet.CanonicalizePath
Also, happens when I try to copy a file.
The application does the rename or copy automatically and it's the same name for all the users
Example:
Rename(vOld, vNew)
FileCopy(vOld, vNew)
This exception only happen in Win7.
Somebody have an idea what is the reason to some users appear this exception?
This will happen when the user provides an invalid file name, for example one that includes colons.
You should validate that the user-entered file name does not contain any of the values in System.IO.Path.GetInvalidPathChars.
All it's my fault!
-_-'
I'm trying to rename this path:
C:\_MyFile.xlsx
To:
C:\MyFile.xlsx
In my computer all works fine because I have the both files (The users only has the file with the underscore).
When the program try to validate it try to rename the file "_C:\MyFile.xlsx" to "C:\MyFile.xlsx"
The exception don't give much information about my error...