Read contents of .gz file with python - gzip

I'm new to Python and am running into issues reading the contents of a .gz file:
I've got a folder full of .gz files that I've extracted programatically using a private API. The contents of each .gz file is a .xml file so I need to iterate over the dir and extract them.
The problem is when I programatically extract these .gz files into their respective .xml versions... The files create without error and when I open one (Using TextWrangler) it looks like a regular .xml file, but NOT when I view it in a hex editor. Also, when I open the .xml file programatically and print it's contents, it shows up as a bunch of (binary?) jumbled text.
With the above in mind, If I manually extract one of the files (ie: using OSX, but not Python), the file is viewable in a hex editor as I'd expect it to be.
Here is my code snippet (appropriate imports not shown, but they are glob and gzip):
searchpattern = siteid + "_" + resource + "_*.gz"
for infile in glob.glob(workingDir + searchpattern):
print infile
#read the zipped contents (https://docs.python.org/2/library/gzip.html)
f = gzip.open(infile, 'rb')
file_content = f.read()
file_content = str(file_content) #This was an attempt to fix
print file_content # This shows a bunch of mumbo jumbo
#write the contents we just read to a new file (uncompressed)
newfilename = infile[0:-3] # the filename without the ".gz"
newfilename = newfilename + ".xml"
fnew = open(newfilename, 'w+b')
fnew.write(str(file_content))
fnew.close()
#delete the .gz version of the file
#os.remove(infile)

If I run this against XML I don't get any issues with the program.
If I compress and XML and extract it with this program and diff the original with the output of this program I get no differences.
This program does add an extra ".xml" extension.

So this turns out to be a silly mistake on my part, but I'll post this as a followup for anybody else who makes the same mistake I did.
The problem was that i was zipping what had already been zipped earlier in my program. So with that in mind, my code snippet on this thread didn't have anything wrong with it. Neither did my code that i created the .gz file with (technically). As you can see below. Opening the file normally, instead of with the gzip library earlier in the program did the trick.
#Download and write the contents of each response to a .gz file
if limitCounter < limit or int(limit) == 0:
print _name + " " + scopeStartDate + " through " + scopeEndDate + " at " + href
file = api.get(href)
gz_file_content = file.content
#gz_file = gzip.open(workingDir + _name, "wb") # This breaks the program later
gz_file = open(workingDir + _name, 'wb') # This works.
gz_file.write(gz_file_content)
gz_file.close()

Related

Youtube-dl 'outtmpl' dynamic output

I'm trying to create a python program that allows you to dynamically choose the folder to save the download.
'outtmpl' lets you choose an output as a simple path such that
'e:/python/downloadedsongs/%(title)s.%(ext)s'
is a valid path and will let you save to the downloadedsongs folder
But With the function that defines the choose folder button
def openLocation():
global Folder_Name
Folder_Name = filedialog.askdirectory()
print(Folder_Name)
why does
f'{Folder_name}/%(title)s.%(ext)s'
not return a valid path, the program outright ignores the variable. I thought at first i was having a fstring issue because %s was pythons old format for f'strings but I have also tried these variances with no success
'outtmpl': Folder_Name + '/' + '%(title)s.%(ext)s', (no errors but cannot find output)
'outtmpl': '%(Folder_Name)s%(title)s.%(ext)s',
%()s is the old version of fstrings I was conflating the two.
answer ended up looking like 'outtmpl': Folder + '/%(title)s.%(ext)s'

How to rename multiple files from the multiple text files?

My goal is to do following:
I am using Win 10 and I have files like so:
folder
2020-04-23_19-30-52_UTC.mp4
2020-04-23_19-30-52_UTC.txt which contains string "This video is me at a wedding"
2020-05-25_19-30-52_UTC.mp4
2020-05-25_19-30-52_UTC.txt which contains string "This video is dogwalk at the sunset"
where .txt contains the name of the mp4 from the same date and I want to do the following:
folder
This video is me at a wedding.mp4
2020-04-23_19-30-52_UTC.txt
This video is dogwalk at the sunset.mp4
2020-05-25_19-30-52_UTC.txt
there is a few ways how to achieve this but I am not that good with coding. My only priority is to have it done and I am for now not limited to use of any tool or programming language.
Thanks
I'd tackle this problem with Python.
import os
dir = ('[path to original folder]')
files = os.listdir(dir)
# Iterate through all the files in the folder
for path in files:
filetype = path[-4:] # Grabs last 4 characters of the filepath
# Checks if it's a textfile
if (filetype == '.txt'):
f = open(os.path.join(dir, path), "r") # open the textfile
new_name = f.read() # grab the description
f.close() # close the textfile
new_name = new_name + '.mp4' # Add proper filetype
path = path[:-4] # Throws away the last 4 characters of the filepath
path = path + '.mp4' # Add proper filetype
os.rename(os.path.join(dir, path), os.path.join(dir, new_name)) # Rename
If any more issues arise please let me know so I can help.

Write in memory object to S3 via boto3

I am attempting to write files directly to S3 without creating a local file which is then uploaded.
I am using cStringIO to generate a file in memory, but I am having trouble figuring out the proper way to upload it in boto3.
def writetos3(sourcedata, filename, folderpath):
s3 = boto3.resource('s3')
data = open(sourcedata, 'rb')
s3.Bucket('bucketname').put_object(Key= folderpath + "/" + filename, Body=data)
Above is the standard boto3 method that I was using previously with the local file, it does not work without a local file, I get the following error: coercing to Unicode: need string or buffer, cStringIO.StringO found
.
Because the in memory file (I believe) is already considered open, I tried changing it to the code below, but it still does not work, no error is given the script simply hangs on the last line of the method.
def writetos3(sourcedata, filename, folderpath):
s3 = boto3.resource('s3')
s3.Bucket('bucketname').put_object(Key= folderpath + "/" + filename, Body=sourcedata)
Just for more info, the value I am attempting to write looks like this
(cStringIO.StringO object at 0x045DC540)
Does anyone have an idea of what I am doing wrong here?
It looks like you want this:
data = open(sourcedata, 'rb').decode()
It defaults to utf8. Also, I encourage you to run your code under python3, and to use appropriate language tags for your question.

Is it possible to get the name of the file when opened with messagebox.askopenfile

I am trying to open a file with file = filedialog.askopenfile(initialdir='./'), but I need also to know the name of the file opened for other purposes. I know that if the user selects a file, file is not None, otherwise it's something like this:
<_io.TextIOWrapper name='/Users/u/Desktop/e/config.py' mode='r' encoding='US-ASCII'>
but _io.TextIOWrapper objects are not sub-scriptable.
By suggestion, I discovered that there exists another function similar to messagebox.askopenfile, askopenfilename, that instead of opening directly the file, returns just the name of the file. If we want also to open the file, we can open and read it manually:
file_name = filedialog.askopenfilename(initialdir='./')
if file_name != '':
with open(file_name, 'r') as file:
string = ''
for line in file:
string += str(line)
print(string)
Even if this is a good way, I'm still thinking that tkinter should have provided this functionality with messagebox.askopenfile directly.
Navigating trough the Python directory we can find the filedialog.py file, which contains the specifications of both functions, which are very similar:
def askopenfilename(**options):
"Ask for a filename to open"
return Open(**options).show()
askopenfile
def askopenfile(mode = "r", **options):
"Ask for a filename to open, and returned the opened file"
filename = Open(**options).show()
if filename:
return open(filename, mode)
return None
As we can see, the first returns the result of the call to the show function, whereas the second returns an opened file.

No such file or directory when opening file in memory with ZipRuby Zip::File

I'm consuming an api that replies with a zip file in the contents of the body of the http response. I'm able to unzip the file and write each file to disk using the example at the zip-ruby wiki (https://bitbucket.org/winebarrel/zip-ruby/wiki/Home):
Zip::Archive.open('filename.zip') do |ar| # except I'm opening from a string in memory
ar.each do |zf|
if zf.directory?
FileUtils.mkdir_p(zf.name)
else
dirname = File.dirname(zf.name)
FileUtils.mkdir_p(dirname) unless File.exist?(dirname)
open(zf.name, 'wb') do |f|
f << zf.read
end
end
end
end
However, I don't want to write the files to disk. Instead, I want create an active record object, and set a paperclip file attachment:
asset = Asset.create(:name => zf.name)
asset.file = open(zf.name, 'r')
asset.save
What's odd is the open statement in the first example that writes the file to disk works consistently. However, when I want to just open the zf (Zip::File) as a generic File in memory, I will sometimes get:
*** Errno::ENOENT Exception: No such file or directory - assets/somefilename.png
How can I assign the Zip::File zipruby creates to the paperclip file without getting this error?