My goal is to do following:
I am using Win 10 and I have files like so:
folder
2020-04-23_19-30-52_UTC.mp4
2020-04-23_19-30-52_UTC.txt which contains string "This video is me at a wedding"
2020-05-25_19-30-52_UTC.mp4
2020-05-25_19-30-52_UTC.txt which contains string "This video is dogwalk at the sunset"
where .txt contains the name of the mp4 from the same date and I want to do the following:
folder
This video is me at a wedding.mp4
2020-04-23_19-30-52_UTC.txt
This video is dogwalk at the sunset.mp4
2020-05-25_19-30-52_UTC.txt
there is a few ways how to achieve this but I am not that good with coding. My only priority is to have it done and I am for now not limited to use of any tool or programming language.
Thanks
I'd tackle this problem with Python.
import os
dir = ('[path to original folder]')
files = os.listdir(dir)
# Iterate through all the files in the folder
for path in files:
filetype = path[-4:] # Grabs last 4 characters of the filepath
# Checks if it's a textfile
if (filetype == '.txt'):
f = open(os.path.join(dir, path), "r") # open the textfile
new_name = f.read() # grab the description
f.close() # close the textfile
new_name = new_name + '.mp4' # Add proper filetype
path = path[:-4] # Throws away the last 4 characters of the filepath
path = path + '.mp4' # Add proper filetype
os.rename(os.path.join(dir, path), os.path.join(dir, new_name)) # Rename
If any more issues arise please let me know so I can help.
Related
I want to extract text from several images.
I want to do it in colab.
I know how to do it with one image:https://github.com/bhadreshpsavani/ExploringOCR/blob/master/OCRusingTesseract.ipynb
But how to do it in a cycle, because I have more than a hundred pictures?
Thanks in advance!
I uploaded my images in colab.research in root directory and resolved this task with following code:
image_ext = ['.jpg', '.png', '.jpeg']
directory = '/'
for file in os.listdir(directory):
ext = os.path.splitext(file)[-1].lower()
if ext not in image_ext:
continue
filename = os.path.join(directory, file)
extracted_information = pytesseract.image_to_string(Image.open(filename))
print(extracted_information)
I am doing some processing on wave audio files using Tensorflow
and saving them using the tf.print with output_stream option.
pcm =contrib_audio.encode_wav(processed_audio,16000)
tf.print(output_stream="file:///tmp/test.wav",summarize=-1)
The problem is that I am not able to change value of /tmp/test.wav dynamically
so that multiple wave files are stored.
Kindly refer to the below code.
# Using a counter
for i in range(1,10):
fname = "test_"+str(i)+".wav" #filename
path = "//content/sample_data/" #path to save
fname = "file://{path}{fname}".format(fname=fname, path = path)
tf.print(output_stream=fname,summarize=-1)
You can create a dynamic text, for it to be a unique filename.
I have a folder called "before_manipulation ".
It contains 3 CSV files with names File_A.CSV, File_B.CSV ,File_C.CSV
Current_path : c:/users/before_manipulation [file_A.CSV, File_B.CSV,File_C.CSV]
I have a data manipulation that I need to do in each of the files and after manipulation ,I need to save with the same file names in another directory.
Targeted_path : C:/users/after_manipulation [file_A.CSV, File_B.CSV,File_C.CSV]
I have the logic to do the data manipulation when there is only a single file with Pandas dataframe. When I have multiple files, how to read each file and its name and pass it to my logic ?
Pseudo Code of how I am working if there was one file.
import pandas as pd
df = pd.read_csv('c:/users/before_manipulation/file_A.csv')
... do logic/manipulation
df.to_csv('c:/users/after_manipuplation/file_A.csv')
any help is appreciated.
You can use os.listdir(<path>) to return a list of the files contained within a directory. If you do not pass a variable to <path> it will return the working directory listing.
With the list from os.listdir you can iterate over it, passing the capture filename to the function you already have for data manipulation. Then on the save to you can use the captured filename to save in your desired directory.
In summary the code would look something like this.
import os
import pandas as pd
in_dir = r'c:/users/before_manipulation/'
out_dir = r'c:/users/after_manipulation/'
files_to_run = os.listdir(in_dir)
for file in files_to_run:
print('Running {}'.format(in_dir + file))
df = pd.read_csv(in_dir + file)
...do your logic here to return the changed df you want to save
...
df.to_csv(out_dir + file)
For this to work you would need to have the same shape files for each file you have in the directory, and also you would need to want to do the same logic for each file.
If that is not the case you will need something like a dictionary to save the different manipulations you need to do based on the file name and call those when appropriate.
Assuming you have some logic that works for one file, I'd just put that logic into a function and run it on a for loop.
You'd end up with something like this:
directory = r'c:/users/before_manipulation'
files = ['file_A.CSV', 'File_B.CSV','File_C.CSV']
for file in files:
somefunction(directory + '/' + file)
If you need more info on functions I'd check this out: https://www.w3schools.com/python/python_functions.asp
using pathlib
from pathlib import Path
new_dir = '\\your_path'
files = [file for file in Path(your_dir).glob('*.csv')]
for file in files:
df = pd.read_csv(file)
# .. your logic
df.to_csv(f'{new_dir}\\{file.name}',index=False)
I'm new to Python and am running into issues reading the contents of a .gz file:
I've got a folder full of .gz files that I've extracted programatically using a private API. The contents of each .gz file is a .xml file so I need to iterate over the dir and extract them.
The problem is when I programatically extract these .gz files into their respective .xml versions... The files create without error and when I open one (Using TextWrangler) it looks like a regular .xml file, but NOT when I view it in a hex editor. Also, when I open the .xml file programatically and print it's contents, it shows up as a bunch of (binary?) jumbled text.
With the above in mind, If I manually extract one of the files (ie: using OSX, but not Python), the file is viewable in a hex editor as I'd expect it to be.
Here is my code snippet (appropriate imports not shown, but they are glob and gzip):
searchpattern = siteid + "_" + resource + "_*.gz"
for infile in glob.glob(workingDir + searchpattern):
print infile
#read the zipped contents (https://docs.python.org/2/library/gzip.html)
f = gzip.open(infile, 'rb')
file_content = f.read()
file_content = str(file_content) #This was an attempt to fix
print file_content # This shows a bunch of mumbo jumbo
#write the contents we just read to a new file (uncompressed)
newfilename = infile[0:-3] # the filename without the ".gz"
newfilename = newfilename + ".xml"
fnew = open(newfilename, 'w+b')
fnew.write(str(file_content))
fnew.close()
#delete the .gz version of the file
#os.remove(infile)
If I run this against XML I don't get any issues with the program.
If I compress and XML and extract it with this program and diff the original with the output of this program I get no differences.
This program does add an extra ".xml" extension.
So this turns out to be a silly mistake on my part, but I'll post this as a followup for anybody else who makes the same mistake I did.
The problem was that i was zipping what had already been zipped earlier in my program. So with that in mind, my code snippet on this thread didn't have anything wrong with it. Neither did my code that i created the .gz file with (technically). As you can see below. Opening the file normally, instead of with the gzip library earlier in the program did the trick.
#Download and write the contents of each response to a .gz file
if limitCounter < limit or int(limit) == 0:
print _name + " " + scopeStartDate + " through " + scopeEndDate + " at " + href
file = api.get(href)
gz_file_content = file.content
#gz_file = gzip.open(workingDir + _name, "wb") # This breaks the program later
gz_file = open(workingDir + _name, 'wb') # This works.
gz_file.write(gz_file_content)
gz_file.close()
I am trying to open a file with file = filedialog.askopenfile(initialdir='./'), but I need also to know the name of the file opened for other purposes. I know that if the user selects a file, file is not None, otherwise it's something like this:
<_io.TextIOWrapper name='/Users/u/Desktop/e/config.py' mode='r' encoding='US-ASCII'>
but _io.TextIOWrapper objects are not sub-scriptable.
By suggestion, I discovered that there exists another function similar to messagebox.askopenfile, askopenfilename, that instead of opening directly the file, returns just the name of the file. If we want also to open the file, we can open and read it manually:
file_name = filedialog.askopenfilename(initialdir='./')
if file_name != '':
with open(file_name, 'r') as file:
string = ''
for line in file:
string += str(line)
print(string)
Even if this is a good way, I'm still thinking that tkinter should have provided this functionality with messagebox.askopenfile directly.
Navigating trough the Python directory we can find the filedialog.py file, which contains the specifications of both functions, which are very similar:
def askopenfilename(**options):
"Ask for a filename to open"
return Open(**options).show()
askopenfile
def askopenfile(mode = "r", **options):
"Ask for a filename to open, and returned the opened file"
filename = Open(**options).show()
if filename:
return open(filename, mode)
return None
As we can see, the first returns the result of the call to the show function, whereas the second returns an opened file.