how to read multiple .xlsx files from multiple subfolders using python - pandas

I have one folder that includes 10-12 subfolders, from each subfolder I need to read a specific .xlsx file. I am stuck, I have got all the .xlsx files I want to use os.walk but I don't know how to proceed further.
for root,dirs,files in os.walk(path):
for name in files:
if name.endswith("abc.xlsx"):

I You would like to us os.walk, this is how.
import os
for root,dirs,files in os.walk(path):
reqfiles = [i for i in files if i.endswith("abc.xlsx")]
You can use just os.listdir.
reqfiles = [i for i in os.listdir(path) if i.endswith("abc.xlsx")]

Related

Pandas to_csv with ZIP compresses whole directory

df.to_csv("/path/to/destination.zip", compression="zip")
The above line will generate a file called destination.zip in the directory /path/to/.
Decompressing the ZIP file, will result in a directory structure path/to/destination.zip where destination.zip is the CSV file.
Why is the path/to/ folder structure included in the compressed file? Is there any way to avoid this?
Was blown away by this, currently writing the ZIP locally (destination.zip) and using os.rename to move it to the desired location.. Is this a bug ?

While loop files in folder

I'm relatively new to Netlogo and already struggling ;)
I have the following problem: I want my program to open a folder, check a file in that folder and afterwards remove that file from that folder. I figured the best way to do this is via a while loop, but I'm struggling to find the right syntax. Hope you all can help!
The command 'file-open' will open a file using the path provided (the string after file-open: e.g. file-open "C:\Documents\model-out.txt" will open a file titled model-out.txt in the Documents folder on the C drive.)
You can then use 'file-read' or 'file-write' to read or write to the file respectively.
The command 'file-close' will close the file, which then can be deleted with 'file-delete'.
You can also check if a file exists in a folder using the command if file-exists? "C:\Documents\model-out.txt", and if true, the file can be deleted using file-delete.
Also check the command 'set-current-directory'.
Best,

pandas.read_csv of a gzip file within a zipped directory

I would like to use pandas.read_csv to open a gzip file (.asc.gz) within a zipped directory (.zip). Is there an easy way to do this?
This code doesn't work:
csv = pd.read_csv(r'C:\folder.zip\file.asc.gz') // can't find the file
This code does work (however, it requires me to unzip the folder, which I want to avoid because my dataset currently contains thousands of zipped folders):
csv = pd.read_csv(r'C:\folder\file.asc.gz')
Is there an easy way to do this? I have tried using a combination of zipfile.Zipfile and read_csv, but have been unsuccessful (I think partly due to the fact that this is an ascii file as well)
Maybe the followings might help.
df = pd.read_csv('filename.gz', compression='gzip')
OR
import gzip
file=gzip.open('filename.gz','rb')
content=file.read()

Importing a *random* csv file from a folder into pandas

I have a folder with several csv files, with file names between 100 and 400 (Eg. 142.csv, 278.csv etc). Not all the numbers between 100-400 are associated with a file, for example there is no 143.csv. I want to write a loop that imports 5 random files into separate dataframes in pandas instead of manually searching and typing out the file names over and over. Any ideas to get me started with this?
You can use glob and read all the csv files in the directory.
file = glob.glob('*.csv')
random_files=np.random.choice(file,5)
dataframes= []
for fp in random_files :
dataframes.append(pd.read_csv(fp))
From this you can chose the random 5 files from directory and then read them seprately.
Hope I answer your question

How to extract .sql file that seems to be a .zip

I have received a file from a customer. The file is said to be
SQL code (application/sql)
However, this has turned out to be wrong: nothing could open it. It turns out it was secretely a .zip file. By renaming it to '.zip' and manually extracting it I was able to get the files contained in it. I would like to do a similar process in python.
So far I've renamed the file:
file_name_zip = file_name.replace('.sql', '.zip')
os.rename(file_name, file_name_zip)
And I've tried extracting it:
zip_ref = zipfile.ZipFile(file_name_zip, 'r')
zip_ref.extractall(extracted_file)
However, this failed because
zipfile.BadZipFile: File is not a zip file
I've googled, and apparently this can sometimes be fixed using:
zip_file_name_2 = zip_file_name.replace('.zip', '2.zip')
os.system(f'zip -FF {zip_file_name} --out {zip_file_name_2}')
This required me to put in a bunch of settings, which I wasn't able to figure out. There must be a better way to go about this.
Does anybody know how to parse such an .sql file?