Save the file in a different folder using python - pandas

I have a pandas dataframe and I would like to save it as a text file to another folder. What I tried so far?
import pandas as pd
df.to_csv(path = './output/filename.txt')
This does not save the file and gives me an error. How do I save the dataframe (df) into the folder called output?

the first arguement name of to_csv() is path_or_buf either change it or just remove it
df.to_csv('./output/filename.txt')

Related

combine all csv in various subfolders in pandas or in powershell/terminal and create a pandas dataframe

I have individual csv files within each subfolders of subfolders. From year to months, and within each month folder are day folders, and within each day, is the individual csv. I would like to combine all of the individual csv into one and create a pandas df.
In the tree diagram, it looks like this:
I tried this approach below but nothing was created:
import pandas as pd
import glob
path = r'~/root/up/to/the/folder/2022'
alldata = glob.glob(path + "each*.csv")
alldata.head()
I initially had it just looking for "each*.csv" files but realized there is something missing in between in order to get individual csv within each folder. Then maybe, a for loop will work. like loop through each folder within each subfolder, but that is where I am stucked right now.
The answer to this: Combining separate daily CSVs in pandas shows files that are in the same folder.
I tried to make sense on this answer: batch file to concatenate all csv files in a subfolder for all subfolders, but it just won't click on me.
I also tried the following as suggested in Python importing csv files within subfolders
import os
import pandas as pd
path = '<Insert Path>'
file_extension = '.csv'
csv_file_list = []
for root, dirs, files in os.walk(path):
for name in files:
if name.endswith(file_extension):
file_path = os.path.join(root, name)
csv_file_list.append(file_path)
dfs = [pd.read_csv(f) for f in csv_file_list]
but nothing is showing, I think there is something wrong with the path to redirect as shown in the tree above.
Or maybe there is a following step I need to do because when I ran dfs.head() it says AttributeError: 'list' object has no attribute 'head'
The following should work:
from pathlib import Path
import pandas as pd
csv_folder = Path('.') # path to your folder, e.g. to `2022`
df = pd.concat(pd.read_csv(p) for p in csv_folder.glob('**/*.csv'))
Alternatively, if you prefer you can also use glob.glob('**/*.csv', recursive=True) instead of the Path.glob method.

how to read data from multiple folder from adls to databricks dataframe

file path format is data/year/weeknumber/no of day/data_hour.parquet
data/2022/05/01/00/data_00.parquet
data/2022/05/01/01/data_01.parquet
data/2022/05/01/02/data_02.parquet
data/2022/05/01/03/data_03.parquet
data/2022/05/01/04/data_04.parquet
data/2022/05/01/05/data_05.parquet
data/2022/05/01/06/data_06.parquet
data/2022/05/01/07/data_07.parquet
how to read all this file one by one in data bricks notebook and store into the data frame
import pandas as pd
#Get all the files under the folder
data = dbutils.fs.la(file)
df = pd.DataFrame(data)
#Create the list of file
list = df.path.tolist()
enter code here
for i in list:
df = spark.read.load(path=f'{f}*',format='parquet')
i can able to read only the last file skipping the other file
The last line of your code cannot load data incrementally. In contrast, it refreshes df variable with the data from each path for each time it ran.
Removing the for loop and trying the code below would give you an idea how file masking with asterisks works. Note that the path should be a full path. (I'm not sure if the data folder is your root folder or not)
df = spark.read.load(path='/data/2022/05/*/*/*.parquet',format='parquet')
This is what I have applied from the same answer I shared with you in the comment.

Trying to combine all csv files in directory into one csv file

My directory structure is as follows:
>Pandas-Data-Science
>Demo
>SalesAnalysis
>Sales_Data
>Sales_April_2019.csv
>Sales_August_2019.csv
....
>Sales_December_2019.csv
So Demo is a new python file I made and I want to take all the csv files from Sales_Data and create one csv file in Demo.
I was able to make a csv file for any particular csv file from Sales_Data
df = pd.read_csv('./SalesAnalysis/Sales_Data/Sales_August_2019.csv')
So I figured if I just get the file name and iterate through it I can concatenate it all into an empty csv file:
import os
import pandas as pd
df = pd.DataFrame(list())
df.to_csv('one_file.csv')
files = [f for f in os.listdir('./SalesAnalysis/Sales_Data')]
for f in files:
current = pd.read_csv("./SalesAnalysis/Sales_Data/"+f)
So my thinking was that current will create a single csv file since f prints out the exact string required ie. Sales_August_2019.csv
However I get an error with current that says: No such file or directory: './SalesAnalysis/Sales_Data/Sales_April_2019.csv'
when clearly I was able to make a csv file with the exact same string. So why does my code not work?
Probably a problem with your current working directory not being what you expect. I prefer to do these operations with absolute path which makes it easier for debugging:
from pathlib import Path
path = Path('./SalesAnalysis/Sales_Data').resolve()
current = [pd.read_csv(file) for file in path.glob('*.csv')]
demo = pd.concat(current)
You can set a breakpoint to find out what path is exactly.
try this:
import os
import glob
import pandas as pd
os.path.expanduser("/mydir")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')

How to iterate over files extract file name and pass to pandas logic

I have a folder called "before_manipulation ".
It contains 3 CSV files with names File_A.CSV, File_B.CSV ,File_C.CSV
Current_path : c:/users/before_manipulation [file_A.CSV, File_B.CSV,File_C.CSV]
I have a data manipulation that I need to do in each of the files and after manipulation ,I need to save with the same file names in another directory.
Targeted_path : C:/users/after_manipulation [file_A.CSV, File_B.CSV,File_C.CSV]
I have the logic to do the data manipulation when there is only a single file with Pandas dataframe. When I have multiple files, how to read each file and its name and pass it to my logic ?
Pseudo Code of how I am working if there was one file.
import pandas as pd
df = pd.read_csv('c:/users/before_manipulation/file_A.csv')
... do logic/manipulation
df.to_csv('c:/users/after_manipuplation/file_A.csv')
any help is appreciated.
You can use os.listdir(<path>) to return a list of the files contained within a directory. If you do not pass a variable to <path> it will return the working directory listing.
With the list from os.listdir you can iterate over it, passing the capture filename to the function you already have for data manipulation. Then on the save to you can use the captured filename to save in your desired directory.
In summary the code would look something like this.
import os
import pandas as pd
in_dir = r'c:/users/before_manipulation/'
out_dir = r'c:/users/after_manipulation/'
files_to_run = os.listdir(in_dir)
for file in files_to_run:
print('Running {}'.format(in_dir + file))
df = pd.read_csv(in_dir + file)
...do your logic here to return the changed df you want to save
...
df.to_csv(out_dir + file)
For this to work you would need to have the same shape files for each file you have in the directory, and also you would need to want to do the same logic for each file.
If that is not the case you will need something like a dictionary to save the different manipulations you need to do based on the file name and call those when appropriate.
Assuming you have some logic that works for one file, I'd just put that logic into a function and run it on a for loop.
You'd end up with something like this:
directory = r'c:/users/before_manipulation'
files = ['file_A.CSV', 'File_B.CSV','File_C.CSV']
for file in files:
somefunction(directory + '/' + file)
If you need more info on functions I'd check this out: https://www.w3schools.com/python/python_functions.asp
using pathlib
from pathlib import Path
new_dir = '\\your_path'
files = [file for file in Path(your_dir).glob('*.csv')]
for file in files:
df = pd.read_csv(file)
# .. your logic
df.to_csv(f'{new_dir}\\{file.name}',index=False)

Cannot open a csv file

I have a csv file on which i need to work in my jupyter notebook ,even though i am able to view the contents in the file using the code in the picture
When i am trying to convert the data into a data frame i get a "no columns to parse from file error"
i have no headers. My csv file looks like this and also i have saved it in the UTF-8 format
Try to use pandas to read the csv file:
df = pd.read_csv("BON3_NC_CUISINES.csv)
print(df)