i want to load the csv file in dataframes i'll try many option but is not working
df = pd.read_csv(r"C:\Users\DJ_PRATIK28\Downloads\titanic.xlsx","r", encoding="utf-8")
Use pd.read_excel:
df = pd.read_excel(r"C:\Users\DJ_PRATIK28\Downloads\titanic.xlsx")
Related
I've got dataframe like this
original data
and I hope to have new dataframe like below
new data
How can I create code for this modification?
It need to consolidate first series data and create new dataframe.
Some imports:
import pandas as pd
import numpy as np
Here we create dataframe from data you provided:
df = pd.DataFrame({
"a" : [
'A2C02158300', 'D REC/BAS16-03W 100V 250mA SOD323 0s SMD', 'D201,D206,D218,D219,D222,D302,D308,D408', 'D409,D501,D502,D505,D506,D507,D508',
'A2C02250500', 'T BIP/PUMD3,SOT363,SMD SOLDERING', 'T209,T501,T502'
]
})
df.head(10)
Output:
Then we prepare dataframe with first 2 columns:
s1 = df.iloc[::4, :]
s1.reset_index(drop=True, inplace=True)
s2 = df.iloc[1::4, :]
s2.reset_index(drop=True, inplace=True)
df = pd.DataFrame({
'a': s1['a'],
'b': s2['a']
})
After that prepare and add third column:
s3 = df.iloc[2::4, :]
s3.reset_index(drop=True, inplace=True)
s3 = s3['a'].str.split(',').apply(pd.Series, 1).stack()
s3.index = s3.index.droplevel(-1)
s3.name = 'c'
df = df.join(s3)
df.reset_index(drop=True, inplace=True)
df
Output:
import pandas as pd
df=pd.Series(['12', '-$10', '$10,000'])
df.replace(to_replace='$', value=None ,method='bfill')
You can try this:
df = pd.Series(['12', '-$10', '$10,000'])
df = df.to_frame()
df[0] = df[0].str.replace('$', '1')
print(df)
I have a few thousand files of text and would like to analyze them for trends/word patterns, etc. I am familiar with both Pandas and SQL but am not sure how to "load" all these files into a table/system such that I can run code on them. Any advice?
If you have all the same columns in all the text files you can use something like this.
import pandas as pd
import glob
path = r'C:/location_rawdata_files'#use the path where you stored all txt's
all_files = glob.glob(path + "/*.txt")
lst = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None)
lst .append(df)
df= pd.concat(lst, axis=0, ignore_index=True)
I have a bunch of CSVs with names '<3-letter-string> YYYY.csv'. There are four different versions of <3-letter-string>, and I want to sort the csvs into four xlsxs, each identified by that three letter string.
My code:
import pandas as pd
import os
full_df = pd.DataFrame()
for filename in os.listdir('C:/Users/XXXXXX/ZZZZZZ'):
if filename.endswith(".csv"):
print(filename)
df = pd.read_csv(filename, skiprows=1, names=['ID','Units Sold','Retail Dollars'])
df['Year'] = filename[-8:-4]
full_df = pd.concat([full_df, df])
full_df.to_excel(filename[0:3] + '.xlsx', index=False)
This makes four different xlsxs, which is what I want, but they're all a mixture of the different csvs.
How do I tell pandas to group them into four separate xlsxs according to the filename? My initial thought is to include filename slicing in the penultimate line and create four different concatenated full_df dataframes to write separately, but I'm not sure how.
import pandas as pd
import os
def Get_Yo_Fantasy_Hennnnnyyyyy():
full_df = pd.DataFrame()
for filename in os.listdir("path"):
if filename.endswith(".csv"):
print(filename)
df = pd.read_csv(
filename,
skiprows=1,
names=["ID", "Units Sold", "Retail Dollars"])
df["Year"] = filename[-8:-4]
df["Type"] = filename[0:3]
full_df = pd.concat([full_df, df])
for i in list(full_df.Type.unique()):
full_df[full_df.Type.str.contains(i)].to_excel(
"{}".format(i) + ".xlsx", index=False)
Get_Yo_Fantasy_Hennnnnyyyyy()
import pandas as pd
data = {'x':['011','012','013'],'y':['022','033','041']}
Df = pd.DataFrame(data = data,type = str)
Df.to_csv("path/to/save.csv")
There result I've obtained seems as this
To achieve such result it will be easier to export directly to xlsx file, even without setting dtype of DataFrame.
import pandas as pd
writer = pd.ExcelWriter('path/to/save.xlsx')
data = {'x':['011','012','013'],'y':['022','033','041']}
Df = pd.DataFrame(data = data)
Df.to_excel(writer,"Sheet1")
writer.save()
I've tried also some other methods like prepending apostrophe or quoting all fields with ", but it gave no effect.