df = pd.read_csv("C:\Users\DJ_PRATIK28\Downloads\titanic.xlsx") - pandas

i want to load the csv file in dataframes i'll try many option but is not working
df = pd.read_csv(r"C:\Users\DJ_PRATIK28\Downloads\titanic.xlsx","r", encoding="utf-8")

Use pd.read_excel:
df = pd.read_excel(r"C:\Users\DJ_PRATIK28\Downloads\titanic.xlsx")

Related

how to consolidate series data and make new dataframe in pandas?

I've got dataframe like this
original data
and I hope to have new dataframe like below
new data
How can I create code for this modification?
It need to consolidate first series data and create new dataframe.
Some imports:
import pandas as pd
import numpy as np
Here we create dataframe from data you provided:
df = pd.DataFrame({
"a" : [
'A2C02158300', 'D REC/BAS16-03W 100V 250mA SOD323 0s SMD', 'D201,D206,D218,D219,D222,D302,D308,D408', 'D409,D501,D502,D505,D506,D507,D508',
'A2C02250500', 'T BIP/PUMD3,SOT363,SMD SOLDERING', 'T209,T501,T502'
]
})
df.head(10)
Output:
Then we prepare dataframe with first 2 columns:
s1 = df.iloc[::4, :]
s1.reset_index(drop=True, inplace=True)
s2 = df.iloc[1::4, :]
s2.reset_index(drop=True, inplace=True)
df = pd.DataFrame({
'a': s1['a'],
'b': s2['a']
})
After that prepare and add third column:
s3 = df.iloc[2::4, :]
s3.reset_index(drop=True, inplace=True)
s3 = s3['a'].str.split(',').apply(pd.Series, 1).stack()
s3.index = s3.index.droplevel(-1)
s3.name = 'c'
df = df.join(s3)
df.reset_index(drop=True, inplace=True)
df
Output:

I want to replace $ with 1- in a pandas series ,how can i do that

import pandas as pd
df=pd.Series(['12', '-$10', '$10,000'])
df.replace(to_replace='$', value=None ,method='bfill')
You can try this:
df = pd.Series(['12', '-$10', '$10,000'])
df = df.to_frame()
df[0] = df[0].str.replace('$', '1')
print(df)

Loading/analyzing a bunch of text files in Pandas/SQL

I have a few thousand files of text and would like to analyze them for trends/word patterns, etc. I am familiar with both Pandas and SQL but am not sure how to "load" all these files into a table/system such that I can run code on them. Any advice?
If you have all the same columns in all the text files you can use something like this.
import pandas as pd
import glob
path = r'C:/location_rawdata_files'#use the path where you stored all txt's
all_files = glob.glob(path + "/*.txt")
lst = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None)
lst .append(df)
df= pd.concat(lst, axis=0, ignore_index=True)

Concatenate CSVs into XLSX files based on filename (pandas)

I have a bunch of CSVs with names '<3-letter-string> YYYY.csv'. There are four different versions of <3-letter-string>, and I want to sort the csvs into four xlsxs, each identified by that three letter string.
My code:
import pandas as pd
import os
full_df = pd.DataFrame()
for filename in os.listdir('C:/Users/XXXXXX/ZZZZZZ'):
if filename.endswith(".csv"):
print(filename)
df = pd.read_csv(filename, skiprows=1, names=['ID','Units Sold','Retail Dollars'])
df['Year'] = filename[-8:-4]
full_df = pd.concat([full_df, df])
full_df.to_excel(filename[0:3] + '.xlsx', index=False)
This makes four different xlsxs, which is what I want, but they're all a mixture of the different csvs.
How do I tell pandas to group them into four separate xlsxs according to the filename? My initial thought is to include filename slicing in the penultimate line and create four different concatenated full_df dataframes to write separately, but I'm not sure how.
import pandas as pd
import os
def Get_Yo_Fantasy_Hennnnnyyyyy():
full_df = pd.DataFrame()
for filename in os.listdir("path"):
if filename.endswith(".csv"):
print(filename)
df = pd.read_csv(
filename,
skiprows=1,
names=["ID", "Units Sold", "Retail Dollars"])
df["Year"] = filename[-8:-4]
df["Type"] = filename[0:3]
full_df = pd.concat([full_df, df])
for i in list(full_df.Type.unique()):
full_df[full_df.Type.str.contains(i)].to_excel(
"{}".format(i) + ".xlsx", index=False)
Get_Yo_Fantasy_Hennnnnyyyyy()

How to specify column type(I need string) using pandas.to_csv method in Python?

import pandas as pd
data = {'x':['011','012','013'],'y':['022','033','041']}
Df = pd.DataFrame(data = data,type = str)
Df.to_csv("path/to/save.csv")
There result I've obtained seems as this
To achieve such result it will be easier to export directly to xlsx file, even without setting dtype of DataFrame.
import pandas as pd
writer = pd.ExcelWriter('path/to/save.xlsx')
data = {'x':['011','012','013'],'y':['022','033','041']}
Df = pd.DataFrame(data = data)
Df.to_excel(writer,"Sheet1")
writer.save()
I've tried also some other methods like prepending apostrophe or quoting all fields with ", but it gave no effect.