I have excel file with special character. I want to write the DataFrame without the double quote, but receive error. Help is very much appreciated.
To generate operation commands from excel to text format
from pandas import DataFrame
import pandas as pd
filename = r'In_file.xlsx'
df = pd.read_excel(filename, header=None)
df1 = df[0] + ' ' + df[1] + ' ' + df[2]
df1.to_csv('out_file3.txt', index=False, header=False, quoting=csv.QUOTE_NONE)
Error:
NameError Traceback (most recent call last)
<ipython-input-9-70ff5701bfb8> in <module>
9 df1 = df[0] + ' ' + df[1] + ' ' + df[2]
10
---> 11 df1.to_csv('out_file3.txt', index=False, header=False, quoting=csv.QUOTE_NONE)
> NameError: name 'csv' is not defined
You're missing the csv package import:
import csv # <- HERE!
from pandas import DataFrame
import pandas as pd
filename = r'In_file.xlsx'
df = pd.read_excel(filename, header=None)
df1 = df[0] + ' ' + df[1] + ' ' + df[2]
df1.to_csv('out_file3.txt', index=False, header=False, quoting=csv.QUOTE_NONE)
Related
I am reading a txt file for search variable.
I am using this variable to find it in a dataframe.
for lines in lines_list:
sn = lines
if sn in df[df['SERIAL'].str.contains(sn)]:
condition = df[df['SERIAL'].str.contains(sn)]
df_new = pd.DataFrame(condition)
df_new.to_csv('try.csv',mode='a', sep=',', index=False)
When I check the try.csv file, it has much more lines the txt file has.
The df has a lots of lines, more than the txt file.
I want save the whole line from search result into a dataframe or file
I tried to append the search result to a new dataframe or csv.
first create line list
f = open("text.txt", "r")
l = list(map(lambda x: x.strip(), f.readlines()))
write this apply func has comparing values and filtering
def apply_func(x):
if str(x) in l:
return x
return np.nan
and get output
df["Serial"] = df["Serial"].apply(apply_func)
df.dropna(inplace=True)
df.to_csv("new_df.csv", mode="a", index=False)
or try filter method
f = open("text.txt", "r")
l = list(map(lambda x: x.strip(), f.readlines()))
df = df.set_index("Serial").filter(items=l, axis=0).reset_index()
df.to_csv("new_df.csv", mode="a", index=False)
I have a pandas colum which has special characters such as {{,}},[,],,. (commas are separators).
I tried using the following to replace the special characters with an underscore ('_'), but it is not working. Can you please let me know what I am doing wrong? Thanks.
import pandas as pd
data = [["facebook_{{campaign.name}}"], ["google_[email]"]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Marketing'])
print(df)
df['Marketing'].str.replace(r"\(|\)|\{|\}|\[|\]|\|", "_")
print(df)
Output:
Marketing
0 facebook_{{campaign.name}}
1 google_[email]
Marketing
0 facebook_{{campaign.name}}
1 google_[email]
From this DataFrame :
>>> import pandas as pd
>>> data = [["facebook_{{campaign.name}}"], ["google_[email]"]]
>>> df = pd.DataFrame(data, columns = ['Marketing'])
>>> df
Marketing
0 facebook_{{campaign.name}}
1 google_[email]
We can use replace as you suggested with a regex, including | which is a or operator except for the final \| which is the symbol |.
Then we deduplicate the double _ and we remove the final remaining _ to get the expected result :
>>> df['Marketing'] = df['Marketing'].str.replace(r"\(+|\)+|\{+|\}+|\[+|\]+|\|+|\_+|\.+", "_", regex=True).str.replace(r"_+", "_", regex=True).str.replace(r"_$", "", regex=True)
>>> df
0 facebook_campaign_name
1 google_email
Name: Marketing, dtype: object
I wrote a query in the data frame and want to save it in CSV file
I tried this code and didn't work
q1 = "SELECT * FROM df1 join df2 on df1.Date = df2.Date"
df = pd.read_sql(q1,None)
df.to_csv('data.csv',index=False)
You can try following code:
import pandas as pd
df1 = pd.read_csv("Insert file path")
df2 = pd.read_csv("Insert file path")
df1['Date'] = pd.to_datetime(df1['Date'] ,errors = 'coerce',format = '%Y-%m-%d')
df2['Date'] = pd.to_datetime(df2['Date'] ,errors = 'coerce',format = '%Y-%m-%d')
df = df1.merge(df2,how='inner', on ='Date')
df.to_csv('data.csv',index=False)
This should solve your problem.
A column in the dataframe looks like DD/MM/YYYY format.
I want to slice it and rearrange to MM/DD/YYYY (for calculation)
I have tried:
import pandas as pd
from io import StringIO
csvfile = StringIO("""
DD/MM/YYYY
01/05/2020
21/02/2021
19/06/2021
05/06/2021
11/06/2021
10/05/2021
")
df = pd.read_csv(csvfile, sep = ',', engine='python')
df['DD/MM/YYYY'] = df['DD/MM/YYYY'].astype(str)
df['MM/DD/YYYY'] = df['DD/MM/YYYY'][3:5] + '/' + df['DD/MM/YYYY'][:2] + '/' + df['DD/MM/YYYY'][-4:]
# df['MM/DD/YYYY'] = pd.to_datetime(df['DD/MM/YYYY'][3:5] + '/' + df['DD/MM/YYYY'][:2] + '/' + df['DD/MM/YYYY'][-4:])
print (df)
But it doesn't work. What would be the right way to write it? Thank you!
Use .str:
df['MM/DD/YYYY'] = df['DD/MM/YYYY'].str[3:5] + '/' + df['DD/MM/YYYY'].str[:2] + '/' + df['DD/MM/YYYY'].str[-4:]
If possible you can parse datetimes by original format in specified in format='%d/%m/%Y' and then add Series.dt.strftime:
df['MM/DD/YYYY'] = pd.to_datetime(df['DD/MM/YYYY'], format='%d/%m/%Y').st.strftime('%m/%d/%Y')
For some reason, the following code only returns a dataframe of 10 rows instead of 20 (there are millions of rows in the SQL view).
When I viewed the output from print(data2), it showed the first 10 rows as a DataFrame, but the next DataFrame was empty.
import cx_Oracle as cx
import pandas as pd
conn = cx.Connection("username/pwd#server")
data = pd.DataFrame([])
SQL1 = '''SELECT * FROM TABLE_MV where rownum between '''
for i in range(1, 20, 10):
lower = i
upper = i+9
SQL3 = SQL1 + str(lower) + ' and ' + str(upper)
data2 = pd.read_sql(SQL3, conn)
print(data2)
data = data.append(data2)