how to save sql query result to csv in pandas - sql

I wrote a query in the data frame and want to save it in CSV file
I tried this code and didn't work
q1 = "SELECT * FROM df1 join df2 on df1.Date = df2.Date"
df = pd.read_sql(q1,None)
df.to_csv('data.csv',index=False)

You can try following code:
import pandas as pd
df1 = pd.read_csv("Insert file path")
df2 = pd.read_csv("Insert file path")
df1['Date'] = pd.to_datetime(df1['Date'] ,errors = 'coerce',format = '%Y-%m-%d')
df2['Date'] = pd.to_datetime(df2['Date'] ,errors = 'coerce',format = '%Y-%m-%d')
df = df1.merge(df2,how='inner', on ='Date')
df.to_csv('data.csv',index=False)
This should solve your problem.

Related

Creating new dataframe by search result in df

I am reading a txt file for search variable.
I am using this variable to find it in a dataframe.
for lines in lines_list:
sn = lines
if sn in df[df['SERIAL'].str.contains(sn)]:
condition = df[df['SERIAL'].str.contains(sn)]
df_new = pd.DataFrame(condition)
df_new.to_csv('try.csv',mode='a', sep=',', index=False)
When I check the try.csv file, it has much more lines the txt file has.
The df has a lots of lines, more than the txt file.
I want save the whole line from search result into a dataframe or file
I tried to append the search result to a new dataframe or csv.
first create line list
f = open("text.txt", "r")
l = list(map(lambda x: x.strip(), f.readlines()))
write this apply func has comparing values and filtering
def apply_func(x):
if str(x) in l:
return x
return np.nan
and get output
df["Serial"] = df["Serial"].apply(apply_func)
df.dropna(inplace=True)
df.to_csv("new_df.csv", mode="a", index=False)
or try filter method
f = open("text.txt", "r")
l = list(map(lambda x: x.strip(), f.readlines()))
df = df.set_index("Serial").filter(items=l, axis=0).reset_index()
df.to_csv("new_df.csv", mode="a", index=False)

How can I leave every answer from 'for'

I think my code works well.
But the problem is that my code does not leave every answer on DataFrame R.
When I print R, only the last answer appeared.
What should I do to display every answer?
I want to add answer on the next column.
import numpy as np
import pandas as pd
DATA = pd.DataFrame()
DATA = pd.read_excel('C:\gskim\P4DS/Monthly Stock adjClose2.xlsx')
DATA = DATA.set_index("Date")
DATA1 = np.log(DATA/DATA.shift(1))
DATA2 = DATA1.drop(DATA1.index[0])*100
F = pd.DataFrame(index = DATA2.index)
for i in range (0, 276):
Q = DATA2.iloc[i].dropna()
W = sorted(abs(Q), reverse = False)
W_qcut = pd.qcut(W, 5, labels = ['A', 'B', 'C', 'D', 'E'])
F = Q.groupby(W_qcut).sum()
R = pd.DataFrame(F)
print(R)
the first table is the current result, I want to fill every blank tables on the second table as a result:

Pandas how to append data from df into df2

The code bellow gets me in a loop and prints a table of data from a website.
How can i get the data from this 'output' table into a new orgaized table?
I need to get the 'Códigos de Negociação' and the 'CNPJ' into this new table.
This is a sample of the scraped table
0 1
0 Nome de Pregão FII ALIANZA
1 Códigos de Negociação ALZR11
2 CNPJ 28.737.771/0001-85
3 Classificação Setorial Financeiro e Outros/Fundos/Fundos Imobiliários
4 Site www.btgpactual.com
This is the code
import pandas as pd
list = pd.read_html('http://bvmf.bmfbovespa.com.br/Fundos-Listados/FundosListados.aspx?tipoFundo=imobiliario&Idioma=pt-br')[0]
Tickers = list['Código'].tolist()
removechars = str.maketrans('', '', './-')
for i in Tickers:
try:
df = pd.read_html("http://bvmf.bmfbovespa.com.br/Fundos-Listados/FundosListadosDetalhe.aspx?Sigla="+i+"&tipoFundo=Imobiliario&aba=abaPrincipal&idioma=pt-br")[0]
print(df)
except:
print('y')
And i would like to apply the removechars in the CNPJ, to clear it from dots, bars and dashes.
Expected result:
Código CNPJ
0 ALZR11 28737771000185
This code worked for me
import pandas as pd
list = pd.read_html('http://bvmf.bmfbovespa.com.br/Fundos-Listados/FundosListados.aspx?tipoFundo=imobiliario&Idioma=pt-br')[0]
Tickers = list['Código'].tolist()
print(list)
CNPJ = []
Codigo = []
removechars = str.maketrans('', '', './-')
for i in Tickers:
try:
df = pd.read_html("http://bvmf.bmfbovespa.com.br/Fundos-Listados/FundosListadosDetalhe.aspx?Sigla="+i+"&tipoFundo=Imobiliario&aba=abaPrincipal&idioma=pt-br")[0]
print(df)
Codigo.append(df.at[1, 1])
CNPJ.append(df.at[2,1])
df2 = pd.DataFrame({'Codigo':Codigo,'CNPJ':CNPJ})
CNPJ_No_S_CHAR = [s.translate(removechars) for s in CNPJ]
df2['CNPJ'] = pd.Series(CNPJ_No_S_CHAR)
print(df2)
except:
print('y')

pandas merge two dataframe to form a multiindex

I'm playing around with Pandas to see if I can do some stock calculation better/faster than with other tools. If I have a single stock it's easy to create daily calculation L
df['mystuff'] = df['Close']+1
If I download more than a ticker it gets complicated:
df = df.stack()
df['mystuff'] = df['Close']+1
df = df.unstack()
If I want to use prevous' day "Close" it gets too complex for me. I thought I might go back to fetch a single ticker, do any operation with iloc[i-1] or something similar (I haven't figured it yet) and then merge the dataframes.
How do I merget two dataframes of single tickers to have a multiindex?
So that:
f1 = web.DataReader('AAPL', 'yahoo', start, end)
f2 = web.DataReader('GOOG', 'yahoo', start, end)
is like
f = web.DataReader(['AAPL','GOOG'], 'yahoo', start, end)
Edit:
This is the nearest thing to f I can create. It's not exactly the same so I'm not sure I can use it instead of f.
f_f = pd.concat(['AAPL':f1,'GOOG':f2},axis=1)
Maybe I should experiment with operations working on a multiindex instead of splitting work on simpler dataframes.
Full Code:
import pandas_datareader.data as web
import pandas as pd
from datetime import datetime
start = datetime(2001, 9, 1)
end = datetime(2019, 8, 31)
a = web.DataReader('AAPL', 'yahoo', start, end)
g = web.DataReader('GOOG', 'yahoo', start, end)
# here are shift/diff calculations that I don't knokw how to do with a multiindex
a_g = web.DataReader(['AAPL','GOOG'], 'yahoo', start, end)
merged = pd.concat({'AAPL':a,'GOOG':g},axis=1)
a_g.to_csv('ag.csv')
merged.to_csv('merged.csv')
import code; code.interact(local=locals())
side note: I don't know how to compare the two csv
This is not exactly the same but it returns Multiindex you can use as in the a_g case
import pandas_datareader.data as web
import pandas as pd
from datetime import datetime
start = datetime(2019, 7, 1)
end = datetime(2019, 8, 31)
out = []
for tick in ["AAPL", "GOOG"]:
d = web.DataReader(tick, 'yahoo', start, end)
cols = [(col, tick) for col in d.columns]
d.columns = pd.MultiIndex\
.from_tuples(cols,
names=['Attributes', 'Symbols'] )
out.append(d)
df = pd.concat(out, axis=1)
Update
In case you want to calculate and add a new column in case you have multiindex columns you can follow this
import pandas_datareader.data as web
import pandas as pd
from datetime import datetime
start = datetime(2019, 7, 1)
end = datetime(2019, 8, 31)
ticks = ['AAPL','GOOG']
df = web.DataReader(ticks, 'yahoo', start, end)
names = list(df.columns.names)
df1 = df["Close"].shift()
cols = [("New", col) for col in df1.columns]
df1.columns = pd.MultiIndex.from_tuples(cols,
names=names)
df = df.join(df1)

vectorization of loop in pandas

I've been trying to vectorize the following with no such luck:
Consider two data frames. One is a list of dates:
cols = ['col1', 'col2']
index = pd.date_range('1/1/15','8/31/18')
df = pd.DataFrame(columns = cols )
What i'm doing currently is looping thru df and getting the counts of all rows that are less than or equal to the date in question with my main (large) dataframe df_main
for x in range(len(index)):
temp_arr = []
active = len(df_main[(df_main.n_date <= index[x])]
temp_arr = [index[x],active]
df= df.append(pd.Series(temp_arr,index=cols) ,ignore_index=True)
Is there a way to vectorize the above?
What about something like the following
#initializing
mycols = ['col1', 'col2']
myindex = pd.date_range('1/1/15','8/31/18')
mydf = pd.DataFrame(columns = mycols )
#create df_main (that has each of myindex's dates minus 10 days)
df_main = pd.DataFrame(data=myindex-pd.Timedelta(days=10), columns=['n_date'])
#wrap a dataframe around a list comprehension
mydf = pd.DataFrame([[x, len(df_main[df_main['n_date'] <= x])] for x in myindex])