I think my code works well.
But the problem is that my code does not leave every answer on DataFrame R.
When I print R, only the last answer appeared.
What should I do to display every answer?
I want to add answer on the next column.
import numpy as np
import pandas as pd
DATA = pd.DataFrame()
DATA = pd.read_excel('C:\gskim\P4DS/Monthly Stock adjClose2.xlsx')
DATA = DATA.set_index("Date")
DATA1 = np.log(DATA/DATA.shift(1))
DATA2 = DATA1.drop(DATA1.index[0])*100
F = pd.DataFrame(index = DATA2.index)
for i in range (0, 276):
Q = DATA2.iloc[i].dropna()
W = sorted(abs(Q), reverse = False)
W_qcut = pd.qcut(W, 5, labels = ['A', 'B', 'C', 'D', 'E'])
F = Q.groupby(W_qcut).sum()
R = pd.DataFrame(F)
print(R)
the first table is the current result, I want to fill every blank tables on the second table as a result:
Related
This must be very easy, but I cannot get a plot of the last/any row of a dataframe.
A = data.frame(a = rnorm(50), b = rnorm(50), c = rnorm(50))
barplot(A[nrow(A),1:3])
I get the error message:
Error in barplot.default(A[nrow(A), 1:3]) :
'height' must be a vector or a matrix
A solution using ggplot would be very welcome!
imported ggplot2 library and the dataset you gave me. used the tail command to get only the last row. Then had to melt() the data to get it into the right format, then plotted in ggplot2
library(ggplot2)
library(reshap2)
A = data.frame(a = rnorm(50), b = rnorm(50), c = rnorm(50))
A_tail <- tail(A, 1)
tailmelt <- melt(A_tail)
ggplot(data = tailmelt, aes( x = factor(variable), y = value, fill = variable ) ) +
geom_bar( stat = 'identity' )
I'm iterating through PDF's to obtain the text entered in the form fields. When I send the rows to a csv file it only exports the last row. When I print results from the Dataframe, all the row indexes are 0's. I have tried various solutions from stackoverflow, but I can't get anything to work, what should be 0, 1, 2, 3...etc. are coming in as 0, 0, 0, 0...etc.
Here is what I get when printing results, only the last row exports to csv file:
0
0 1938282828
0
0 1938282828
0
0 22222222
infile = glob.glob('./*.pdf')
for i in infile:
if i.endswith('.pdf'):
pdreader = PdfFileReader(open(i,'rb'))
diction = pdreader.getFormTextFields()
myfieldvalue2 = str(diction['ID'])
df = pd.DataFrame([myfieldvalue2])
print(df)`
Thank you for any help!
You are replacing the same dataframe each time:
infile = glob.glob('./*.pdf')
for i in infile:
if i.endswith('.pdf'):
pdreader = PdfFileReader(open(i,'rb'))
diction = pdreader.getFormTextFields()
myfieldvalue2 = str(diction['ID'])
df = pd.DataFrame([myfieldvalue2]) # this creates new df each time
print(df)
Correct Code:
infile = glob.glob('./*.pdf')
df = pd.DataFrame()
for i in infile:
if i.endswith('.pdf'):
pdreader = PdfFileReader(open(i,'rb'))
diction = pdreader.getFormTextFields()
myfieldvalue2 = str(diction['ID'])
df = df.append([myfieldvalue2])
print(df)
The code bellow gets me in a loop and prints a table of data from a website.
How can i get the data from this 'output' table into a new orgaized table?
I need to get the 'Códigos de Negociação' and the 'CNPJ' into this new table.
This is a sample of the scraped table
0 1
0 Nome de Pregão FII ALIANZA
1 Códigos de Negociação ALZR11
2 CNPJ 28.737.771/0001-85
3 Classificação Setorial Financeiro e Outros/Fundos/Fundos Imobiliários
4 Site www.btgpactual.com
This is the code
import pandas as pd
list = pd.read_html('http://bvmf.bmfbovespa.com.br/Fundos-Listados/FundosListados.aspx?tipoFundo=imobiliario&Idioma=pt-br')[0]
Tickers = list['Código'].tolist()
removechars = str.maketrans('', '', './-')
for i in Tickers:
try:
df = pd.read_html("http://bvmf.bmfbovespa.com.br/Fundos-Listados/FundosListadosDetalhe.aspx?Sigla="+i+"&tipoFundo=Imobiliario&aba=abaPrincipal&idioma=pt-br")[0]
print(df)
except:
print('y')
And i would like to apply the removechars in the CNPJ, to clear it from dots, bars and dashes.
Expected result:
Código CNPJ
0 ALZR11 28737771000185
This code worked for me
import pandas as pd
list = pd.read_html('http://bvmf.bmfbovespa.com.br/Fundos-Listados/FundosListados.aspx?tipoFundo=imobiliario&Idioma=pt-br')[0]
Tickers = list['Código'].tolist()
print(list)
CNPJ = []
Codigo = []
removechars = str.maketrans('', '', './-')
for i in Tickers:
try:
df = pd.read_html("http://bvmf.bmfbovespa.com.br/Fundos-Listados/FundosListadosDetalhe.aspx?Sigla="+i+"&tipoFundo=Imobiliario&aba=abaPrincipal&idioma=pt-br")[0]
print(df)
Codigo.append(df.at[1, 1])
CNPJ.append(df.at[2,1])
df2 = pd.DataFrame({'Codigo':Codigo,'CNPJ':CNPJ})
CNPJ_No_S_CHAR = [s.translate(removechars) for s in CNPJ]
df2['CNPJ'] = pd.Series(CNPJ_No_S_CHAR)
print(df2)
except:
print('y')
I'm playing around with Pandas to see if I can do some stock calculation better/faster than with other tools. If I have a single stock it's easy to create daily calculation L
df['mystuff'] = df['Close']+1
If I download more than a ticker it gets complicated:
df = df.stack()
df['mystuff'] = df['Close']+1
df = df.unstack()
If I want to use prevous' day "Close" it gets too complex for me. I thought I might go back to fetch a single ticker, do any operation with iloc[i-1] or something similar (I haven't figured it yet) and then merge the dataframes.
How do I merget two dataframes of single tickers to have a multiindex?
So that:
f1 = web.DataReader('AAPL', 'yahoo', start, end)
f2 = web.DataReader('GOOG', 'yahoo', start, end)
is like
f = web.DataReader(['AAPL','GOOG'], 'yahoo', start, end)
Edit:
This is the nearest thing to f I can create. It's not exactly the same so I'm not sure I can use it instead of f.
f_f = pd.concat(['AAPL':f1,'GOOG':f2},axis=1)
Maybe I should experiment with operations working on a multiindex instead of splitting work on simpler dataframes.
Full Code:
import pandas_datareader.data as web
import pandas as pd
from datetime import datetime
start = datetime(2001, 9, 1)
end = datetime(2019, 8, 31)
a = web.DataReader('AAPL', 'yahoo', start, end)
g = web.DataReader('GOOG', 'yahoo', start, end)
# here are shift/diff calculations that I don't knokw how to do with a multiindex
a_g = web.DataReader(['AAPL','GOOG'], 'yahoo', start, end)
merged = pd.concat({'AAPL':a,'GOOG':g},axis=1)
a_g.to_csv('ag.csv')
merged.to_csv('merged.csv')
import code; code.interact(local=locals())
side note: I don't know how to compare the two csv
This is not exactly the same but it returns Multiindex you can use as in the a_g case
import pandas_datareader.data as web
import pandas as pd
from datetime import datetime
start = datetime(2019, 7, 1)
end = datetime(2019, 8, 31)
out = []
for tick in ["AAPL", "GOOG"]:
d = web.DataReader(tick, 'yahoo', start, end)
cols = [(col, tick) for col in d.columns]
d.columns = pd.MultiIndex\
.from_tuples(cols,
names=['Attributes', 'Symbols'] )
out.append(d)
df = pd.concat(out, axis=1)
Update
In case you want to calculate and add a new column in case you have multiindex columns you can follow this
import pandas_datareader.data as web
import pandas as pd
from datetime import datetime
start = datetime(2019, 7, 1)
end = datetime(2019, 8, 31)
ticks = ['AAPL','GOOG']
df = web.DataReader(ticks, 'yahoo', start, end)
names = list(df.columns.names)
df1 = df["Close"].shift()
cols = [("New", col) for col in df1.columns]
df1.columns = pd.MultiIndex.from_tuples(cols,
names=names)
df = df.join(df1)
I've been trying to vectorize the following with no such luck:
Consider two data frames. One is a list of dates:
cols = ['col1', 'col2']
index = pd.date_range('1/1/15','8/31/18')
df = pd.DataFrame(columns = cols )
What i'm doing currently is looping thru df and getting the counts of all rows that are less than or equal to the date in question with my main (large) dataframe df_main
for x in range(len(index)):
temp_arr = []
active = len(df_main[(df_main.n_date <= index[x])]
temp_arr = [index[x],active]
df= df.append(pd.Series(temp_arr,index=cols) ,ignore_index=True)
Is there a way to vectorize the above?
What about something like the following
#initializing
mycols = ['col1', 'col2']
myindex = pd.date_range('1/1/15','8/31/18')
mydf = pd.DataFrame(columns = mycols )
#create df_main (that has each of myindex's dates minus 10 days)
df_main = pd.DataFrame(data=myindex-pd.Timedelta(days=10), columns=['n_date'])
#wrap a dataframe around a list comprehension
mydf = pd.DataFrame([[x, len(df_main[df_main['n_date'] <= x])] for x in myindex])