Whenever I run this code I get:
The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
What should I do to make the code run with concat?
final_dataframe = pd.DataFrame(columns = my_columns)
for symbol in stocks['Ticker']:
api_url = f'https://sandbox.iexapis.com/stable/stock/{symbol}/quote?token={IEX_CLOUD_API_TOKEN}'
data = requests.get(api_url).json()
final_dataframe = final_dataframe.append(
pd.Series([symbol,
data['latestPrice'],
data['marketCap'],
'N/A'],
index = my_columns),
ignore_index = True)
See this release note
or from another post:
"Append is the specific case(axis=0, join='outer') of concat" link
The changes in your code should be: (changed the pd.Series to variable just for presentation)
s = pd.Series([symbol, data['latestPrice'], data['marketCap'], 'N/A'], index = my_columns)
final_dataframe = pd.concat([final_dataframe, s], ignore_index = True)
Related
I'm currently trying to replace .append in my code since it won't be supported in the future and I have some trouble with the custom index I'm using
I read the names of every .shp files in a directory and extract some date from it
To make the link with an excel file I have, I use the name I extract from the title of the file
df = pd.DataFrame(columns = ['date','fichier'])
for i in glob.glob("*.shp"):
nom_parcelle = i.split("_")[2]
if not nom_parcelle in df.index:
# print(df.last_valid_index())
date_recolte = i.split("_")[-1]
new_row = pd.Series(data={'date':date_recolte.split(".")[0], 'fichier':i}, name = nom_parcelle)
df = df.append(new_row, ignore_index=False)
This works exactly as I want it to be
Sadly, I can't find a way to replace it with .concat
I looked for ways to keep the index whith concat but didn't find anything that worked as I intended
Did I miss anything?
Try the approach below with pandas.concat based on your code :
import glob
import pandas as pd
df = pd.DataFrame(columns = ['date','fichier'])
dico_dfs={}
for i in glob.glob("*.shp"):
nom_parcelle = i.split("_")[2]
if not nom_parcelle in df.index:
# print(df.last_valid_index())
date_recolte = i.split("_")[-1]
new_row = pd.Series(data={'date':date_recolte.split(".")[0], 'fichier':i}, name = nom_parcelle)
dico_dfs[i]= new_row.to_frame()
df= pd.concat(dico_dfs, ignore_index=False, axis=1).T.droplevel(0)
# Output :
print(df)
date fichier
nom1 20220101 a_xx_nom1_20220101.shp
nom2 20220102 b_yy_nom2_20220102.shp
nom3 20220103 c_zz_nom3_20220103.shp
Im trying, with no luck, to method chain pd.to_datetime() through .assign()
This works:
tcap2 = tcap.\
assign(person = tcap['text'].apply(lambda x: x.split(" ", 1)[0]),
date_time = tcap['text'].str.extract(r'\(([^()]+)\)'),
text = tcap['text'].str.split(': ').str[1])
tcap2['date_time'] = pd.to_datetime(tcap2['date_time'])
but I was hoping to have the whole chunk in the same chain like this:
tcap2 = tcap.\
assign(person = tcap['text'].apply(lambda x: x.split(" ", 1)[0]),
date_time = tcap['text'].str.extract(r'\(([^()]+)\)'),
text = tcap['text'].str.split(': ').str[1]).\
assign(date_time = lambda df: pd.to_datetime(tcap['date_time']))
I would be grateful for any advice
Thank you Nipy you are awesome, just a little change there in my lambda function (facepalm)
This worked an absolute treat and just makes the code so much more compact and readable
tcap = tcap.\
assign(person = tcap['text'].apply(lambda x: x.split(" ", 1)[0]),
date_time = tcap['text'].str.extract(r'\(([^()]+)\)'),
text = tcap['text'].str.split(': ').str[1]).\
assign(date_time = lambda tcap: pd.to_datetime(tcap['date_time']))
On a separate note, to avoid the use of '\' and to make chaining like this more readable you can surround the expression with parentheses:
tcap = (tcap
.assign(person=tcap['text'].apply(lambda x: x.split(" ", 1)[0]),
date_time=tcap['text'].str.extract(r'\(([^()]+)\)'),
text=tcap['text'].str.split(': ').str[1])
.assign(date_time = lambda tcap: pd.to_datetime(tcap['date_time']))
)
My initial data frame is like this:
import pandas as pd
df = pd.DataFrame({'serialNo':['aaaa','aaaa','cccc','ffff'],
'Date':['2018-09-15','2018-09-16','2018-09-15','2018-09-19'],
'moduleLocation': ['face','head','stomach','legs'],
'moduleName': ['singing', 'dance','booze', 'vocals'],
'warning': [4402, 3747 ,5555,8754],
'failed':[0,3462,5161,3262]})
I have performed the following functions to clean up the data the first is to make all the datatypes as string:
all_columns = list(df)
df[all_columns] = df[all_columns].astype(str)
This is followed by the function to perform certain concatenations:
def concatenate(diagnostics, field, target):
diagnostics.sort_values(by=['serialNo',field],inplace=True)
diagnostics.drop_duplicates(inplace=True)
diagnostics[target] = \
diagnostics.groupby(['serialNo'], as_index=False)[field].transform(lambda s: ','.join(filter(None, s)))
diagnostics.drop([field],axis=1,inplace=True)
diagnostics.drop_duplicates(inplace=True)
return diagnostics
module = concatenate(df[['serialNo','moduleName']], 'moduleName', 'Module')
Warn = concatenate(df[['serialNo','warning']], 'warning', 'Warn')
Err = concatenate(df[['serialNo','failed']], 'failed', 'Err')
Location = concatenate(df[['serialNo','moduleLocation']], 'moduleLocation', 'Location')
diag_final = pd.merge(module,Warn,on=['serialNo'],how='inner')
diag_final = pd.merge(diag_final,Err,on=['serialNo'],how='inner')
diag_final = pd.merge(diag_final,Location,on=['serialNo'],how='inner')
Now the problem is the Date column no longer exists in my diag_final data frame and I would like to have it. I do not want to make changes to the existing function but just make sure that I have the corresponding Dates. How should I achieve this?
There are likely to be multiple values for each serial number. Hence, you will have to concatenate the values, similar what you are doing for moduleLocation, and moduleName.
dates = concatenate(df[['serialNo','Date']], 'Date', 'Date_cat')
diag_final = pd.merge(diag_final,dates,on=['serialNo'],how='inner')
I got an error when try to use structural.em in "bnlearn" package
This is the code:
cut.learn<- structural.em(cut.df, maximize = "hc",
+ maximize.args = "restart",
+ fit="mle", fit.args = list(),
+ impute = "parents", impute.args = list(), return.all = FALSE,
+ max.iter = 5, debug = FALSE)
Error in check.data(x, allow.levels = TRUE, allow.missing = TRUE,
warn.if.no.missing = TRUE, : at least one variable has no observed
values.
Did anyone have the same problems, please tell me how to fix it.
Thank you.
I got structural.em working. I am currently working on a python interface to bnlearn that I call pybnl. I also ran into the problem you desecribe above.
Here is a jupyter notebook that shows how to use StructuralEM from python marks.
The gist of it is described in slides-bnshort.pdf on page 135, "The MARKS Example, Revisited".
You have to create an inital fit with an inital imputed dataframe by hand and then provide the arguments to structural.em like so (ldmarks is the latent-discrete-marks dataframe where the LAT column only contains missing/NA values):
library(bnlearn)
data('marks')
dmarks = discretize(marks, breaks = 2, method = "interval")
ldmarks = data.frame(dmarks, LAT = factor(rep(NA, nrow(dmarks)), levels = c("A", "B")))
imputed = ldmarks
# Randomly set values of the unobserved variable in the imputed data.frame
imputed$LAT = sample(factor(c("A", "B")), nrow(dmarks2), replace = TRUE)
# Fit the parameters over an empty graph
dag = empty.graph(nodes = names(ldmarks))
fitted = bn.fit(dag, imputed)
# Although we've set imputed values randomly, nonetheless override them with a uniform distribution
fitted$LAT = array(c(0.5, 0.5), dim = 2, dimnames = list(c("A", "B")))
# Use whitelist to enforce arcs from the latent node to all others
r = structural.em(ldmarks, fit = "bayes", impute="bayes-lw", start=fitted, maximize.args=list(whitelist = data.frame(from = "LAT", to = names(dmarks))), return.all = TRUE)
You have to use bnlearn 4.4-20180620 or later, because it fixes a bug in the underlying impute function.
I'm reading one CSV file eliminating the duplicates and exporting to a database.
The problem here is that it is creating a column called level0 instead of resetting the index.
Here is my code
df = pd.read_csv('SampleData.csv', sep=';', encoding='latin1', low_memory=False)
df_projects = df['External'].drop_duplicates()
df_projects = df_projects.to_frame()
df_projects.rename(columns={'External': 'name'}, inplace=True)
df_projects = df_projects.reset_index()
con = create_engine('sqlite:///db.sqlite3')
df_projects.to_sql("inventory_projects", con, index=True, if_exists='replace')
You need add parameter drop=True to reset_index:
...
df_projects = df_projects.rename('name').to_frame()
df_projects = df_projects.reset_index(drop=True)
...