This question already has an answer here:
Pandas to_csv call is prepending a comma
(1 answer)
Closed 1 year ago.
I am trying to do math on a specific cell in a csv file with pandas, but whenever I try to do it, it adds a column on the left that looks like
,
1,value,value
2,value,value
3,value,value
4,value,value
5,value,value
So if I run it multiple times it looks like
,
1,1,1,value,value
2,2,2,value,value
3,3,3,value,value
4,4,4,value,value
5,5,5,value,value
How do I make it not do this??
My code is: add_e_trial = equations_reader.at[bank_indexer_addbalance, 1] = reciever_amount
Whenever you do a df.to_csv, include index=False. The left most column is the index of the dataset
df.to_csv(index=False)
Related
This question already has answers here:
What is [:,:-1] in python? [duplicate]
(2 answers)
Python Difference between iloc indexes
(1 answer)
Understanding slicing
(38 answers)
Closed 6 months ago.
X = df_census.iloc[:,:-1]
y = df_census.iloc[:,-1]
what is the meaning of [:,:-1] in this and also [:,-1]
Following your example: [:,:-1]
The first argument is : which means to get all rows of the dataframe, and :-1 means to return all columns but the last one
In the case of just -1, it means to get the last column
I am working in pandas with a certain dataset that describes the population of a certain country per year. The dataset is construed in a weird way wherein the years aren't the columns themselves but rather the years are a value within the first row of the set. The dataset describes every year from 1960 up til now but I only need 1970, 1980, 1990 etc. For this purpose I've created a list with all those years and tried to make a new dataset which is equivalent to the old one but only has the columns that contain a value from said list so I don't have all this extra info I'm not using. Online I can only find instructions for removing rows or selecting by column name, since both these criteria don't apply in this situation I thought i should ask here.
The dataset is a csv file which I've downloaded off some world population site. here a link to a screenshot of the data
As you can see the years are given in scientific notation for some years, which is also how I've added them to my list.
pop = pd.read_csv('./maps/API_SP.POP.TOTL_DS2_en_csv_v2_10576638.csv',
header=None, engine='python', skiprows=4)
display(pop)
years = ['1.970000e+03','1.980000e+03','1.990000e+03','2.000000e+03','2.010000e+03','2.015000e+03', 'Country Name']
pop[pop.columns[pop.isin(years).any()]]
This is one of the things I've tried so far which I thought made the most sense, but I am still very new to pandas so any help would be greatly appreciated.
Using the data at https://data.worldbank.org/indicator/sp.pop.totl, copied into pastebin (first time using the service, so apologies if it doesn't work for some reason):
# actual code using CSV file saved to desktop
#df = pd.read_csv(<path to CSV>, skiprows=4)
# pastebin for reproducibility
df = pd.read_csv(r'https://pastebin.com/raw/LmdGySCf',sep='\t')
# manually select years and other columns of interest
colsX = ['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
'1990', '1995', '2000']
dfX = df[colsX]
# select every fifth year
colsY = df.filter(regex='19|20', axis=1).columns[[int(col) % 5 == 0 for col in df.filter(regex='19|20', axis=1).columns]]
dfY = df[colsY]
As a general comment:
The dataset is construed in a weird way wherein the years aren't the columns themselves but rather the years are a value within the first row of the set.
This is not correct. Viewing the CSV file, it is quite clear that row 5 (Country Name, Country Code, Indicator Name, Indicator Code, 1960, 1961, ...) are indeed column names. You have read the data into pandas in such a way that those values are not column years, but your first step, before trying to subset your data, should be to ensure you have read in the data properly -- which, in this case, would give you column headers named for each year.
This question already has answers here:
Conditional Replace Pandas
(7 answers)
Adding a new pandas column with mapped value from a dictionary [duplicate]
(1 answer)
Closed 3 years ago.
This is my dataset :
Datset example
I want to assign to the different values* such as 'W M I HOLDINGS CORP' and 'MILESTONE SCIENTIFIC INC' another variable in the column 'ticker' in order to have the opportunity to sort them.
In the column 'ticker' I need to add WMIH and WLSS respectively to the two different values.
How can I do that ?
I am going to expect the output with something like this :
Output Example
You should be ok with this. Assuming you have your dataset in df variable.
df.loc[df['comnam'] == 'W M I HOLDINGS CORP', 'ticker'] = 'WMIH'
df.loc[df['comnam'] == 'MILESTONE SCIENTIFIC INC', 'ticker'] = 'WLSS'
This question already has an answer here:
Convert pandas._period.Period type Column names to Lowercase
(1 answer)
Closed 4 years ago.
I have a dataframe where I used
df.groupby(pd.PeriodIndex(df.columns, freq='Q'), axis=1).mean() to combine all column names from month into quarter by taking the mean.
However, the result dataframe has columns like below and I could not change all upper case Q into lower case 'q'.
PeriodIndex(['2000Q1', '2000Q2', '2000Q3', '2000Q4', '2001Q1', '2001Q2',
'2001Q3', '2001Q4', '2002Q1', '2002Q2', '2002Q3', '2002Q4',
'2003Q1', '2003Q2', '2003Q3', '2003Q4', '2004Q1', '2004Q2',
'2004Q3', '2004Q4', '2005Q1', '2005Q2', '2005Q3', '2005Q4',
'2006Q1', '2006Q2', '2006Q3', '2006Q4', '2007Q1', '2007Q2',
'2007Q3', '2007Q4', '2008Q1', '2008Q2', '2008Q3', '2008Q4',
'2009Q1', '2009Q2', '2009Q3', '2009Q4', '2010Q1', '2010Q2',
'2010Q3', '2010Q4', '2011Q1', '2011Q2', '2011Q3', '2011Q4',
'2012Q1', '2012Q2', '2012Q3', '2012Q4', '2013Q1', '2013Q2',
'2013Q3', '2013Q4', '2014Q1', '2014Q2', '2014Q3', '2014Q4',
'2015Q1', '2015Q2', '2015Q3', '2015Q4', '2016Q1', '2016Q2',
'2016Q3'],
dtype='period[Q-DEC]', freq='Q-DEC')
I have tried using df.columns=[x.lower() for x in df.columns] and it gives an
error:'Period' object has no attribute 'lower'
This looks like a duplicate of the issue posted here: Convert pandas._period.Period type Column names to Lowercase
Basically, you'll want to reformat the Period output to have a lowercase q like so:
df.columns = df.columns.strftime('%Yq%q')
Alternatively, if you want to modify your PeriodIndex object directly, you can do something like:
# get the PeriodIndex object you pasted in your question
periods = df.groupby(pd.PeriodIndex(df.columns, freq='Q'), axis=1).mean()
# format the entries accordingly
periods = [p.strftime('%Yq%q') for p in periods]
The %Y denotes the year format, the first q is the lowercase "q" you want, and the %q is the quartile.
Here is the documentation for a Period's strftime() method, which returns the formatted time string. At the bottom they have some nice examples!
Looking at the methods listed in the Pandas documentation, lower() isn't an available method for the Period object, which is why you're getting this error (a PeriodIndex is just an array of Periods, which denote a chunk of time).
I'm trying to perform calculations based on the entries in a pandas dataframe. The dataframe looks something like this:
and it contains 1466 rows. I'll have to run similar calculations on other dfs with more rows later.
What I'm trying to do, is calculate something like mag='(U-V)/('R-I)' (but ignoring any values that are -999), put that in a new column, and then z_pred=10**((mag-c)m) in a new column (mag, c and m are just hard-coded variables). I have other columns I need to add too, but I figure that'll just be an extension of the same method.
I started out by trying
for i in range(1):
current = qso[:]
mag = (U-V)/(R-I)
name = current['NED']
z_pred = 10**((mag - c)/m)
z_meas = current['z']
but I got either a Series for z, which I couldn't operate on, or various type errors when I tried to print the values or write them to a file.
I found this question which gave me a start, but I can't see how to apply it to multiple calculations, as in my situation.
How can I achieve this?
Conditionally adding calculated columns row wise are usually performed with numpy's np.where;
df['mag'] = np.where(~df[['U', 'V', 'R', 'I']].eq(-999).any(1), (df.U - df.V) / (df.R - df.I), -999)
Note; assuming here that when any of the columns contain '-999' it will not be calculated and a '-999' is returned.