Adding columns from every other row in a pandas dataframe - pandas

In the picture, 1, you can see the start of my data frame. I would like to make two new columns that consist of the values (confirmed_cases and deaths) and to get rid of the 'Type' column. Essentially I want there to be one row of data for each county and to have a confirmed_cases and death column added using the values from the data already. I tried the code below but obviously the length of values does not match the length of index.
Any suggestions?
apidata['Confirmed_Cases'] = apidata['values'].iloc[::2].values
apidata['Deaths'] = apidata['values'].iloc[1::2].values
(Sorry about the link to the photo, I am too new to the site to be able to just include the photo in the post)
Maybe if there's a way to double how many times each value is posted in the new column? So first five deaths would be [5, 5, 26, 26, 0] and then I can just delete every other row?

I ended up figuring it out, by creating a second dataframe that deleted half of the rows (every other one) from the first dataframe and then adding the values from the original dataframe into the new one.
apidata2 = apidata.iloc[::2]
apidata2['Confirmed_Cases'] = apidata['values'].iloc[::2].values
apidata2['Deaths'] = apidata['values'].iloc[1::2].values
apidata2.head()
Finished output

Related

Compile a count of similar rows in a Pandas Dataframe based on multiple column values

I have two Dataframes, one containing my data read in from a CSV file and another that has the data grouped by all of the columns but the last and reindexed to contain a column for the count of the size of the groups.
df_k1 = pd.read_csv(filename, sep=';')
columns_for_groups = list(df_k1.columns)[:-1]
k1_grouped = df_k1.groupby(columns_for_groups).size().reset_index(name="Count")
I need to create a series such that every row(i) in the series corresponds to row(i) in my original Dataframe but the contents of the series need to be the size of the group that the row belongs to in the grouped Dataframe. I currently have this, and it works for my purposes, but I was wondering if anyone knew of a faster or more elegant solution.
size_by_row = []
for row in df_k1.itertuples():
for group in k1_grouped.itertuples():
if row[1:-1] == group[1:-1]:
size_by_row.append(group[-1])
break
group_size = pd.Series(size_by_row)

How to broadcast a list of data into dataframe (Or multiIndex )

I have a big dataframe its about 200k of rows and 3 columns (x, y, z). Some rows doesn't have y,z values and just have x value. I want to make a new column that first set of data with z value be 1,second one be 2,then 3, etc. Or make a multiIndex same format.
Following image shows what I mean
Like this image
I made a new column called "NO." and put zero as initial value. Then
I tried to record the index of where I want the new column get a new value. with following code
df = pd.read_fwf(path, header=None, names=['x','y','z'])
df['NO.']=0
index_NO_changed = df.index[df['z'].isnull()]
Then I loop through it and change the number:
for i in range(len(index_NO_changed)-1):
df['NO.'].iloc[index_NO_changed[i]:index_NO_changed[i+1]]=i+1
df['NO.'].iloc[index_NO_changed[-1]:]=len(index_NO_changed)
But the problem is I get a warning that "
A value is trying to be set on a copy of a slice from a DataFrame
I was wondering
Is there any better way? Is creating multiIndex instead of adding another column easier considering size of dataframe?

pandas dataframe prevent inserting duplicate records

I'm trying to accomplish something quite plain indeed, but my previous searches were unsuccessful.
I have an existing dataframe generated from a multi-sheet spreadsheet.
From each sheet I loaded a number of records, every sheet corresponds to a 'Match' column in the df.
Now i need to insert multiple records OUTSIDE of the spreadsheet, so it's easy to prepare a dict like this :
for m in range(from_match, to_match):
d = {'Minuto': 0, 'Azione': event, 'Giocatore': player, 'Match': m, 'Extra': 1}
The 'Extra' field indicates that these records are NOT loaded from the spreadsheet.
The problem is that a df.append(d, ignore_index=True), always insert the record no matter if it already exists.
That is not the expected result.
The ideal (for me) solution, would be something like this :
for m in range(from_match, to_match):
d = {'Minuto': 0, 'Azione': event, 'Giocatore': player, 'Match': m, 'Extra': 1}
# insert some check here
if record d do not exists:
df = df.append(d, ignore_index=True)
I've played with df.isin, I've seen solution based on merging dataframes, but it seems to me that something like this shouldn'be complicated at all.
Any suggestions ?
Thanks

pandas : Indexing for thousands of rows in dataframe

I initially had 100k rows in my dataset. I read the csv using pandas into a dataframe called data. I tried to do a subset selection of 51 rows using .loc. My index labels are numeric values 0, 1, 2, 3 etc. I tried using this command -
data = data.loc['0':'50']
But the results were weird, it took all the rows from 0 to 49999, looks like it is taking rows till the index value starts with 50.
Similarly, I tried with this command - new_data = data.loc['0':'19']
and the result was all the rows, starting from 0 till 18999.
Could this be a bug in pandas?
You want to use .iloc in place of .loc, since you are selecting data from the dataframe via numeric indices.
For example:
data.iloc[:50,:]
Keep in mind that your indices are of numeric-type, not string-type, so querying with a string (as you have done in your OP) attempts to match string-wise comparisons.

How do I preset the dimensions of my dataframe in pandas?

I am trying to preset the dimensions of my data frame in pandas so that I can have 500 rows by 300 columns. I want to set it before I enter data into the dataframe.
I am working on a project where I need to take a column of data, copy it, shift it one to the right and shift it down by one row.
I am having trouble with the last row being cut off when I shift it down by one row (eg: I started with 23 rows and it remains at 23 rows despite the fact that I shifted down by one and should have 24 rows).
Here is what I have done so far:
bolusCI = pd.DataFrame()
##set index to very high number to accommodate shifting row down by 1
bolusCI = bolus_raw[["Activity (mCi)"]].copy()
activity_copy = bolusCI.shift(1)
activity_copy
pd.concat([bolusCI, activity_copy], axis =1)
Thanks!
There might be a more efficient way to achieve what you are looking to do, but to directly answer your question you could do something like this to init the DataFrame with certain dimensions
pd.DataFrame(columns=range(300),index=range(500))
You just need to define the index and columns in the constructor. The simplest way is to use pandas.RangeIndex. It mimics np.arange and range in syntax. You can also pass a name parameter to name it.
pd.DataFrame
pd.Index
df = pd.DataFrame(
index=pd.RangeIndex(500),
columns=pd.RangeIndex(300)
)
print(df.shape)
(500, 300)