Data with same row name into column - pandas

The data I tried to process are in replicates like this:
before
I want to plot them with error bar, can I plot them as it is?
I was trying to make them look like this, so I can plot them:
after
I tried to use pivot table, but seems like it only works on data with at least two labels
Thank you very much!

Try this (I checked it, and it works also with one label):
import pandas as pd
import numpy as np
t = ['treatment_1', 'treatment_1', 'treatment_1', 'treatment_2', 'treatment_2', 'treatment_2']
n = [10, 12, 13, 20, 22, 23]
df = pd.DataFrame(t, columns=['treatment'])
df['value'] = n
table = df.pivot(columns=['treatment']).replace(np.nan, 0)
table = pd.DataFrame({c: table.loc[table[c] != 0, c].tolist() for c in table})
print(table)

Related

Converting dict to dataframe of Solution point values & plotting

I am trying to plot some results obtained after optimisation using Gurobi.
I have converted the dictionary to python dataframe.
it is 96*1
But now how do I use this dataframe to plot as 1st row-value, 2nd row-value, I am attaching the snapshot of the same.
Please anyone can help me in this?
x={}
for t in time1:
x[t]= [price_energy[t-1]*EnergyResource[174,t].X]
df = pd.DataFrame.from_dict(x, orient='index')
df
You can try pandas.DataFrame(data=x.values()) to properly create a pandas DataFrame while using row numbers as indices.
In the example below, I have generated a (pseudo) random dictionary with 10 values, and stored it as a data frame using pandas.DataFrame giving a name to the only column as xyz. To understand how indexing works, please see Indexing and selecting data.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Create a dictionary 'x'
rng = np.random.default_rng(121)
x = dict(zip(np.arange(10), rng.random((1, 10))[0]))
# Create a dataframe from 'x'
df = pd.DataFrame(x.values(), index=x.keys(), columns=["xyz"])
print(df)
print(df.index)
# Plot the dataframe
plt.plot(df.index, df.xyz)
plt.show()
This prints df as:
xyz
0 0.632816
1 0.297902
2 0.824260
3 0.580722
4 0.593562
5 0.793063
6 0.444513
7 0.386832
8 0.214222
9 0.029993
and gives df.index as:
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')
and also plots the figure:

Add new columns to excel file from multiple datasets with Pandas in Google Colab

I'm trying to add some columns to a excel file after some data but I'm not having good results just overwriting what I have. Let me give you some context: I'm reading a csv, for each column I'm using a for to value_counts and then create a frame from this value_counts here the code for just one column:
import pandas as pd
data= pd.read_csv('responses.csv')
datatoexcel = data['Music'].value_counts().to_frame()
datatoexcel.to_excel('savedataframetocolumns.xlsx') #Name of the file
This works like this ...
And with that code for only one column I have the format that I actually need for excel.
But the problem is when I try to do it with for to all the columns and then "Append" to excel the following dataframes using this formula:
for columnName in df:
datasetstoexcel = df.value_counts(columnName).to_frame()
print(datasetstoexcel)
# Here is my problem with the following line the .to_excel
x.to_excel('quickgraph.xlsx') #I tried more code lines but I'll leave this one as base
The result that I want to reach is this one:
I'm really close to finish this code, some help here please!
How about this?
Sample data
df = pd.DataFrame({
"col1": [1,2,3,4],
"col2": [5,6,7,8],
"col3": [9, 9, 11, 12],
"col4": [13, 14, 15, 16],
})
Find value counts and add to a list
li = []
for i in range(0, len(df)):
value_counts = df.iloc[:, i].value_counts().to_frame().reset_index()
li.append(value_counts)
concat all the dataframes inside li and write to excel
pd.concat(li, axis=1).to_excel("result.xlsx")
Sample output:

Extracting column from Array in python

I am beginner in Python and I am stuck with data which is array of 32763 number, separated by comma. Please find the data here data
I want to convert this into two column 1 from (0:16382) and 2nd column from (2:32763). in the end I want to plot column 1 as x axis and column 2 as Y axis. I tried the following code but I am not able to extract the columns
import numpy as np
import pandas as pd
import matplotlib as plt
data = np.genfromtxt('oscilloscope.txt',delimiter=',')
df = pd.DataFrame(data.flatten())
print(df)
and then I want to write the data in some file let us say data1 in the format as shown in attached pic
It is hard to answer without seeing the format of your data, but you can try
data = np.genfromtxt('oscilloscope.txt',delimiter=',')
print(data.shape) # here we check we got something useful
# this should split data into x,y at position 16381
x = data[:16381]
y = data[16381:]
# now you can create a dataframe and print to file
df = pd.DataFrame({'x':x, 'y':y})
df.to_csv('data1.csv', index=False)
Try this.
#input as dataframe df, its chunk_size, extract output as list. you can mention chunksize what you want.
def split_dataframe(df, chunk_size = 16382):
chunks = list()
num_chunks = len(df) // chunk_size + 1
for i in range(num_chunks):
chunks.append(df[i*chunk_size:(i+1)*chunk_size])
return chunks
or
np.array_split

pandas df columns series

Have dataframe, and I have done some operations with its columns as follows
df1=sample_data.sort_values("Population")
df2=df1[(df1.Population > 500000) & (df1.Population < 1000000)]
df3=df2["Avg check"]*df2["Avg Daily Rides Last Week"]/df2["CAC"]
df4=df2["Avg check"]*df2["Avg Daily Rides Last Week"]
([[df3],[df4]])
If I understand right, then df3 & df4 now are series only, not dataframe. There should be a way to make a new dataframe with these Series and to plot scatter. Please advise. Thanks.
Wanted to add annotate for each and faced the issue
df3=df2["Avg check"]*df2["Avg Daily Rides Last Week"]/df2["CAC"]
df4=df2["Avg check"]*df2["Avg Daily Rides Last Week"]
df5=df2["Population"]
df6=df2["city_id"]
sct=plt.scatter(df5,df4,c=df3, cmap="viridis")
plt.xlabel("Population")
plt.ylabel("Avg check x Avg Daily Rides")
for i, txt in enumerate(df6):
plt.annotate(txt,(df4[i],df5[i]))
plt.colorbar()
plt.show()
I think you can pass both Series to matplotlib.pyplot.scatter:
import matplotlib.pyplot as plt
sc = plt.scatter(df3, df4)
EDIT: Swap df5 and df4 and for select by positions use Series.iat:
for i, txt in enumerate(df6):
plt.annotate(txt,(df5.iat[i],df4.iat[i]))
You can create a DataFrame from Series. Here is how to do it. Simply add both series in a dictionary
author = ['Jitender', 'Purnima', 'Arpit', 'Jyoti']
article = [210, 211, 114, 178]
auth_series = pd.Series(author)
article_series = pd.Series(article)
frame = { 'Author': auth_series, 'Article': article_series }
and then create a DataFrame from that dictionary:
result = pd.DataFrame(frame)
The code is from geeksforgeeks.org

save a named tuple in all rows of a pandas dataframe

I'm trying to save a named tuple n=NamedTuple(value1='x'=, value2='y') in a row of a pandas dataframe.
The problem is that the named tuple is showing a length of 2 because it has 2 parameters in my case (value1 and value2), so it doesn't fit it into a single cell of the dataframe.
How can I achieve that the named tuple is written into every call of a row of a dataframe?
df['columnd1']=n
an example:
from collections import namedtuple
import pandas as pd
n = namedtuple("test", ['param1', 'param2'])
n1 = n(param1='1', param2='2')
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df['nt'] = n1
print(df)
I don't really understand what you're trying to do, but if you want to put that named tuple in every row of a new column (i.e. like a scalar) then you can't rely on broadcasting but should instead replicate it yourself:
df['nt'] = [n1 for _ in range(df.shape[0])]