pandas dataframe export row column value - pandas

I import data from Excel into python pandas with read_clipboard.
import pandas as pd
df = pd.read_clipboard()
The column index are the month (januar, februar, ...,december). The row index are products name (orange, banana, etc). And the value in cells are the monthly sales.
How can I export a csv of the following format
month;product;sales
To make it more visual, I show the input in the first image and how the output shoud be in the second image.

You can also use xlrd package.
Sample Book1.xlsx:
january february march
Orange 4 2 4
banana 2 6 3
apple 5 1 7
sample code:
import xlrd
book = xlrd.open_workbook("Book1.xlsx")
print(book.sheet_names())
first_sheet = book.sheet_by_index(0)
row1 = first_sheet.row_values(0)
print(first_sheet.nrows)
for i in range(len(row1)):
if i !=0:
next_row = first_sheet.row_values(i)
for j in range(len(next_row)-1):
print("{};{};{}".format(row1[i],next_row[0],next_row[j+1]))
Result:
january;Orange;4.0
january;Orange;2.0
january;Orange;4.0
february;banana;2.0
february;banana;6.0
february;banana;3.0
march;apple;5.0
march;apple;1.0
march;apple;7.0

If that is only the case, it might solve that problem:
month = df1.columns.to_list()*3
product = []
sales=[]
for x in range(0,2):
product += [df1.index[x]]*12
sales += df1.iloc[x].values.tolist()
df2 = pd.DataFrame({'month': month, 'product': product, 'sales': sales})
But you need to look for smarter way if you have a larger Dataframe, like what #Jon Clements suggested in the comment.

I finally solved it thanks to your advice : using unstack
df2 = df.transpose()
df3 = df2 =.unstack()
df3.to_csv('my/path/name.csv', sep=';')

Related

Transform Dataframe rows to columns and additional steps

I have a dataframe like this:
import pandas as pd
import numpy as np
data={'UNIT':['UNIT_1','UNIT_2','UNIT_3','UNIT_4'],
'Name_1':[ 'werner', 'otto', 'karl', 'fritz'],
'Name_2':[ 'ottilie', 'anna', 'jasmin', ''],
'Name_3':[ 'bello', 'kitti', '', '']}
df=pd.DataFrame(data)
df.replace('', np.nan, inplace=True)
display(df)
which looks like this:
The result that I want is this:
The code I have so far looks like this:
for index, row in df.iterrows():
row_transposed = row.T
row_transposed.dropna(inplace=True)
df_row_transposed = pd.DataFrame(row_transposed)
df_row_transposed_head = df_row_transposed.head(1)
#display(df_row_transposed)
#display(row_transposed_head)
hr_unit = df_row_transposed_head.iloc[0]
add_unit = (hr_unit[index])
for index, row in df_row_transposed.iterrows():
df_row_transposed["UNIT"] = add_unit
#row_transposed = row_transposed.iloc[index: , :]
display(df_row_transposed)
which already creates this:
but now I am stuck...
Any help is very much appriciated
Try df.melt. It will help you to unstack the column.
ddf = df.melt(id_vars='UNIT').sort_values(by='UNIT')
new_df = ddf[["value","UNIT"]]
new_df.dropna().reset_index(drop=True,inplace=True)
new_df
Out[163]:
value UNIT
0 werner UNIT_1
1 ottilie UNIT_1
2 bello UNIT_1
3 otto UNIT_2
4 anna UNIT_2
5 kitti UNIT_2
6 karl UNIT_3
7 jasmin UNIT_3
8 fritz UNIT_4

Pandas: Newbie question on compare and (re)calculate fields with pandas

What I need to do is to compare 2 fields in a row in a csv-file:
Data looks like this:
store;ean;price;retail_price;quantity
001;0888721396226;200;200;2
001;0888721396233;200;159;2
001;2194384654084;299;259;7
001;2194384654091;199.95;199.95;8
in case that "price" is equal to "retail_price" the field retail_price must be reduced by a given percent-value, e.g. -10%
so in the example data, the first and last line should be changed to 180 and 179,955
I´m completely new to pandas and after reading the "getting started" part I did not find anything that I could set upon ...
so any help or hint (just point me in the direction, I will fiddle it out myself then) is appreciated,
Kind regards!
Use Series.eq for compare both values and if same multiple retail_price by 0.9 else not in numpy.where:
mask = df['price'].eq(df['retail_price'])
df['retail_price'] = np.where(mask, df['retail_price'].mul(0.9), df['retail_price'])
print (df)
store ean price retail_price quantity
0 1 888721396226 200.00 180.000 2
1 1 888721396233 200.00 159.000 2
2 1 2194384654084 299.00 259.000 7
3 1 2194384654091 199.95 179.955 8
Or you can use DataFrame.loc for multiple only matched rows by 0.9:
mask = df['price'].eq(df['retail_price'])
df.loc[mask, 'retail_price'] *= 0.9
#working like
df.loc[mask, 'retail_price'] = df.loc[mask, 'retail_price'] * 0.9
EDIT: for filter rows not matched mask (with Falses in mask) use:
df2 = df[~mask].copy()
print (df2)
store ean price retail_price quantity
1 1 888721396233 200.0 159.0 2
2 1 2194384654084 299.0 259.0 7
print (mask)
0 True
1 False
2 False
3 True
dtype: bool
This ist my code:
import pandas as pd
import numpy as np
import sys
with open('prozente.txt', 'r') as f: #create multiplicator from static value in File "prozente.txt"
prozente = int(f.readline())
mulvalue = 1-(prozente/100)
df = pd.read_csv('1.csv', sep=';', header=1, names=['store','ean','price','retail_price','quantity'])
mask = df['price'].eq(df['retail_price'])
df['retail_price'] = np.where(mask, df['retail_price'].mul(mulvalue).round(2), df['retail_price'])
df2 = df[~mask].copy()
df.to_csv('output.csv', columns=['store','ean','price','retail_price','quantity'],sep=';', index=False)
print(df)
print(df2)
using this as 1.csv:
store;ean;price;retail_price;quantity
001;0888721396226;200;200;2
001;0888721396233;200;159;2
001;2194384654084;299;259;7
001;2194384654091;199.95;199.95;8
The content of file "prozente.txt" is
25

Reshape Pandas dataframe (partial transpose)

I have a csv similar to the following, where the column heading specifies the time (hour number):
Day,Location,1,2,3
1/1/2021,A,0.26,0.25,0.49
1/1/2021,B,0.8,0.23,0.55
1/1/2021,C,0.32,0.11,0.58
1/2/2021,A,0.67,0.72,0.49
1/2/2021,B,0.25,0.09,0.56
1/2/2021,C,0.83,0.54,0.7
When I load it as a dataframe using
df = pd.read_csv(open('VirusLevels.csv', 'r'), index_col=[0,1], header=0)
Pandas creates a dataframe with indices Day and Location, and column names 1, 2, and 3.
I need it to be reshaped as shown below, where Day and Time are the indices, and the Location is the column heading:
I've tried a lot of things and followed a lot of rabbitholes, but haven't been successful. The most on-point example I could find suggested something like the following, but it doesn't work (says "KeyError: 'Day'").
df.melt(id_vars=['Day'], var_name= 'Time',
value_name = 'VirusLevels').sort_values(by='Location').reset_index(drop=True)
Thanks in advance for any help.
Try:
df = pd.read_csv('VirusLevels.csv', index_col=[0,1])
df.rename_axis(columns='Time').stack().unstack('Location')
# or
# df.rename_axis('Time',axis='columns').stack().unstack('Location')
Output:
Location A B C
Day Time
1/1/2021 1 0.345307 0.099403 0.474077
2 0.299947 0.853091 0.352472
3 0.400975 0.599249 0.743099
1/2/2021 1 0.660258 0.003976 0.295406
2 0.425434 0.953433 0.418783
3 0.421021 0.844761 0.369561

monthly frequency time series data frame, fill NaNs with specific values

How do I pass values to months from April to September.
I would like the April value equals to 42000, May=41000, June=61200, July=71000,August=71000
df.index
RangeIndex(start=0, stop=60, step=1)
For a mapping like this, you would typically define a dictionary and map the values. Use .split to get the month part of the date and fillna to fill only the missing values.
Data:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date': ['2018-Jan', '2018-Feb', '2018-Mar', '2018-Apr', '2018-May',
'2018-Jun', '2018-Jul', '2018-Aug', '2018-Sep'],
'Value': [75267.169, 42258.868, 43793]+[np.NaN]*6})
Code:
d = {'Apr': 42000, 'May': 41000, 'Jun': 61200, 'Jul': 71000, 'Aug': 71000}
df['Value'] = df.Value.fillna(df.Date.str.split('-').str[1].map(d))
Output:
Date Value
0 2018-Jan 75267.169
1 2018-Feb 42258.868
2 2018-Mar 43793.000
3 2018-Apr 42000.000
4 2018-May 41000.000
5 2018-Jun 61200.000
6 2018-Jul 71000.000
7 2018-Aug 71000.000
8 2018-Sep NaN
super simple and ugly way to do it using pd.DataFrame.iloc
to_fill = [42000,41000,61200,71000,71000]
df.iloc[54:59,1] = to_fill

how to extract the unique values and its count of a column and store in data frame with index key

I am new to pandas.I have a simple question:
how to extract the unique values and its count of a column and store in data frame with index key
I have tried to:
df = df1['Genre'].value_counts()
and I am getting a series but I don't know how to convert it to data frame object.
Pandas series has a .to_frame() function. Try it:
df = df1['Genre'].value_counts().to_frame()
And if you wanna "switch" the rows to columns:
df = df1['Genre'].value_counts().to_frame().T
Update: Full example if you want them as columns:
import pandas as pd
import numpy as np
np.random.seed(400) # To reproduce random variables
df1 = pd.DataFrame({
'Genre': np.random.choice(['Comedy','Drama','Thriller'], size=10)
})
df = df1['Genre'].value_counts().to_frame().T
print(df)
Returns:
Thriller Comedy Drama
Genre 5 3 2
try
df = pd.DataFrame(df1['Genre'].value_counts())