pandas dataframe export row column value

pandas dataframe export row column value - pandas

I import data from Excel into python pandas with read_clipboard.
import pandas as pd
df = pd.read_clipboard()
The column index are the month (januar, februar, ...,december). The row index are products name (orange, banana, etc). And the value in cells are the monthly sales.
How can I export a csv of the following format
month;product;sales
To make it more visual, I show the input in the first image and how the output shoud be in the second image.

You can also use xlrd package.
Sample Book1.xlsx:
january february march
Orange 4 2 4
banana 2 6 3
apple 5 1 7
sample code:
import xlrd
book = xlrd.open_workbook("Book1.xlsx")
print(book.sheet_names())
first_sheet = book.sheet_by_index(0)
row1 = first_sheet.row_values(0)
print(first_sheet.nrows)
for i in range(len(row1)):
if i !=0:
next_row = first_sheet.row_values(i)
for j in range(len(next_row)-1):
print("{};{};{}".format(row1[i],next_row[0],next_row[j+1]))
Result:
january;Orange;4.0
january;Orange;2.0
january;Orange;4.0
february;banana;2.0
february;banana;6.0
february;banana;3.0
march;apple;5.0
march;apple;1.0
march;apple;7.0

If that is only the case, it might solve that problem:
month = df1.columns.to_list()*3
product = []
sales=[]
for x in range(0,2):
product += [df1.index[x]]*12
sales += df1.iloc[x].values.tolist()
df2 = pd.DataFrame({'month': month, 'product': product, 'sales': sales})
But you need to look for smarter way if you have a larger Dataframe, like what #Jon Clements suggested in the comment.

I finally solved it thanks to your advice : using unstack
df2 = df.transpose()
df3 = df2 =.unstack()
df3.to_csv('my/path/name.csv', sep=';')

Related

Transform Dataframe rows to columns and additional steps

I have a dataframe like this:
import pandas as pd
import numpy as np
data={'UNIT':['UNIT_1','UNIT_2','UNIT_3','UNIT_4'],
'Name_1':[ 'werner', 'otto', 'karl', 'fritz'],
'Name_2':[ 'ottilie', 'anna', 'jasmin', ''],
'Name_3':[ 'bello', 'kitti', '', '']}
df=pd.DataFrame(data)
df.replace('', np.nan, inplace=True)
display(df)
which looks like this:
The result that I want is this:
The code I have so far looks like this:
for index, row in df.iterrows():
row_transposed = row.T
row_transposed.dropna(inplace=True)
df_row_transposed = pd.DataFrame(row_transposed)
df_row_transposed_head = df_row_transposed.head(1)
#display(df_row_transposed)
#display(row_transposed_head)
hr_unit = df_row_transposed_head.iloc[0]
add_unit = (hr_unit[index])
for index, row in df_row_transposed.iterrows():
df_row_transposed["UNIT"] = add_unit
#row_transposed = row_transposed.iloc[index: , :]
display(df_row_transposed)
which already creates this:
but now I am stuck...
Any help is very much appriciated

Try df.melt. It will help you to unstack the column.
ddf = df.melt(id_vars='UNIT').sort_values(by='UNIT')
new_df = ddf[["value","UNIT"]]
new_df.dropna().reset_index(drop=True,inplace=True)
new_df
Out[163]:
value UNIT
0 werner UNIT_1
1 ottilie UNIT_1
2 bello UNIT_1
3 otto UNIT_2
4 anna UNIT_2
5 kitti UNIT_2
6 karl UNIT_3
7 jasmin UNIT_3
8 fritz UNIT_4

Pandas: Newbie question on compare and (re)calculate fields with pandas

What I need to do is to compare 2 fields in a row in a csv-file:
Data looks like this:
store;ean;price;retail_price;quantity
001;0888721396226;200;200;2
001;0888721396233;200;159;2
001;2194384654084;299;259;7
001;2194384654091;199.95;199.95;8
in case that "price" is equal to "retail_price" the field retail_price must be reduced by a given percent-value, e.g. -10%
so in the example data, the first and last line should be changed to 180 and 179,955
I´m completely new to pandas and after reading the "getting started" part I did not find anything that I could set upon ...
so any help or hint (just point me in the direction, I will fiddle it out myself then) is appreciated,
Kind regards!

Use Series.eq for compare both values and if same multiple retail_price by 0.9 else not in numpy.where:
mask = df['price'].eq(df['retail_price'])
df['retail_price'] = np.where(mask, df['retail_price'].mul(0.9), df['retail_price'])
print (df)
store ean price retail_price quantity
0 1 888721396226 200.00 180.000 2
1 1 888721396233 200.00 159.000 2
2 1 2194384654084 299.00 259.000 7
3 1 2194384654091 199.95 179.955 8
Or you can use DataFrame.loc for multiple only matched rows by 0.9:
mask = df['price'].eq(df['retail_price'])
df.loc[mask, 'retail_price'] *= 0.9
#working like
df.loc[mask, 'retail_price'] = df.loc[mask, 'retail_price'] * 0.9
EDIT: for filter rows not matched mask (with Falses in mask) use:
df2 = df[~mask].copy()
print (df2)
store ean price retail_price quantity
1 1 888721396233 200.0 159.0 2
2 1 2194384654084 299.0 259.0 7
print (mask)
0 True
1 False
2 False
3 True
dtype: bool

This ist my code:
import pandas as pd
import numpy as np
import sys
with open('prozente.txt', 'r') as f: #create multiplicator from static value in File "prozente.txt"
prozente = int(f.readline())
mulvalue = 1-(prozente/100)
df = pd.read_csv('1.csv', sep=';', header=1, names=['store','ean','price','retail_price','quantity'])
mask = df['price'].eq(df['retail_price'])
df['retail_price'] = np.where(mask, df['retail_price'].mul(mulvalue).round(2), df['retail_price'])
df2 = df[~mask].copy()
df.to_csv('output.csv', columns=['store','ean','price','retail_price','quantity'],sep=';', index=False)
print(df)
print(df2)
using this as 1.csv:
store;ean;price;retail_price;quantity
001;0888721396226;200;200;2
001;0888721396233;200;159;2
001;2194384654084;299;259;7
001;2194384654091;199.95;199.95;8
The content of file "prozente.txt" is
25

Reshape Pandas dataframe (partial transpose)

I have a csv similar to the following, where the column heading specifies the time (hour number):
Day,Location,1,2,3
1/1/2021,A,0.26,0.25,0.49
1/1/2021,B,0.8,0.23,0.55
1/1/2021,C,0.32,0.11,0.58
1/2/2021,A,0.67,0.72,0.49
1/2/2021,B,0.25,0.09,0.56
1/2/2021,C,0.83,0.54,0.7
When I load it as a dataframe using
df = pd.read_csv(open('VirusLevels.csv', 'r'), index_col=[0,1], header=0)
Pandas creates a dataframe with indices Day and Location, and column names 1, 2, and 3.
I need it to be reshaped as shown below, where Day and Time are the indices, and the Location is the column heading:
I've tried a lot of things and followed a lot of rabbitholes, but haven't been successful. The most on-point example I could find suggested something like the following, but it doesn't work (says "KeyError: 'Day'").
df.melt(id_vars=['Day'], var_name= 'Time',
value_name = 'VirusLevels').sort_values(by='Location').reset_index(drop=True)
Thanks in advance for any help.

Try:
df = pd.read_csv('VirusLevels.csv', index_col=[0,1])
df.rename_axis(columns='Time').stack().unstack('Location')
# or
# df.rename_axis('Time',axis='columns').stack().unstack('Location')
Output:
Location A B C
Day Time
1/1/2021 1 0.345307 0.099403 0.474077
2 0.299947 0.853091 0.352472
3 0.400975 0.599249 0.743099
1/2/2021 1 0.660258 0.003976 0.295406
2 0.425434 0.953433 0.418783
3 0.421021 0.844761 0.369561

monthly frequency time series data frame, fill NaNs with specific values

How do I pass values to months from April to September.
I would like the April value equals to 42000, May=41000, June=61200, July=71000,August=71000
df.index
RangeIndex(start=0, stop=60, step=1)

For a mapping like this, you would typically define a dictionary and map the values. Use .split to get the month part of the date and fillna to fill only the missing values.
Data:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date': ['2018-Jan', '2018-Feb', '2018-Mar', '2018-Apr', '2018-May',
'2018-Jun', '2018-Jul', '2018-Aug', '2018-Sep'],
'Value': [75267.169, 42258.868, 43793]+[np.NaN]*6})
Code:
d = {'Apr': 42000, 'May': 41000, 'Jun': 61200, 'Jul': 71000, 'Aug': 71000}
df['Value'] = df.Value.fillna(df.Date.str.split('-').str[1].map(d))
Output:
Date Value
0 2018-Jan 75267.169
1 2018-Feb 42258.868
2 2018-Mar 43793.000
3 2018-Apr 42000.000
4 2018-May 41000.000
5 2018-Jun 61200.000
6 2018-Jul 71000.000
7 2018-Aug 71000.000
8 2018-Sep NaN

super simple and ugly way to do it using pd.DataFrame.iloc
to_fill = [42000,41000,61200,71000,71000]
df.iloc[54:59,1] = to_fill

how to extract the unique values and its count of a column and store in data frame with index key

I am new to pandas.I have a simple question:
how to extract the unique values and its count of a column and store in data frame with index key
I have tried to:
df = df1['Genre'].value_counts()
and I am getting a series but I don't know how to convert it to data frame object.

Pandas series has a .to_frame() function. Try it:
df = df1['Genre'].value_counts().to_frame()
And if you wanna "switch" the rows to columns:
df = df1['Genre'].value_counts().to_frame().T
Update: Full example if you want them as columns:
import pandas as pd
import numpy as np
np.random.seed(400) # To reproduce random variables
df1 = pd.DataFrame({
'Genre': np.random.choice(['Comedy','Drama','Thriller'], size=10)
})
df = df1['Genre'].value_counts().to_frame().T
print(df)
Returns:
Thriller Comedy Drama
Genre 5 3 2

try
df = pd.DataFrame(df1['Genre'].value_counts())

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

pandas dataframe export row column value - pandas

I finally solved it thanks to your advice : using unstack df2 = df.transpose() df3 = df2 =.unstack() df3.to_csv('my/path/name.csv', sep=';')

Related

Transform Dataframe rows to columns and additional steps

Pandas: Newbie question on compare and (re)calculate fields with pandas

Reshape Pandas dataframe (partial transpose)

monthly frequency time series data frame, fill NaNs with specific values

how to extract the unique values and its count of a column and store in data frame with index key

Categories

Resources