Increment or reset counter based on an existing value of a data frame column in Pandas

Increment or reset counter based on an existing value of a data frame column in Pandas - pandas

I have a dataframe imported from csv file along the lines of the below:
Value Counter
1. 5 0
2. 15 1
3. 15 2
4. 15 3
5. 10 0
6. 15 1
7. 15 1
I want to increment the value of counter only if the value= 15 else reset it to 0. I tried cumsum but stuck how to reset it back to zero of nonmatch
Here is my code
import pandas as pd
import csv
import numpy as np
dfs = []
df = pd.read_csv("H:/test/test.csv")
df["Counted"] = (df["Value"] == 15).cumsum()
dfs.append(df)
big_frame = pd.concat(dfs, sort=True, ignore_index=False)
big_frame.to_csv('H:/test/List.csv' , index=False)
Thanks for your help

Here's my approach:
s = df.Value.ne(15)
df['Counter'] = (~s).groupby(s.cumsum()).cumsum()

Related

Pandas: Newbie question on compare and (re)calculate fields with pandas

What I need to do is to compare 2 fields in a row in a csv-file:
Data looks like this:
store;ean;price;retail_price;quantity
001;0888721396226;200;200;2
001;0888721396233;200;159;2
001;2194384654084;299;259;7
001;2194384654091;199.95;199.95;8
in case that "price" is equal to "retail_price" the field retail_price must be reduced by a given percent-value, e.g. -10%
so in the example data, the first and last line should be changed to 180 and 179,955
I´m completely new to pandas and after reading the "getting started" part I did not find anything that I could set upon ...
so any help or hint (just point me in the direction, I will fiddle it out myself then) is appreciated,
Kind regards!

Use Series.eq for compare both values and if same multiple retail_price by 0.9 else not in numpy.where:
mask = df['price'].eq(df['retail_price'])
df['retail_price'] = np.where(mask, df['retail_price'].mul(0.9), df['retail_price'])
print (df)
store ean price retail_price quantity
0 1 888721396226 200.00 180.000 2
1 1 888721396233 200.00 159.000 2
2 1 2194384654084 299.00 259.000 7
3 1 2194384654091 199.95 179.955 8
Or you can use DataFrame.loc for multiple only matched rows by 0.9:
mask = df['price'].eq(df['retail_price'])
df.loc[mask, 'retail_price'] *= 0.9
#working like
df.loc[mask, 'retail_price'] = df.loc[mask, 'retail_price'] * 0.9
EDIT: for filter rows not matched mask (with Falses in mask) use:
df2 = df[~mask].copy()
print (df2)
store ean price retail_price quantity
1 1 888721396233 200.0 159.0 2
2 1 2194384654084 299.0 259.0 7
print (mask)
0 True
1 False
2 False
3 True
dtype: bool

This ist my code:
import pandas as pd
import numpy as np
import sys
with open('prozente.txt', 'r') as f: #create multiplicator from static value in File "prozente.txt"
prozente = int(f.readline())
mulvalue = 1-(prozente/100)
df = pd.read_csv('1.csv', sep=';', header=1, names=['store','ean','price','retail_price','quantity'])
mask = df['price'].eq(df['retail_price'])
df['retail_price'] = np.where(mask, df['retail_price'].mul(mulvalue).round(2), df['retail_price'])
df2 = df[~mask].copy()
df.to_csv('output.csv', columns=['store','ean','price','retail_price','quantity'],sep=';', index=False)
print(df)
print(df2)
using this as 1.csv:
store;ean;price;retail_price;quantity
001;0888721396226;200;200;2
001;0888721396233;200;159;2
001;2194384654084;299;259;7
001;2194384654091;199.95;199.95;8
The content of file "prozente.txt" is
25

Reading CSV and import column data as an numpy array

I have many csv file all contains two column. One is 'Energy' and another is 'Count'. My target is to import those data and keep them as a numpy array separately. Let's say X and Y will be two numpy array where X have all Energy and Y have all count data. But the problem is in my csv file i have a blank row after each data that seems making a lot of trouble. How can I eliminate those lines and save data as an array?
Energy Counts
-0.4767 0
-0.4717 0
-0.4667 0
-0.4617 0
-0.4567 0
-0.4517 0
import pandas as pd
import glob
import numpy as np
import os
import matplotlib.pyplot as plt
file_path = "path" ###file path
read_files = glob.glob(os.path.join(file_path,"*.csv")) ###get all files
X = [] ##create empty list
Y = [] ##create empty list
for files in read_files:
df = pd.read_csv(files,header=[0])
X.append(['Energy'])##store X data
Y.append(['Counts'])##store y data
X=np.array(X)
Y=np.array(Y)
print(X.shape)
print(Y.shape)
plt.plot(X[50],Y[50])
plt.show()
Ideally if I can save all data correctly, I suppose to get my plot but as data is not saving correctly, I am not getting any plot.

Set the skip_blank_lines parameter to True and these lines won't be read into the dataframe:
df = pd.read_csv(files, header=[0], skip_blank_lines=True)
So your whole program should be something like this (each file has the same column headers in the first line and the columns are separated by spaces):
...
df = pd.DataFrame()
for file in read_files:
df = df.append(pd.read_csv(file, sep='\s+', skip_blank_lines=True))
df.plot(x='Energy', y='Counts')
df.show()
# save both columns in one file
df.to_csv('myXYFile.csv', index=False)
# or two files with one column each
df.Energy.to_csv('myXFile.csv', index=False)
df.Counts.to_csv('myYFile.csv', index=False)
TEST PROGRAM
import pandas as pd
import io
input1="""Energy Counts
-0.4767 0
-0.4717 0
-0.4667 0
-0.4617 0
-0.4567 0
-0.4517 0
"""
input2="""Energy Counts
-0.4767 0
-0.4717 0
"""
df = pd.DataFrame()
for input in (input1,input2):
df = df.append(pd.read_csv(io.StringIO(input), sep='\s+', skip_blank_lines=True))
print(df)
TEST OUTPUT:
Energy Counts
0 -0.4767 0
1 -0.4717 0
2 -0.4667 0
3 -0.4617 0
4 -0.4567 0
5 -0.4517 0
0 -0.4767 0
1 -0.4717 0

Counter calling in pandas?

I want to call counter values inside pandas.
Effort so far:
from __future__ import unicode_literals
import spacy,en_core_web_sm
from collections import Counter
import pandas as pd
nlp = en_core_web_sm.load()
c = Counter(([token.pos_ for token in nlp('The cat sat on the mat.')]))
sbase = sum(c.values())
for el, cnt in c.items():
el, '{0:2.2f}%'.format((100.0* cnt)/sbase)
df = pd.DataFrame.from_dict(c, orient='index').reset_index()
print df
Current Output:
index 0
0 NOUN 2
1 VERB 1
2 DET 2
3 ADP 1
4 PUNCT 1
Expected Output:
The below inside dataframe:
(u'NOUN', u'28.57%')
(u'VERB', u'14.29%')
(u'DET', u'28.57%')
(u'ADP', u'14.29%')
(u'PUNCT', u'14.29%')
I want to call el and cnt inside the data frame how?
It was a folow up question wherein i wanted to get percentage of POS distribution listed.
Percentage Count Verb, Noun using Spacy?
I was of understanding I need to put group el and cnt in place of c below:
df = pd.DataFrame.from_dict(c, orient='index').reset_index()

I can only fix your out put since I do not have the original data
(df['0']/df['0'].sum()).map("{0:.2%}".format)
Out[827]:
0 28.57%
1 14.29%
2 28.57%
3 14.29%
4 14.29%
Name: 0, dtype: object

Reformatting pandas table when column contains repeated headers

I have the pandas DataFrame below and I want to sort it such that the ["File Name", "File Start Time", etc] are column headers. I can imagine running a loop through the rows looking for strings, but perhaps there is a simpler option for this?
import pandas as pd
data = pd.read_csv(file_path + 'chb01-summary.txt',skiprows = 28, header=None, delimiter = ": ")
file source https://www.physionet.org/pn6/chbmit/chb01/chb01-summary.txt

You can use read_csv and reshape by unstack:
url = 'https://www.physionet.org/pn6/chbmit/chb01/chb01-summary.txt'
df = pd.read_csv(url, skiprows=28, sep=':\s+', names=['a','b'], engine='python')
print (df.head())
a b
0 File Name chb01_01.edf
1 File Start Time 11:42:54
2 File End Time 12:42:54
3 Number of Seizures in File 0
4 File Name chb01_02.edf
df = df.set_index([df['a'].eq('File Name').cumsum(), 'a'])['b']
.unstack()
.reset_index(drop=True)
print (df.head())
a File End Time File Name File Start Time Number of Seizures in File \
0 12:42:54 chb01_01.edf 11:42:54 0
1 13:42:57 chb01_02.edf 12:42:57 0
2 14:43:04 chb01_03.edf 13:43:04 1
3 15:43:12 chb01_04.edf 14:43:12 1
4 16:43:19 chb01_05.edf 15:43:19 0
a Seizure End Time Seizure Start Time
0 None None
1 None None
2 3036 seconds 2996 seconds
3 1494 seconds 1467 seconds
4 None None

Division between two numbers in a Dataframe

I am trying to calculate a percent change between 2 numbers in one column when a signal from another column is triggered.
The trigger can be found with np.where() but what I am having trouble with is the percent change. .pct_change does not work because if you .pct_change(-5) you get 16.03/20.35 and I want the number the opposite way 20.35/16.03. See table below. I have tried returning the array from the index in the np.where and adding it to an .iloc from the 'Close' column but it says I can't use that array to get an .iloc position. Can anyone help me solve this problem. Thank you.
IdxNum | Close | Signal (1s)
==============================
0 21.45 0
1 21.41 0
2 21.52 0
3 21.71 0
4 20.8 0
5 20.35 0
6 20.44 0
7 16.99 0
8 17.02 0
9 16.69 0
10 16.03 1<< 26.9% <<< 20.35/16.03-1 (df.Close[5]/df.Close[10]-1)
11 15.67 0
12 15.6 0

You can try this code block:
#Create DataFrame
df = pd.DataFrame({'IdxNum':range(13),
'Close':[21.45,21.41,21.52,21.71,20.8,20.35,20.44,16.99,17.02,16.69,16.03,15.67,15.6],
'Signal':[0] * 13})
df.ix[10,'Signal']=1
#Create a function that calculates the reqd diff
def cal_diff(row):
if(row['Signal']==1):
signal_index = int(row['IdxNum'])
row['diff'] = df.Close[signal_index-5]/df.Close[signal_index]-1
return row
#Create a column and apply that difference
df['diff'] = 0
df = df.apply(lambda x:cal_diff(x),axis=1)
In case you don't have IdxNum column, you can use the index to calculate difference
#Create DataFrame
df = pd.DataFrame({
'Close':[21.45,21.41,21.52,21.71,20.8,20.35,20.44,16.99,17.02,16.69,16.03,15.67,15.6],
'Signal':[0] * 13})
df.ix[10,'Signal']=1
#Calculate the reqd difference
df['diff'] = 0
signal_index = df[df['Signal']==1].index[0]
df.ix[signal_index,'diff'] = df.Close[signal_index-5]/df.Close[signal_index]-1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Increment or reset counter based on an existing value of a data frame column in Pandas - pandas

Here's my approach: s = df.Value.ne(15) df['Counter'] = (~s).groupby(s.cumsum()).cumsum()

Related

Pandas: Newbie question on compare and (re)calculate fields with pandas

Reading CSV and import column data as an numpy array

Counter calling in pandas?

Reformatting pandas table when column contains repeated headers

Division between two numbers in a Dataframe

Categories

Resources