I need to mark rows in a time series where the timestamps fall between given time-of-day blocks; when I have eg
values = ([ 'motorway' ] * 5000) + ([ 'link' ] * 300) + ([ 'motorway' ] * 7000)
df = pd.DataFrame.from_dict({
'timestamp': pd.date_range(start='2018-1-1', end='2018-1-2', freq='s').tolist()[:len(values)],
'road_type': values,
})
df.set_index('timestamp', inplace=True)
I need to add a column rush that marks rows where timestamp is between 06:00 and 09:00 or 15:30 and 19:00. I've seen between_time but I don't know how to apply it here.
edit: based on this answer I managed to put together
df['rush'] = df.index.isin(df.between_time('00:00:15', '00:00:20', include_start=True, include_end=True).index) | df.index.isin(df.between_time('00:00:54', '00:00:59', include_start=True, include_end=True).index)
but I wonder whether there isn't a more elegant way.
One alternative using between
from datetime import time as t
values = ([ 'motorway' ] * 5000) + ([ 'link' ] * 300) + ([ 'motorway' ] * 7000)
df = pd.DataFrame.from_dict({
'timestamp': pd.date_range(start='2018-1-1', end='2018-1-2',
freq='s').tolist()[:len(values)],
'road_type': values,
})
time = df['timestamp'].dt.time
df['rush'] = (time.between(t(0,6,0), t(0,9,0)) | time.between(t(0,15,30),t(0,19,0))).values
Or slicing the df using datetime.time
df = df.set_index(df.timestamp.dt.time)
df['rush'] = df.index.isin(df[t(0,6,0):t(0,9,0)].index | df[t(0,15,30):t(0,19,0)].index)
df = df.reset_index(drop=True)
Related
To get 'n' number of random values from the below array whenever i execute the test script. How i can i achieve this into Karate in a feature file.
[
"2972029540",
"2972033041",
"2972030914",
"2972028446",
"2972030851",
"2972026534",
"2972029484"
]
Here you go:
* def random = function(max){ return Math.floor(Math.random() * max) + 1 }
* def data = [ "2972029540", "2972033041", "2972030914", "2972028446", "2972030851", "2972026534", "2972029484" ]
* def count = random(data.length)
* print 'random count is', count
* def temp = data.slice(0, count)
* print temp
Read this for more info: https://stackoverflow.com/a/53975071/143475
I want to add relationships to column 'relations' based on rel_list. Specifically, for each tuple, i.e. ('a', 'b'), I want to replace the relationships column value '' with 'b' in the first row, but no duplicate, meaning that for the 2nd row, don't replace '' with 'a', since they are considered as duplicated. The following code doesn't work fully correct:
import pandas as pd
data = {
"names": ['a', 'b', 'c', 'd'],
"ages": [50, 40, 45, 20],
"relations": ['', '', '', '']
}
rel_list = [('a', 'b'), ('a', 'c'), ('c', 'd')]
df = pd.DataFrame(data)
for rel_tuple in rel_list:
head = rel_tuple[0]
tail = rel_tuple[1]
df.loc[df.names == head, 'relations'] = tail
print(df)
The current result of df is:
names ages relations
0 a 50 c
1 b 40
2 c 45 d
3 d 20
However, the correct one is:
names ages relations
0 a 50 b
0 a 50 c
1 b 40
2 c 45 d
3 d 20
There are new rows that need to be added. The 2nd row in this case, like above. How to do that?
You can craft a dataframe and merge:
(df.drop('relations', axis=1)
.merge(pd.DataFrame(rel_list, columns=['names', 'relations']),
on='names',
how='outer'
)
# .fillna('') # uncomment to replace NaN with empty string
)
Output:
names ages relations
0 a 50 b
1 a 50 c
2 b 40 NaN
3 c 45 d
4 d 20 NaN
Instead of updating df you can create a new one and add relations row by row:
import pandas as pd
data = {
"names": ['a', 'b', 'c', 'd'],
"ages": [50, 40, 45, 20],
"relations": ['', '', '', '']
}
rel_list = [('a', 'b'), ('a', 'c'), ('c', 'd')]
df = pd.DataFrame(data)
new_df = pd.DataFrame(data)
new_df.loc[:, 'relations'] = ''
for head, tail in rel_list:
new_row = df[df.names == head]
new_row.loc[:,'relations'] = tail
new_df = new_df.append(new_row)
print(new_df)
Output:
names ages relations
0 a 50
1 b 40
2 c 45
3 d 20
0 a 50 b
0 a 50 c
2 c 45 d
Then, if needed, in the end you can delete all rows without value in 'relations':
new_df = new_df[new_df['relations']!='']
Have any way to use
df = pd.read_excel(r'a.xlsx')
df2 = df.groupby(by=["col"], as_index=False).mean()
Include new column with number of rows grouped in each row?
in absence of sample data, I'm assuming you have multiple numeric columns
can use apply() to then calculate all means and append len() to this series
df = pd.DataFrame(
{
"col": np.random.choice(list("ABCD"), 200),
"val": np.random.uniform(1, 5, 200),
"val2": np.random.uniform(5, 10, 200),
}
)
df2 = df.groupby(by=["col"], as_index=False).apply(
lambda d: d.select_dtypes("number").mean().append(pd.Series({"len": len(d)}))
)
df2
col
val
val2
len
0
A
3.13064
7.63837
42
1
B
3.1057
7.50656
44
2
C
3.0111
7.82628
54
3
D
3.20709
7.32217
60
comment code
def w_avg(df, values, weights, exp):
d = df[values]
w = df[weights] ** exp
return (d * w).sum() / w.sum()
dfg1 = pd.DataFrame(
{
"Jogador": np.random.choice(list("ABCD"), 200),
"Evento": np.random.choice(list("XYZ"),200),
"Rating Calculado BW": np.random.uniform(1, 5, 200),
"Lances": np.random.uniform(5, 10, 200),
}
)
dfg = dfg1.groupby(by=["Jogador", "Evento"]).apply(
lambda dfg1: dfg1.select_dtypes("number")
.agg(lambda d: w_avg(dfg1, "Rating Calculado BW", "Lances", 1))
.append(pd.Series({"len": len(dfg1)}))
)
dfg
I have a table that looks like this:
A B C
1 foo
2 foobar blah
3
I want to count up the non empty columns from A, B and C to get a summary column like this:
A B C sum
1 foo 1
2 foobar blah 2
3 0
Here is how I'm trying to do it:
import pandas as pd
df = { 'A' : ["foo", "foobar", ""],
'B' : ["", "blah", ""],
'C' : ["","",""]}
df = pd.DataFrame(df)
print(df)
df['sum'] = df[['A', 'B', 'C']].notnull().sum(axis=1)
df['sum'] = (df[['A', 'B', 'C']] != "").sum(axis=1)
These last two lines are different ways to get what I want but they aren't working. Any suggestions?
df['sum'] = (df[['A', 'B', 'C']] != "").sum(axis=1)
Worked. Thanks for the assistance.
This one-liner worked for me :)
df["sum"] = df.replace("", np.nan).T.count().reset_index().iloc[:,1]
I want to insert dictionaries within a list of dictionaries as rows in a dataframe with the indices equal to the values found in every key 'index' while moving the rows previously occupying those indices 1 step down so they don't get overwritten.
ex.
List:
rows=
[{'Abbreviation': u'3-HYDROXY-3-METHYL-GLUTARYL-COA_m',
'Charge': -5.0,
'Charged Formula': u'C27H39N7O20P3S1',
'Neutral Formula': u'C27H44N7O20P3S1',
'index': 101},
{'Abbreviation': u'5-METHYL-THF_c',
'Charge': -2.0,
'Charged Formula': u'C20H23N7O6',
'Neutral Formula': u'C20H25N7O6',
'index': 204}]
DataFrame: df
Before:
index Abbreviation
101 foo
204 bar
After:
index Abbreviation | etc..
101 3-HYDROXY-3-METHYL-GLUTARYL-COA_m .
102 foo
204 5-METHYL-THF_c
205 bar
Any help is appreciated. Thank you very much!
Regular list insertion should work here. Posting some sample code below:
d1 = { 'index' : 101 }
d2 = { 'index' : 102 }
d4 = { 'index' : 104 }
l = [d1, d4]
# assuming elements of l have the key 'index' and
# are sorted in the ascending order of 'index'
# inserting d2 in l
for i, v in enumerate(l):
if v['index'] > d2['index']:
break
l.insert(i, d2)
List contents:
Before inserting d2
[{'index': 101}, {'index': 104}]
After inserting d2
[{'index': 101}, {'index': 102}, {'index': 104}]