Performing a mod function on time data column pandas python

Performing a mod function on time data column pandas python - pandas

Hello I wanted to apply a mod function of column % 24 to the hour of time column.
I believe the time column is in a string format,
I was wondering how I should go about performing the operation.
sales_id,date,time,shopping_cart,price,parcel_size,Customer_lat,Customer_long,isLoyaltyProgram,nearest_storehouse_id,nearest_storehouse,dist_to_nearest_storehouse,delivery_cost
ORD0056604,24/03/2021,45:13:45,"[('bed', 3), ('Chair', 1), ('wardrobe', 4), ('side_table', 2), ('Dining_table', 2), ('mattress', 1)]",3152.77,medium,-38.246,145.61984,1,4,Sunshine,78.43,5.8725000000000005
ORD0096594,13/12/2018,54:22:20,"[('Study_table', 4), ('wardrobe', 4), ('side_table', 1), ('Dining_table', 2), ('sofa', 4), ('Chair', 3), ('mattress', 1)]",3781.38,large,-38.15718,145.05072,1,4,Sunshine,40.09,5.8725000000000005
ORD0046310,16/02/2018,17:23:36,"[('mattress', 2), ('wardrobe', 1), ('side_table', 2), ('sofa', 1), ('Chair', 3), ('Study_table', 4)]",2219.09,medium,144.69623,-38.00731,0,2,Footscray,34.2,16.9875
ORD0031675,25/06/2018,17:38:48,"[('bed', 4), ('side_table', 1), ('Chair', 1), ('mattress', 3), ('Dining_table', 2), ('sofa', 2), ('wardrobe', 2)]",4542.1,large,144.65506,-38.40669,1,2,Footscray,72.72,18.274500000000003
ORD0019799,05/01/2021,18:37:16,"[('wardrobe', 1), ('Study_table', 3), ('sofa', 4), ('side_table', 2), ('Chair', 4), ('Dining_table', 4), ('bed', 1)]",3132.71,L,-37.66022,144.94286,1,0,Clayton,17.77,14.931
ORD0041462,25/12/2018,07:29:33,"[('Chair', 3), ('bed', 1), ('mattress', 3), ('side_table', 3), ('wardrobe', 3), ('sofa', 4)]",4416.42,medium,-38.39154,145.87448,0,6,Sunshine,105.91,6.151500000000001
ORD0047848,30/07/2021,34:18:01,"[('Chair', 3), ('bed', 3), ('wardrobe', 4)]",2541.04,small,-37.4654,144.45832,1,2,Footscray,60.85,18.4635

Convert values to timedeltas by to_timedelta and then remove days by indexing - selecting last 8 values:
print (df)
sales_id date time
0 ORD0056604 24/03/2021 45:13:45
1 ORD0096594 13/12/2018 54:22:20
print (pd.to_timedelta(df['time']))
0 1 days 21:13:45
1 2 days 06:22:20
Name: time, dtype: timedelta64[ns]
df['time'] = pd.to_timedelta(df['time']).astype(str).str[-8:]
print (df)
sales_id date time
0 ORD0056604 24/03/2021 21:13:45
1 ORD0096594 13/12/2018 06:22:20
If need also add days to date column solution is add timedeltas to dates and last extract values by Series.dt.strftime:
dates = pd.to_datetime(df['date'], dayfirst=True) + pd.to_timedelta(df['time'])
df['time'] = dates.dt.strftime('%H:%M:%S')
df['date'] = dates.dt.strftime('%d/%m/%Y')
print (df)
sales_id date time
0 ORD0056604 25/03/2021 21:13:45
1 ORD0096594 15/12/2018 06:22:20

Related

Google BigQuery Standard SQL get weight summarize result by group

Original data
structure(list(Year = c(1999, 1999, 1999, 2000, 2000, 2000),
Country = c("a", "b", "b", "a", "a", "b"), number = c(2,
3, 4, 5, 3, 6), result = c(2, 4, 5, 6, 2, 2)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -6L))
What I need is
Year Country weightresult
weightresult=result*(number/sum(number_year,country))
with weight result by number and the sum number is according to Year,Country group
the process result is
tructure(list(Year = c(1999, 1999, 1999, 2000, 2000, 2000),
Country = c("a", "b", "b", "a", "a", "b"), number = c(2,
3, 4, 5, 3, 6), result = c(2, 4, 5, 6, 2, 2), weight = c(2,
7, 7, 8, 8, 6), wre = c(2, 1.71428571428571, 2.85714285714286,
3.75, 0.75, 2)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
Finally need is
structure(list(Country = c("a", "a", "b", "b"), Year = c(1999,
2000, 1999, 2000), wre = c(2, 4.5, 4.57142857142857, 2)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L))
How to get the finally result in Bigquery Standard SQL
SELECT
Year,
Country,
(number/(SUM(number) OVER (PARTITION BY Year, Country))) * result AS wre,
Count(*),
FROM `table`
Where
Year<=2020
GROUP BY Year,Country
ORDER BY Year,Country
And the error is
SELECT list expression references column number which is neither grouped nor aggregated at ...

Use below
select
year,
country,
sum(number * result) / sum(number) as weighted_result
from your_table
where year <= 2020
group by year,country
order by year,country
with output

how do I select rows from pandas df without returning False values?

I have a df and I need to select rows based on some conditions in multiple columns.
Here is what I have
import pandas as pd
dat = [('p','q', 5), ('k','j', 2), ('p','-', 5), ('-','p', 4), ('q','pkjq', 3), ('pkjq','q', 2)
df = pd.DataFrame(dat, columns = ['a', 'b', 'c'])
df_dat = df[(df[['a','b']].isin(['k','p','q','j']) & df['c'] > 3)] | df[(~df[['a','b']].isin(['k','p','q','j']) & df['c'] > 2 )]
Expected result = [('p','q', 5), ('p','-', 5), ('-','p', 4), ('q','pkjq', 3)]
Result I am getting is an all false dataframe

When you have the complicate condition I recommend, make the condition outside the slice
cond1 = df[['a','b']].isin(['k','p','q','j']).any(1) & df['c'].gt(3)
cond2 = (~df[['a','b']].isin(['k','p','q','j'])).any(1) & df['c'].gt(2)
out = df.loc[cond1 | cond2]
Out[305]:
a b c
0 p q 5
2 p - 5
3 - p 4
4 q pkjq 3

Is there any way to mapping point between 2 ecliptic curve?

The ecliptic curve E1: y^2 = x^3+7 over F17 with the base point G is (15, 13)
and the second ecliptic curve E2: y^2 = x^3+7 over F31 with the same base point G is (15, 13).
My question is: is there any way to calculate the equivalent point of F31 based on F17?
For example: with 7G = (10, 15) of curve F17, how to calculate 7G of F31 ? The result should be 7G = (12, 14) on F31.
Below is all points of two curves:
#----Curve F17-------#
1G = (15, 13)
2G = (2, 10)
3G = (8, 3)
4G = (12, 1)
5G = (6, 6)
6G = (5, 8)
7G = (10, 15)
8G = (1, 12)
9G = (3, 0)
10G = (1, 5)
11G = (10, 2)
12G = (5, 9)
13G = (6, 11)
14G = (12, 16)
15G = (8, 14)
16G = (2, 7)
17G = (15, 4)
#----Curve F31-------#
1G = (15, 13)
2G = (29, 17)
3G = (1, 22)
4G = (20, 19)
5G = (21, 17)
6G = (23, 23)
7G = (12, 14)
8G = (11, 27)
9G = (25, 22)
10G = (7, 19)
11G = (27, 27)
12G = (5, 9)
13G = (0, 24)
14G = (4, 12)
15G = (22, 23)
16G = (3, 13)
17G = (13, 18)
18G = (17, 23)
19G = (24, 4)
20G = (24, 27)
21G = (17, 8)
22G = (13, 13)
23G = (3, 18)
24G = (22, 8)
25G = (4, 19)
26G = (0, 7)
27G = (5, 22)
28G = (27, 4)
29G = (7, 12)
30G = (25, 9)
31G = (11, 4)

Pandas how to find position of cell contains sub-string

Example:
Price | Rate p/lot | Total Comm|
947.2 1.25 BAM 1.25
129.3 2.1 $ 1.25
161.69 $ 0.8 CAD 2.00
If I have search for ['$','CAD']:-
Expected output:-
[(1, 2), (2, 1),(2,2)]

Sorry, find solution like this,It may help someone
import pandas as pd
df = pd.DataFrame([[947.2, 1.25, 'BAM 1.25'],
[129.3, 2.1, '$ 1.25'],
[161.69, '0.8 $', 'CAD 2.00']],
columns=['Price', 'Rate p/lot', 'Total Comm'])
row, column = (df.applymap(lambda x: x if any(s in str(x) for s in ['$','CAD']) else None )).values.nonzero()
t = list(zip(row,column))

You can use in with applymap:
i, j = (df.applymap(lambda x: '$' in str(x))).values.nonzero()
t = list(zip(i, j))
print (t)
[(1, 2), (2, 1)]
i, j = (df.applymap(lambda x: any(y for y in L if y in str(x)))).values.nonzero()
#another solution
#i, j = (df.applymap(lambda x: any(s in str(x) for s in L))).values.nonzero()
t = list(zip(i, j))
print (t)
[(1, 2), (2, 1), (2, 2)]

Use str.contains:
df = df.astype(str)
from itertools import product
result = reduce(lambda x,y:x+y, [list(product([i],list(df.iloc[:,i][df.iloc[:,i].str.contains('\$|CAD')].index))) for i in range(len(df.columns))])
Output
[(1, 2), (2, 1), (2, 2)]

Data handling for matplotlib histogram with error bars

I've got a data set which is a list of tuples in python like this:
dataSet = [(6.1248199999999997, 27), (6.4400500000000003, 4), (5.9150600000000004, 1), (5.5388400000000004, 38), (5.82559, 1), (7.6892199999999997, 2), (6.9047799999999997, 1), (6.3516300000000001, 76), (6.5168699999999999, 1), (7.4382099999999998, 1), (5.4493299999999998, 1), (5.6254099999999996, 1), (6.3227700000000002, 1), (5.3321899999999998, 11), (6.7402300000000004, 4), (7.6701499999999996, 1), (5.4589400000000001, 3), (6.3089700000000004, 1), (6.5926099999999996, 2), (6.0003000000000002, 5), (5.9845800000000002, 1), (6.4967499999999996, 2), (6.51227, 6), (7.0302600000000002, 1), (5.7271200000000002, 49), (7.5311300000000001, 7), (5.9495800000000001, 2), (5.1487299999999996, 18), (5.7637099999999997, 6), (5.5144500000000001, 44), (6.7988499999999998, 1), (5.2578399999999998, 1)]
Where the first element of the tuple is an energy and the second a counter, how many sensor where affected.
I want to create a histogram to study the relation between the number of affected sensors and the energy. I'm pretty new to matplotlib (and python), but this is what I've done so far:
import math
import matplotlib.pyplot as plt
dataSet = [(6.1248199999999997, 27), (6.4400500000000003, 4), (5.9150600000000004, 1), (5.5388400000000004, 38), (5.82559, 1), (7.6892199999999997, 2), (6.9047799999999997, 1), (6.3516300000000001, 76), (6.5168699999999999, 1), (7.4382099999999998, 1), (5.4493299999999998, 1), (5.6254099999999996, 1), (6.3227700000000002, 1), (5.3321899999999998, 11), (6.7402300000000004, 4), (7.6701499999999996, 1), (5.4589400000000001, 3), (6.3089700000000004, 1), (6.5926099999999996, 2), (6.0003000000000002, 5), (5.9845800000000002, 1), (6.4967499999999996, 2), (6.51227, 6), (7.0302600000000002, 1), (5.7271200000000002, 49), (7.5311300000000001, 7), (5.9495800000000001, 2), (5.1487299999999996, 18), (5.7637099999999997, 6), (5.5144500000000001, 44), (6.7988499999999998, 1), (5.2578399999999998, 1)]
binWidth = .2
binnedDataSet = []
#create another list and append the "binning-value"
for item in dataSet:
binnedDataSet.append((item[0], item[1], math.floor(item[0]/binWidth)*binWidth))
energies, sensorHits, binnedEnergy = [[q[i] for q in binnedDataSet] for i in (0,1,2)]
plt.plot(binnedEnergy, sensorHits, 'ro')
plt.show()
This works so far (although it doesn't even look like a histogram ;-) but OK), but now I want to calculate the mean value for each bin and append some error bars.
What's the way to do it? I looked at histogram examples for matplotlib, but they all use one-dimensional data which will be counted, so you get a frequency spectrum… That's not really what I want.

I am somewhat confused by exactly what you are trying to do, but I think this (to first order) will do what I think you want:
bin_width = .2
bottom = 5.0
top = 8.0
binned_data = [0.0] * int(math.ceil(((top - bottom) / bin_width)))
binned_count = [0] * int(math.ceil(((top - bottom) / bin_width)))
n_bins = len(binned_data)
for E, cnt in dataSet:
if E < bottom or E > top:
print 'out of range'
continue
bin_id = int(math.floor(n_bins * (E - bottom) / (top - bottom)))
binned_data[bin_id] += cnt
binned_count[bin_id] += 1
binned_avergaed_data = [C_sum / hits if hits > 0 else 0 for C_sum, hits in zip(binned_data, binned_count)]
bin_edges = [bottom + j * bin_width for j in range(len(binned_data))]
plt.bar(bin_edges, binned_avergaed_data, width=bin_width)
I would also suggest looking into numpy, it would make this much simpler to write.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Performing a mod function on time data column pandas python - pandas

Related

Google BigQuery Standard SQL get weight summarize result by group

how do I select rows from pandas df without returning False values?

Is there any way to mapping point between 2 ecliptic curve?

Pandas how to find position of cell contains sub-string

Data handling for matplotlib histogram with error bars

Categories

Resources