ID rows containing values greater than corresponding values based on a criteria from another row

ID rows containing values greater than corresponding values based on a criteria from another row - pandas

I have a grouped dataframe. I have created a flag that identifies if values in a row are less than the group maximums. This works fine. However I want to unflag rows where the value contained in a third column is greater than the value in the same (third) column within each group. I have a feeling there shoule be an elegant and pythonic way to do this but I can't figure it out.
The flag I have shown in the code compares the maximum value of tour_duration within each hh_id to the corresponding value of "comp_expr" and if found less, assigns "1" to the column flag. However, I want values in the flag column to be 0 if min(arrivaltime) for each subgroup tour_id > max(arrivaltime) for the tour_id whose tour_duration is found to be maximum within each hh_id. For example, in the given data, tour_id 16300 has the highest value of tour_duration. But tour_id 16200 has min arrivaltime 1080 which is < max(arrivaltime) for tour_id 16300 (960). So flag for all tour_id 16200 should be 0.
Kindly assist.
import pandas as pd
import numpy as np
stops_data = pd.DataFrame({'hh_id': [20044,20044,20044,20044,20044,20044,20044,20044,20044,20044,20044,20122,20122,20122,20122,20122,20122,20122,20122,20122,20122,20122,20122,20122,],'tour_id':[16300,16300,16100,16100,16100,16100,16200,16200,16200,16000,16000,38100,38100,37900,37900,37900,38000,38000,38000,38000,38000,38000,37800,37800],'arrivaltime':[360,960,900,900,900,960,1080,1140,1140,420,840,300,960,780,720,960,1080,1080,1080,1080,1140,1140,480,900],'tour_duration':[600,600,60,60,60,60,60,60,60,420,420,660,660,240,240,240,60,60,60,60,60,60,420,420],'comp_expr':[1350,1350,268,268,268,268,406,406,406,974,974,1568,1568,606,606,606,298,298,298,298,298,298,840,840]})
stops_data['flag'] = np.where(stops_data.groupby(['hh_id'])
['tour_duration'].transform(max) < stops_data['comp_expr'],0,1)
This is my current output:Current dataset and output
This is my desired output, please see flag column: Desired output, see changed flag values in bold

>>> stops_data.loc[stops_data.tour_id
.isin(stops_data.loc[stops_data.loc[stops_data
.groupby(['hh_id','tour_id'])['arrivaltime'].idxmin()]
.groupby('hh_id')['arrivaltime'].idxmax()]['tour_id']), 'flag'] = 0
>>> stops_data
hh_id tour_id arrivaltime tour_duration comp_expr flag
0 20044 16300 360 600 1350 0
1 20044 16300 960 600 1350 0
2 20044 16100 900 60 268 1
3 20044 16100 900 60 268 1
4 20044 16100 900 60 268 1
5 20044 16100 960 60 268 1
6 20044 16200 1080 60 406 0
7 20044 16200 1140 60 406 0
8 20044 16200 1140 60 406 0
9 20044 16000 420 420 974 0
10 20044 16000 840 420 974 0
11 20122 38100 300 660 1568 0
12 20122 38100 960 660 1568 0
13 20122 37900 780 240 606 1
14 20122 37900 720 240 606 1
15 20122 37900 960 240 606 1
16 20122 38000 1080 60 298 0
17 20122 38000 1080 60 298 0
18 20122 38000 1080 60 298 0
19 20122 38000 1080 60 298 0
20 20122 38000 1140 60 298 0
21 20122 38000 1140 60 298 0
22 20122 37800 480 420 840 0
23 20122 37800 900 420 840 0

Related

How to calculate Growth rate of a time series variable in python pandas

I have a data in time series format like:
date value
1-1-2013 100
1-2-2013 200
1-3-2013 300
1-4-2013 400
1-5-2013 500
1-6-2013 600
1-7-2013 700
1-8-2013 650
1-9-2013 450
1-10-2013 350
1-11-2013 250
1-12-2013 150

Use Series.pct_change:
In [458]: df['growth rate'] = df.value.pct_change()
In [459]: df
Out[459]:
date value growth rate
0 1-1-2013 100 NaN
1 1-2-2013 200 1.000000
2 1-3-2013 300 0.500000
3 1-4-2013 400 0.333333
4 1-5-2013 500 0.250000
5 1-6-2013 600 0.200000
6 1-7-2013 700 0.166667
7 1-8-2013 650 -0.071429
8 1-9-2013 450 -0.307692
9 1-10-2013 350 -0.222222
10 1-11-2013 250 -0.285714
11 1-12-2013 150 -0.400000
Or:
If you want to show in percent multiply by 100:
In [480]: df['growth rate'] = df.value.pct_change().mul(100)
In [481]: df
Out[481]:
date value growth rate
0 1-1-2013 100 NaN
1 1-2-2013 200 100.000000
2 1-3-2013 300 50.000000
3 1-4-2013 400 33.333333
4 1-5-2013 500 25.000000
5 1-6-2013 600 20.000000
6 1-7-2013 700 16.666667
7 1-8-2013 650 -7.142857
8 1-9-2013 450 -30.769231
9 1-10-2013 350 -22.222222
10 1-11-2013 250 -28.571429
11 1-12-2013 150 -40.000000

Growth rate as single number for each year
df['col'] = df.groupby(['Year'])['col2'].pct_change(periods=1) * 100

Pandas Pivot table: How to sort top n nows in the pivot table by category and generate new dataframe?

df = pd.DataFrame({'country': ['AUD','CAD', 'IND','JPY', 'AUD', 'CHY', 'IND', 'KRL', 'SRI', 'KRW', 'CAD'],
'area': ['N','S','N','E','W','S','NE','N','S','SE','N'], 'gdp': [349,65,60,88,75,100,200,250,150,210,160], 'income': [7000,2300,5000,1000,550,1000,2060,2750,1450,2610,1650], 'expense': [500,300,700,600,500,900,206,275,1405,210,150]})
df = df.pivot_table(index=['country','area'],values=['gdp'],aggfunc='sum').sort_values(by = ['gdp'], ascending = False, axis = 0).head(5)
By applying above method, I am unable to see top 5 'country' based on 'gdp'. My expected output should be like below in dataframe. Please suggest. However, I tried to populate the expected output through MS excel to get a feel.
new_df = country gdp expense income
AUD 424 1000 7550
N 349 500 7000
W 75 500 550
IND 260 906 7060
N 60 700 5000
NE 200 206 2060
KRL 250 275 2750
N 250 275 2750
CAD 225 450 3950
N 160 150 1650
S 65 300 2300
KRW 210 210 2610
SE 210 210 2610
new_df = country gdp expense income area
AUD 424 1000 7550 N, W
IND 260 906 7060 N, NE
KRL 250 275 2750 N
CAD 225 450 3950 N,S
KRW 210 210 2610 SE

Use -
df.groupby('country', as_index=False).agg({'gdp': 'sum', 'area': ','.join}).sort_values(by='gdp', ascending=False).head(5)
Output
country gdp area
0 AUD 349 N,W
5 KRL 250 N
6 KRW 210 SE
3 IND 200 N,NE
1 CAD 160 S,N

you don't need to use pivot for this, it can be done this way
df.groupby('country').agg({'gdp':'sum',
'area':','.join}).sort_values('gdp',ascending = False).head(5)
Output:
country gdp area
AUD 424 N,W
IND 260 N,NE
KRL 250 N
CAD 225 S,N
KRW 210 SE

How to remove unwanted values in data when reading csv file

Reading Pina_Indian_Diabities.csv some of the values are strings, something like this
+AC0-5.4128147485
734 2
735 4
736 0
737 8
738 +AC0-5.4128147485
739 1
740 NaN
741 3
742 1
743 9
744 13
745 12
746 1
747 1
like in row 738, there re such values in other rows and columns as well.
How can I drop them?

Parsing Json Data from select query in SQL Server

I have a situation where I have a table that has a single varchar(max) column called dbo.JsonData. It has x number of rows with x number of properties.
How can I create a query that will allow me to turn the result set from a select * query into a row/column result set?
Here is what I have tried:
SELECT *
FROM JSONDATA
FOR JSON Path
But it returns a single row of the json data all in a single column:
JSON_F52E2B61-18A1-11d1-B105-00805F49916B
[{"Json_Data":"{\"Serial_Number\":\"12345\",\"Gateway_Uptime\":17,\"Defrost_Cycles\":0,\"Freeze_Cycles\":2304,\"Float_Switch_Raw_ADC\":1328,\"Bin_status\":2304,\"Line_Voltage\":0,\"ADC_Evaporator_Temperature\":0,\"Mem_Sw\":1280,\"Freeze_Timer\":2560,\"Defrost_Timer\":593,\"Water_Flow_Switch\":3328,\"ADC_Mid_Temperature\":2560,\"ADC_Water_Temperature\":0,\"Ambient_Temperature\":71,\"Mid_Temperature\":1259,\"Water_Temperature\":1259,\"Evaporator_Temperature\":1259,\"Ambient_Temperature_Off_Board\":0,\"Ambient_Temperature_On_Board\":0,\"Gateway_Info\":\"{\\\"temp_sensor\\\":0.00,\\\"temp_pcb\\\":82.00,\\\"gw_uptime\\\":17.00,\\\"winc_fw\\\":\\\"19.5.4\\\",\\\"gw_fw_version\\\":\\\"0.0.0\\\",\\\"gw_fw_version_git\\\":\\\"2a75f20-dirty\\\",\\\"gw_sn\\\":\\\"328\\\",\\\"heap_free\\\":11264.00,\\\"gw_sig_csq\\\":0.00,\\\"gw_sig_quality\\\":0.00,\\\"wifi_sig_strength\\\":-63.00,\\\"wifi_resets\\\":0.00}\",\"ADC_Ambient_Temperature\":1120,\"Control_State\":\"Bin Full\",\"Compressor_Runtime\":134215680}"},{"Json_Data":"{\"Serial_Number\":\"12345\",\"Gateway_Uptime\":200,\"Defrost_Cycles\":559,\"Freeze_Cycles\":510,\"Float_Switch_Raw_ADC\":106,\"Bin_status\":0,\"Line_Voltage\":119,\"ADC_Evaporator_Temperature\":123,\"Mem_Sw\":113,\"Freeze_Timer\":0,\"Defrost_Timer\":66,\"Water_Flow_Switch\":3328,\"ADC_Mid_Temperature\":2560,\"ADC_Water_Temperature\":0,\"Ambient_Temperature\":71,\"Mid_Temperature\":1259,\"Water_Temperature\":1259,\"Evaporator_Temperature\":54,\"Ambient_Temperature_Off_Board\":0,\"Ambient_Temperature_On_Board\":0,\"Gateway_Info\":\"{\\\"temp_sensor\\\":0.00,\\\"temp_pcb\\\":82.00,\\\"gw_uptime\\\":199.00,\\\"winc_fw\\\":\\\"19.5.4\\\",\\\"gw_fw_version\\\":\\\"0.0.0\\\",\\\"gw_fw_version_git\\\":\\\"2a75f20-dirty\\\",\\\"gw_sn\\\":\\\"328\\\",\\\"heap_free\\\":10984.00,\\\"gw_sig_csq\\\":0.00,\\\"gw_sig_quality\\\":0.00,\\\"wifi_sig_strength\\\":-60.00,\\\"wifi_resets\\\":0.00}\",\"ADC_Ambient_Temperature\":1120,\"Control_State\":\"Defrost\",\"Compressor_Runtime\":11304}"},{"Json_Data":"{\"Seri...
What am I missing?
I can't specify the columns explicitly because the json strings aren't always the same.
This what I expect:
Serial_Number Gateway_Uptime Defrost_Cycles Freeze_Cycles Float_Switch_Raw_ADC Bin_status Line_Voltage ADC_Evaporator_Temperature Mem_Sw Freeze_Timer Defrost_Timer Water_Flow_Switch ADC_Mid_Temperature ADC_Water_Temperature Ambient_Temperature Mid_Temperature Water_Temperature Evaporator_Temperature Ambient_Temperature_Off_Board Ambient_Temperature_On_Board ADC_Ambient_Temperature Control_State Compressor_Runtime temp_sensor temp_pcb gw_uptime winc_fw gw_fw_version gw_fw_version_git gw_sn heap_free gw_sig_csq gw_sig_quality wifi_sig_strength wifi_resets LastModifiedDateUTC Defrost_Cycle_time Freeze_Cycle_time
12345 251402 540 494 106 0 98 158 113 221 184 0 0 0 1259 1259 1259 33 0 0 0 Freeze 10833 0 78 251402 19.5.4 0.0.0 2a75f20-dirty 328.00000000 10976 0 0 -61 0 2018-03-20 11:15:28.000 0 0
12345 251702 540 494 106 0 98 178 113 517 184 0 0 0 1259 1259 1259 22 0 0 0 Freeze 10838 0 78 251702 19.5.4 0.0.0 2a75f20-dirty 328.00000000 10976 0 0 -62 0 2018-03-20 11:15:42.000 0 0
...
Thank you,
Ron

inventory summary Access

Well after much messing about I have finally got a query that gives sales totals of all the products, and another that gives stock absorbed for all of the products, (images below).
I was looking at Allen Brown's stuff and I get it, but I was wondering If i could make a summary report covering all of the available stock for all of the products. there are 31 products in total and they all appear in the stock query and in the sales query.
http://imageshack.us/photo/my-images/810/87887.jpg
http://imageshack.us/photo/my-images/827/8787j.jpg
Any Ideas on the coding to use...
I guess it would be possible to do, but then I don't really know where to start
I really want to make a report which will summarise the stock of each of the products in the queries.
Instead of having a button for working the stock of one specific product, having a button that will work the stock for each of the products in the queries at the same time.
does this make sense?
Thanks
Sam
UPDATE
this is the query for finding stock
SELECT TblStock.ProductID, Sum(TblStock.StockLevel) AS Stock, TblProduct.Item
FROM TblStock INNER JOIN TblProduct ON TblStock.ProductID = TblProduct.ProductID
GROUP BY TblStock.ProductID, TblProduct.Item;
this is the query for finding quantity of sales
SELECT TblProduct.Item, Sum(TblTotalSale.Size) AS Quantity, TblProduct.ProductID
FROM TblProduct INNER JOIN TblTotalSale ON TblProduct.[ProductID] = TblTotalSale.[ProductID]
GROUP BY TblProduct.Item, TblProduct.ProductID;
TblStock looks like
StockID ProductID StockLevel
89 32 200
90 33 72
91 34 72
92 1 528
93 3 528
94 5 528
95 9 528
96 7 528
97 18 80
98 30 72
99 31 204
Product Table looks like
ProductID Item Price StockDelivery PriceSmall Large Small
1 Carling £2.50 528 £1.40 2 1
3 Carlsburg £2.70 528 £1.60 2 1
5 IPA £2.30 528 £1.20 2 1
7 StrongBow £2.80 528 £1.65 2 1
9 RevJames £2.45 528 £1.30 2 1
11 Becks £2.90 72 1
12 WKDBlue £2.80 72 1
13 WKDRed £2.80 72 1
14 SmirnoffIce £2.80 72 1
15 KoppaburgPear £3.10 72 1
16 KoppaburgSum £3.10 72 1
17 Bulmers £2.90 72 1
18 Vodka £1.60 80 1
19 Gin £1.40 80 1
20 Sherry £1.40 80 1
21 Sambuca £1.70 80 1
22 Rum £1.60 80 1
23 Port £1.60 80 1
24 Whiskey £1.60 80 1
25 Baileys £1.60 80 1
26 Jagermeister £1.50 80 1
27 Martini £1.60 80 1
28 CokeCan £0.85 72 1
29 Coke £1.30 204 £0.30 2 1
30 LemonadeCan £0.85 72 1
31 Lemonade £1.30 204 £0.30 2 1
32 Squash £0.25 200 1
33 Tonic £0.85 72 1
34 RedBull £1.90 72 1
35 Nuts £0.60 70 1
36 Crisps £0.60 70 1
tbltotalSale looks like
TotalSalesID ProductID SalePrice Day Time Size
370 1 £2.50 05/02/2012 19:53:14 2
371 1 £1.40 05/02/2012 19:53:14 1
372 1 £2.50 05/02/2012 19:53:14 2
373 1 £1.40 05/02/2012 19:53:14 1
374 1 £2.50 05/02/2012 20:25:12 2
375 1 £1.40 05/02/2012 20:25:12 1
376 1 £2.50 05/02/2012 20:25:12 2
377 1 £1.40 05/02/2012 20:25:12 1
378 1 £2.50 05/02/2012 20:25:12 2
379 1 £2.50 05/02/2012 20:25:12 2
380 1 £1.40 05/02/2012 20:25:12 1
381 5 £2.30 05/02/2012 20:25:12 2
382 5 £2.30 05/02/2012 20:25:12 2
383 5 £1.20 05/02/2012 20:25:12 1
384 7 £2.80 05/02/2012 20:25:12 2
385 7 £1.65 05/02/2012 20:25:12 1
386 7 £1.65 05/02/2012 20:25:12 1
387 9 £1.30 05/02/2012 20:25:12 1
435 11 £2.90 05/02/2012 20:25:12 1
436 11 £2.90 05/02/2012 20:42:49 1
437 11 £2.90 05/02/2012 20:42:49 1
I can upload my database if that would be easyer.
have tried to use the following Query for what i want, but it returns 11 results for each product id and none that are correct....
SELECT QrySaleTot.Item, QrySaleTot.ProductID, [QryStockLevel].[Stock]-[QrySaleTot].[Quantity] AS StockOnHand
FROM QrySaleTot, QryStockLevel
GROUP BY QrySaleTot.Item, QrySaleTot.ProductID, [QryStockLevel].[Stock]-[QrySaleTot].[Quantity];
Thanks

You can join the two tables by id and just subtract.
SELECT Sales.ID, Stock.Level - Sales.Quantity
FROM Sales
INNER JOIN Stock
ON Sales.ID = Stock.ID
Updating is not so different. Play around with the query design window. You may wish to read:
Fundamental Microsoft Jet SQL for Access 2000
Intermediate Microsoft Jet SQL for Access 2000
Advanced Microsoft Jet SQL for Access 2000

You included this query in the update to your question:
SELECT
QrySaleTot.Item,
QrySaleTot.ProductID,
[QryStockLevel].[Stock]-[QrySaleTot].[Quantity] AS StockOnHand
FROM QrySaleTot, QryStockLevel
GROUP BY
QrySaleTot.Item,
QrySaleTot.ProductID,
[QryStockLevel].[Stock]-[QrySaleTot].[Quantity];
The first problem is you don't have a join condition ... so each row from QrySaleTot will be matched up with every row from QryStockLevel. That will produce what is called a Cartesian product, or cross join. Revise it to use a join on a field the 2 queries have in common.
The GROUP BY doesn't seem useful here because you're not computing any aggregate values.
Finally, Item is a reserved word. If you must keep that field name, bracket it everywhere you reference it in your queries like this: QrySaleTot.[Item]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

ID rows containing values greater than corresponding values based on a criteria from another row - pandas

Related

How to calculate Growth rate of a time series variable in python pandas

Pandas Pivot table: How to sort top n nows in the pivot table by category and generate new dataframe?

How to remove unwanted values in data when reading csv file

Parsing Json Data from select query in SQL Server

inventory summary Access

Categories

Resources