Group repeating pattern in pandas Dataframe - pandas

so i have a Dataframe that has a repeating Number Series that i want to group like this:
Number Pattern
Value
Desired Group
Value.1
1
723
1
Max of Group
2
400
1
Max of Group
8
235
1
Max of Group
5
387
2
Max of Group
7
911
2
Max of Group
3
365
3
Max of Group
4
270
3
Max of Group
5
194
3
Max of Group
7
452
3
Max of Group
100
716
4
Max of Group
104
69
4
Max of Group
2
846
5
Max of Group
3
474
5
Max of Group
4
524
5
Max of Group
So essentially the number pattern is always monotonly increasing.
Any Ideas?

You can compare Number Pattern by 1 with cumulative sum by Series.cumsum and then is used GroupBy.transform with max:
df['Desired Group'] = df['Number Pattern'].eq(1).cumsum()
df['Value.1'] = df.groupby('Desired Group')['Value'].transform('max')
print (df)
Number Pattern Value Desired Group Value.1
0 1 723 1 723
1 2 400 1 723
2 3 235 1 723
3 1 387 2 911
4 2 911 2 911
5 1 365 3 452
6 2 270 3 452
7 3 194 3 452
8 4 452 3 452
9 1 716 4 716
10 2 69 4 716
11 1 846 5 846
12 2 474 5 846
13 3 524 5 846
For monotically increasing use:
df['Desired Group'] = (~df['Number Pattern'].diff().gt(0)).cumsum()

Related

Getting the last 50 rows for each group in group by

I have this query but it is only showing the last 5 rows instead of limiting the amount of rows the group by gets
I only want the last 50 rows for each person to be sum and in the group.
SELECT playerid, SUM(gamesplayed) AS totalgames, SUM(playtimes) AS playtimeTotal, SUM(Kills) AS totalkills
FROM plugin_game
WHERE gamesplayed=1
GROUP BY playerid
ORDER BY totalkills DESC
LIMIT 50
playerid totalgames playtimeTotal totalkills
797749 8 3076 678
53854 8 5982 635
24398 8 3277 575
464657 4 1325 387
65748 4 3390 368
651532 4 3219 354
287378 6 3893 350
753808 4 2565 323
731631 4 1733 256
665338 4 1971 255
569648 2 2041 244
56488 4 2636 157
006985 3 785 93
58640 1 432 72
If i change the LIMIT to 5 it only shows
playerid totalgames playtimeTotal totalkills
797749 8 3076 678
53854 8 5982 635
24398 8 3277 575
464657 4 1325 387
65748 4 3390 368
so if we use 5 games as an example, i only want to get the SUM for the past 5 games for the group
This should work in postgre sql!
SELECT playerid,
SUM(gamesplayed) over w AS totalgames,
SUM(playtimes) over w AS playtimetotal,
SUM(kills) over w AS totalkills,
ROW_NUMBER() over w AS row
FROM plugin_game
window w AS (PARTITION BY playerid ORDER BY totalkills DESC)
WHERE gamesplayed=1 and row <=50

What is the best why to aggregate data for last 7,30,60.. days in SQL

Hi I have a table with date and the number of views that we had in our channel at the same day
date views
03/06/2020 5
08/06/2020 49
09/06/2020 50
10/06/2020 1
13/06/2020 1
16/06/2020 1
17/06/2020 102
23/06/2020 97
29/06/2020 98
07/07/2020 2
08/07/2020 198
12/07/2020 1
14/07/2020 168
23/07/2020 292
No we want to see in each calendar date the sum of the past 7 and 30 days
so the result will be
date sum_of_7d sum_of_30d
01/06/2020 0 0
02/06/2020 0 0
03/06/2020 5 5
04/06/2020 5 5
05/06/2020 5 5
06/06/2020 5 5
07/06/2020 5 5
08/06/2020 54 54
09/06/2020 104 104
10/06/2020 100 105
11/06/2020 100 105
12/06/2020 100 105
13/06/2020 101 106
14/06/2020 101 106
15/06/2020 52 106
16/06/2020 53 107
17/06/2020 105 209
18/06/2020 105 209
so I was wondering what is the best SQL that I can write in order to get it
I'm working on redshift and the actual table (not this example) include over 40B rows
I used to do something like this:
select dates_helper.date
, tbl1.cnt
, sum(tbl1.cnt) over (order by date rows between 7 preceding and current row ) as sum_7d
, sum(tbl1.cnt) over (order by date rows between 30 preceding and current row ) as sum_7d
from bi_db.dates_helper
left join tbl1
on tbl1.invite_date = dates_helper.date

Oracle - Group By Creating Duplicate Rows

I have a query that looks like this:
select nvl(trim(a.code), 'Blanks') as Ward, count(b.apcasekey) as UNSP, count(c.apcasekey) as GRAPH,
count(d.apcasekey) as "ANI/PIG",
(count(b.apcasekey) + count(c.apcasekey) + count(d.apcasekey)) as "TOTAL ACTIVE",
count(a.apcasekey) as "TOTAL OPEN" from (etc...)
group by a.code
order by Ward
The reason I have nvl(trim(a.code), 'Blanks') as Ward is that sometimes a.code is a blank string, sometimes it's a null.
The problem is that when I use the Group By statement, I can't use Ward or I get the error
Ward: Invalid Identifier
I can only use a.code so I get 2 rows for 'Blanks', as per below
1 Blanks 7 0 0 7 7
2 Blanks 23 1 1 25 30
3 W01 75 4 0 79 91
4 W02 62 1 0 63 72
5 W03 140 2 0 142 162
6 W04 6 1 0 7 7
7 W05 46 0 1 47 48
8 W06 322 46 1 369 425
9 W07 91 0 1 92 108
10 W08 93 2 0 95 104
11 W09 28 1 0 29 30
12 W10 25 0 0 25 28
What I need, is for the row with 'Blanks' to combined into 1 row. Little help?
Thanks.
You can not use the alias in the GROUP BY, but you can use the expression that builds the value:
GROUP BY nvl(trim(a.code), 'Blanks')

Getting more data while converting data int to float and doing division and Multiplying with int?

I have three columns as shown in below tableA
Student Day Shifts
129 11 4
91 9 6
166 19 8
164 26 12
146 11 6
147 16 8
201 8 3
164 4 2
186 8 6
165 7 4
171 10 4
104 5 4
1834 134 67
I am writing a tvf to calculate Value of Points generated for Students as below
ALTER function Statagic(
#StartDate date
)
RETURNS TABLE
AS
RETURN
(
with src as
( select
Division=case when Shifts=0 then 0 else cast(Day as float)/cast(Shifts as float) end,*
from TableA
)
,tgt as
(select *,Points=Student*Division from src
)
select * from tgt)
When i execute above tvf(select * from Statagic('3/16/2014'))
My output is below
129 11 4 2.75 354.75
91 9 6 1.5 136.5
166 19 8 2.375 394.25
164 26 12 2.16666666666667 355.333333333333
146 11 6 1.83333333333333 267.666666666667
147 16 8 2 294
201 8 3 2.66666666666667 536
164 4 2 2 328
186 8 6 1.33333333333333 248
165 7 4 1.75 288.75
171 10 4 2.5 427.5
104 5 4 1.25 130
1834 134 67 2 3668
Note :
If you see the last row for three columns in the table is the total of rest column.So when you see the last row in the Output of TVF for last two columns when i am adding i am not getting same data i am getting more.
Guys please help me i am struggling to fix this bug i tried in all ways but i am unable to fix it.
select 354.75+136.5+394.25+355.333333333333+267.666666666667+294+536+328+248+288.75+427.5+130=3760.750000000000
3668 is not euql to 3760.75(I am getting more 100 value)

Update or Select into ORACLE

I am using the following statement;
SELECT RESV_ID, BOOKING_CUS_ID, ACC_ID,
(SELECT F.FLI_PRICE FROM FLIGHT F WHERE F.FLI_ID = R.IN_FLIGHT_ID) AS DEPART_FLIGHT_PRICE,
(SELECT F1.FLI_PRICE FROM FLIGHT F1 WHERE F1.FLI_ID = R.OUT_FLIGHT_ID) AS RETURN_FLIGHT_PRICE,
(SELECT AC.ACC_PRICEPN FROM ACCOMMODATION AC WHERE AC.ACC_ID = R.ACC_ID) AS ACCOMMODATION_PRICE
FROM HOLIDAY_RESERVATION R;
to yield the following results;
RESV_ID BOOKING_CUS_ID ACC_ID DEPART_FLIGHT_PRICE RETURN_FLIGHT_PRICE ACCOMMODATION_PRICE
---------- -------------- ---------- ------------------- ------------------- -------------------
1 1 2 520 450 350
2 3 4 250 150 150
3 5 6 290 300 450
4 7 7 399 450 650
5 9 365 345
6 11 558 460
7 13 250 250
8 15 550 550
9 17 25 250
10 19 19 450
10 rows selected.
Question:
How do I sum up the price fields, SOME PRICES ARE NOT AVAILABLE because a reservation was either made for accommodation only or flight only, hence both values will not be present always and this is where the issue lies
DEPART_FLIGHT_PRICE RETURN_FLIGHT_PRICE ACCOMMODATION_PRICE
Furthermore:
I wish to insert or update the SUM of those three values into a SUBTOTAL in the reservation table, perhaps by using select into or update, I have spent a whole day trying to do this but my skills are limited. any help will be greatly appreciated.
Flight table
FLI_ID FLI_CO FLI_AIRCRA DEPT_AIRPORT ARRV_AIRPORT DEPT_TIME ARRV_TIME FLI_PRICE
1 BD425 Boeing 707 1 12 18-MAR-12 02.24.00 AM 18-MAR-12 06.24.00 AM 520
2 LX345 Beriev 30 6 7 20-MAR-12 03.30.00 PM 20-MAR-12 04.20.00 PM 250
3 NZ4445 Boeing 720 9 14 25-MAR-12 09.00.00 AM 25-MAR-12 05.00.00 PM 290
4 TP351 Boeing 767 10 15 25-MAR-12 11.25.00 AM 25-MAR-12 03.35.00 PM 399
5 BA472 Boeing 720 5 14 26-MAR-12 01.05.00 PM 26-MAR-12 04.15.00 PM 365
Accommodation
ACC_ID ACC_TYPE_CODE ACC_DESC ACC_PRICEPN ACC_ROOMS RESORT_ID ACC_ADDR CITY_ID
1 1 Three bedroom bungalow near theme park 500 3 1
2 1 Two bedroom bungalow next to disney house 350 2 1
3 1 One bedroom bungalow with lake view 250 2 2
4 2 One bedroom chalet near the lake 150 1 2
5 2 Four bedroom chalet near the tree house 600 4 3
Reservation
RESV_ID EMP_ID BOOKING_CUS_ID RESV_DATE HOLIDAY_S HOLIDAY_E IN_FLIGHT_ID OUT_FLIGHT_ID IN_FLIGHT_SEATS_NO OUT_FLIGHT_SEATS_NO ACC_ID SUBTOTAL
1 338 1 16-FEB-12 18-MAR-12 20-APR-12 1 11 2 2 2
2 335 3 10-JAN-12 20-MAR-12 22-APR-12 2 12 2 2 4
3 338 5 05-MAR-12 25-MAR-12 26-APR-12 3 13 2 2 6
4 328 7 02-JAN-12 25-MAR-12 25-APR-12 4 14 2 2 7
5 311 9 20-JAN-12 26-MAR-12 21-APR-12 5 15 2 2
6 317 11 07-JAN-12 27-MAR-12 22-APR-12 6 16 2 2
7 344 13 29-FEB-12 15-MAR-12 12-APR-12 7 17 2 2
8 326 15 11-JAN-12 18-MAR-12 12-APR-14 8 18 2 2
9 329 17 16-JAN-12 19-MAR-12 17-APR-12 25
10 323 19 18-FEB-12 20-MAR-12 21-APR-12 19
Okay I managed to yield the results that i wanted
SELECT HR.RESV_ID, F_IN.FLI_ID, F_IN.FLI_PRICE, F_OUT.FLI_ID, F_OUT.FLI_PRICE, AC.ACC_ID, AC.ACC_PRICEPN, NVL(F_IN.FLI_PRICE,0)+NVL(F_OUT.FLI_PRICE,0)+NVL(AC.ACC_PRICEPN,0) AS TOTAL
FROM HOLIDAY_RESERVATION HR
LEFT JOIN FLIGHT F_IN ON HR.IN_FLIGHT_ID = F_IN.FLI_ID
LEFT JOIN FLIGHT F_OUT ON HR.OUT_FLIGHT_ID = F_OUT.FLI_ID
LEFT JOIN ACCOMMODATION AC ON HR.ACC_ID = AC.ACC_ID
ORDER BY HR.RESV_ID;
YIELDS
RESV_ID FLI_ID FLI_PRICE FLI_ID FLI_PRICE ACC_ID ACC_PRICEPN TOTAL
---------- ---------- ---------- ---------- ---------- ---------- ----------- ----------
1 1 500 11 555 2 350 1405
2 2 150 12 253 4 150 553
3 3 300 13 345 6 450 1095
4 4 450 14 343 7 650 1443
5 5 345 15 242 587
6 6 460 16 460 920
7 7 250 17 250 500
8 8 550 18 550 1100
9 25 250 250
10 19 450 450
And the following statement is to update the reservation table.
Thanks to Leigh Riffel from DBA stackxchange for the following code
UPDATE HOLIDAY_RESERVATION R SET SUBTOTAL =
NVL((SELECT F.FLI_PRICE FROM FLIGHT F WHERE F.FLI_ID = R.IN_FLIGHT_ID), 0) +
NVL((SELECT F.FLI_PRICE FROM FLIGHT F WHERE F.FLI_ID = R.OUT_FLIGHT_ID), 0) +
NVL((SELECT AC.ACC_PRICEPN FROM ACCOMMODATION AC WHERE AC.ACC_ID = R.ACC_ID), 0);
Now the subtotal is populated with the values obtained from the sum performed above >>
RESV_ID EMP_ID BOOKING_CUS_ID RESV_DATE HOLIDAY_S HOLIDAY_E IN_FLIGHT_ID OUT_FLIGHT_ID IN_FLIGHT_SEATS_NO OUT_FLIGHT_SEATS_NO ACC_ID SUBTOTAL
---------- ---------- -------------- --------- --------- --------- ------------ ------------- ------------------ ------------------- ---------- ----------
1 338 1 16-FEB-12 18-MAR-12 20-APR-12 1 11 2 2 2 1405
2 335 3 10-JAN-12 20-MAR-12 22-APR-12 2 12 2 2 4 553
3 338 5 05-MAR-12 25-MAR-12 26-APR-12 3 13 2 2 6 1095
4 328 7 02-JAN-12 25-MAR-12 25-APR-12 4 14 2 2 7 1443
5 311 9 20-JAN-12 26-MAR-12 21-APR-12 5 15 2 2 587
6 317 11 07-JAN-12 27-MAR-12 22-APR-12 6 16 2 2 920
7 344 13 29-FEB-12 15-MAR-12 12-APR-12 7 17 2 2 500
8 326 15 11-JAN-12 18-MAR-12 12-APR-14 8 18 2 2 1100
9 329 17 16-JAN-12 19-MAR-12 17-APR-12 25 250
10 323 19 18-FEB-12 20-MAR-12 21-APR-12 19 450
Subsequently the code was added to a trigger (which was the original intention)
CREATE OR REPLACE TRIGGER HR_SUBTOTAL
BEFORE INSERT OR UPDATE ON HOLIDAY_RESERVATION
FOR EACH ROW
BEGIN
SELECT
NVL((SELECT F.Fli_Price FROM Flight F WHERE F.Fli_ID = :new.In_Flight_ID), 0) +
NVL((SELECT F.Fli_Price FROM Flight F WHERE F.Fli_ID = :new.Out_Flight_ID), 0) +
NVL((SELECT AC.Acc_PricePn FROM Accomodation AC WHERE AC.Acc_ID = :new.Acc_ID), 0)
INTO :new.Subtotal
FROM dual;
END;
/
For the SUM, assuming you want to treat NULL values as 0, you'd just need to do an NVL on the numbers
NVL( DEPART_FLIGHT_PRICE, 0 ) +
NVL( RETURN_FLIGHT_PRICE, 0 ) +
NVL( ACCOMMODATION_PRICE, 0 )
As for the UPDATE, it sounds like you just need a correlated UPDATE statement.
UPDATE reservation r
SET subtotal = (SELECT (SELECT NVL( DEPART_FLIGHT_PRICE, 0 ) +
NVL( RETURN_FLIGHT_PRICE, 0 ) +
NVL( ACCOMMODATION_PRICE, 0 )
FROM (SELECT RESV_ID,
BOOKING_CUS_ID,
ACC_ID,
(SELECT F.FLI_PRICE
FROM FLIGHT F
WHERE F.FLI_ID = R.IN_FLIGHT_ID) AS DEPART_FLIGHT_PRICE,
(SELECT F1.FLI_PRICE
FROM FLIGHT F1
WHERE F1.FLI_ID = R.OUT_FLIGHT_ID) AS RETURN_FLIGHT_PRICE,
(SELECT AC.ACC_PRICEPN
FROM ACCOMMODATION AC
WHERE AC.ACC_ID = R.ACC_ID) AS ACCOMMODATION_PRICE
FROM dual));
You are asking:
How do I sum up the price fields, as you can see some of them can have nulls.
DEPART_FLIGHT_PRICE RETURN_FLIGHT_PRICE ACCOMMODATION_PRICE
Just enclose them in NVL function as follows:
NVL(DEPART_FLIGHT_PRICE, 0)
and then sum them up.
For the second part, what you need is a MERGE statement. A good example is at http://www.oracle-developer.net/display.php?id=203