How can I get the table like this? - sql

I have three tables in my database - campaign, app, revenue.
Campaign:
id | name
----+------
1 | Gis1
2 | Gis2
3 | Gis3
App:
app_id | name | campaign_id
--------+-----------+-------------
1 | Paino | 1
2 | Guitar | 1
3 | DrumPads | 1
4 | Karaoke | 2
5 | Metronome | 3
Revenue:
date | app_id | revenue
------------+--------+-------------------
2018-01-01 | 3 | 78.538551844269
2018-01-01 | 4 | 38.8709466245191
2018-01-01 | 2 | 35.5413845637373
2018-01-01 | 1 | 28.6825649309465
2018-01-01 | 5 | 6.33375584214843
2018-01-02 | 4 | 75.162254483704
2018-01-02 | 1 | 73.2370500155917
2018-01-02 | 5 | 70.4319678991422
2018-01-02 | 2 | 61.6702865774691
2018-01-02 | 3 | 11.7512900955221
2018-01-03 | 3 | 96.3792688491068
2018-01-03 | 4 | 84.3478274916822
2018-01-03 | 2 | 78.6001262071822
2018-01-03 | 5 | 13.8776103129058
2018-01-03 | 1 | 1.68915693074764
2018-01-04 | 5 | 99.4360222634511
2018-01-04 | 4 | 90.1921250023309
2018-01-04 | 3 | 16.5334091972016
2018-01-04 | 2 | 10.5714115940407
2018-01-04 | 1 | 1.35598296965985
2018-01-05 | 5 | 80.2475503409425
2018-01-05 | 4 | 38.9817245329402
2018-01-05 | 2 | 34.409188396027
2018-01-05 | 3 | 20.4833489416672
2018-01-05 | 1 | 2.61399153047812
2018-01-06 | 3 | 87.8649452536831
2018-01-06 | 1 | 74.4561480870284
2018-01-06 | 5 | 21.6574319699022
2018-01-06 | 2 | 4.87542333346478
2018-01-06 | 4 | 1.14697005074565
2018-01-07 | 1 | 87.9779788101898
2018-01-07 | 4 | 77.7294346579956
2018-01-07 | 3 | 59.3464731223967
2018-01-07 | 2 | 40.95148445392
2018-01-07 | 5 | 5.06283105895021
2018-01-08 | 1 | 96.2285605244126
2018-01-08 | 2 | 95.07328406998
2018-01-08 | 3 | 92.0486340327792
2018-01-08 | 4 | 85.379685234924
2018-01-08 | 5 | 9.78507570055686
2018-01-09 | 1 | 62.8192365909115
2018-01-09 | 4 | 62.0064597273823
2018-01-09 | 5 | 48.0621315020228
2018-01-09 | 3 | 29.7547369619939
2018-01-09 | 2 | 12.2752425067087
2018-01-10 | 2 | 81.0502551311092
2018-01-10 | 3 | 48.9698039641851
2018-01-10 | 1 | 17.5580143188766
2018-01-10 | 5 | 16.961404890828
2018-01-10 | 4 | 15.8832169199418
2018-01-11 | 2 | 77.6197753309208
2018-01-11 | 4 | 37.7590440824396
2018-01-11 | 1 | 28.964817136957
2018-01-11 | 3 | 28.706793080089
2018-01-11 | 5 | 26.9639842717711
2018-01-12 | 3 | 87.2789863299996
2018-01-12 | 1 | 78.8013559572292
2018-01-12 | 4 | 57.4583081599463
2018-01-12 | 5 | 48.0822281547709
2018-01-12 | 2 | 0.839615458734033
2018-01-13 | 3 | 69.8766551973526
2018-01-13 | 2 | 58.3078275325981
2018-01-13 | 5 | 21.8336755576109
2018-01-13 | 4 | 11.370240413885
2018-01-13 | 1 | 1.86340769095961
2018-01-14 | 5 | 92.6937944833375
2018-01-14 | 4 | 87.4130741995654
2018-01-14 | 3 | 72.2022209237481
2018-01-14 | 1 | 17.323222911245
2018-01-14 | 2 | 14.1322298298443
2018-01-15 | 2 | 90.8789341373927
2018-01-15 | 4 | 74.78605271702
2018-01-15 | 1 | 65.674207749016
2018-01-15 | 5 | 33.0848315520449
2018-01-15 | 3 | 19.7583865950811
2018-01-16 | 5 | 66.2050914825085
2018-01-16 | 4 | 34.6843542862023
2018-01-16 | 1 | 29.5897929780101
2018-01-16 | 2 | 15.0023649485883
2018-01-16 | 3 | 7.54663420658891
2018-01-17 | 2 | 83.3703723270077
2018-01-17 | 3 | 61.088943523605
2018-01-17 | 4 | 46.5194411862903
2018-01-17 | 5 | 46.462239550764
2018-01-17 | 1 | 16.1838123321874
2018-01-18 | 5 | 78.0041560412725
2018-01-18 | 4 | 30.3052500891844
2018-01-18 | 2 | 29.8116578069311
2018-01-18 | 3 | 5.80476470204397
2018-01-18 | 1 | 2.28775040131831
2018-01-19 | 5 | 94.0447243349086
2018-01-19 | 2 | 93.2593723776554
2018-01-19 | 3 | 86.2968057525727
2018-01-19 | 1 | 42.7138322733396
2018-01-19 | 4 | 22.1327564577787
2018-01-20 | 3 | 98.8579713044872
2018-01-20 | 5 | 64.8200087378497
2018-01-20 | 4 | 64.7727513652878
2018-01-20 | 2 | 39.2598249004273
2018-01-20 | 1 | 25.6178488851919
2018-01-21 | 3 | 84.4040426309011
2018-01-21 | 1 | 52.0713063443698
2018-01-21 | 5 | 41.7424199787255
2018-01-21 | 4 | 35.3389400530059
2018-01-21 | 2 | 28.350741474429
2018-01-22 | 2 | 96.8320321290855
2018-01-22 | 3 | 74.0004402752697
2018-01-22 | 1 | 72.5235460636752
2018-01-22 | 5 | 53.607618058446
2018-01-22 | 4 | 41.3008316635055
2018-01-23 | 2 | 66.6286214457232
2018-01-23 | 3 | 54.1626139019933
2018-01-23 | 5 | 52.5239485716162
2018-01-23 | 4 | 25.7367743326983
2018-01-23 | 1 | 6.46491466744874
2018-01-24 | 2 | 83.5308430627458
2018-01-24 | 1 | 68.6328785122374
2018-01-24 | 4 | 55.6973785257225
2018-01-24 | 3 | 46.0264499615527
2018-01-24 | 5 | 16.4651600203735
2018-01-25 | 4 | 80.9564163429763
2018-01-25 | 5 | 62.5899942406707
2018-01-25 | 1 | 59.0336831992662
2018-01-25 | 2 | 46.4030509765701
2018-01-25 | 3 | 22.6888680448289
2018-01-26 | 4 | 76.5099290710172
2018-01-26 | 3 | 53.933127563048
2018-01-26 | 5 | 49.5466520893498
2018-01-26 | 2 | 45.1699294234721
2018-01-26 | 1 | 21.3764512981173
2018-01-27 | 1 | 90.5434132585012
2018-01-27 | 4 | 67.0016445981484
2018-01-27 | 3 | 11.2431627841556
2018-01-27 | 2 | 5.39719616685773
2018-01-27 | 5 | 2.11776835627748
2018-01-28 | 2 | 53.3541751891504
2018-01-28 | 1 | 32.9596394913923
2018-01-28 | 3 | 21.1895497351378
2018-01-28 | 4 | 16.2897762555689
2018-01-28 | 5 | 5.34709359321544
2018-01-29 | 1 | 64.5439256676011
2018-01-29 | 2 | 15.9776125576869
2018-01-29 | 4 | 11.0105036902667
2018-01-29 | 3 | 2.16601788703412
2018-01-29 | 5 | 0.555523083910259
2018-01-30 | 4 | 94.7839147312857
2018-01-30 | 5 | 72.6621727991897
2018-01-30 | 3 | 70.124043314061
2018-01-30 | 2 | 34.5961079425723
2018-01-30 | 1 | 33.2888204556319
2018-01-31 | 3 | 77.7231288650421
2018-01-31 | 1 | 56.8044673174345
2018-01-31 | 4 | 43.5046513642636
2018-01-31 | 5 | 41.5792942791069
2018-01-31 | 2 | 25.5788387345906
2018-02-01 | 3 | 95.2107725320766
2018-02-01 | 4 | 86.3486448391141
2018-02-01 | 5 | 78.3239590582078
2018-02-01 | 2 | 39.1536975881585
2018-02-01 | 1 | 36.0675078797763
2018-02-02 | 5 | 97.8803713050822
2018-02-02 | 1 | 79.4247662701352
2018-02-02 | 2 | 26.3779061699958
2018-02-02 | 3 | 22.4354942949645
2018-02-02 | 4 | 13.2603534317112
2018-02-03 | 3 | 96.1323726327063
2018-02-03 | 5 | 59.6632595622737
2018-02-03 | 1 | 27.389807545151
2018-02-03 | 2 | 7.76389782111102
2018-02-03 | 4 | 0.969840948318645
2018-02-04 | 3 | 75.3978559173567
2018-02-04 | 5 | 49.3882938530803
2018-02-04 | 2 | 39.1100010374179
2018-02-04 | 4 | 35.8242148224422
2018-02-04 | 1 | 7.23734382101905
2018-02-05 | 4 | 75.3672510776635
2018-02-05 | 5 | 64.5369740371526
2018-02-05 | 3 | 51.5082265591993
2018-02-05 | 1 | 32.3788448578061
2018-02-05 | 2 | 21.4472612365463
2018-02-06 | 3 | 53.4502002775965
2018-02-06 | 2 | 53.0717656757934
2018-02-06 | 1 | 40.8672220798649
2018-02-06 | 4 | 37.839976598642
2018-02-06 | 5 | 9.12020377129901
2018-02-07 | 5 | 97.4855788083418
2018-02-07 | 4 | 97.4608054761709
2018-02-07 | 1 | 95.6723225551752
2018-02-07 | 3 | 87.9714358507064
2018-02-07 | 2 | 38.2405435002047
2018-02-08 | 4 | 82.9874314133669
2018-02-08 | 5 | 82.6651133226406
2018-02-08 | 1 | 69.3052440890685
2018-02-08 | 2 | 51.1343060185741
2018-02-08 | 3 | 25.5081553094595
2018-02-09 | 2 | 77.6589355538231
2018-02-09 | 5 | 74.7649757096248
2018-02-09 | 4 | 74.0052834670764
2018-02-09 | 1 | 37.58471748555
2018-02-09 | 3 | 9.52726961562965
2018-02-10 | 4 | 63.5114625028904
2018-02-10 | 1 | 57.6003561091767
2018-02-10 | 3 | 33.8354238124814
2018-02-10 | 5 | 24.755497452165
2018-02-10 | 2 | 6.09719410861046
2018-02-11 | 3 | 99.3200679204704
2018-02-11 | 4 | 92.787953262445
2018-02-11 | 2 | 75.7916875546417
2018-02-11 | 1 | 74.1264023056354
2018-02-11 | 5 | 39.0543105010909
2018-02-12 | 4 | 73.9016911300489
2018-02-12 | 2 | 36.8834180654883
2018-02-12 | 3 | 30.824684325787
2018-02-12 | 5 | 29.1559120548307
2018-02-12 | 1 | 10.1162943083399
2018-02-13 | 1 | 88.0975197801571
2018-02-13 | 3 | 73.3753659668181
2018-02-13 | 4 | 63.0762892472857
2018-02-13 | 5 | 35.8151357458788
2018-02-13 | 2 | 13.4014942840453
2018-02-14 | 1 | 94.8739671484573
2018-02-14 | 2 | 91.6415916160249
2018-02-14 | 5 | 66.2281593912018
2018-02-14 | 4 | 42.94700050317
2018-02-14 | 3 | 26.5246491333787
2018-02-15 | 2 | 98.7486846642082
2018-02-15 | 5 | 69.6182587287506
2018-02-15 | 3 | 44.6821718318301
2018-02-15 | 4 | 21.9568740682904
2018-02-15 | 1 | 15.374522578894
2018-02-16 | 2 | 94.3365941896695
2018-02-16 | 1 | 53.269122319394
2018-02-16 | 3 | 39.6046035126169
2018-02-16 | 4 | 37.622514510779
2018-02-16 | 5 | 31.3474270053205
2018-02-17 | 3 | 70.0631248181593
2018-02-17 | 5 | 50.1262781461011
2018-02-17 | 2 | 43.9279952731992
2018-02-17 | 1 | 28.2582849814117
2018-02-17 | 4 | 21.0913544631149
2018-02-18 | 3 | 74.8909778287795
2018-02-18 | 2 | 74.2363801582102
2018-02-18 | 5 | 72.4878600270842
2018-02-18 | 1 | 25.6855071233935
2018-02-18 | 4 | 0.37039199763309
2018-02-19 | 3 | 83.3856751613489
2018-02-19 | 5 | 46.4974932948942
2018-02-19 | 2 | 6.43301299768522
2018-02-19 | 4 | 4.81320557633388
2018-02-19 | 1 | 2.15515010060456
2018-02-20 | 4 | 81.4230771798843
2018-02-20 | 5 | 57.7265346180577
2018-02-20 | 1 | 56.2984247130064
2018-02-20 | 2 | 49.0169450043801
2018-02-20 | 3 | 46.5627217436774
2018-02-21 | 1 | 96.5297614033189
2018-02-21 | 5 | 96.2494094090932
2018-02-21 | 3 | 31.3462847216426
2018-02-21 | 4 | 23.2941891242544
2018-02-21 | 2 | 19.9083254355315
2018-02-22 | 1 | 79.0770313884165
2018-02-22 | 2 | 64.9973229306064
2018-02-22 | 3 | 55.3855288854335
2018-02-22 | 4 | 53.814505037514
2018-02-22 | 5 | 24.401256997123
2018-02-23 | 3 | 94.6754099868804
2018-02-23 | 1 | 52.4266618064681
2018-02-23 | 5 | 43.3877704733184
2018-02-23 | 2 | 23.3815439158117
2018-02-23 | 4 | 5.92925014836784
2018-02-24 | 4 | 82.3691566567076
2018-02-24 | 3 | 59.14386332869
2018-02-24 | 1 | 56.3529858789623
2018-02-24 | 5 | 17.7818909222602
2018-02-24 | 2 | 8.08320409409884
2018-02-25 | 1 | 51.144611434977
2018-02-25 | 4 | 32.6423341915492
2018-02-25 | 2 | 25.7686248507202
2018-02-25 | 3 | 3.33917220111982
2018-02-25 | 5 | 1.98348143815742
2018-02-26 | 5 | 95.2717564467113
2018-02-26 | 2 | 89.9541470672166
2018-02-26 | 4 | 73.8019448592861
2018-02-26 | 3 | 41.1512130216618
2018-02-26 | 1 | 36.3474907902939
2018-02-27 | 5 | 79.2906637385048
2018-02-27 | 4 | 62.3354455191908
2018-02-27 | 2 | 41.5109752476831
2018-02-27 | 1 | 18.9144882775624
2018-02-27 | 3 | 2.1427167667481
2018-02-28 | 4 | 85.4665146107167
2018-02-28 | 3 | 46.1527380247259
2018-02-28 | 2 | 22.3016369603851
2018-02-28 | 1 | 7.070596022248
2018-02-28 | 5 | 4.55199247079415
As a result of query I need to get table, that contained four columns:
period - A period by months and weeks = January, February, Week 1, Week 2,..., Week 9
gis1 - revenue by corresponding period of company Gis1
gis2 - the same for Gis2
gis3 - the same for Gis3
Here is source files.
I wrote such query that creates tables in database and gives revenue by company Gis1 by weeks:
CREATE TABLE campaign
(
id INT PRIMARY KEY,
name VARCHAR
);
COPY campaign (id, name) FROM '/home/leonid/Campaign.csv' DELIMITER ';' CSV HEADER;
CREATE TABLE app
(
app_id INT PRIMARY KEY,
name VARCHAR,
campaign_id INT REFERENCES campaign (id)
);
COPY app (app_id, name, campaign_id) FROM '/home/leonid/App.csv' DELIMITER ';' CSV HEADER;
CREATE TABLE revenue
(
date DATE,
app_id INT REFERENCES app (app_id),
revenue DOUBLE PRECISION
);
COPY revenue (date, app_id, revenue) FROM '/home/leonid/Revenue.csv' DELIMITER ';' CSV HEADER;
ALTER TABLE revenue
ADD COLUMN N SERIAL PRIMARY KEY;
SELECT DISTINCT EXTRACT (WEEK FROM r.date) AS Week, SUM (r.revenue) AS Gis1
FROM revenue r
JOIN app a ON r.app_id = a.app_id
JOIN campaign c ON a.campaign_id = c.id
WHERE c.name = 'Gis1'
GROUP BY Week
ORDER BY Week;
Maybe I need to use crosstab?

Disclaimer: Not really sure if I understood your problem. But it seems that you are just want to achieve a little pivot:
demo:db<>fiddle
WITH alldata AS ( -- 1
SELECT
r.revenue,
c.name,
EXTRACT('isoyear' FROM date) as year, -- 2
to_char(date, 'Month') as month, -- 3
EXTRACT('week' FROM date) as week -- 4
FROM
revenue r
JOIN app a ON a.app_id = r.app_id
JOIN campaign c ON c.id = a.campaign_id
)
SELECT
month || ' ' || week as period, -- 5
SUM(revenue) FILTER (WHERE name = 'Gis1') as gis1, -- 7
SUM(revenue) FILTER (WHERE name = 'Gis2') as gis2,
SUM(revenue) FILTER (WHERE name = 'Gis3') as gis2
FROM
alldata
GROUP BY year, month, week -- 6
ORDER BY year, week
The result (for random extract of your too huge sample data):
| period | gis1 | gis2 | gis2 |
|-------------|--------------------|--------------------|--------------------|
| January 1 | 690.6488198608687 | 192.90960581696436 | 103.48598677014304 |
| January 2 | 377.6251679591726 | 85.379685234924 | 9.78507570055686 |
| January 3 | 303.6608544533801 | 121.3054939033103 | 124.4663955920365 |
| January 4 | 59.0336831992662 | 80.9564163429763 | 62.5899942406707 |
| February 5 | 123.5221801778573 | (null) | 59.6632595622737 |
| February 6 | 368.2516021023734 | 175.3567225686088 | 182.1855864844304 |
| February 7 | 368.9193547506641 | 21.0913544631149 | 122.6141381731853 |
| February 8 | 154.03324156166846 | 81.4230771798843 | 57.7265346180577 |
| February 9 | 174.5234469014203 | 73.8019448592861 | 179.11441265601024 |
WITH clause calculates the whole table. Could be done within a subquery as well.
Calculates the year out of the date. isoyear instead of year to assure that there are no problems with the first week. isoyear defines exactly when the first week is ("the first Thursday of a year is in week 1 of that year.")
Get the month name
Get the week
Creating a "period" out of month name and week number
Grouping by the period (grouping by year and week mainly).
Doing the pivot using the FILTER clause. With that you are able to filter what you wish to aggregate. The company in your example.

Related

Redshift SQL - Count Sequences of Repeating Values Within Groups

I have a table that looks like this:
| id | date_start | gap_7_days |
| -- | ------------------- | --------------- |
| 1 | 2021-06-10 00:00:00 | 0 |
| 1 | 2021-06-13 00:00:00 | 0 |
| 1 | 2021-06-19 00:00:00 | 0 |
| 1 | 2021-06-27 00:00:00 | 0 |
| 2 | 2021-07-04 00:00:00 | 1 |
| 2 | 2021-07-11 00:00:00 | 1 |
| 2 | 2021-07-18 00:00:00 | 1 |
| 2 | 2021-07-25 00:00:00 | 1 |
| 2 | 2021-08-01 00:00:00 | 1 |
| 2 | 2021-08-08 00:00:00 | 1 |
| 2 | 2021-08-09 00:00:00 | 0 |
| 2 | 2021-08-16 00:00:00 | 1 |
| 2 | 2021-08-23 00:00:00 | 1 |
| 2 | 2021-08-30 00:00:00 | 1 |
| 2 | 2021-08-31 00:00:00 | 0 |
| 2 | 2021-09-01 00:00:00 | 0 |
| 2 | 2021-08-08 00:00:00 | 1 |
| 2 | 2021-08-15 00:00:00 | 1 |
| 2 | 2021-08-22 00:00:00 | 1 |
| 2 | 2021-08-23 00:00:00 | 1 |
For each ID, I check whether consecutive date_start values are 7 days apart, and put a 1 or 0 in gap_7_days accordingly.
I want to do the following (using Redshift SQL only):
Get the length of each sequence of consecutive 1s in gap_7_days for each ID
Expected output:
| id | date_start | gap_7_days | sequence_length |
| -- | ------------------- | --------------- | --------------- |
| 1 | 2021-06-10 00:00:00 | 0 | |
| 1 | 2021-06-13 00:00:00 | 0 | |
| 1 | 2021-06-19 00:00:00 | 0 | |
| 1 | 2021-06-27 00:00:00 | 0 | |
| 2 | 2021-07-04 00:00:00 | 1 | 6 |
| 2 | 2021-07-11 00:00:00 | 1 | 6 |
| 2 | 2021-07-18 00:00:00 | 1 | 6 |
| 2 | 2021-07-25 00:00:00 | 1 | 6 |
| 2 | 2021-08-01 00:00:00 | 1 | 6 |
| 2 | 2021-08-08 00:00:00 | 1 | 6 |
| 2 | 2021-08-09 00:00:00 | 0 | |
| 2 | 2021-08-16 00:00:00 | 1 | 3 |
| 2 | 2021-08-23 00:00:00 | 1 | 3 |
| 2 | 2021-08-30 00:00:00 | 1 | 3 |
| 2 | 2021-08-31 00:00:00 | 0 | |
| 2 | 2021-09-01 00:00:00 | 0 | |
| 2 | 2021-08-08 00:00:00 | 1 | 4 |
| 2 | 2021-08-15 00:00:00 | 1 | 4 |
| 2 | 2021-08-22 00:00:00 | 1 | 4 |
| 2 | 2021-08-23 00:00:00 | 1 | 4 |
Get the number of sequences for each ID
Expected output:
| id | num_sequences |
| -- | ------------------- |
| 1 | 0 |
| 2 | 3 |
How can I achieve this?
If you want the number of sequences, just look at the previous value. When the current value is "1" and the previous is NULL or 0, then you have a new sequence.
So:
select id,
sum( (gap_7_days = 1 and coalesce(prev_gap_7_days, 0) = 0)::int ) as num_sequences
from (select t.*,
lag(gap_7_days) over (partition by id order by date_start) as prev_gap_7_days
from t
) t
group by id;
If you actually want the lengths of the sequences, as in the intermediate results, then ask a new question. That information is not needed for this question.

Group by bursts of occurences in TimescaleDB/PostgreSQL

this is my first question in stackoverflow, any advice on how to ask a well structured question will be welcomed.
So, I have a TimescaleDB database, which is time-series databases built over Postgres. It has most of its functionalities, so if any of you don't know about Timescale it won't be an issue.
I have a select statement which returns:
time | num_issues | actor_login
------------------------+------------+------------------
2015-11-10 01:00:00+01 | 2 | nifl
2015-12-10 01:00:00+01 | 1 | anandtrex
2016-01-09 01:00:00+01 | 1 | isaacrg
2016-02-08 01:00:00+01 | 1 | timbarclay
2016-06-07 02:00:00+02 | 1 | kcalmes
2016-07-07 02:00:00+02 | 1 | cassiozen
2016-08-06 02:00:00+02 | 13 | phae
2016-09-05 02:00:00+02 | 2 | phae
2016-10-05 02:00:00+02 | 13 | cassiozen
2016-11-04 01:00:00+01 | 6 | cassiozen
2016-12-04 01:00:00+01 | 4 | cassiozen
2017-01-03 01:00:00+01 | 5 | cassiozen
2017-02-02 01:00:00+01 | 8 | cassandraoid
2017-03-04 01:00:00+01 | 16 | erquhart
2017-04-03 02:00:00+02 | 3 | erquhart
2017-05-03 02:00:00+02 | 9 | erquhart
2017-06-02 02:00:00+02 | 5 | erquhart
2017-07-02 02:00:00+02 | 2 | greatwarlive
2017-08-01 02:00:00+02 | 8 | tech4him1
2017-08-31 02:00:00+02 | 7 | tech4him1
2017-09-30 02:00:00+02 | 17 | erquhart
2017-10-30 01:00:00+01 | 7 | erquhart
2017-11-29 01:00:00+01 | 12 | erquhart
2017-12-29 01:00:00+01 | 8 | tech4him1
2018-01-28 01:00:00+01 | 6 | ragasirtahk
And it follows. Basically it returns a username in a bucket of time, in this case 30 days.
The SQL query is:
SELECT DISTINCT ON(time_bucket('30 days', created_at))
time_bucket('30 days', created_at) as time,
count(id) as num_issues,
actor_login
FROM
issues_event
WHERE action = 'opened' AND repo_name='netlify/netlify-cms'
group by time, actor_login
order by time, num_issues DESC
My question is, how can i detect or group the rows which have equal actor_login and are consecutive.
For example, I would like to group the cassiozen from 2016-10-05 to 2017-01-03, but not with the other cassiozen of the column.
I have tried with auxiliar columns, with window functions such as LAG, but without a function or a do statement I don't think it is possible.
I also tried with functions but I can't find a way.
Any approach, idea or solution will be fully appreciated.
Edit: I show my desired output.
time | num_issues | actor_login | actor_group_id
------------------------+------------+------------------+----------------
2015-11-10 01:00:00+01 | 2 | nifl | 0
2015-12-10 01:00:00+01 | 1 | anandtrex | 1
2016-01-09 01:00:00+01 | 1 | isaacrg | 2
2016-02-08 01:00:00+01 | 1 | timbarclay | 3
2016-06-07 02:00:00+02 | 1 | kcalmes | 4
2016-07-07 02:00:00+02 | 1 | cassiozen | 5
2016-08-06 02:00:00+02 | 13 | phae | 6
2016-09-05 02:00:00+02 | 2 | phae | 6
2016-10-05 02:00:00+02 | 13 | cassiozen | 7
2016-11-04 01:00:00+01 | 6 | cassiozen | 7
2016-12-04 01:00:00+01 | 4 | cassiozen | 7
2017-01-03 01:00:00+01 | 5 | cassiozen | 7
2017-02-02 01:00:00+01 | 8 | cassandraoid | 12
2017-03-04 01:00:00+01 | 16 | erquhart | 13
2017-04-03 02:00:00+02 | 3 | erquhart | 13
2017-05-03 02:00:00+02 | 9 | erquhart | 13
2017-06-02 02:00:00+02 | 5 | erquhart | 13
2017-07-02 02:00:00+02 | 2 | greatwarlive | 17
2017-08-01 02:00:00+02 | 8 | tech4him1 | 18
2017-08-31 02:00:00+02 | 7 | tech4him1 | 18
2017-09-30 02:00:00+02 | 17 | erquhart | 16
2017-10-30 01:00:00+01 | 7 | erquhart | 16
2017-11-29 01:00:00+01 | 12 | erquhart | 16
2017-12-29 01:00:00+01 | 8 | tech4him1 | 21
2018-01-28 01:00:00+01 | 6 | ragasirtahk | 24
The solution of MatBaille is almost perfect.
I just wanted to group the consecutive actors like this so I could extract a bunch of metrics with other attributes of the table.
You could use a so-called "gaps-and-islands" approach
WITH
sorted AS
(
SELECT
*,
ROW_NUMBER() OVER ( ORDER BY time) AS rn,
ROW_NUMBER() OVER (PARTITION BY actor_login ORDER BY time) AS rn_actor
FROM
your_results
)
SELECT
*,
rn - rn_actor AS actor_group_id
FROM
sorted
Then the combination of (actor_login, actor_group_id) will group consecutive rows together.
db<>fiddle demo

Aggregating tsrange values into day buckets with a tie-breaker

So I've got a schema that lets people donate $ to a set of organizations, and that donation is tied to a certain arbitrary period of time. I'm working on a report that looks at each day, and for each organization shows the total number of donations and the total cumulative value of those donations for that organization's day.
For example, here's a mockup of 3 donors, Alpha (orange), Bravo (green), and Charlie (Blue) donating to 2 different organizations (Foo and Bar) over various time periods:
I've created a SQLFiddle that implements the above example in a schema that somewhat reflects what I'm working with in reality: http://sqlfiddle.com/#!17/88969/1
(The schema is broken out into more tables than what you'd come up with given the problem statement to better reflect the real-life version I'm working with)
So far, the query that I've managed to put together looks like this:
WITH report_dates AS (
SELECT '2018-01-01'::date + g AS date
FROM generate_series(0, 14) g
), organizations AS (
SELECT id AS organization_id FROM users
WHERE type = 'Organization'
)
SELECT * FROM report_dates rd
CROSS JOIN organizations o
LEFT JOIN LATERAL (
SELECT
COALESCE(sum(doa.amount_cents), 0) AS total_donations_cents,
COALESCE(count(doa.*), 0) AS total_donors
FROM users
LEFT JOIN donor_organization_amounts doa ON doa.organization_id = users.id
LEFT JOIN donor_amounts da ON da.id = doa.donor_amounts_id
LEFT JOIN donor_schedules ds ON ds.donor_amounts_id = da.id
WHERE (users.id = o.organization_id) AND (ds.period && tsrange(rd.date::timestamp, rd.date::timestamp + INTERVAL '1 day', '[)'))
) o2 ON true;
With the results looking like this:
| date | organization_id | total_donations_cents | total_donors |
|------------|-----------------|-----------------------|--------------|
| 2018-01-01 | 1 | 0 | 0 |
| 2018-01-02 | 1 | 250 | 1 |
| 2018-01-03 | 1 | 250 | 1 |
| 2018-01-04 | 1 | 1750 | 3 |
| 2018-01-05 | 1 | 1750 | 3 |
| 2018-01-06 | 1 | 1750 | 3 |
| 2018-01-07 | 1 | 750 | 2 |
| 2018-01-08 | 1 | 850 | 2 |
| 2018-01-09 | 1 | 850 | 2 |
| 2018-01-10 | 1 | 500 | 1 |
| 2018-01-11 | 1 | 500 | 1 |
| 2018-01-12 | 1 | 500 | 1 |
| 2018-01-13 | 1 | 1500 | 2 |
| 2018-01-14 | 1 | 1000 | 1 |
| 2018-01-15 | 1 | 0 | 0 |
| 2018-01-01 | 2 | 0 | 0 |
| 2018-01-02 | 2 | 250 | 1 |
| 2018-01-03 | 2 | 250 | 1 |
| 2018-01-04 | 2 | 1750 | 2 |
| 2018-01-05 | 2 | 1750 | 2 |
| 2018-01-06 | 2 | 1750 | 2 |
| 2018-01-07 | 2 | 1750 | 2 |
| 2018-01-08 | 2 | 2000 | 2 |
| 2018-01-09 | 2 | 2000 | 2 |
| 2018-01-10 | 2 | 1500 | 1 |
| 2018-01-11 | 2 | 1500 | 1 |
| 2018-01-12 | 2 | 0 | 0 |
| 2018-01-13 | 2 | 1000 | 2 |
| 2018-01-14 | 2 | 500 | 1 |
| 2018-01-15 | 2 | 0 | 0 |
That's pretty close, however the problem with this query is that on days where a donation ends and that same donor begins a new one, it should only count that donor's donation one time, using the higher amount donation as a tie-breaker for the cumulative $ count. An example of that is on 2018-01-13 for organization Foo: total_donors should be 1 and total_donations_cents 1000.
I tried to implement a tie-breaker for using DISTINCT ON but I got off into the weeds... any help would be appreciated!
Also, should I be worried about the performance implications of my implementation so far, given the CTEs and the CROSS JOIN?
Figured it out using DISTINCT ON: http://sqlfiddle.com/#!17/88969/4
WITH report_dates AS (
SELECT '2018-01-01'::date + g AS date
FROM generate_series(0, 14) g
), organizations AS (
SELECT id AS organization_id FROM users
WHERE type = 'Organization'
), donors_by_date AS (
SELECT * FROM report_dates rd
CROSS JOIN organizations o
LEFT JOIN LATERAL (
SELECT DISTINCT ON (date, da.donor_id)
da.donor_id,
doa.id,
doa.donor_amounts_id,
doa.amount_cents
FROM users
LEFT JOIN donor_organization_amounts doa ON doa.organization_id = users.id
LEFT JOIN donor_amounts da ON da.id = doa.donor_amounts_id
LEFT JOIN donor_schedules ds ON ds.donor_amounts_id = da.id
WHERE (users.id = o.organization_id) AND (ds.period && tsrange(rd.date::timestamp, rd.date::timestamp + INTERVAL '1 day', '[)'))
ORDER BY date, da.donor_id, doa.amount_cents DESC
) foo ON true
)
SELECT
date,
organization_id,
COALESCE(SUM(amount_cents), 0) AS total_donations_cents,
COUNT(*) FILTER (WHERE donor_id IS NOT NULL) AS total_donors
FROM donors_by_date
GROUP BY date, organization_id
ORDER BY organization_id, date;
Result:
| date | organization_id | total_donations_cents | total_donors |
|------------|-----------------|-----------------------|--------------|
| 2018-01-01 | 1 | 0 | 0 |
| 2018-01-02 | 1 | 250 | 1 |
| 2018-01-03 | 1 | 250 | 1 |
| 2018-01-04 | 1 | 1750 | 3 |
| 2018-01-05 | 1 | 1750 | 3 |
| 2018-01-06 | 1 | 1750 | 3 |
| 2018-01-07 | 1 | 750 | 2 |
| 2018-01-08 | 1 | 850 | 2 |
| 2018-01-09 | 1 | 850 | 2 |
| 2018-01-10 | 1 | 500 | 1 |
| 2018-01-11 | 1 | 500 | 1 |
| 2018-01-12 | 1 | 500 | 1 |
| 2018-01-13 | 1 | 1000 | 1 |
| 2018-01-14 | 1 | 1000 | 1 |
| 2018-01-15 | 1 | 0 | 0 |
| 2018-01-01 | 2 | 0 | 0 |
| 2018-01-02 | 2 | 250 | 1 |
| 2018-01-03 | 2 | 250 | 1 |
| 2018-01-04 | 2 | 1750 | 2 |
| 2018-01-05 | 2 | 1750 | 2 |
| 2018-01-06 | 2 | 1750 | 2 |
| 2018-01-07 | 2 | 1750 | 2 |
| 2018-01-08 | 2 | 2000 | 2 |
| 2018-01-09 | 2 | 2000 | 2 |
| 2018-01-10 | 2 | 1500 | 1 |
| 2018-01-11 | 2 | 1500 | 1 |
| 2018-01-12 | 2 | 0 | 0 |
| 2018-01-13 | 2 | 1000 | 2 |
| 2018-01-14 | 2 | 500 | 1 |
| 2018-01-15 | 2 | 0 | 0 |

Join and fill down NULL values with last non null

I have been attempting a join that, at first I believed was relatively simple, but am now having a bit of trouble getting it exactly right. I have two sets of data which resemble the following
ID | stmt_dt ID | renewal_dt
-- --
1 |1/31/15 1 | 2/28/15
1 |2/28/15 1 | 4/30/15
1 |3/31/15 2 | 2/28/15
1 |4/30/15 3 | 1/31/15
1 |5/31/15
2 |1/31/15
2 |2/28/15
2 |3/31/15
2 |4/30/15
2 |5/31/15
3 |1/31/15
3 |2/28/15
3 |3/31/15
3 |4/30/15
3 |5/31/15
4 |1/31/15
4 |2/28/15
4 |3/31/15
4 |4/30/15
4 |5/31/15
Here is my desired output
ID | stmt_dt | renewal_dt
--
1 |1/31/15 | NA
1 |2/28/15 | 2/28/15
1 |3/31/15 | 2/28/15
1 |4/30/15 | 4/30/15
1 |5/31/15 | 4/30/15
2 |1/31/15 | NA
2 |2/28/15 | 2/28/15
2 |3/31/15 | 2/28/15
2 |4/30/15 | 2/28/15
2 |5/31/15 | 2/28/15
3 |1/31/15 | 1/31/15
3 |2/28/15 | 1/31/15
3 |3/31/15 | 1/31/15
3 |4/30/15 | 1/31/15
3 |5/31/15 | 1/31/15
4 |1/31/15 | NA
4 |2/28/15 | NA
4 |3/31/15 | NA
4 |4/30/15 | NA
4 |5/31/15 | NA
My biggest issue has been getting the merged values to fill down to the next non null within each group. Any ideas on how to achieve this join? Thanks!
min(...) over (... rows between 1 following and 1 following)* + join
* = LEAD
select s.ID
,s.stmt_dt
,r.renewal_dt
from stmt s
left join (select ID
,renewal_dt
,min (renewal_dt) over
(
partition by ID
order by renewal_dt
rows between 1 following
and 1 following
) as next_renewal_dt
from renewal
) r
on s.ID = r.ID
and s.stmt_dt >= r.renewal_dt
and s.stmt_dt < coalesce (r.next_renewal_dt,date '9999-01-01')
order by s.ID
,s.stmt_dt
+----+------------+------------+
| ID | stmt_dt | renewal_dt |
+----+------------+------------+
| 1 | 2015-01-31 | |
| 1 | 2015-02-28 | 2015-02-28 |
| 1 | 2015-03-31 | 2015-02-28 |
| 1 | 2015-04-30 | 2015-04-30 |
| 1 | 2015-05-31 | 2015-04-30 |
| 2 | 2015-01-31 | |
| 2 | 2015-02-28 | 2015-02-28 |
| 2 | 2015-03-31 | 2015-02-28 |
| 2 | 2015-04-30 | 2015-02-28 |
| 2 | 2015-05-31 | 2015-02-28 |
| 3 | 2015-01-31 | 2015-01-31 |
| 3 | 2015-02-28 | 2015-01-31 |
| 3 | 2015-03-31 | 2015-01-31 |
| 3 | 2015-04-30 | 2015-01-31 |
| 3 | 2015-05-31 | 2015-01-31 |
| 4 | 2015-01-31 | |
| 4 | 2015-02-28 | |
| 4 | 2015-03-31 | |
| 4 | 2015-04-30 | |
| 4 | 2015-05-31 | |
+----+------------+------------+
union all + last_value
select ID
,dt as stmt_dt
,last_value (case when tab = 'R' then dt end ignore nulls) over
(
partition by id
order by dt
,case tab when 'R' then 1 else 2 end
) as renewal_dt
from ( select 'S',ID,stmt_dt from stmt
union all select 'R',ID,renewal_dt from renewal
) as t (tab,ID,dt)
qualify tab = 'S'
order by ID
,stmt_dt
+----+------------+------------+
| ID | stmt_dt | renewal_dt |
+----+------------+------------+
| 1 | 2015-01-31 | |
| 1 | 2015-02-28 | 2015-02-28 |
| 1 | 2015-03-31 | 2015-02-28 |
| 1 | 2015-04-30 | 2015-04-30 |
| 1 | 2015-05-31 | 2015-04-30 |
| 2 | 2015-01-31 | |
| 2 | 2015-02-28 | 2015-02-28 |
| 2 | 2015-03-31 | 2015-02-28 |
| 2 | 2015-04-30 | 2015-02-28 |
| 2 | 2015-05-31 | 2015-02-28 |
| 3 | 2015-01-31 | 2015-01-31 |
| 3 | 2015-02-28 | 2015-01-31 |
| 3 | 2015-03-31 | 2015-01-31 |
| 3 | 2015-04-30 | 2015-01-31 |
| 3 | 2015-05-31 | 2015-01-31 |
| 4 | 2015-01-31 | |
| 4 | 2015-02-28 | |
| 4 | 2015-03-31 | |
| 4 | 2015-04-30 | |
| 4 | 2015-05-31 | |
+----+------------+------------+
SELECT correlated query
select s.ID
,s.stmt_dt
,(
select max (r.renewal_dt)
from renewal r
where r.ID = s.ID
and r.renewal_dt <= s.stmt_dt
) as renewal_dt
from stmt s
order by ID
,stmt_dt
+----+------------+------------+
| ID | stmt_dt | renewal_dt |
+----+------------+------------+
| 1 | 2015-01-31 | |
| 1 | 2015-02-28 | 2015-02-28 |
| 1 | 2015-03-31 | 2015-02-28 |
| 1 | 2015-04-30 | 2015-04-30 |
| 1 | 2015-05-31 | 2015-04-30 |
| 2 | 2015-01-31 | |
| 2 | 2015-02-28 | 2015-02-28 |
| 2 | 2015-03-31 | 2015-02-28 |
| 2 | 2015-04-30 | 2015-02-28 |
| 2 | 2015-05-31 | 2015-02-28 |
| 3 | 2015-01-31 | 2015-01-31 |
| 3 | 2015-02-28 | 2015-01-31 |
| 3 | 2015-03-31 | 2015-01-31 |
| 3 | 2015-04-30 | 2015-01-31 |
| 3 | 2015-05-31 | 2015-01-31 |
| 4 | 2015-01-31 | |
| 4 | 2015-02-28 | |
| 4 | 2015-03-31 | |
| 4 | 2015-04-30 | |
| 4 | 2015-05-31 | |
+----+------------+------------+

Count rows each month of a year - SQL Server

I have a table "Product" as :
| ProductId | ProductCatId | Price | Date | Deadline |
--------------------------------------------------------------------
| 1 | 1 | 10.00 | 2016-01-01 | 2016-01-27 |
| 2 | 2 | 10.00 | 2016-02-01 | 2016-02-27 |
| 3 | 3 | 10.00 | 2016-03-01 | 2016-03-27 |
| 4 | 1 | 10.00 | 2016-04-01 | 2016-04-27 |
| 5 | 3 | 10.00 | 2016-05-01 | 2016-05-27 |
| 6 | 3 | 10.00 | 2016-06-01 | 2016-06-27 |
| 7 | 1 | 20.00 | 2016-01-01 | 2016-01-27 |
| 8 | 2 | 30.00 | 2016-02-01 | 2016-02-27 |
| 9 | 1 | 40.00 | 2016-03-01 | 2016-03-27 |
| 10 | 4 | 15.00 | 2016-04-01 | 2016-04-27 |
| 11 | 1 | 25.00 | 2016-05-01 | 2016-05-27 |
| 12 | 5 | 55.00 | 2016-06-01 | 2016-06-27 |
| 13 | 5 | 55.00 | 2016-06-01 | 2016-01-27 |
| 14 | 5 | 55.00 | 2016-06-01 | 2016-02-27 |
| 15 | 5 | 55.00 | 2016-06-01 | 2016-03-27 |
I want to create SP count rows of Product each month with condition Year = CurrentYear , like :
| Month| SumProducts | SumExpiredProducts |
-------------------------------------------
| 1 | 3 | 3 |
| 2 | 3 | 3 |
| 3 | 3 | 3 |
| 4 | 2 | 2 |
| 5 | 2 | 2 |
| 6 | 2 | 2 |
What should i do ?
You can use a query like the following:
SELECT MONTH([Date]),
COUNT(*) AS SumProducts ,
COUNT(CASE WHEN [Date] > Deadline THEN 1 END) AS SumExpiredProducts
FROM mytable
WHERE YEAR([Date]) = YEAR(GETDATE())
GROUP BY MONTH([Date])