Group by rows which are in sequence - sql

Consider I have a table like this
PASSENGER CITY DATE
43 NEW YORK 1-Jan-21
44 CHICAGO 4-Jan-21
43 NEW YORK 2-Jan-21
43 NEW YORK 3-Jan-21
44 ROME 5-Jan-21
43 LONDON 4-Jan-21
44 CHICAGO 6-Jan-21
44 CHICAGO 7-Jan-21
How would I group Passenger and City column in sequence to get a result like below?
PASSENGER CITY COUNT
43 NEW YORK 3
44 CHICAGO 1
44 ROME 1
43 LONDON 1
44 CHICAGO 2

One way to deal with such a gaps-and-islands problem is to calculate a ranking for the gaps.
Then group also on that ranking.
SELECT PASSENGER, CITY
, COUNT(*) AS "Count"
-- , MIN("DATE") AS StartDate
-- , MAX("DATE") AS EndDate
FROM (
SELECT q1.*
, SUM(gap) OVER (PARTITION BY PASSENGER ORDER BY "DATE") as Rnk
FROM (
SELECT PASSENGER, CITY, "DATE"
, CASE
WHEN 1 = TRUNC("DATE")
- TRUNC(LAG("DATE")
OVER (PARTITION BY PASSENGER, CITY ORDER BY "DATE"))
THEN 0 ELSE 1 END as gap
FROM table_name t
) q1
) q2
GROUP BY PASSENGER, CITY, Rnk
ORDER BY MIN("DATE"), PASSENGER
PASSENGER
CITY
Count
43
NEW YORK
3
43
LONDON
1
44
CHICAGO
1
44
ROME
1
44
CHICAGO
2
db<>fiddle here

From Oracle 12, you can use MATCH_RECOGNIZE:
SELECT *
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY passenger
ORDER BY "DATE"
MEASURES
FIRST(city) AS city,
COUNT(*) AS count
PATTERN (same_city+)
DEFINE
same_city AS FIRST(city) = city
);
Which, for the sample data:
CREATE TABLE table_name (PASSENGER, CITY, "DATE") AS
SELECT 43, 'NEW YORK', DATE '2021-01-01' FROM DUAL UNION ALL
SELECT 44, 'CHICAGO', DATE '2021-01-04' FROM DUAL UNION ALL
SELECT 43, 'NEW YORK', DATE '2021-01-02' FROM DUAL UNION ALL
SELECT 43, 'NEW YORK', DATE '2021-01-03' FROM DUAL UNION ALL
SELECT 44, 'ROME', DATE '2021-01-05' FROM DUAL UNION ALL
SELECT 43, 'LONDON', DATE '2021-01-04' FROM DUAL UNION ALL
SELECT 44, 'CHICAGO', DATE '2021-01-06' FROM DUAL UNION ALL
SELECT 44, 'CHICAGO', DATE '2021-01-07' FROM DUAL
Outputs:
PASSENGER
CITY
COUNT
43
NEW YORK
3
43
LONDON
1
44
CHICAGO
1
44
ROME
1
44
CHICAGO
2
If you have ordered the input result set (note: tables should be considered to be unordered) and want to maintain the order then:
SELECT *
FROM (SELECT t.*, ROWNUM AS rn FROM table_name t)
MATCH_RECOGNIZE (
PARTITION BY passenger
ORDER BY RN
MEASURES
FIRST(rn) AS rn,
FIRST("DATE") AS "DATE",
FIRST(city) AS city,
COUNT(*) AS count
PATTERN (same_city+)
DEFINE
same_city AS FIRST(city) = city
)
ORDER BY rn
Outputs:
PASSENGER
RN
DATE
CITY
COUNT
43
1
01-JAN-21
NEW YORK
3
44
2
04-JAN-21
CHICAGO
1
44
5
05-JAN-21
ROME
1
43
6
04-JAN-21
LONDON
1
44
7
06-JAN-21
CHICAGO
2
db<>fiddle here

Related

SQL Implementing Forward Fill logic

I have a dataset within a date range which has three columns, Product_type, date and metric. For a given product_type, data is not available for all days. For the missing rows, we would like to do a forward date fill for next n days using the last value of the metric.
Product_type
date
metric
A
2019-10-01
10
A
2019-10-02
12
A
2019-10-03
15
A
2019-10-04
5
A
2019-10-05
5
A
2019-10-06
5
A
2019-10-16
12
A
2019-10-17
23
A
2019-10-18
34
Here, the data from 2019-10-04 to 2019-10-06, has been forward filled. There might be bigger gaps in the dates, but we only want to fill the first n days.
Here, n=2, so rows 5 and 6 has been forward filled.
I am not sure how to implement this logic in SQL.
Here's one option. Read comments within code.
Sample data:
SQL> WITH
2 test (product_type, datum, metric)
3 AS
4 (SELECT 'A', DATE '2019-10-01', 10 FROM DUAL
5 UNION ALL
6 SELECT 'A', DATE '2019-10-02', 12 FROM DUAL
7 UNION ALL
8 SELECT 'A', DATE '2019-10-03', 15 FROM DUAL
9 UNION ALL
10 SELECT 'A', DATE '2019-10-04', 5 FROM DUAL
11 UNION ALL
12 SELECT 'A', DATE '2019-10-16', 12 FROM DUAL
13 UNION ALL
14 SELECT 'A', DATE '2019-10-18', 23 FROM DUAL),
Query begins here:
15 temp
16 AS
17 -- CB_FWD_FILL = 1 if difference between two consecutive dates is larger than 1 day
18 -- (i.e. that's the gap to be forward filled)
19 (SELECT product_type,
20 datum,
21 metric,
22 LEAD (datum) OVER (PARTITION BY product_type ORDER BY datum)
23 next_datum,
24 CASE
25 WHEN LEAD (datum)
26 OVER (PARTITION BY product_type ORDER BY datum)
27 - datum >
28 1
29 THEN
30 1
31 ELSE
32 0
33 END
34 cb_fwd_fill
35 FROM test)
36 -- original data from the table
37 SELECT product_type, datum, metric FROM test
38 UNION ALL
39 -- DATUM is the last date which is OK; add LEVEL pseudocolumn to it to fill the gap
40 -- with PAR_N number of rows
41 SELECT product_type, datum + LEVEL, metric
42 FROM (SELECT product_type, datum, metric
43 FROM (-- RN = 1 means that that's the first gap in data set - that's the one
44 -- that has to be forward filled
45 SELECT product_type,
46 datum,
47 metric,
48 ROW_NUMBER ()
49 OVER (PARTITION BY product_type ORDER BY datum) rn
50 FROM temp
51 WHERE cb_fwd_fill = 1)
52 WHERE rn = 1)
53 CONNECT BY LEVEL <= &par_n
54 ORDER BY datum;
Result:
Enter value for par_n: 2
PRODUCT_TYPE DATUM METRIC
--------------- ---------- ----------
A 2019-10-01 10
A 2019-10-02 12
A 2019-10-03 15
A 2019-10-04 5
A 2019-10-05 5 --> newly added
A 2019-10-06 5 --> rows
A 2019-10-16 12
A 2019-10-18 23
8 rows selected.
SQL>
Another solution:
WITH test (product_type, datum, metric) AS
(
SELECT 'A', DATE '2019-10-01', 10 FROM DUAL
UNION ALL
SELECT 'A', DATE '2019-10-02', 12 FROM DUAL
UNION ALL
SELECT 'A', DATE '2019-10-03', 15 FROM DUAL
UNION ALL
SELECT 'A', DATE '2019-10-04', 5 FROM DUAL
UNION ALL
SELECT 'A', DATE '2019-10-16', 12 FROM DUAL
UNION ALL
SELECT 'A', DATE '2019-10-18', 23 FROM DUAL
),
minmax(mindatum, maxdatum) AS (
SELECT MIN(datum), max(datum) from test
),
alldates (datum, product_type) AS
(
SELECT mindatum + level - 1, t.product_type FROM minmax,
(select distinct product_type from test) t
connect by mindatum + level <= (select maxdatum from minmax)
),
grouped as (
select a.datum, a.product_type, t.metric,
count(t.product_type) over(partition by a.product_type order by a.datum) as grp
from alldates a
left join test t on t.datum = a.datum
),
final_table as (
select g.datum, g.product_type, g.grp, g.rn,
last_value(g.metric ignore nulls) over(partition by g.product_type order by g.datum) as metric
from (
select g.*, row_number() over(partition by product_type, grp order by datum) - 1 as rn
from grouped g
) g
)
select datum, product_type, metric
from final_table
where rn <= &par_n
order by datum
;

Generating random student data

I'm trying to create a process that populates a student table. I want to be able to create a different combination of a student's first/last name and dob every time I run the query.
The code below appears to work fine as it generates 5 names. My first question is can this be modified to generate N NUMBER of rows sat for example 20. I tried using CONNECT by level <=20 but that gives me a syntax error.
Secondly, know the random_date function works
Select random_date(DATE '2001-01-01', DATE '2001-12-31') from dual
17-NOV-2001 08:31:16
But I can't seem to incorporate into my SQL.
Any help would be greatly appreciated. Thanks in advance for your expertise and time
ALTER SESSION SET NLS_TIMESTAMP_FORMAT = 'DD-MON-YYYY HH24:MI:SS.FF';
ALTER SESSION SET NLS_DATE_FORMAT = 'DD-MON-YYYY HH24:MI:SS';
CREATE OR REPLACE FUNCTION random_date(
p_from IN DATE,
p_to IN DATE
) RETURN DATE
IS
BEGIN
RETURN p_from + DBMS_RANDOM.VALUE() * (p_to - p_from + 1 );
END random_date;
/
CREATE TABLE students (
student_id number(*,0),
first_name VARCHAR(25) NOT NULL,
last_name VARCHAR(25) NOT NULL,
dob DATE,
constraint teacher_pk primary key (student_id));
WITH raw_names (first_name, last_name) AS
(
SELECT 'Faith', 'Andrews' FROM dual UNION ALL
SELECT 'Tom', 'Thornton' FROM dual UNION ALL
SELECT 'Anna', 'Smith' FROM dual UNION ALL
SELECT 'Lisa', 'Jones' FROM dual UNION ALL
SELECT 'Andy', 'Beirs' FROM dual
)
, numbered_names AS
(
SELECT first_name, last_name
, ROW_NUMBER () OVER (ORDER BY dbms_random.value (0, 1)) AS first_num
, ROW_NUMBER () OVER (ORDER BY dbms_random.value (0, 2)) AS last_num
FROM raw_names
)
SELECT fn.first_num AS student_id
, fn.first_name
, ln.last_name
FROM numbered_names fn
JOIN numbered_names ln ON ln.last_num = fn.first_num
ORDER BY student_id
;
I can't debug your code as you didn't post it (the one that raised a syntax error and doesn't accept the function).
Anyway, here's what you might do:
line #14 - function call
lines #15 - 19 show how to create desired number of rows (must be multiple of number of rows in raw_names)
SQL> WITH raw_names (first_name, last_name) AS
2 (
3 SELECT 'Faith', 'Andrews' FROM dual UNION ALL
4 SELECT 'Tom', 'Thornton' FROM dual UNION ALL
5 SELECT 'Anna', 'Smith' FROM dual UNION ALL
6 SELECT 'Lisa', 'Jones' FROM dual UNION ALL
7 SELECT 'Andy', 'Beirs' FROM dual
8 )
9 , numbered_names AS
10 (
11 SELECT first_name, last_name
12 , ROW_NUMBER () OVER (ORDER BY dbms_random.value (0, 1)) AS first_num
13 , ROW_NUMBER () OVER (ORDER BY dbms_random.value (0, 2)) AS last_num
14 , random_date (date '2001-01-01', date '2001-12-31') datum
15 FROM raw_names cross join
16 table(cast(multiset(select level from dual
17 connect by level <= (select &n / count(*) from raw_names))
18 as sys.odcinumberlist))
19 )
20 SELECT fn.first_num AS student_id
21 , fn.first_name
22 , ln.last_name
23 , ln.datum
24 FROM numbered_names fn
25 JOIN numbered_names ln ON ln.last_num = fn.first_num
26 ORDER BY student_id
27 ;
Enter value for n: 20
Result:
STUDENT_ID FIRST LAST_NAM DATUM
---------- ----- -------- --------------------
1 Tom Andrews 12-NOV-2001 14:42:05
2 Faith Jones 06-MAR-2001 05:14:07
3 Tom Thornton 04-SEP-2001 16:28:25
4 Faith Beirs 29-MAR-2001 06:11:35
5 Andy Thornton 18-MAY-2001 17:32:07
6 Andy Jones 19-JAN-2001 19:39:15
7 Anna Jones 17-JAN-2001 02:51:39
8 Andy Andrews 31-DEC-2001 15:36:44
9 Faith Beirs 22-JUN-2001 05:34:22
10 Lisa Thornton 29-JUL-2001 07:00:15
11 Lisa Smith 31-JAN-2001 04:17:04
12 Anna Andrews 07-FEB-2001 09:02:21
13 Lisa Thornton 31-DEC-2001 20:18:06
14 Lisa Smith 24-SEP-2001 04:10:21
15 Tom Andrews 30-JUN-2001 12:01:04
16 Faith Jones 16-AUG-2001 19:56:54
17 Anna Beirs 23-NOV-2001 11:01:03
18 Anna Beirs 23-NOV-2001 08:33:39
19 Andy Smith 24-SEP-2001 21:27:00
20 Tom Smith 24-SEP-2001 22:07:39
20 rows selected.
SQL>

How to find the row with the highest value cell based on another column from within a group of values?

I have this table:
Site_ID
Volume
RPT_Date
RPT_Hour
1
10
01/01/2021
1
1
7
01/01/2021
2
1
13
01/01/2021
3
1
11
01/16/2021
1
1
3
01/16/2021
2
1
5
01/16/2021
3
2
9
01/01/2021
1
2
24
01/01/2021
2
2
16
01/01/2021
3
2
18
01/16/2021
1
2
7
01/16/2021
2
2
1
01/16/2021
3
I need to select the RPT_Hour with the highest Volume for each set of dates
Needed Output:
Site_ID
Volume
RPT_Date
RPT_Hour
1
13
01/01/2021
1
1
11
01/16/2021
1
2
24
01/01/2021
2
2
18
01/16/2021
1
SELECT site_id, volume, rpt_date, rpt_hour
FROM (SELECT t.*,
ROW_NUMBER()
OVER (PARTITION BY site_id, rpt_date ORDER BY volume DESC) AS rn
FROM MyTable) t
WHERE rn = 1;
I cannot figure out how to group the table into like date groups. If I could do that, I think the rn = 1 will return the highest volume row for each date.
The way I see it, your query is OK (but rpt_hour in desired output is not).
SQL> with test (site_id, volume, rpt_date, rpt_hour) as
2 (select 1, 10, date '2021-01-01', 1 from dual union all
3 select 1, 7, date '2021-01-01', 2 from dual union all
4 select 1, 13, date '2021-01-01', 3 from dual union all
5 select 1, 11, date '2021-01-16', 1 from dual union all
6 select 1, 3, date '2021-01-16', 2 from dual union all
7 select 1, 5, date '2021-01-16', 3 from dual union all
8 --
9 select 2, 9, date '2021-01-01', 1 from dual union all
10 select 2, 24, date '2021-01-01', 3 from dual union all
11 select 2, 16, date '2021-01-01', 3 from dual union all
12 select 2, 18, date '2021-01-16', 1 from dual union all
13 select 2, 7, date '2021-01-16', 2 from dual union all
14 select 2, 1, date '2021-01-16', 3 from dual
15 ),
16 temp as
17 (select t.*,
18 row_number() over (partition by site_id, rpt_date order by volume desc) rn
19 from test t
20 )
21 select site_id, volume, rpt_date, rpt_hour
22 from temp
23 where rn = 1
24 /
SITE_ID VOLUME RPT_DATE RPT_HOUR
---------- ---------- ---------- ----------
1 13 01/01/2021 3
1 11 01/16/2021 1
2 24 01/01/2021 3
2 18 01/16/2021 1
SQL>
One option would be using MAX(..) KEEP (DENSE_RANK ..) OVER (PARTITION BY ..) analytic function without need of any subquery such as :
SELECT DISTINCT
site_id,
MAX(volume) KEEP (DENSE_RANK FIRST ORDER BY volume DESC) OVER
(PARTITION BY site_id, rpt_date) AS volume,
rpt_date,
MAX(rpt_hour) KEEP (DENSE_RANK FIRST ORDER BY volume DESC) OVER
(PARTITION BY site_id, rpt_date) AS rpt_hour
FROM t
GROUP BY site_id, rpt_date, volume, rpt_hour
ORDER BY site_id, rpt_date
Demo

List the branch that monthly pays the most in salaries

I have this table, the expected output should be B003 since it's pays 54,000
STAFF
SALARY
BRAN
SL21
30000
B005
SG37
12000
B003
SG14
18000
B003
SA9
9000
B007
SG5
24000
B003
SL41
9000
B005
So far I only have this subquery, which isn't working how I expected.
SELECT BRANCHNO
FROM STAFF
WHERE (SALARY) IN (SELECT MAX(SUM(SALARY))
FROM STAFF
GROUP BY BRANCHNO);
This works but I want a subquery that returns the branchno
SELECT MAX(SUM(SALARY))
FROM STAFF
GROUP BY BRANCHNO;
select BRANCHNO max(sum_sal)
from (SELECT BRANCHNO, SUM(SALARY) sum_sal
FROM STAFF
GROUP BY BRANCHNO) q1
group by BRANCHNO ;
The column used to group the rows can be displayed. So, add BRANCHNO to your select clause.
One option is to use rank analytic function which ranks branches by sum of their salaries in descending order; you'd then return the one(s) that rank as the highest (rnk = 1).
Sample data:
SQL> with staff (staff, salary, bran) as
2 (select 'SL21', 30000, 'B005' from dual union all
3 select 'SG37', 12000, 'B003' from dual union all
4 select 'SG14', 18000, 'B003' from dual union all
5 select 'SA9' , 9000, 'B007' from dual union all
6 select 'SG5' , 24000, 'B003' from dual union all
7 select 'SL41', 9000, 'B005' from dual
8 )
Query:
9 select bran
10 from (select bran, rank() over (order by sum(salary) desc) rnk
11 from staff
12 group by bran
13 )
14 where rnk = 1;
BRAN
----
B003
SQL>

How to do a partitioned outer join in BigQuery

I would like to implement the partitioned outer join in BigQuery. To give a concrete example, I'd like to achieve the partitioned outer join as the accepted answer here: https://dba.stackexchange.com/questions/227069/what-is-a-partitioned-outer-join
I understand there are a lot of discussions about this topic, but I can't make it work under BigQuery. I added partition by date after the left table following the same syntax in the answer as follows:
select * from (
select '2019-01-17' as date, 'London' as location, 11 as qty
union all
select '2019-01-15' as date, 'London' as location, 10 as qty
union all
select '2019-01-16' as date, 'Paris' as location, 20 as qty
union all
select '2019-01-17' as date, 'Boston' as location, 31 as qty
union all
select '2019-01-16' as date, 'Boston' as location, 30 as qty
) as sales partition by (date)
right join
(
select 'London' as location
union all
select 'Paris' as location
union all
select 'Boston' as location
)
as loc
using (location)
The target result I'm looking for is:
date qty location
15-JAN-19 NULL Boston
15-JAN-19 10 London
15-JAN-19 NULL Paris
16-JAN-19 30 Boston
16-JAN-19 NULL London
16-JAN-19 20 Paris
17-JAN-19 31 Boston
17-JAN-19 11 London
17-JAN-19 NULL Paris
But I got the following error: Syntax error: Unexpected keyword PARTITION at [11:12]
How can I implement it in BigQuery?
Below is for BigQuery Standard SQL
#standardSQL
SELECT `date`, qty, location
FROM (SELECT DISTINCT `date` FROM sales)
CROSS JOIN loc
LEFT JOIN sales
USING (`date`, location)
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH sales AS (
SELECT '2019-01-17' AS `date`, 'London' AS location, 11 AS qty UNION ALL
SELECT '2019-01-15', 'London', 10 UNION ALL
SELECT '2019-01-16', 'Paris', 20 UNION ALL
SELECT '2019-01-17', 'Boston', 31 UNION ALL
SELECT '2019-01-16', 'Boston', 30
), loc AS (
SELECT 'London' AS location UNION ALL
SELECT 'Paris' UNION ALL
SELECT 'Boston'
)
SELECT `date`, qty, location
FROM (SELECT DISTINCT `date` FROM sales)
CROSS JOIN loc
LEFT JOIN sales
USING (`date`, location)
-- ORDER BY `date`, location
with below result
Row date qty location
1 2019-01-15 null Boston
2 2019-01-15 10 London
3 2019-01-15 null Paris
4 2019-01-16 30 Boston
5 2019-01-16 null London
6 2019-01-16 20 Paris
7 2019-01-17 31 Boston
8 2019-01-17 11 London
9 2019-01-17 null Paris
In case if you need dates to be in 15-JAN-19 format - you below
#standardSQL
SELECT FORMAT_DATE('%d-%b-%y', CAST(`date` AS DATE)) AS `date`, qty, location
FROM (SELECT DISTINCT `date` FROM sales)
CROSS JOIN loc
LEFT JOIN sales
USING (`date`, location)
so result will be
Row date qty location
1 15-Jan-19 null Boston
2 15-Jan-19 10 London
3 15-Jan-19 null Paris
4 16-Jan-19 30 Boston
5 16-Jan-19 null London
6 16-Jan-19 20 Paris
7 17-Jan-19 31 Boston
8 17-Jan-19 11 London
9 17-Jan-19 null Paris