How to build an intercalated ranking from two tables using BigQuery - sql

I have two tables in BigQuery with records ordered by a ranking. Given a ratio of integers, I want to be able to join both tables, keeping the order of the ranking and the proportions of the ratio of integers.
For example:
Table A
Name
Ranking A
Kevin
1
Jack
2
Kate
3
Randall
4
Beck
5
Table B:
Name
Ranking B
William
1
Laurel
2
Sophie
3
Tess
4
Deja
5
Toby
6
Nick
7
Given a ratio 2:3 where 2 corresponds with Table A, and 3 corresponds with Table B, the expected result would be:
Name
Ranking A
Ranking B
Final Rank
Kevin
1
1
Jack
2
2
William
1
3
Laurel
2
4
Sophie
3
5
Kate
3
6
Randall
4
7
Tess
4
8
Deja
5
9
Toby
6
10
Beck
5
11
Nick
7
12
Any ideas?

You can solve this problem with some math trick here. In both tables you have to compute a running sum and skip 2 (for second table) or 3 (for first table) values according to the ranking value you're currently placing. Basically you're making two gapped running sum, where the gaps will be filled by the other one's ranking values.
SELECT Name,
SUM(CASE WHEN MOD(RankingA,2) = 1 THEN 4 ELSE 1 END) OVER(ORDER BY RankingA)-3 AS rn
FROM tableA
UNION ALL
SELECT Name,
SUM(CASE WHEN MOD(RankingB,3) = 1 THEN 3 ELSE 1 END) OVER(ORDER BY RankingB) AS rn
FROM tableB
ORDER BY rn
Then you just apply the UNION ALL operation and ORDER BY on the just generated ranking.

Another approach would be:
SELECT name,
IF(tbl = 1, rank, NULL) AS rankingA,
IF(tbl = 2, rank, NULL) AS rankingB,
ROW_NUMBER() OVER (ORDER BY rank_grp, tbl, rank) final_rank
FROM (
SELECT name, rankingA AS rank, DIV(rankingA - 1, 2) AS rank_grp, 1 AS tbl FROM tableA
UNION ALL
SELECT name, rankingB, DIV(rankingB - 1, 3), 2 FROM tableB
);
+---------+----------+----------+------------+
| name | rankingA | rankingB | final_rank |
+---------+----------+----------+------------+
| Kevin | 1 | | 1 |
| Jack | 2 | | 2 |
| William | | 1 | 3 |
| Laurel | | 2 | 4 |
| Sophie | | 3 | 5 |
| Kate | 3 | | 6 |
| Randall | 4 | | 7 |
| Tess | | 4 | 8 |
| Deja | | 5 | 9 |
| Toby | | 6 | 10 |
| Beck | 5 | | 11 |
| Nick | | 7 | 12 |
+---------+----------+----------+------------+

You may try the following:
select Name, RankingA, RankingB,
rank() over (order by NewRank) FinalRank
from
(
select Name, cast(RankingA as string) as RankingA , '' as RankingB,
RankingA + floor((RankingA-1)/2)*3 as NewRank
from TableA
union all
select Name, '', cast(RankingB as string),
RankingB + ceiling((RankingB)/3)*2
from TableB
) T
RankingA + floor((RankingA-1)/2)*3: shifts each two consecutive RankingA values (1,2 and 3,4 and 5,6 ...) by n * 3 where n starts from 0.
RankingB + ceiling((RankingB)/3)*2: shifts each three consecutive RankingB values (1,2,3 and 4,5,6 ...) by n * 2 where n starts from 1.

Related

How to return the same period last year data with SQL?

I am trying to create a view in postgreSQL with the requirements as below:
The table needs to show the same period last year data for every records.
Sample data:
date_sk | location_sk | division_sk | employee_type_sk | value
20180202 | 6 | 8 | 4 | 1
20180202 | 7 | 2 | 4 | 2
20190202 | 6 | 8 | 4 | 1
20190202 | 7 | 2 | 4 | 1
20200202 | 6 | 8 | 4 | 1
20200202 | 7 | 2 | 4 | 3
In the table, date_sk, location_sk, division_sk and employee_type_sk are super keys which form an unique record in the table.
You can check the required output as below:
date_sk | location_sk | division_sk | employee_type_sk | value | value_last_year
20180202 | 6 | 8 | 4 | 1 | NULL
20180203 | 7 | 2 | 4 | 2 | NULL
20190202 | 6 | 8 | 4 | 1 | 1
20190203 | 7 | 3 | 4 | 1 | NULL
20200202 | 6 | 8 | 4 | 1 | 1
20200203 | 7 | 3 | 4 | 3 | 1
The records start on 20180202, therefore, the data for the same period last year is unavailable. At the 4th record, there is a difference in division_sk comparing with the same period last year - hence, the head_count_last_year is NULL.
My current solution is to create a view from the sample data with an addition column as same_date_last_year then LEFT JOIN the same table. The SQL queries are below:
CREATE VIEW test_view AS
SELECT *,
CONCAT(LEFT(date_sk, 4) - 1, RIGHT(date_sk, 4)) AS same_date_last_year
FROM test_table
SELECT
test_view.date_sk,
test_view.location_sk,
test_view.division_sk,
test_view.employee_type_sk,
test_view.value,
test_table.value AS value_last_year
FROM test_view
LEFT JOIN test_table ON (test_view.same_date_last_year = test_table.date_sk)
We have a lot of data in the table. My solution above is unacceptable in terms of performance.
Is there a different query which yields the same result and might improve the performance ?
You could simply use a correlated subquery here which is likely best for performance:
select *,
(
select value from t t2
where t2.date_sk=t.date_sk - interval '1' year and
t2.location_sk=t.location_sk and
t2.division_sk=t.division_sk and
t2.employee_type_sk=t.employee_type_sk
) as value_last_year
from t
WITH CTE(DATE_SK,LOCATION_SK,DIVISION_SK,EMPLOYEE_TYPE_SK,VALUE)AS
(
SELECT CAST('20180202' AS DATE),6,8,4,1 UNION ALL
SELECT CAST('20180203'AS DATE),7,2,4,2 UNION ALL
SELECT CAST('20190202'AS DATE),6,8,4,1 UNION ALL
SELECT CAST('20190203'AS DATE),7,2,4,1 UNION ALL
SELECT CAST('20200202'AS DATE),6,8,4,1 UNION ALL
SELECT CAST('20200203'AS DATE),7,2,4,3
)
SELECT C.DATE_SK,C.LOCATION_SK,C.DIVISION_SK,C.EMPLOYEE_TYPE_SK,C.VALUE,
LAG(C.VALUE)OVER(PARTITION BY C.LOCATION_SK,C.DIVISION_SK,C.EMPLOYEE_TYPE_SK ORDER BY C.DATE_SK ASC)LAGG
FROM CTE AS C
ORDER BY C.DATE_SK ASC;
Could you please try if the above is suitable for you. I assume,DATE_SK is a date column or can be CAST to a date

Select max value from column for every value in other two columns

I'm working on a webapp that tracks tvshows, and I need to get all episodes id's that are season finales, which means, the highest episode number from all seasons, for all tvshows.
This is a simplified version of my "episodes" table.
id tvshow_id season epnum
---|-----------|--------|-------
1 | 1 | 1 | 1
2 | 1 | 1 | 2
3 | 1 | 1 | 3
4 | 1 | 2 | 1
5 | 1 | 2 | 2
6 | 2 | 1 | 1
7 | 2 | 1 | 2
8 | 2 | 1 | 3
9 | 2 | 1 | 4
10 | 2 | 2 | 1
11 | 2 | 2 | 2
The expect output:
id
---|
3 |
5 |
9 |
11 |
I've managed to get this working for the latest season but I can't make it work for all seasons.
I've also tried to take some ideas from this but I can't seem to find a way to add the tvshow_id in there.
I'm using Postgres v10
SELECT Id from
(Select *, Row_number() over (partition by tvshow_id,season order by epnum desc) as ranking from tbl)c
Where ranking=1
You can use the below SQL to get your result, using GROUP BY with sub-subquery as:
select id from tab_x
where (tvshow_id,season,epnum) in (
select tvshow_id,season,max(epnum)
from tab_x
group by tvshow_id,season)
Below is the simple query to get desired result. Below query is also good in performance with help of using distinct on() clause
select
distinct on (tvshow_id,season)
id
from your_table
order by tvshow_id,season ,epnum desc

How to organise rows in a union

Having a little issue with my results where I am trying to create a fixture list for each team, playing home and away. I’ve almost got it working but the problem I have is displayed in the following results:
I have 2 teams playing each other in week 1 home and away and the same two teams playing each other home and away in week for.
What this should display when I insert the data and read it is week 1, team 4 is at home playing team 18 who are way, and then the reverse fixture is played in week 4.
In other words row 2 and 3 are incorrect and should only show rows 1 and 4. What do I need to change in the code below to get this to work?
INSERT INTO dbo.Fixture (WeekNumber, HomeTeamID, AwayTeamID, FixtureDate, LeagueID)
SELECT
ROW_NUMBER() OVER (PARTITION BY h.teamID ORDER BY h.TeamID, a.TeamID, h.LeagueID) AS WeekNumber,
h.TeamID,
a.TeamID,
DATEADD(day,(ROW_NUMBER() OVER (ORDER BY h.LeagueID)-1)*7,#StartFixtureWeek) AS FixtureWeek,
h.LeagueID
FROM dbo.Team h
CROSS JOIN dbo.Team a
WHERE h.TeamID <> a.TeamID
AND h.LeagueID = a.LeagueID
UNION
SELECT
ROW_NUMBER() OVER (PARTITION BY a.teamID ORDER BY h.TeamID, a.TeamID, h.LeagueID) AS WeekNumber,
h.TeamID,
a.TeamID,
DATEADD(day,(ROW_NUMBER() OVER (ORDER BY a.LeagueID)-1)*7,#StartFixtureWeek) AS FixtureWeek,
h.LeagueID
FROM dbo.Team h
CROSS JOIN dbo.Team a
WHERE h.TeamID <> a.TeamID
AND h.LeagueID = a.LeagueID
select * from dbo.Fixture
WHERE (HomeTeamID = 4 AND AwayTeamID = 18) OR (HomeTeamID = 18 AND AwayTeamID = 4)
UPDATE:
Below is an explanation and design of the desired output:
WeekNumber HomeTeamID AwayTeamID FixtureWeek LeagueID
1 1 4 10-06-2016 1
2 1 3 17-06-2016 1
3 1 2 24-06-2016 1
4 4 1 30-06-2016 1
5 3 1 06-07-2016 1
6 2 1 13-07-2016 1
1 2 3 10-06-2016 1
2 2 4 17-06-2016 1
3 3 4 24-06-2016 1
4 3 2 30-06-2016 1
5 4 2 06-07-2016 1
6 4 3 13-07-2016 1
1 5 8 10-06-2016 2
2 5 7 17-06-2016 2
3 5 6 24-06-2016 2
4 8 5 30-06-2016 2
5 7 5 06-07-2016 2
6 6 5 13-07-2016 2
1 6 7 10-06-2016 2
2 6 8 17-06-2016 2
3 7 8 24-06-2016 2
4 7 6 30-06-2016 2
5 8 6 06-07-2016 2
6 8 7 13-07-2016 2
Ok so have two leagues (LeagueID 1 and LeagueID 2)
In League 1 there are 4 teams (TeamID) – 1, 2, 3, 4 – They play each other home and away but they can’t play two games within the same week.
In League 2 there are 4 teams (TeamID) – 5, 6, 7, 8 – They play each other home and away but they can’t play two games within the same week.
Both Leagues start on the same day and add 7 days for every game (or in other words every week)
The output doesn’t show it but preferable if each team can play their home game one week and then the next week play away, then play home, then away etc. But if we can get the above output to display first and then fiddle about to order them home and away then that’s fine.
CROSS JOIN with h.TeamID <> a.TeamID generates all pairs, i.e. (1,2) and (2,1). I think it is easier if we separate them, so WHERE filter would become h.TeamID > a.TeamID or h.TeamID < a.TeamID.
Week numbers should be assigned for each League, so it should be partitioned by LeagueID. Then it is a matter of sorting teams in some way, and then in opposite direction.
In the result below you can see how a pattern of team pairs for 6 weeks repeat in two opposite directions.
Most likely it is possible to achieve the same or similar result with one CROSS JOIN that uses <> without UNION and a more complicated expression in ROW_NUMBER.
Side note: don't use UNION if you don't really need it. Use UNION ALL.
Side note 2: it doesn't matter if you use CROSS JOIN or INNER JOIN here. Optimizer would generate identical execution plans. I think CROSS JOIN here is more readable, shows more clearly the intention to generate a Cartesian product - set of all pairs.
Sample data
DECLARE #T TABLE (TeamID int, LeagueID int);
INSERT INTO #T (TeamID, LeagueID) VALUES
(1, 1),
(2, 1),
(3, 1),
(4, 1),
(5, 2),
(6, 2),
(7, 2),
(8, 2);
Query
SELECT
h.TeamID AS HomeTeamID
,a.TeamID AS AwayTeamID
,h.LeagueID
,ROW_NUMBER() OVER
(PARTITION BY h.LeagueID ORDER BY h.TeamID, a.TeamID) AS WeekNumber
,1 AS SortOrder
FROM
#T AS h
CROSS JOIN #T AS a
WHERE
h.LeagueID = a.LeagueID
AND h.TeamID < a.TeamID
UNION ALL
SELECT
h.TeamID AS HomeTeamID
,a.TeamID AS AwayTeamID
,h.LeagueID
,ROW_NUMBER() OVER
(PARTITION BY h.LeagueID ORDER BY a.TeamID DESC, h.TeamID DESC) AS WeekNumber
,2 AS SortOrder
FROM
#T AS h
CROSS JOIN #T AS a
WHERE
h.LeagueID = a.LeagueID
AND h.TeamID > a.TeamID
ORDER BY
LeagueID
,SortOrder
,WeekNumber
;
Result
+------------+------------+----------+------------+-----------+
| HomeTeamID | AwayTeamID | LeagueID | WeekNumber | SortOrder |
+------------+------------+----------+------------+-----------+
| 1 | 2 | 1 | 1 | 1 |
| 1 | 3 | 1 | 2 | 1 |
| 1 | 4 | 1 | 3 | 1 |
| 2 | 3 | 1 | 4 | 1 |
| 2 | 4 | 1 | 5 | 1 |
| 3 | 4 | 1 | 6 | 1 |
| 4 | 3 | 1 | 1 | 2 |
| 4 | 2 | 1 | 2 | 2 |
| 3 | 2 | 1 | 3 | 2 |
| 4 | 1 | 1 | 4 | 2 |
| 3 | 1 | 1 | 5 | 2 |
| 2 | 1 | 1 | 6 | 2 |
| 5 | 6 | 2 | 1 | 1 |
| 5 | 7 | 2 | 2 | 1 |
| 5 | 8 | 2 | 3 | 1 |
| 6 | 7 | 2 | 4 | 1 |
| 6 | 8 | 2 | 5 | 1 |
| 7 | 8 | 2 | 6 | 1 |
| 8 | 7 | 2 | 1 | 2 |
| 8 | 6 | 2 | 2 | 2 |
| 7 | 6 | 2 | 3 | 2 |
| 8 | 5 | 2 | 4 | 2 |
| 7 | 5 | 2 | 5 | 2 |
| 6 | 5 | 2 | 6 | 2 |
+------------+------------+----------+------------+-----------+

Order rows by ntile and row_number

I'm trying to build stored procedure that will return data for Crystal Reports report.
Inside CR I'm using multi column layout.
I want to get 3 layout column something like this:
1 5 8
2 6 9
3 7 10
4
But because CR has some layout issues it is ordering my table like this:
1 2 3
4 5 6
7 8 9
10
So I've tried to create procedure that will return extra column on which I'll sort my data.
So instead 1,2,3,4 order I need 1,4,7,10,2,5,8,3,6,9...
I have table with that data:
ID | CASE_ID | CASE_DATE
--------------------------
1 | 1 | 2014-02-03
2 | 1 | 2014-02-04
3 | 1 | 2014-02-05
4 | 1 | 2014-02-06
5 | 1 | 2014-02-07
6 | 1 | 2014-02-08
7 | 1 | 2014-02-09
8 | 1 | 2014-02-10
9 | 1 | 2014-02-11
10 | 1 | 2014-02-12
AND I need stored procedure that will return this data:
ID | CASE_ID | CASE_DATE | ORDER
---------------------------------
1 | 1 | 2014-02-03 | 1
2 | 1 | 2014-02-04 | 5
3 | 1 | 2014-02-05 | 8
4 | 1 | 2014-02-06 | 2
5 | 1 | 2014-02-07 | 6
6 | 1 | 2014-02-08 | 9
7 | 1 | 2014-02-09 | 3
8 | 1 | 2014-02-10 | 7
9 | 1 | 2014-02-11 | 10
10 | 1 | 2014-02-12 | 4
Here is sql fiddle with sample data and my code: http://sqlfiddle.com/#!3/c24c1/1
Idea behind sort column:
divide all rows into 3 groups (ntile), take first item from first group, then first from second and first from third group
EDIT:
Here is my temporary solution, I hope that running this will clarify what I had in mind when I was asking this question:
--DECLARE #NUM INT;
--SET #NUM=3;
SELECT ID,
CASE_ID,
CONVERT(NVARCHAR(10),CASE_DATE,121) AS DATA,
(ROW1 - 1) * 3/*#NUM*/ + COL AS [ORDER]
FROM
( SELECT CASE_ID,
ID,
ROW AS LP,
COL,
ROW_NUMBER() OVER (PARTITION BY CASE_ID, COL ORDER BY ROW) AS ROW1,
CASE_DATE
FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY D.CASE_ID ORDER BY D.ID) AS ROW,
NTILE(3/*#NUM*/) OVER (PARTITION BY D.CASE_ID ORDER BY D.ID) AS COL,
ID,
D.CASE_ID,
CASE_DATE
FROM DATA D
WHERE D.CASE_ID = 1)X )Y
ORDER BY Y.CASE_ID,
LP
Edit: It looks like you actually want the ORDER column, not just returning the columns in that order.
SELECT ID,
CASE_ID,
DATA,
ROW_NUMBER() OVER (ORDER BY ROW, N) AS [ORDER]
FROM (
SELECT ID,
CASE_ID,
N,
ROW_NUMBER() OVER (PARTITION BY CASE_ID, N ORDER BY ID) AS ROW,
DATA
FROM (
SELECT
ID,
CASE_ID,
NTILE(3) OVER (PARTITION BY CASE_ID ORDER BY ID) AS N,
CONVERT(NVARCHAR(10), CASE_DATE,121) AS DATA
FROM DATA
WHERE CASE_ID = 1 ) X ) Y
ORDER BY ID;
SQLFiddle

Select dynamic couples of lines in SQL (PostgreSQL)

My objective is to make dynamic group of lines (of product by TYPE & COLOR in fact)
I don't know if it's possible just with one select query.
But : I want to create group of lines (A PRODUCT is a TYPE and a COLOR) as per the number_per_group column and I want to do this grouping depending on the date order (Order By DATE)
A single product with a NB_PER_GROUP number 2 is exclude from the final result.
Table :
-----------------------------------------------
NUM | TYPE | COLOR | NB_PER_GROUP | DATE
-----------------------------------------------
0 | 1 | 1 | 2 | ...
1 | 1 | 1 | 2 |
2 | 1 | 2 | 2 |
3 | 1 | 2 | 2 |
4 | 1 | 1 | 2 |
5 | 1 | 1 | 2 |
6 | 4 | 1 | 3 |
7 | 1 | 1 | 2 |
8 | 4 | 1 | 3 |
9 | 4 | 1 | 3 |
10 | 5 | 1 | 2 |
Results :
------------------------
GROUP_NUMBER | NUM |
------------------------
0 | 0 |
0 | 1 |
~~~~~~~~~~~~~~~~~~~~~~~~
1 | 2 |
1 | 3 |
~~~~~~~~~~~~~~~~~~~~~~~~
2 | 4 |
2 | 5 |
~~~~~~~~~~~~~~~~~~~~~~~~
3 | 6 |
3 | 8 |
3 | 9 |
If you have another way to solve this problem, I will accept it.
What about something like this?
select max(gn.group_number) group_number, ip.num
from products ip
join (
select date, type, color, row_number() over (order by date) - 1 group_number
from (
select op.num, op.type, op.color, op.nb_per_group, op.date, (row_number() over (partition by op.type, op.color order by op.date) - 1) % nb_per_group group_order
from products op
) sq
where sq.group_order = 0
) gn
on ip.type = gn.type
and ip.color = gn.color
and ip.date >= gn.date
group by ip.num
order by group_number, ip.num
This may only work if your nb_per_group values are the same for each combination of type and color. It may also require unique dates, but that could probably be worked around if required.
The innermost subquery partitions the rows by type and color, orders them by date, then calculates the row numbers modulo nb_per_group; this forms a 0-based count for the group that resets to 0 each time nb_per_group is exceeded.
The next-level subquery finds all of the 0 values we mapped in the lower subquery and assigns group numbers to them.
Finally, the outermost query ties each row in the products table to a group number, calculated as the highest group number that split off before this product's date.