SQL Interleave multiple ordered tables - sql

Let's say I have 2 tables with date ordered rows like:
products table:
date
name
09/01/2021
P1
12/01/2021
P2
22/01/2021
P3
and artworks table:
date
name
19/01/2018
A1
27/02/2019
A2
28/02/2021
A3
Is there any way in SQL to design a query that joins the 2 tables by "interleaving" them, but takes the first 2 products, then 1 artwork, then the next 2 products, then the next artwork...and so on
The result would be like:
date
name
09/01/2021
P1
12/01/2021
P2
19/01/2018
A1
22/01/2021
P3
27/02/2019
A2

You can use ROW_NUMBER() to produce interleaving numbering.
For example:
select
date, name
from (
select date, name,
row_number() over(order by date) * 10 as rn
from products
union all
select date, name,
row_number() over(order by date) * 20 + 1 as rn
from artworks
) x
order by rn

Related

Join values in one table only when the minimum value is less than value in other table - snowflake

I have two tables:
Table A
Purchase_date
Product_ID
20200101
1
20190101
2
20200301
1
20201201
2
Table B
Product_ID
Price
Price_change_date
1
10
20191231
2
15
20201031
1
12
20200110
1
20
20201231
2
8
20190331
I want to join these two tables based on two criteria:
If the purchase_date < min(price_change_date), return the price corresponding to the min(price_change_date)
Else return the price at the max(price_change_date) that is less than the purchase_date
I have written a query to successfully get results for the second criteria, but not the first, and I'm not sure if they can be combined within the same query.
Results for the above table should yield:
Results
Purchase_date
Product_ID
Price
Price_change_date
20200101
1
10
20191231
20190101
2
8
20190331
20200301
1
12
20200110
20201201
2
15
20201031
Notice the second row is the one that returns a price with a purchase date that precedes the price_change_date.
Thanks in advance!!
You can use a lateral join:
select a.*, b.*
from a, lateral
(select b.*
from b
where b.product_id = a.product_id and b.price_change_date <= a.purchase_date
order by b.price_change_date desc
limit 1
) b;
EDIT:
The above gives you the most recent price change information. If you want records in a before the original price, then you can use:
select a.*
from a left join
(select b.product_id, min(b.price_change_date) as min_price_change_date
from b
group by product_id
) b
on a.purchase_date < b.price_change_date;

Ranking of a tuple in another table

So I have 2 tables, team A and team B, with their score. I want the rank of the score of every member of team A within team B using SQL or vertica, as shown below
Team A Table
user score
-------------
asa 100
bre 200
cqw 50
duy 50
Team B Table
user score
------------
gfh 20
ewr 80
kil 70
cvb 90
Output:
Team A Table
user score rank in team B
------------------------------
asa 100 1
bre 200 1
cqw 50 4
duy 50 4
Try this - and this only works in Vertica.
INTERPOLATE PREVIOUS VALUE is an outer-join predicate specific to Vertica that joins two tables on non-equal columns, using the 'last known' value in the outer-joined table to make a match succeed.
WITH
-- input, don't use in query itself
table_a (the_user,score) AS (
SELECT 'asa',100
UNION ALL SELECT 'bre',200
UNION ALL SELECT 'cqw',50
UNION ALL SELECT 'duy',50
)
,
table_b(the_user,score) AS (
SELECT 'gfh',20
UNION ALL SELECT 'ewr',80
UNION ALL SELECT 'kil',70
UNION ALL SELECT 'cvb',90
)
-- end of input - start WITH clause here
,
ranked_b AS (
SELECT
RANK() OVER(ORDER BY score DESC) AS the_rank
, *
FROM table_b
)
SELECT
a.the_user AS a_user
, a.score AS a_score
, b.the_rank AS rank_in_team_b
FROM table_a a
LEFT JOIN ranked_b b
ON a.score INTERPOLATE PREVIOUS VALUE b.score
ORDER BY 1
;
a_user|a_score|rank_in_team_b
asa | 100| 1
bre | 200| 1
cqw | 50| 4
duy | 50| 4
Simple correlated query should do:
select
a.*,
(select count(*) + 1 from table_b b where b.score > a.score) rank_in_b
from table_a a;
All you need to do is count the number of people with more score than current user in the table b and add 1 to it to get the rank.

Select distinct rows with max date with repeated and null values (Oracle)

I've 3 tables. Let's say Root, Detail and Revision
I need to select the distinct codes from Root with the highest revision date, having count that the revision lines may not exist and/or have repeteated values in the date column.
Root: idRoot, Code
Detail: idDetail, price, idRoot
Revision: idRevision, date, idDetail
So, i've started doing the join query:
select code, price, date from Root r
inner join Detail d on d.idRoot = r.idRoot
left join Revision r on d.idDetail = r.idDetail;
Having table results like this:
CODE|PRICE|DATE idRevision
---- ----- ----- -----------
C1 100 2/1/2016 1
C1 120 2/1/2016 3
C1 150 null 2
C1 200 1/1/2016 4
C2 300 null null
C3 400 3/1/2016 6
But what I really need is the next result:
CODE|PRICE|DATE idRevision
---- ----- ----- -----------
C1 120 2/1/2016 3
C2 300 null null
C3 400 3/1/2016 6
I've seen several answers for similar cases, but never with null and repeated values:
Oracle: Taking the record with the max date
Fetch the row which has the Max value for a column
Oracle Select Max Date on Multiple records
Any kind of help would be really appreciated
You can use row_number():
select code, price, date
from (select code, price, date,
row_number() over (partition by code order by date desc nulls last, idRevision desc) as seqnum
from Root r inner join
Detail d
on d.idRoot = r.idRoot left join
Revision r
on d.idDetail = r.idDetail
) rdr
where seqnum = 1;

select rows from main table based on highest date in child table between a date range

Sorry for the confusing title.
I've this table:
ApplicantID Applicant Name
-------------------------------
1 Sandeep
2 Thomas
3 Philip
4 Jerin
ALong with this child table which is connected with the above table:
DetailsID ApplicantID CourseName Dt
---------------------------------------------------------------------
1 1 C1 10/5/2014
2 1 C2 10/18/2014
3 1 c3 7/3/2014
4 2 C1 3/2/2014
5 2 C2 10/18/2014
6 2 c3 1/1/2014
7 3 C1 1/5/2014
8 3 C2 4/18/2014
9 3 c3 2/23/2014
10 4 C1 3/15/2014
11 4 C2 2/20/2014
12 4 C2 2/20/2014
I want to get applicantsID, for example, when I specify a date range from
4/20/2014 to 3/5/2014 I should have:
ApplicantID Applicant Name
-------------------------------
3 Philip
4 Jerin
That means the applicants from the main table that must be in the second table and also the highest date of the second table must fall in the specified date range. Hope the scenario is clear.
you can use window analytic function row_number to get applicant with maximum date in the given time range.
select T1.[ApplicantID], [Applicant Name]
from Table1 T1
join ( select [ApplicantID],
ROW_NUMBER() over ( partition by [ApplicantID] order by Dt desc) as rn
from Table2
where Dt BETWEEN '3/5/2014' AND '4/20/2014'
) T
on T1.[ApplicantID] = T.[ApplicantID]
and T.rn =1
You will need to pull the MAX per ApplicantId with a GROUP BY in a sub-query, then JOIN to that result. This should work for you:
Select A.ApplicantId, A.[Applicant Name]
From ApplicantTableName A
Join
(
Select D.ApplicantId, Max(D.Dt) DT
From DetailsTableName D
Group By D.ApplicantId
) B On A.ApplicantId = B.ApplicantId
Where B.DT Between '03/05/2014' And '04/20/2014'

SQL Server group by set of results

I have a table with data that look like this:
product_id | filter_id
__________________
4525 5066
4525 5068
4525 5091
4526 5066
4526 5068
4526 5094
4527 5066
4527 5068
4527 5094
4528 5066
4528 5071
4528 5078
which is actualy groups of three filters for each product e.g. product 4525 has the filters 5066,5068 and 5091.
The second and third group, is the exact same set of filters (5066,5068 and 5094) bound to a different product ( 4526 and 4527 ).
I want to have each unique filter set only one time ( in other words, I want to remove the duplicate sets of filter_ids ). I don't really care what will happen to the product_id, I only want my unique sets of three filter_ids to be grouped with a key.
For example this will also do:
new_id | filter_id
__________________
1 5066
1 5068
1 5091
2 5066
2 5068
2 5094
3 5066
3 5071
3 5078
I hope I explained it well enough.
Thank you.
Please try below query, which is a bit longer than I expected. Not getting any other logic as of now !!!
select
distinct filter_id,
DENSE_RANK() over(order by sc) new_id
from(
select *,
(SELECT ' ' + cast(filter_id as nvarchar(10))
FROM tbl b where b.product_id=a.product_id order by filter_id
FOR XML PATH('')) SC
From tbl a
)x
order by new_id
/-------------- Other Way ------------------/
SELECT
DENSE_RANK() OVER (ORDER BY PRODUCT_ID) new_id,
filter_id
FROM
Table1
WHERE product_id in (
SELECT MIN(product_id) FROM(
SELECT
product_id,
SUM(filter_id*RN) OVER (PARTITION BY PRODUCT_ID) SM
FROM(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY filter_id) RN
FROM Table1
)x
)xx GROUP BY SM)
Select dense_rank()
over(order by product_id asc),filter_id
from table
If I understand well the question the expected result only have the filter_id of the product 4525, 4526 and 4528 because 4526 and 4527 have the same filter_id, so only one of those is needed, in that case this query will do:
SELECT product_id
, dense_rank() OVER (ORDER BY PRODUCT_ID) new_id
, filter_id
FROM table1 c
WHERE NOT EXISTS (SELECT 1
FROM table1 a
LEFT JOIN table1 b ON a.product_id < b.product_id
WHERE b.product_id = c.product_id
GROUP BY a.product_id, b.product_id
HAVING COUNT(DISTINCT a.filter_id)
= COUNT(CASE WHEN a.filter_id = b.filter_id THEN 1
ELSE NULL
END));
SQLFiddle demo
To get the result the first step is to remove the products with a full duplicate list of filter_ID. To get those product the subquery check every product couple to see if the number of filter_id in one is equal to the filter_id shared by the couple.
If you can have product with different number of filters and if a product with a list of filter fully contained in the filter list of another product should be removed from the result, for example if with the base data
product_id | filter_id
-----------+----------
4525 | 5066
4525 | 5068
4525 | 5091
4526 | 5066
4526 | 5068
the expected result is
new_id | filter_id
-------+----------
1 | 5066
1 | 5068
1 | 5091
the query need to be changed to
SELECT product_id
, dense_rank() OVER (ORDER BY PRODUCT_ID) new_id
, filter_id
FROM table1 c
WHERE NOT EXISTS (SELECT b.product_id
FROM table1 a
LEFT JOIN table1 b ON a.product_id < b.product_id
WHERE b.product_id IS NOT NULL
AND b.product_id = c.product_id
GROUP BY a.product_id, b.product_id
HAVING COUNT(DISTINCT a.filter_id)
= COUNT(CASE WHEN a.filter_id = b.filter_id THEN 1
ELSE NULL
END)
OR COUNT(DISTINCT b.filter_id)
= COUNT(CASE WHEN a.filter_id = b.filter_id THEN 1
ELSE NULL
END));
SQLFiddle Demo
I came out with a query quite similar to the second one of TechDo, nine hour after after him. Even if the result is similar, as the idea is different, my idea is to concat the values of filter_id with math
;WITH B AS (
SELECT Product_ID
, filter_id = filter_id - MIN(filter_id) OVER (PARTITION BY NULL)
, _ID = Row_Number() OVER (PARTITION BY Product_ID ORDER BY filter_id) - 1
, N = CEILING(LOG10(MAX(filter_id) OVER (PARTITION BY NULL)
- MIN(filter_id) OVER (PARTITION BY NULL)))
FROM table1 a
), G1 AS (
SELECT Product_ID
, _ID = SUM(Filter_ID * POWER(10, N * _ID))
FROM B
GROUP BY Product_ID
), G2 AS (
SELECT Product_ID = MIN(Product_ID)
FROM G1
GROUP BY _ID
)
SELECT g2.product_id
, dense_rank() OVER (ORDER BY g2.PRODUCT_ID) new_id
, a.filter_id
FROM G2
INNER JOIN table1 a ON g2.product_id = a.product_id;
SQLFiddle demo
The first CTE do a lot of work:
filter_id is reduced in rank (the reduction from 0 to n-1 digits, depending on the range of the data)
is generated a order number for the filter within the product (_ID)
is calculated the max number of digits of the reduced filter_id (N)
In the following CTE those values are used to generate the filter concatenation using the SUM, the formula SUM(Filter_ID * POWER(10, N * _ID)) put a reduced filter_id every N position, for example with the data provided by the OP we have that the max difference of filter_id is 28, so N is 2 and the results are (the points are added for readability)
Product_ID _ID
----------- -----------
4525 25.02.00
4526 28.02.00
4527 28.02.00
4528 12.05.00
The formula used make collision between different filter group impossible, but need a larger space to be calculated, if the range of the filter_id is big it can hit the limit if the integer.