Types of joins and expected output - sql

I have a table that has wholesale data and retail data.
the data is structured as
Channel
Serial#
Date
WS-Build
12345
1/1/2019
WS-Dealer
34567
1/5/2021
Retail
12345
1/1/2020
Retail
34567
3/5/2021
I would like the output to match on serial#
Each serial # will appear twice in the table. I am trying to get a count of # of units sold via builder or dealer.
Serial#
Channel
WholesaleDate
Retail Date
12345
WS-Build
1/1/2019
1/1/2020
34567
WS-Dealer
1/5/2021
3/5/2021
How can i achieve that by joining on the same table?

Try join by serial and channel
select t1.serial#, t2.WholesaleDate, t2."Retail Date", (*) from table1 t1
join table2 t2 on t1.serial# = t2.serial# and t1.channel = t2.channel
group by t1.serial#, , t2.WholesaleDate, t2."Retail Date";

As long as the retail is after the sale you can do
but i don't get where the counts come in
SELECT
t1."Serial#",t1."Channel", t1."Date" as WholesaleDate, t2."Date" as "Retail Date"
FROM tab1 t1 JOIN tab1 t2 ON t1."Serial#" = t2."Serial#" AND t1."Date" < t2."Date"
Serial#
Channel
wholesaledate
Retail Date
12345
WS-Build
2019-01-01 00:00:00
2020-01-01 00:00:00
34567
WS-Dealer
2021-05-01 00:00:00
2021-05-03 00:00:00
SELECT 2
fiddle

Related

In Postgresql, how do I use joins with multiple conditions including >= and <=

I have table A and table B. Each row in table A represents every time a user sends a message. Each row in table B represents every time a user buys a gift.
Goal: for each time a user sends a message, calculate how many gifts they've purchased within 7 days before the timestamp they sent the message. Some users never send messages and some never purchased gifts. If the user in table A didn't have gift purchased within 7 days, the count should be 0.
Table A:
user_id
time
12345
2021-09-04 09:43:55
12345
2021-09-03 00:39:30
12345
2021-09-02 03:26:07
12345
2021-09-05 15:48:34
23456
2021-09-09 09:06:22
23456
2021-09-08 08:06:21
00001
2021-09-03 15:38:15
00002
2021-09-03 15:38:15
Table B:
user_id
time
12345
2021-09-01 09:43:55
12345
2021-08-03 00:42:30
12345
2021-09-03 02:16:07
00003
2021-09-05 15:48:34
23456
2021-09-03 09:06:22
23456
2021-09-10 08:06:21
Expected output:
user_id
time
count
12345
2021-09-04 09:43:55
2
12345
2021-09-03 00:39:30
1
12345
2021-09-02 03:26:07
1
12345
2021-09-05 15:48:34
2
23456
2021-09-09 09:06:22
1
23456
2021-09-08 08:06:21
1
00001
2021-09-03 15:38:15
0
00002
2021-09-03 15:38:15
0
Query I tried:
SELECT A.user_id, A.time, coalesce(count(*), 0) as count
FROM A
LEFT JOIN B ON A.user_id = B.user_id AND B.time >= A.time - INTERVAL '7 days' AND B.time < A.time
GROUP BY 1,2
The count returned doesn't match the expected result however, not sure if I'm doing the join and conditions correctly.
You need to count the values from the possibly NULL columns i.e. from table B in order to get the correct counts of non-existent matches. i.e. being more specific in COUNT(*) to COUNT(b.column_from_b_table). See modification with working demo fiddle below:
SELECT
A.user_id,
A.time,
coalesce(count(B.user_id), 0) as count
FROM A
LEFT JOIN B ON A.user_id = B.user_id AND
B.time >= A.time - INTERVAL '7 days' AND
B.time < A.time
GROUP BY 1,2;
user_id
time
count
1
2021-09-03T15:38:15.000Z
0
12345
2021-09-05T15:48:34.000Z
2
23456
2021-09-08T08:06:21.000Z
1
12345
2021-09-04T09:43:55.000Z
2
12345
2021-09-03T00:39:30.000Z
1
23456
2021-09-09T09:06:22.000Z
1
2
2021-09-03T15:38:15.000Z
0
12345
2021-09-02T03:26:07.000Z
1
View on DB Fiddle
Let me know if this works for you.

Aggregate data based on fixed moving date window in Presto

I wanted to:
aggregate numbers in a “3-months" rolling window, (eg Jan-Mar, Feb-Apr, Mar-May....)
then compare the same country & city with last year's same rolling window
Table I already have: (unique at: country + city + month level)
country city month sum
US A 2019-03-01 3
US B 2019-03-01 4
DE C 2019-03-01 5
US A 2019-03-01 3
CN B 2019-03-01 4
US B 2019-04-01 4
UK C 2019-04-01 7
US C 2019-04-01 2
....
US A 2019-12-01 10
US B 2020-12-01 6
US C 2021-01-01 7
Step 1 ideal output:
country city period sum
US A 2019-03-01~2019-05-01 XXX
US A 2019-04-01~2019-06-01 YYY
UK A 2019-03-01~2019-05-01 ZZZ
...
UK A 2020-12-01~2021-02-01 BBB
Step 2 ideal output:
country city period sum last_year_sum year_over_year_%
US A 2019-03-01~2019-05-01 XXX 111 40%
US A 2019-04-01~2019-06-01 YYY 1111 30%
UK A 2019-03-01~2019-05-01 ZZZ 11111 20%
...
UK A 2020-12-01~2021-02-01 BBB 1111 15%
Ideally, I wanted to achieve this in Presto - any idea how to do that? Thanks!!
Unfortunately, Presto doesn't support the range window frame specification using dates. One method uses joins and aggregation and then lag() to get the last year amount:
select t.country, t.city, t.sum,
sum(t2.sum) as this_year_sum,
lag(sum(t2.sum), 12) over (partition by country, city order by month) as prev_year_sum,
(-1 +
sum(t2.sum) /
lag(sum(t2.sum), 12) over (partition by country, city order by month)
) as yoy_increase
from t left join
t t2
on t2.country = t.country and
t2.city = t.city and
t2.month >= t.month and
t2.month <= t.month + interval '2' month
group by t.country, t.city, t.sum;
Note: This assumes that you have data for all months for each country/city combination.

subquery calculate days between dates

Sub query, SQL, Oracle
I'm new to sub queries and hoping to get some assistance. My thought was the sub query would run first and then the outer query would execute based on the sub query filter of trans_code = 'ABC'. The query works but it pulls all dates from all transaction codes, trans_code 'ABC' and 'DEF' ect.
The end goal is to calculate the number of days between dates.
The table structure is:
acct_num effective_date
1234 01/01/2020
1234 02/01/2020
1234 03/01/2020
1234 04/01/2021
I want to execute a query to look like this:
account Effective_Date Effective_Date_2 Days_Diff
1234 01/01/2020 02/01/2020 31
1234 02/01/2020 03/01/2020 29
1234 03/01/2020 04/01/2021 395
1234 04/01/2021 0
Query:
SELECT t3.acct_num,
t3.trans_code,
t3.effective_date,
MIN (t2.effective_date) AS effective_date2,
MIN (t2.effective_date) - t3.effective_date AS days_diff
FROM (SELECT t1.acct_num, t1.trans_code, t1.effective_date
FROM lawd.trans t1
WHERE t1.trans_code = 'ABC') t3
LEFT JOIN lawd.trans t2 ON t3.acct_num = t2.acct_num
WHERE t3.acct_num = '1234' AND t2.effective_date > t3.effective_date
GROUP BY t3.acct_num, t3.effective_date, t3.trans_code
ORDER BY t3.effective_date asc
TIA!
Use lead():
select t.*,
lead(effective_date) over (partition by acct_num order by effect_date) as next_efffective_date,
(lead(effective_date) - effective_date) as diff
from lawd.trans t

Tracing original Value through Iteration SQL

Suppose there is a data collection system that, whenever a record is altered, it is then saved as a new record with a prefix (say M-[most recent number in que and is unique]).
Suppose I am given the following data set:
Customer | Original_Val
1 1020
2 1011
3 1001
I need to find the most recent value for each customer given the following table:
Customer | Most_Recent_Val | Pretained_To_Val | date
1 M-2000 M-1050 20170225
1 M-1050 M-1035 20170205
1 M-1035 1020 20170131
1 1020 NULL 20170101
2 M-1031 1011 20170105
2 1011 NULL 20161231
3 1001 NULL 20150101
My desired output would be:
Customer | Original_Val | Most_Recent_Val | date
1 1020 M-2000 20170225
2 1011 M-1031 20170105
3 1001 1001 20150101
For customer 1, there are 4 levels i.e (M-2000 <- M-1050 <- M-1035 <- 1020) Note that there would be no more than 10 levels of depth for each customer.
Much Appreciated! Thanks in advance.
Find the min and max of each customer and then join it together. Something like this:
Select
[min].Customer
,[min].Most_Recent_Val as Original_Val
,[max].Most_Recent_Val as Most_Recent_Val
,[max].date
From
(
Select
Customer
,Most_Recent_Val
,date
From
table t1
inner join (
Select
Customer
,MIN(date) as MIN_Date
From
table
Group By
Customer
) t2 ON t2.Customer = t1.Customer
and t2.MIN_Date = t1.Date
) [min]
inner join (
Select
Customer
,Most_Recent_Val
,date
From
table t1
inner join (
Select
Customer
,MAX(date) as MAX_Date
From
table
Group By
Customer
) t2 ON t2.Customer = t1.Customer
and t2.MAX_Date = t1.Date
) [max] ON [max].Customer = [min].Customer

LEFT JOIN on multiple columns with unwanted duplicates

I have been running in circles with a query that is driving me nuts.
The background:
I have two tables, and unfortunately, both have duplicate records. (Dealing with activity logs if that puts it into perspective). Each table comes from a different system and I am trying to join the data together to get a sudo full picture (I realize that I won't get a perfect view because there is no "event key" shared between the two systems; I am attempting to match on a composite of meta data).
Here is what I am working with:
Table1
------------
JobID CustID Name ActionDate IsDuplicate
12345 11111 Ryan 1/1/2015 01:20:20 False
12345 11112 Bob 1/1/2015 02:10:20 False
12345 11111 Ryan 1/1/2015 04:15:35 True
12346 11113 Jim 1/1/2015 05:10:40 False
12346 11114 Jeb 1/1/2015 06:10:40 False
12346 11111 Ryan 1/1/2015 07:10:30 False
Table2
------------
ResponseID CustID ActionDate Browser
11123 10110 12/1/2014 23:32:15 IE
12345 11111 1/1/2015 03:20:20 IE
12345 11112 1/1/2015 05:10:20 Firefox
12345 11111 1/1/2015 06:15:35 Firefox
12346 11113 1/1/2015 07:10:40 Chrome
12346 11114 1/1/2015 08:10:40 Chrome
12346 11111 1/1/2015 10:10:30 Safari
12213 11123 2/1/2015 01:10:30 Chrome
Please note a few things:
- JobID and ResponseID are the same thing
- JobID and ResponseID are indicators of an event on the site (people are responding to an event)
- Action date does not match (system 2 has about an inconsistent 2 hour delay on it but never more that 3 hours delay)
- Note Table2 doesnt have a duplicate flag
- table 1 (~2,000 records) is significantly smaller than table 2 (~16,000 records)
- Note Cust 11111 is bopping around on browsers, taking the same action twice on job 12345 at different times and only taking action once on job 12346
What I am looking for:
Result (ideal)
------------
t1.JobID t1.CustID t1.Name t1.ActionDate t2.Browser
12345 11111 Ryan 1/1/2015 01:20:20 IE
12345 11112 Bob 1/1/2015 02:10:20 Firefox
12345 11111 Ryan 1/1/2015 04:15:35 Firefox
12346 11113 Jim 1/1/2015 05:10:40 Chrome
12346 11114 Jeb 1/1/2015 06:10:40 Chrome
12346 11111 Ryan 1/1/2015 07:10:30 Safari
Note that I JUST want matches for records in Table1. I am getting tons of duplicates because of the join...Which is frustrating.
Here is what I have so far (which I can humbly can say; isn't really close):
SELECT
t1.JobID,
t1.CustID,
t1.Name,
t1.ActionDate,
t2.Browser
FROM
Table1 t1
LEFT OUTER JOIN
Table2 t2
ON
t1.JobID=t2.ResponseID AND
t1.CustID=t2.CustID AND
DATEPART(dd,t1.ActionDate)=DATEPART(dd,t2.ActionDate)
Try changing the join condition for the date to check that t2.actiondate fulfills the condition t1.actiondate <= t2.actiondate <= t1.actiondate + 3 hours
SELECT
t1.JobID, t1.CustID, t1.Name, t1.ActionDate, t2.Browser
FROM
Table1 t1
LEFT JOIN Table2 t2
ON t1.JobID = t2.ResponseID
AND t1.CustID = t2.CustID
AND t2.ActionDate >= t1.ActionDate
AND t2.ActionDate <= DATEADD(hour, 3, t1.ActionDate)
ORDER BY t1.JobID , t1.ActionDate;
With your sample data the result of this query matches your desired result.
One method is to enumerate each table using row_number() and match on the sequence numbers as well:
select t1.JobID, t1.CustID, t1.Name, t1.ActionDate, t2.Browser
from (select t1.*,
row_number() over (partition by JobId, CustId order by ActionDate) as seqnum
from table t1
) t1 join
(select t2.*
row_number() over (partition by ResponseId, CustId order by ActionDate) as seqnum
from table t2
) t2
on t1.JobId = t2.ResponseId and
t1.CustId = t2.CustId and
t1.seqnum = t2.seqnum;
This works for your sample data. However, if there is not a response for every job, then the alignment might get out of whack. If that is a possibility, then date arithmetic might be the better solution.