SQL : automatically fill price between dates - sql

I'm trying to write a view from two tables, one referential table that contains products ID and weeks :
+------------+------+
| Product_id | week | t1
+------------+------+
| 1 | 1 |
| 2 | 1 |
| 1 | 2 |
| 2 | 2 |
| 1 | 3 |
| 2 | 3 |
+------------+------+ etc...
the other one contains Products ID, weeks when the product's price changed and the price
+------------+------+-------+
| Product_id | week | price | t2
+------------+------+-------+
| 1 | 1 | 70 |
| 1 | 2 | 50 |
| 2 | 2 | 70 |
| 1 | 4 | 30 |
| 2 | 4 | 40 |
+------------+------+-------+
I know how to achieve easily this by joining the two tables :
+------------+------+-------+
| Product_id | week | price |
+------------+------+-------+
| 1 | 1 | 70 |
| 1 | 2 | 50 |
| 1 | 3 | |
| 1 | 4 | 30 |
| 1 | 5 | |
| 2 | 1 | |
| 2 | 2 | 70 |
| 2 | 3 | |
| 2 | 4 | 40 |
| 2 | 5 | |
+------------+------+-------+
But my goals would rather be to fill in the gaps and have the price for each week (without creating any new table), as such :
+------------+------+-------+
| Product_id | week | price |
+------------+------+-------+
| 1 | 1 | 70 |
| 1 | 2 | 50 |
| 1 | 3 | 50 |
| 1 | 4 | 30 |
| 1 | 5 | 30 |
| 2 | 1 | |
| 2 | 2 | 70 |
| 2 | 3 | 70 |
| 2 | 4 | 40 |
| 2 | 5 | 40 |
+------------+------+-------+ (product 2 isn't sold yet at week 1, so it doesn't have a price).
I can't see how I would do this in SQL. I haven't used PARTITION BY or LAG yet, and it might be what I'm looking for. If anyone can push me in the right direction, I would appreciate it :)

You can use window functions - the ignore nulls clause, which teradata supports, comes handy here:
select
t1.product_id,
t1.week,
coalesce(
t2.price,
lag(t2.price ignore nulls) over(partition by t1.product_id order by t1.week)
) price
from t1
left join t2
on t2.product_id = t1.product_id
and t2.week = t1.week
Or better yet, as suggested by dnoeth, you can use last_value(), which avoids the need for coalesce():
select
t1.product_id,
t1.week,
last_value(t2.price ignore nulls) over(partition by t1.product_id order by t1.week) price
from t1
left join t2
on t2.product_id = t1.product_id
and t2.week = t1.week

Use a cross join to generate the rows, then left join and window functions:
with weeks as (
select row_number() over (order by product_id) as n
from table1
)
select t1.product_id, w.n as week,
coalesce(t2.price, lag(t2.price ignore nulls) over (partition by p.product_id order by w.n)
) as price
from (select distinct product_id
from table1 t1
) p cross join
weeks w left join
table2 t2
on t2.product_id = p.product_id and t2.week = w.week
where w.n <= 5

You can do this with a LEFT JOIN.
SELECT t1.Product_id, t1.week, tmp.price
FROM t1
LEFT JOIN t2 tmp ON tmp.Product_id = t1.Product_id AND
tmp.week = (SELECT MAX(week) FROM t2
WHERE Product_id = tmp.Product_id AND week <= t1.week)
ORDER BY t1.Product_id, t1.week
I would argue it's cleaner yet with OUTER APPLY, but I don't know if that's supported by teradata.
SELECT t1.Product_id, t1.week, oa.price
FROM t1
OUTER APPLY (SELECT TOP 1 price FROM t2
WHERE Product_id = t1.Product_id AND week <= t1.week
ORDER BY week DESC) oa
ORDER BY t1.Product_id, t1.week

Related

How to join a grouped table in sql?

Novice in SQL here but hopefully someone can help. I have two tables. For the simplicity here is how the tables are structured.
Table 1:
+------------+-------+-----------+------------+
| department | sales | date | sales_code |
+------------+-------+-----------+------------+
| 1 | 50 | 5/26/2021 | A |
+------------+-------+-----------+------------+
| 2 | 150 | 5/26/2021 | B |
+------------+-------+-----------+------------+
| 1 | 200 | 5/25/2021 | C |
+------------+-------+-----------+------------+
| 2 | 250 | 5/24/2021 | D |
+------------+-------+-----------+------------+
Table 2:
+------+------------+-------+-----------+-----------------------+
| item | department | sales | date | column I want to join |
+------+------------+-------+-----------+-----------------------+
| 31 | 1 | 50 | 5/26/2021 | x |
+------+------------+-------+-----------+-----------------------+
| 30 | 2 | 150 | 5/26/2021 | x |
+------+------------+-------+-----------+-----------------------+
| 29 | 1 | 200 | 5/25/2021 | x |
+------+------------+-------+-----------+-----------------------+
| 28 | 2 | 250 | 5/24/2021 | x |
+------+------------+-------+-----------+-----------------------+
I need to join table 2 to table 1 - however it needs to be aggregated by department sales first, this is because table 2 is already aggregated by department sales. Here is what I was thinking but cannot seem to get it to work.
SELECT t1.*, t2.*
FROM table1 as t1
JOIN (
SELECT department, date, column_i_want, sum(sales)
FROM table2
GROUP BY department ) as t2
ON t2.department = t1.department AND t1.date = t2.date
Desired Output:
+------------+-------+-----------+------------+-----------------------+
| department | sales | date | sales_code | column I want to join |
+------------+-------+-----------+------------+-----------------------+
| 1 | 50 | 5/26/2021 | A | x |
+------------+-------+-----------+------------+-----------------------+
| 2 | 150 | 5/26/2021 | B | x |
+------------+-------+-----------+------------+-----------------------+
| 1 | 200 | 5/25/2021 | C | x |
+------------+-------+-----------+------------+-----------------------+
| 2 | 250 | 5/24/2021 | D | x |
+------------+-------+-----------+------------+-----------------------+
Any help would be appreciated.
There are several ways to go about doing that, the easiest one is to create a view
CREATE VIEW t2 AS
SELECT department, date, column_i_want, sum(sales)
FROM table2
GROUP BY department;
then it's easier to join them (you can also use a With clause instead of a view but it can get messy)
SELECT *
FROM table1 NATURAL JOIN t2
here is what you want:
select t2.*, t1.sales_code
from table2 t2
join table1 t1
on t1.department = t2.department
and t1.date = t2.date

SQL conditional on another table

I am pretty new to this so I hope this is clear. How do I get the data from one table, based on 1 month before the month in another table? Example with 2 tables. Using SQL.
First Table_A: Members, amount spent each month. Primary key composite of Name+Month
+---------+--------+--------+
| Name | Month | Amount |
+---------+--------+--------+
| James | 202001 | 10 |
| James | 202002 | 5 |
| James | 202003 | 8 |
| Michael | 202001 | 3 |
| Michael | 202002 | 4 |
| Michael | 202003 | 5 |
| Michael | 202004 | 6 |
| Tom.... | 202001 | 12 |
| Tom.... | 202002 | 10 |
| Tom.....| 202003 | 7 |
| Tom.... | 202004 | 2 |
+---------+--------+--------+
Second Table_B: Members and month unsubscribed. Primary key is Name.
+--------+--------+
| Name | Month |
+--------+--------+
| James | 202003 |
| Tom....| 202004 |
+--------+--------+
Final output_table: Members and amount 1 month before unsubscribing + current members latest month amount and their status. Primary key should be Name.
+---------+--------+--------+--------------+
| Name | Month | Amount | Status |
+---------+--------+--------+--------------+
| James | 202002 | 5 | Unsubscribed |
| Tom | 202003 | 7 | Unsubscribed |
| Michael | 202004 | 6 | Subscribed |
+---------+--------+--------+--------------+
This is a bit tricky. I think this does the trick:
select t1.*, t2.amount,
(case when t2.name is null then 'subscribed' else 'unsubscribed' end)
from (select t1.*, row_number() over (partition by name order b amount desc) as seqnum
from table1 t1
) t1 left join
table2 t2
on t1.name = t2.name
where t2.name is not null and seqnum = 2 or
t2.name is null and seqnum = 1;
CTE approach,
with unsubscribed as(select
first.name, first.month, first.amount
from table1 first
inner join table2 second
on lower(trim(first.name)) = lower(trim(second.name))
and first.month = second.month)
,
subscribed as (select *,
case when name not in (select distinct name from unsubscribed) then 'subscribed'
else 'unsubscribed' end as status from table1)
select name,month,amount,status from (
select *, row_number()over(partition by name order by month desc)as rn from subscribed
where (name,month) not in (select distinct name,month from unsubscribed)) where rn = 1
order by month

Hive window functions: last value of previous partition

Using Hive window functions, I would like to get the last value of the previous partition:
| name | rank | type |
| one | 1 | T1 |
| two | 2 | T2 |
| thr | 3 | T2 |
| fou | 4 | T1 |
| fiv | 5 | T2 |
| six | 6 | T2 |
| sev | 7 | T2 |
Following query:
SELECT
name,
rank,
first_value(rank over(partition by type order by rank)) as new_rank
FROM my_table
Would give:
| name | rank | type | new_rank |
| one | 1 | T1 | 1 |
| two | 2 | T2 | 2 |
| thr | 3 | T2 | 2 |
| fou | 4 | T1 | 4 |
| fiv | 5 | T2 | 5 |
| six | 6 | T2 | 5 |
| sev | 7 | T2 | 5 |
But what I need is "the last value of the previous partition":
| name | rank | type | new_rank |
| one | 1 | T1 | NULL |
| two | 2 | T2 | 1 |
| thr | 3 | T2 | 1 |
| fou | 4 | T1 | 3 |
| fiv | 5 | T2 | 4 |
| six | 6 | T2 | 4 |
| sev | 7 | T2 | 4 |
This seems quite tricky. This is a variant of group-and-islands. Here is the idea:
Identify the "islands" where type is the same (using difference of row numbers).
Then use lag() to introduce the previous rank into the island.
Do a min scan to get the new rank that you want.
So:
with gi as (
select t.*,
(seqnum - seqnum_t) as grp
from (select t.*,
row_number() over (partition by type order by rank) as seqnum_t,
row_number() over (order by rank) as seqnum
from t
) t
),
gi2 as (
select gi.*, lag(rank) over (order by gi.rank) as prev_rank
from gi
)
select gi2.*,
min(prev_rank) over (partition by type, grp) as new_rank
from gi2
order by rank;
Here is a SQL Fiddle (albeit using Postgres).

Fill table with last result

I have a table that contains the following information:
id | amount | date | customer_id
1 | 0.00 | 11/12/17 | 1
2 | 54.00 | 11/12/17 | 1
3 | 60.00 | 02/12/18 | 1
4 | 0.00 | 01/18/17 | 2
5 | 14.00 | 03/12/17 | 2
6 | 24.00 | 02/22/18 | 2
7 | 0.00 | 09/12/16 | 3
8 | 74.00 | 10/01/17 | 3
What I need it to look like is the following:
ranked_id | id | amount | date | customer_id
1 | 1 | 0.00 | 11/12/17 | 1
2 | 2 | 54.00 | 11/12/17 | 1
3 | 3 | 60.00 | 02/12/18 | 1
4 | 3 | 60.00 | 02/12/18 | 1
5 | 3 | 60.00 | 02/12/18 | 1
6 | 3 | 60.00 | 02/12/18 | 1
7 | 3 | 60.00 | 02/12/18 | 1
8 | 4 | 0.00 | 01/18/17 | 2
9 | 5 | 14.00 | 03/12/17 | 2
10 | 6 | 24.00 | 02/22/18 | 2
11 | 6 | 24.00 | 02/22/18 | 2
12 | 6 | 24.00 | 02/22/18 | 2
13 | 6 | 24.00 | 02/22/18 | 2
14 | 6 | 24.00 | 02/22/18 | 2
15 | 7 | 0.00 | 09/12/16 | 3
16 | 8 | 74.00 | 10/01/17 | 3
17 | 8 | 74.00 | 10/01/17 | 3
18 | 8 | 74.00 | 10/01/17 | 3
19 | 8 | 74.00 | 10/01/17 | 3
20 | 8 | 74.00 | 10/01/17 | 3
21 | 8 | 74.00 | 10/01/17 | 3
I know that there's something with partitioning and ranking (on the ranked_id), but I can't figure out how to repeat the last row 7 times.
As #Gordon Linoff suggested you can use the generate_series() function crossed with the distinct customer_ids to generate all the rows needed as in T1 below. Then in T2 (also below) the row_number function is used to generate a sequential value to outer join to from t1 along with the customer_id.
From there it's just a matter of being able to get at the last value per customer_id when there is no original data to join to which is where the case statement and analytic first_value functions come in. I couldn't get the last_value analytic function to work likely due to postgresql's lack of an ignore nulls directive, so I used first_Value with a descending sort order, and only return the analytic value when no other data exists.
with t1 as (
select distinct
dense_rank() over (order by customer_id, generate_series) ranked_id
, customer_id
, generate_series
from table1
cross join generate_series(1,7)
), t2 as (
select row_number() over (partition by customer_id order by id) rn
, table1.*
from table1
)
select t1.ranked_id
, case when t2.customer_id is not null
then t2.id
else first_value(t2.id)
over (partition by t1.customer_id
order by id desc nulls last)
end id
, case when t2.customer_id is not null
then t2.amount
else first_value(t2.amount)
over (partition by t1.customer_id
order by id desc nulls last)
end amount
, case when t2.customer_id is not null
then t2.date
else first_value(t2.date)
over (partition by t1.customer_id
order by id desc nulls last)
end date
, t1.customer_id
from t1
left join t2
on t2.customer_id = t1.customer_id
and t2.id = t1.generate_series
order by ranked_id;
Here's a SQL Fiddle demonstrating the code.
In Postgres, you can use generate_series() and a cross join to generate all the rows. Then you can pick the one you want:
select row_number() over (order by customer_id, id) as ranking_id,
coalesce(t.id, cid) as id, coalesce(t.amount, c.amount) as amount
coalesce(t.date, c.date) as date, t.customer_id
from (select distinct on (customer_id) t.*
from t
order by customer_id, date desc
) c cross join
generate_series(1, 7) g(i) left join
(select t.*, row_number() over (partition by customer_id order by date) as i
from t
) t
on t.customer_id = c.customer_id and t.i = g.i;

SQL Query to get results that match between three tables, or a single result for no match

Is there a way to use a where clause to check if there were zero matches between tables for a record from the first table, and produce one row or results reflecting that?
I'm trying to get results that look like this:
+----------+----------+-----------+----------+-------------+
| Results |
+----------+----------+-----------+----------+-------------+
| Date | Queue ID | From Date | To Date | Campaign ID |
| 3/1/2014 | 1 | 2/24/2014 | 3/2/2014 | 1 |
| 3/1/2014 | 2 | (NULL) | (NULL) | (NULL) |
+----------+----------+-----------+----------+-------------+
From a combination of tables that look like this:
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
| Table 1 | | Table 2 | | Table 3 |
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
| Date | Queue | | Queue | SP | | SP | From Date | To Date | Campaign |
| | ID | | ID | ID | | ID | | | ID |
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
| 3/1/2014 | 1 | | 1 | 1 | | 1 | 2/24/2014 | 3/2/2014 | 1 |
| 3/1/2014 | 2 | | 1 | 2 | | 2 | 3/3/2014 | 3/9/2014 | 5 |
| | | | 1 | 3 | | 3 | 3/10/2014 | 3/16/2014 | 1 |
| | | | 1 | 4 | | 4 | 3/17/2014 | 3/23/2014 | 1 |
| | | | 1 | 5 | | 5 | 3/24/2014 | 3/30/2014 | 4 |
| | | | 2 | 6 | | 6 | 3/3/2014 | 3/9/2014 | 5 |
| | | | 2 | 7 | | 7 | 3/10/2014 | 3/16/2014 | 5 |
| | | | 2 | 8 | | 8 | 3/17/2014 | 3/23/2014 | 5 |
| | | | 2 | 9 | | 9 | 3/24/2014 | 3/30/2014 | 5 |
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
I'm joining Table 1 to Table 2 on QUEUE ID,
and Table 2 to Table 3 on SP ID,
and DATE from Table 1 should fall between Table 3's FROM DATE and TO DATE.
I want a single record returned for each queue, including if there were no date matches.
Unfortunately any combinations of joins or where clauses I've tried so far only result in either one record for Queue ID 1 or multiple records for each Queue ID.
I would suggest this:
SELECT
t1.Date,
t1.QueueID,
s.FromDate,
s.ToDate,
s.CampaignID
FROM
Table1 t1
LEFT JOIN
(
SELECT
t2.QueueID,
t3.FromDate,
t3.ToDate,
t3.CampaignID
FROM
Table2 t2
INNER JOIN
Table3 t3 ON
t2.SPID = t3.SPID
) s ON
t1.QueueID = s.QueueID AND
t1.Date BETWEEN s.FromDate AND s.ToDate
SQL Fiddle here with an abbreviated dataset
A trivial amendment to AHiggins code. Using the CTE makes it a little easier to read perhaps.
With AllDates as
(
SELECT
t2.QueueID,
t3.FromDate,
t3.ToDate,
t3.CampaignID
FROM Table2 t2
INNER JOIN Table3 t3 ON
t2.SPID = t3.SPID
)
SELECT
t1.Date,
t1.QueueID,
s.FromDate,
s.ToDate,
s.CampaignID
FROM Table1 t1
LEFT JOIN AllDates s ON
t1.QueueID = s.QueueID AND
t1.Date BETWEEN s.FromDate AND s.ToDate
You want something like:
select distinct t1.date, t1,queue_id IFNULL(t3.from_date,'NULL'),
IFNULL(t3.to_date,'NULL'), IFNULL(t3.campaign,'NULL')
FROM table1 t1
LEFT OUTER JOIN table2 t2 on t1.queue_id = t2.queue_id
left outer join table3 t3 on t2.sp_id = t3.sp_id
where t3.from_date <= t1.date
AND t3.to_date >= t1.date
This will select dsitinct records from the table (eliminating null duplicates and replacing them with NULL)
SELECT t1.[Date], t1.[Queue ID], s.[From Date], s.[To Date], s.[Campaign ID]
FROM table1 t1
LEFT JOIN (SELECT t3.*, t2.[Queue ID] FROM table3 t3 JOIN table2 t2 ON t2.[SP ID] = t3.[SP ID]) s
ON s.[Queue ID] = t1.[Queue ID] AND t1.[Date] BETWEEN s.[From Date] AND s.[To Date]
SQL Fiddle