Fill table with last result - sql

I have a table that contains the following information:
id | amount | date | customer_id
1 | 0.00 | 11/12/17 | 1
2 | 54.00 | 11/12/17 | 1
3 | 60.00 | 02/12/18 | 1
4 | 0.00 | 01/18/17 | 2
5 | 14.00 | 03/12/17 | 2
6 | 24.00 | 02/22/18 | 2
7 | 0.00 | 09/12/16 | 3
8 | 74.00 | 10/01/17 | 3
What I need it to look like is the following:
ranked_id | id | amount | date | customer_id
1 | 1 | 0.00 | 11/12/17 | 1
2 | 2 | 54.00 | 11/12/17 | 1
3 | 3 | 60.00 | 02/12/18 | 1
4 | 3 | 60.00 | 02/12/18 | 1
5 | 3 | 60.00 | 02/12/18 | 1
6 | 3 | 60.00 | 02/12/18 | 1
7 | 3 | 60.00 | 02/12/18 | 1
8 | 4 | 0.00 | 01/18/17 | 2
9 | 5 | 14.00 | 03/12/17 | 2
10 | 6 | 24.00 | 02/22/18 | 2
11 | 6 | 24.00 | 02/22/18 | 2
12 | 6 | 24.00 | 02/22/18 | 2
13 | 6 | 24.00 | 02/22/18 | 2
14 | 6 | 24.00 | 02/22/18 | 2
15 | 7 | 0.00 | 09/12/16 | 3
16 | 8 | 74.00 | 10/01/17 | 3
17 | 8 | 74.00 | 10/01/17 | 3
18 | 8 | 74.00 | 10/01/17 | 3
19 | 8 | 74.00 | 10/01/17 | 3
20 | 8 | 74.00 | 10/01/17 | 3
21 | 8 | 74.00 | 10/01/17 | 3
I know that there's something with partitioning and ranking (on the ranked_id), but I can't figure out how to repeat the last row 7 times.

As #Gordon Linoff suggested you can use the generate_series() function crossed with the distinct customer_ids to generate all the rows needed as in T1 below. Then in T2 (also below) the row_number function is used to generate a sequential value to outer join to from t1 along with the customer_id.
From there it's just a matter of being able to get at the last value per customer_id when there is no original data to join to which is where the case statement and analytic first_value functions come in. I couldn't get the last_value analytic function to work likely due to postgresql's lack of an ignore nulls directive, so I used first_Value with a descending sort order, and only return the analytic value when no other data exists.
with t1 as (
select distinct
dense_rank() over (order by customer_id, generate_series) ranked_id
, customer_id
, generate_series
from table1
cross join generate_series(1,7)
), t2 as (
select row_number() over (partition by customer_id order by id) rn
, table1.*
from table1
)
select t1.ranked_id
, case when t2.customer_id is not null
then t2.id
else first_value(t2.id)
over (partition by t1.customer_id
order by id desc nulls last)
end id
, case when t2.customer_id is not null
then t2.amount
else first_value(t2.amount)
over (partition by t1.customer_id
order by id desc nulls last)
end amount
, case when t2.customer_id is not null
then t2.date
else first_value(t2.date)
over (partition by t1.customer_id
order by id desc nulls last)
end date
, t1.customer_id
from t1
left join t2
on t2.customer_id = t1.customer_id
and t2.id = t1.generate_series
order by ranked_id;
Here's a SQL Fiddle demonstrating the code.

In Postgres, you can use generate_series() and a cross join to generate all the rows. Then you can pick the one you want:
select row_number() over (order by customer_id, id) as ranking_id,
coalesce(t.id, cid) as id, coalesce(t.amount, c.amount) as amount
coalesce(t.date, c.date) as date, t.customer_id
from (select distinct on (customer_id) t.*
from t
order by customer_id, date desc
) c cross join
generate_series(1, 7) g(i) left join
(select t.*, row_number() over (partition by customer_id order by date) as i
from t
) t
on t.customer_id = c.customer_id and t.i = g.i;

Related

SQL : automatically fill price between dates

I'm trying to write a view from two tables, one referential table that contains products ID and weeks :
+------------+------+
| Product_id | week | t1
+------------+------+
| 1 | 1 |
| 2 | 1 |
| 1 | 2 |
| 2 | 2 |
| 1 | 3 |
| 2 | 3 |
+------------+------+ etc...
the other one contains Products ID, weeks when the product's price changed and the price
+------------+------+-------+
| Product_id | week | price | t2
+------------+------+-------+
| 1 | 1 | 70 |
| 1 | 2 | 50 |
| 2 | 2 | 70 |
| 1 | 4 | 30 |
| 2 | 4 | 40 |
+------------+------+-------+
I know how to achieve easily this by joining the two tables :
+------------+------+-------+
| Product_id | week | price |
+------------+------+-------+
| 1 | 1 | 70 |
| 1 | 2 | 50 |
| 1 | 3 | |
| 1 | 4 | 30 |
| 1 | 5 | |
| 2 | 1 | |
| 2 | 2 | 70 |
| 2 | 3 | |
| 2 | 4 | 40 |
| 2 | 5 | |
+------------+------+-------+
But my goals would rather be to fill in the gaps and have the price for each week (without creating any new table), as such :
+------------+------+-------+
| Product_id | week | price |
+------------+------+-------+
| 1 | 1 | 70 |
| 1 | 2 | 50 |
| 1 | 3 | 50 |
| 1 | 4 | 30 |
| 1 | 5 | 30 |
| 2 | 1 | |
| 2 | 2 | 70 |
| 2 | 3 | 70 |
| 2 | 4 | 40 |
| 2 | 5 | 40 |
+------------+------+-------+ (product 2 isn't sold yet at week 1, so it doesn't have a price).
I can't see how I would do this in SQL. I haven't used PARTITION BY or LAG yet, and it might be what I'm looking for. If anyone can push me in the right direction, I would appreciate it :)
You can use window functions - the ignore nulls clause, which teradata supports, comes handy here:
select
t1.product_id,
t1.week,
coalesce(
t2.price,
lag(t2.price ignore nulls) over(partition by t1.product_id order by t1.week)
) price
from t1
left join t2
on t2.product_id = t1.product_id
and t2.week = t1.week
Or better yet, as suggested by dnoeth, you can use last_value(), which avoids the need for coalesce():
select
t1.product_id,
t1.week,
last_value(t2.price ignore nulls) over(partition by t1.product_id order by t1.week) price
from t1
left join t2
on t2.product_id = t1.product_id
and t2.week = t1.week
Use a cross join to generate the rows, then left join and window functions:
with weeks as (
select row_number() over (order by product_id) as n
from table1
)
select t1.product_id, w.n as week,
coalesce(t2.price, lag(t2.price ignore nulls) over (partition by p.product_id order by w.n)
) as price
from (select distinct product_id
from table1 t1
) p cross join
weeks w left join
table2 t2
on t2.product_id = p.product_id and t2.week = w.week
where w.n <= 5
You can do this with a LEFT JOIN.
SELECT t1.Product_id, t1.week, tmp.price
FROM t1
LEFT JOIN t2 tmp ON tmp.Product_id = t1.Product_id AND
tmp.week = (SELECT MAX(week) FROM t2
WHERE Product_id = tmp.Product_id AND week <= t1.week)
ORDER BY t1.Product_id, t1.week
I would argue it's cleaner yet with OUTER APPLY, but I don't know if that's supported by teradata.
SELECT t1.Product_id, t1.week, oa.price
FROM t1
OUTER APPLY (SELECT TOP 1 price FROM t2
WHERE Product_id = t1.Product_id AND week <= t1.week
ORDER BY week DESC) oa
ORDER BY t1.Product_id, t1.week

SQL group by changing column

Suppose I have a table sorted by date as so:
+-------------+--------+
| DATE | VALUE |
+-------------+--------+
| 01-09-2020 | 5 |
| 01-15-2020 | 5 |
| 01-17-2020 | 5 |
| 02-03-2020 | 8 |
| 02-13-2020 | 8 |
| 02-20-2020 | 8 |
| 02-23-2020 | 5 |
| 02-25-2020 | 5 |
| 02-28-2020 | 3 |
| 03-13-2020 | 3 |
| 03-18-2020 | 3 |
+-------------+--------+
I want to group by changes in value within that given date range, and add a value that increments each time as an added column to denote that.
I have tried a number of different things, such as using the lag function:
SELECT value, value - lag(value) over (order by date) as count
GROUP BY value
In short, I want to take the table above and have it look like:
+-------------+--------+-------+
| DATE | VALUE | COUNT |
+-------------+--------+-------+
| 01-09-2020 | 5 | 1 |
| 01-15-2020 | 5 | 1 |
| 01-17-2020 | 5 | 1 |
| 02-03-2020 | 8 | 2 |
| 02-13-2020 | 8 | 2 |
| 02-20-2020 | 8 | 2 |
| 02-23-2020 | 5 | 3 |
| 02-25-2020 | 5 | 3 |
| 02-28-2020 | 3 | 4 |
| 03-13-2020 | 3 | 4 |
| 03-18-2020 | 3 | 4 |
+-------------+--------+-------+
I want to eventually have it all in one small table with the earliest date for each.
+-------------+--------+-------+
| DATE | VALUE | COUNT |
+-------------+--------+-------+
| 01-09-2020 | 5 | 1 |
| 02-03-2020 | 8 | 2 |
| 02-23-2020 | 5 | 3 |
| 02-28-2020 | 3 | 4 |
+-------------+--------+-------+
Any help would be very appreciated
you can use a combination of Row_number and Dense_rank functions to get the required results like below:
;with cte
as
(
select t.DATE,t.VALUE
,Dense_rank() over(partition by t.VALUE order by t.DATE) as d_rank
,Row_number() over(partition by t.VALUE order by t.DATE) as r_num
from table t
)
Select t.Date,t.Value,d_rank as count
from cte
where r_num = 1
You can use a lag and cumulative sum and a subquery:
SELECT value,
SUM(CASE WHEN prev_value = value THEN 0 ELSE 1 END) OVER (ORDER BY date)
FROM (SELECT t.*, LAG(value) OVER (ORDER BY date) as prev_value
FROM t
) t
Here is a db<>fiddle.
You can recursively use lag() and then row_number() analytic functions :
WITH t2 AS
(
SELECT LAG(value,1,value-1) OVER (ORDER BY date) as lg,
t.*
FROM t
)
SELECT t2.date,t2.value, ROW_NUMBER() OVER (ORDER BY t2.date) as count
FROM t2
WHERE value - lg != 0
Demo
and filter through inequalities among the returned values from those functions.

How to sum rows before a condition is met in SQL

I have a table which has multiple records for the same id. Looks like this, and the rows are sorted by sequence number.
+----+--------+----------+----------+
| id | result | duration | sequence |
+----+--------+----------+----------+
| 1 | 12 | 7254 | 1 |
+----+--------+----------+----------+
| 1 | 12 | 2333 | 2 |
+----+--------+----------+----------+
| 1 | 11 | 1000 | 3 |
+----+--------+----------+----------+
| 1 | 6 | 5 | 4 |
+----+--------+----------+----------+
| 1 | 3 | 20 | 5 |
+----+--------+----------+----------+
| 2 | 1 | 230 | 1 |
+----+--------+----------+----------+
| 2 | 9 | 10 | 2 |
+----+--------+----------+----------+
| 2 | 6 | 0 | 3 |
+----+--------+----------+----------+
| 2 | 1 | 5 | 4 |
+----+--------+----------+----------+
| 2 | 12 | 3 | 5 |
+----+--------+----------+----------+
E.g. for id=1, i would like to sum the duration for all the rows before and include result=6, which is 7254+2333+1000+5. Same for id =2, it would be 230+10+0. Anything after the row where result=6 will be left out.
My expected output:
+----+----------+
| id | duration |
+----+----------+
| 1 | 10592 |
+----+----------+
| 2 | 240 |
+----+----------+
The sequence has to be in ascending order.
I'm not sure how I can do this in sql.
Thank you in advance!
I think you want:
select t2.id, sum(t2.duration)
from t
where t.sequence <= (select t2.sequence
from t t2
where t2.id = t.id and t2.result = 6
);
In PrestoDB, I would recommend window functions:
select id, sum(duration)
from (select t.*,
min(case when result = 6 then sequence end) over (partition by id) as sequence_6
from t
) t
where sequence <= sequence_6;
You can use a simple aggregate query with a condition that uses a subquery to recover the sequence corresponding to the record whose sequence is 6 :
SELECT t.id, SUM(t.duration) total_duration
FROM mytable t
WHERE t.sequence <= (
SELECT sequence
FROM mytable
WHERE id = t.id AND result = 6
)
GROUP BY t.id
This demo on DB Fiddle with your test data returns :
| id | total_duration |
| --- | -------------- |
| 1 | 10592 |
| 2 | 240 |
Basic group by query should solve your issue
select
id,
sum(duration) duration
from t
group by id
for the certain rows:
select
id,
sum(duration) duration
from t
where id = 1
group by id
if you want to include it in your result set
select id, duration, sequence from t
union all
select
id,
sum(duration) duration
null sequence
from t
group by id

Select most recent inspection

I have a ROAD_INSPECTION table:
+----+------------------------+-----------+
| ID | DATE | CONDITION |
+----+------------------------+-----------+
| 1 | 01/01/2009 | 20 |
| 1 | 05/01/2013 | 16 |
| 1 | 04/29/2016 10:02:52 AM | 15 |
+----+------------------------+-----------+
| 2 | 01/01/2009 | 8 |
| 2 | 06/06/2012 9:55:13 AM | 8 |
| 2 | 04/28/2015 | 11 |
+----+------------------------+-----------+
| 3 | 06/11/2012 | 10 |
| 3 | 04/21/2015 | 19 |
+----+------------------------+-----------+
What is the most efficient way to select the most recent inspection? The query would need to include the ID and CONDITION columns, despite the fact that they wouldn't group by cleanly:
+----+------------------------+-----------+
| ID | DATE | CONDITION |
+----+------------------------+-----------+
| 1 | 04/29/2016 10:02:52 AM | 15 |
+----+------------------------+-----------+
| 2 | 04/28/2015 | 11 |
+----+------------------------+-----------+
| 3 | 04/21/2015 | 19 |
+----+------------------------+-----------+
One way could be to retrieve id and date column in derived table and join the output to the main table to retrieve corresponding data from condition column as below.
SELECT t1.id,
t1.date1,
t2.CONDITION1
FROM
(SELECT id,
max(date1) AS date1
FROM table1
GROUP BY id) t1
JOIN table1 t2 ON t1.id = t2.id
AND t1.date1 = t2.date1;
Result:
id date1 CONDITION1
-------------------------------------
1 29.04.2016 10:02:52 15
2 28.04.2015 00:00:00 11
3 21.04.2015 00:00:00 19
DEMO
OR if your rdbms supports windows function, use below.
SELECT id,
date1,
condition1
FROM
(SELECT id,
date1,
condition1,
row_number() over(PARTITION BY id
ORDER BY date1 DESC) AS rn
FROM table1 ) t1
WHERE rn = 1;
DEMO

Sum all sub group last value by group

Consider the following table:
ID | ITEM | GROUP_ID | VAL | COST
---+------+----------+-----------+-------
1 | A | 1 | 1 | 12
2 | B | 1 | 2 | 12
3 | C | 1 | 3 | 12
4 | D | 1 | 4 | 13
5 | D | 1 | 5 | 12
6 | E | 2 | 1 | 17
7 | E | 2 | 2 | 10
8 | E | 2 | 3 | 11
9 | E | 2 | 4 | 12
10 | F | 2 | 5 | 15
11 | F | 2 | 6 | 13
12 | F | 2 | 7 | 11
13 | F | 2 | 8 | 12
how to get the result as follow:
GROUP_ID | VAL | COST
----------+-----------+-------
1 | 15 | 48
2 | 36 | 24
The val is the sum by group id.
The cost is the sum of last value by item.
Use analytic function ROW_NUMBER() on postgres, oracle or sql server
SqlFiddleDemo
WITH last_item as (
SELECT group_id, sum(cost) as sum_cost
FROM (
SELECT t.*,
ROW_NUMBER() over (partition by item order by id desc) as rn
FROM Table1 t
) as t
WHERE rn = 1
GROUP BY t.group_id
),
val_sum as (
SELECT t.group_id, SUM(val) as sum_val
FROM Table1 t
GROUP BY t.group_id
)
SELECT v.group_id, v.sum_val, l.sum_cost
FROM val_sum v
INNER JOIN last_item l
ON v.group_id = l.group_id
OUTPUT
| group_id | sum_val | sum_cost |
|----------|---------|----------|
| 1 | 15 | 48 |
| 2 | 36 | 24 |
Try this
WITH LastRow (id)
AS (
SELECT MAX(id)
FROM TheTable
GROUP BY item, group_id
)
SELECT group_Id, SUM(val), SUM(CASE WHEN B.id IS NULL THEN 0 ELSE cost END)
FROM TheTable A
LEFT OUTER JOIN LastRow B ON A.id = B.id
GROUP BY group_id
EDIT:
SQL Fiddle Demo
Thanks #Juan Carlos Oropeza for creating the SQL Fiddle test data