Updating previous year values

Updating previous year values - sql

I have a table T like this:
ID|DESC1_ID | DESC2_ID | TS | CUM_MONTH_PREV_YEAR| CUM_MONTH_THIS_YEAR|ID2
--------------------------------------------------------------------------|-
1 |1 |1 |31.01.22| | 220 |1
----------------------------------------------------------------------------
2 |1 |2 |31.01.22| | 500 |1
---------------------------------------------------------------------------
3 |1 |3 |31.01.22| | 22 |1
----------------------------------------------------------------------------
4 |2 |1 |31.01.22| | 50 |1
---------------------------------------------------------------------------
5 |1 |1 |01.02.23| | 230 |2
----------------------------------------------------------------------------
6 |1 |2 |01.02.23| | 300 |2
---------------------------------------------------------------------------
7 |1 |3 |01.02.23| | 32 |2
----------------------------------------------------------------------------
8 |2 |1 |01.02.23| | 30 |2
I wish to update the field CUM_MONTH_PREV_YEAR for all entries where id2 is 2.
The value should be a previous year value. It could happen, that the date(TS) is not exactly from 1 year before.
I have a query below which does what I want.
UPDATE T t1
SET CUM_MONTH_PREV_YEAR = (SELECT NVL(CUM_MONTH_THIS_YEAR , 0)
FROM T t2
WHERE t2.TS = (SELECT MAX(TS)
FROM T
WHERE TS BETWEEN ADD_MONTHS( t1.TS - 7, -12) AND ADD_MONTHS( t1.TS, -12)
AND DESC1_ID = t1.DESC1_ID
AND DESC2_ID = t1.DESC2_ID )
AND t2.DESC1_ID = t.DESC1_ID
AND t2.DESC2_ID = t.DESC2_ID )
WHERE ID2 = 2 ;
I wanted to ask if there is a better way to reach the goal.
UPDATE:
As everything is simplified it happened only by the coincidence that ID2 for previous year is 1 and for this year is 2. In the reality there are much more values.
UPDATE:
The values are stored daily(on working days).

UPDATED ANSWER
works on daily bases even if some dates don't exist in previous year. In that case the first date before that is fetched.
UPDATE A_TBL t
SET t.CUM_MONTH_PREV_YEAR = ( Select PREV_CUM
From ( SELECT t.DESC1_ID, t.DESC2_ID, MAX(t.TS) "TS", MAX(t.CUM_MONTH_THIS_YEAR) "PREV_CUM"
FROM A_TBL t
INNER JOIN A_TBL t2 ON(t2.DESC1_ID = t.DESC1_ID And t2.DESC2_ID = t.DESC2_ID)
WHERE To_Number(To_Char(t.TS, 'yyyymmdd')) <= To_Number(To_Char(ADD_MONTHS(t2.TS, -12), 'yyyymmdd'))
GROUP BY t.DESC1_ID, t.DESC2_ID
)
Where DESC1_ID = t.DESC1_ID And DESC2_ID = t.DESC2_ID )
WHERE t.ID2 = 2;
All that you need is the innermost query which gives you the data from the same date or first one before in previous year of that in row with ID2 = 2 (one beeing updated). This is the query:
SELECT t.DESC1_ID, t.DESC2_ID, MAX(t.TS) "TS", MAX(t.CUM_MONTH_THIS_YEAR) "PREV_CUM"
FROM A_TBL t
INNER JOIN A_TBL t2 ON(t2.DESC1_ID = t.DESC1_ID And t2.DESC2_ID = t.DESC2_ID)
WHERE To_Number(To_Char(t.TS, 'yyyymmdd')) <= To_Number(To_Char(ADD_MONTHS(t2.TS, -12), 'yyyymmdd'))
GROUP BY t.DESC1_ID, t.DESC2_ID
DESC1_ID DESC2_ID TS PREV_CUM
---------- ---------- --------- ----------
2 1 31-JAN-22 50
1 2 31-JAN-22 500
1 3 31-JAN-22 22
1 1 31-JAN-22 220
The result after the update is:
Select * From A_TBL;
ID
DESC1_ID
DESC2_ID
TS
CUM_MONTH_PREV_YEAR
CUM_MONTH_THIS_YEAR
ID2
1
1
1
31-JAN-22
220
1
2
1
2
31-JAN-22
500
1
3
1
3
31-JAN-22
22
1
4
2
1
31-JAN-22
50
1
5
1
1
01-FEB-23
220
230
2
6
1
2
01-FEB-23
500
300
2
7
1
3
01-FEB-23
22
32
2
8
2
1
01-FEB-23
50
30
2

Looking at your data, I think you are looking for this:
UPDATE table_name A
SET CUM_MONTH_PREV_YEAR = (SELECT B.CUM_MONTH_THIS_YEAR
FROM table_name B
WHERE A.id2 - 1 = B.id2
AND A.DESC1_ID = B.DESC1_ID
AND A.DESC2_ID = B.DESC2_ID);
SELECT *
FROM table_name;

Related

Filling in missing data in Snowflake

I have a table in Snowflake like this:
TIME USER ITEM
1 frank 1
2 frank 0
3 frank 0
4 frank 0
5 frank 2
6 alf 5
7 alf 0
8 alf 6
9 alf 0
10 alf 9
I want to be able to replace all the zeroes with the next non-zero value, so in the end I have a table like this:
TIME USER ITEM
1 frank 1
2 frank 2
3 frank 2
4 frank 2
5 frank 2
6 alf 5
7 alf 6
8 alf 6
9 alf 9
10 alf 9
How would I write a query that does that in Snowflake?

You can use conditional_change_event function for this - documented here:
with base_table as (
select
t1.*,
conditional_change_event(item) over (order by time desc) event_num
from test_table t1
order by time desc
)
select
t1.time,
t1.user,
t1.item old_item,
coalesce(t2.item, t1.item) new_item
from base_table t1
left join base_table t2 on t1.event_num = t2.event_num + 1 and t1.item = 0
order by t1.time asc
Above SQL Results:
+----+-----+--------+--------+
|TIME|USER |OLD_ITEM|NEW_ITEM|
+----+-----+--------+--------+
|1 |frank|1 |1 |
|2 |frank|0 |2 |
|3 |frank|0 |2 |
|4 |frank|0 |2 |
|5 |alf |2 |2 |
|6 |alf |5 |5 |
|7 |alf |0 |6 |
|8 |alf |6 |6 |
|9 |alf |0 |9 |
|10 |alf |9 |9 |
+----+-----+--------+--------+

You can use lead(ignore nulls):
select t.*,
(case when item = 0
then lead(nullif(item, 0) ignore nulls) over (partition by user order by time)
else item
end) as imputed_item
from t;
You can also phrase this using first_value():
select t.*,
last_value(nullif(item, 0) ignore nulls) over (partition by user order by time desc)
from t;

If you want to use first_value() or last_value() in Snowflake, please keep in mind that Snowflake supports window frames differently from the ANSI standard as documented here. This means that if you want to use the default window frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW you have to include it explicitly in the statement, otherwise, the default would be ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING and that is why the LAST_VALUE example from the previous answer would not work correctly. Here is one example that would work:
select t.*,
last_value(nullif(item, 0) ignore nulls) over (partition by user order by time desc rows between unbounded preceding and current row)
from t;

Nothing wrong with above solutions ... but here's a different approach ... I think it's simpler.
select * from good
union all
select
bad.time
,bad.user
,min(good.item)
from bad
left outer join
good on good.user=bad.user and good.time>bad.time
group by
1,2
Full COPY|PASTE|RUN SQL:
with cte as (
select * from (
select 1 time, 'frank' user , 1 item union
select 2 time, 'frank' user , 0 item union
select 3 time, 'frank' user , 0 item union
select 4 time, 'frank' user , 0 item union
select 5 time, 'frank' user , 2 item union
select 6 time, 'alf' user , 5 item union
select 7 time, 'alf' user , 0 item union
select 8 time, 'alf' user , 6 item union
select 9 time, 'alf' user , 0 item union
select 10 time, 'alf' user , 9) )
, good as (select * from cte where item<> 0)
, bad as (select * from cte where item= 0)
select * from good
union all
select
bad.time
,bad.user
,min(good.item )
from bad
left outer join
good on good.user=bad.user and good.time>bad.time
group by
1,2

Possible to use a column name in a UDF in SQL?

I have a query in which a series of steps is repeated constantly over different columns, for example:
SELECT DISTINCT
MAX (
CASE
WHEN table_2."GRP1_MINIMUM_DATE" <= cohort."ANCHOR_DATE" THEN 1
ELSE 0
END)
OVER (PARTITION BY cohort."USER_ID")
AS "GRP1_MINIMUM_DATE",
MAX (
CASE
WHEN table_2."GRP2_MINIMUM_DATE" <= cohort."ANCHOR_DATE" THEN 1
ELSE 0
END)
OVER (PARTITION BY cohort."USER_ID")
AS "GRP2_MINIMUM_DATE"
FROM INPUT_COHORT cohort
LEFT JOIN INVOLVE_EVER table_2 ON cohort."USER_ID" = table_2."USER_ID"
I was considering writing a function to accomplish this as doing so would save on space in my query. I have been reading a bit about UDF in SQL but don't yet understand if it is possible to pass a column name in as a parameter (i.e. simply switch out "GRP1_MINIMUM_DATE" for "GRP2_MINIMUM_DATE" etc.). What I would like is a query which looks like this
SELECT DISTINCT
FUNCTION(table_2."GRP1_MINIMUM_DATE") AS "GRP1_MINIMUM_DATE",
FUNCTION(table_2."GRP2_MINIMUM_DATE") AS "GRP2_MINIMUM_DATE",
FUNCTION(table_2."GRP3_MINIMUM_DATE") AS "GRP3_MINIMUM_DATE",
FUNCTION(table_2."GRP4_MINIMUM_DATE") AS "GRP4_MINIMUM_DATE"
FROM INPUT_COHORT cohort
LEFT JOIN INVOLVE_EVER table_2 ON cohort."USER_ID" = table_2."USER_ID"
Can anyone tell me if this is possible/point me to some resource that might help me out here?
Thanks!

There is no such direct as #Tejash already stated, but the thing looks like your database model is not ideal - it would be better to have a table that has USER_ID and GRP_ID as keys and then MINIMUM_DATE as seperate field.
Without changing the table structure, you can use UNPIVOT query to mimic this design:
WITH INVOLVE_EVER(USER_ID, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE)
AS (SELECT 1, SYSDATE, SYSDATE, SYSDATE, SYSDATE FROM dual UNION ALL
SELECT 2, SYSDATE-1, SYSDATE-2, SYSDATE-3, SYSDATE-4 FROM dual)
SELECT *
FROM INVOLVE_EVER
unpivot ( minimum_date FOR grp_id IN ( GRP1_MINIMUM_DATE AS 1, GRP2_MINIMUM_DATE AS 2, GRP3_MINIMUM_DATE AS 3, GRP4_MINIMUM_DATE AS 4))
Result:
| USER_ID | GRP_ID | MINIMUM_DATE |
|---------|--------|--------------|
| 1 | 1 | 09/09/19 |
| 1 | 2 | 09/09/19 |
| 1 | 3 | 09/09/19 |
| 1 | 4 | 09/09/19 |
| 2 | 1 | 09/08/19 |
| 2 | 2 | 09/07/19 |
| 2 | 3 | 09/06/19 |
| 2 | 4 | 09/05/19 |
With this you can write your query without further code duplication and if you need use PIVOT-syntax to get one line per USER_ID.
The final query could then look like this:
WITH INVOLVE_EVER(USER_ID, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE)
AS (SELECT 1, SYSDATE, SYSDATE, SYSDATE, SYSDATE FROM dual UNION ALL
SELECT 2, SYSDATE-1, SYSDATE-2, SYSDATE-3, SYSDATE-4 FROM dual)
, INPUT_COHORT(USER_ID, ANCHOR_DATE)
AS (SELECT 1, SYSDATE-1 FROM dual UNION ALL
SELECT 2, SYSDATE-2 FROM dual UNION ALL
SELECT 3, SYSDATE-3 FROM dual)
-- Above is sampledata query starts from here:
, unpiv AS (SELECT *
FROM INVOLVE_EVER
unpivot ( minimum_date FOR grp_id IN ( GRP1_MINIMUM_DATE AS 1, GRP2_MINIMUM_DATE AS 2, GRP3_MINIMUM_DATE AS 3, GRP4_MINIMUM_DATE AS 4)))
SELECT qcsj_c000000001000000 user_id, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE
FROM INPUT_COHORT cohort
LEFT JOIN unpiv table_2
ON cohort.USER_ID = table_2.USER_ID
pivot (MAX(CASE WHEN minimum_date <= cohort."ANCHOR_DATE" THEN 1 ELSE 0 END) AS MINIMUM_DATE
FOR grp_id IN (1 AS GRP1,2 AS GRP2,3 AS GRP3,4 AS GRP4))
Result:
| USER_ID | GRP1_MINIMUM_DATE | GRP2_MINIMUM_DATE | GRP3_MINIMUM_DATE | GRP4_MINIMUM_DATE |
|---------|-------------------|-------------------|-------------------|-------------------|
| 3 | | | | |
| 1 | 0 | 0 | 0 | 0 |
| 2 | 0 | 1 | 1 | 1 |
This way you only have to write your calculation logic once (see line starting with pivot).

Generate large dataset of dates for each user

I'm looking for SQL help generating a large dataset.
I have a list of users (id) and the date they activated with our service (activated_date).
I'm looking to generate a data set that has an entry for every day from the user's activated date to today.
What I Have Today
--------------------
id | activated_date
--------------------
2 | 01/01/2017
63 | 23/04/2018
.. | ...
--------------------
What I want to achieve
--------------------
id | date
--------------------
2 | 01/01/2017 <-- activation date
2 | 02/01/2017
2 | 03/01/2017
2 | 04/01/2017
2 | 05/01/2017
.. | ...
2 | 27/10/2018 <-- yesterday
63 | 23/04/2018 <-- activation date
63 | 24/04/2018
63 | 25/04/2018
63 | 26/04/2018
.. | ...
63 | 27/10/2018 <-- yesterday

You can generate the dates recursively so the problem will solve by RECURSIVE CTE as follow:
create test table:
create table your_table
(
id int not null,
activated_date timestamp not null
);
add sample data:
insert into your_table
values
(2, '2017/01/01'),
(63, '2018/04/23');
and final the recursive query:
with
recursive
cte(id, active_date)
as
(
select
id,
activated_date as active_date
from
your_table
union all
select
t.id,
active_date + justify_days(interval '1 days')
from
cte
inner join
your_table t
on
cte.id = t.id
where
cte.active_date < now()
)
select
*
from
cte
order by 1, 2;
try it on: dbfiddle

How can combine this two tables and create a new table?

How to combine two tables and create a new table
first table :
ExitDate | fullname | outputnumber
------------------------------------------------
2012/01/01 a 10
2012/01/06 b 2
2012/01/08 c 3
2012/01/12 d 4
second table
inputnumber | date
-------------------------------
100 2012/01/05
150 2012/01/07
200 2012/01/10
the answer table
ExitDate | fullname | outputnumber | inputnumber | date
-------------------------------------------------------------------------------
2012/01/01 a 10 - -
- - - 100 2012/01/05
2012/01/06 b 2 - -
- - - 150 2012/01/07
2012/01/08 c 3 - -
- - - 200 2012/01/10
2012/01/12 d 4 - -
note : the date and location is important and i using sql server

If I correctly understand, you need union all. Something like this :
select * from (
select ExitDate, fullname, outputnumber, NUll as inputnumber, NUll as [date] from first_table
union all
select NUll as ExitDate, NUll as fullname, NUll as outputnumber, inputnumber, [date] from second_table
) t
order by coalesce(ExitDate, [date])
Then entire result is sorted by combined dates from ExitDate and date columns
rextester demo

I think you can have a better table like this:
select *
from (
select fullname, 0 as io, outputnumber as number, ExitDate as date
from table1
union all
select '-', 1, inputnumber, date) t
order by date, io;
fullname | io | number | date
---------+----+--------+-------------
a | 0 | 10 | 2012/01/01
- | 1 | 100 | 2012/01/05
b | 0 | 2 | 2012/01/06
- | 1 | 150 | 2012/01/07
c | 0 | 3 | 2012/01/08
- | 1 | 200 | 2012/01/10
d | 0 | 4 | 2012/01/12

You can get the exact output you want using full outer join:
select t1.*, t2.*
from t1 full outer join
t2
on 1 = 0
order by coalesce(t1.exitdate, t2.date);

Sum column in the same hour together and if there is no record for that hour, store value as 0

Say I have a table with these records:
Date |Instances
2015-10-12 06:15:00.000 |2
2015-10-12 06:45:00.000 |5
2015-10-12 04:15:00.000 |2
2015-10-12 04:25:00.000 |3
2015-10-12 03:15:00.000 |5
2015-10-12 02:15:00.000 |6
2015-10-12 01:15:00.000 |6
I manage to sum all instance in the same hour together with the following query:
SELECT DATEPART(HOUR,DATE) as TIMEHOUR,
SUM([INSTANCES]) as INSTANCESUM
FROM TABLE (NOLOCK)
where DATE BETWEEN DATEADD(HOUR, -5, GETUTCDATE()) AND GETUTCDATE()
group by DATEPART(HOUR, DATE)
order by DATEPART(HOUR, DATE) DESC
The result is something like this:
TIMEHOUR | INSTANCESUM
6 | 7
4 | 5
3 | 5
2 | 6
1 | 6
But I need the hour to record the instancesum as 0 if there is none. Say the UTC Time is 06:00 the table will return something like this:
TIMEHOUR | INSTANCESUM
6 | 7
---> 5 | 0 <---
4 | 5
3 | 5
2 | 6
1 | 6
Is it possible on ms sql?
[EDIT]
What I need is that the result table will always have the same amount of entry depending on the time range I set. So even if there is no record in that time range, the result table still have that number of record, just with 0 as value
TIMEHOUR | INSTANCESUM
6 | 0
5 | 0
4 | 0
3 | 0
2 | 0
1 | 0

See the code below:
declare #temp table (timehour int, instancesum int)
insert into #temp values (6,7), (4,5), (3,5), (2,6), (1,6) -- Replace values with your original query.
select timehour, instancesum from #temp
union
select l.timehour + 1 as timehour, 0
from
#temp as l
left join #temp as r on l.timehour + 1 = r.timehour
where
l.timehour < (select max(timehour) from #temp) and
r.timehour is null
Output:
timehour instancesum
1 6
2 6
3 5
4 5
5 0
6 7
I tried this in MS-SQL since I have not worked on MySQL. However most logic would remain same. The only part I am not sure about is creating #temp temporary table in MySQL. If you can alter that part (if required) you are good.

To answer the original question, you could left join the table to itself on the equal hour and the instances that don't appear during an hour in the left join should report their sum as null, you could then apply an nvl treatment to output 0 as their sum. Something like this:
 
SELECT
DATEPART(t.HOUR,t.DATE) as TIMEHOUR
, nvl(SUM([ti.INSTANCES]), 0) as INSTANCESUM
FROM TABLE (NOLOCK) as t
left join TABLE  as ti
on
DATEPART(t.HOUR,t.DATE) =
DATEPART(ti.HOUR,ti.DATE)
where
DATE BETWEEN DATEADD(HOUR, -5, GETUTCDATE()) AND GETUTCDATE()
 
group by
DATEPART(HOUR, DATE)
order by
DATEPART(HOUR, DATE) DESC
To answer your second question in the comments:
You should be able to output the date time in any format you want using
To_char(date, 'yyyymmdd hh:mm:ss')
At least that's what would work in Oracle, MySQL syntax may be slightly different.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Updating previous year values - sql

Looking at your data, I think you are looking for this: UPDATE table_name A SET CUM_MONTH_PREV_YEAR = (SELECT B.CUM_MONTH_THIS_YEAR FROM table_name B WHERE A.id2 - 1 = B.id2 AND A.DESC1_ID = B.DESC1_ID AND A.DESC2_ID = B.DESC2_ID); SELECT * FROM table_name;

Related

Filling in missing data in Snowflake

Possible to use a column name in a UDF in SQL?

Generate large dataset of dates for each user

How can combine this two tables and create a new table?

Sum column in the same hour together and if there is no record for that hour, store value as 0

Categories

Resources