Convert rows to columns in SQL in teradata - sql

I have data which looks like this:
Name
Date
Bal
John
2022-01-01
10
John
2022-01-02
4
John
2022-01-03
7
David
2022-01-01
13
David
2022-01-02
15
David
2022-01-03
20
I want the Bal column populated under date column, like:
Name
2022-01-01
2022-01-02
2022-01-03
John
10
4
7
David
13
15
20
What I tried is
SELECT
NAME,
CASE WHEN DATE= '2022-01-01' THEN EOD_BALANCE ELSE NULL END "01-Jan-22",
CASE WHEN DATE= '2022-01-02' THEN EOD_BALANCE ELSE NULL END "02-Jan-22"
FROM TABL1
but I am not getting the required results. Below are the results from query in first answer:

You want a pivot query here, which means you should aggregate by name and then take the max of the CASE expressions:
SELECT
NAME,
MAX(CASE WHEN DATE = '2022-01-01' THEN EOD_BALANCE END) AS "01-Jan-22",
MAX(CASE WHEN DATE = '2022-01-02' THEN EOD_BALANCE END) AS "02-Jan-22",
MAX(CASE WHEN DATE = '2022-01-03' THEN EOD_BALANCE END) AS "03-Jan-22"
FROM TABL1
GROUP BY NAME;

Related

snowflake sql: sum for each day between two dates

I hope someone can help. Suppose I have this table
id
actual_date
target_date
qty
1
2022-01-01
2022-01-01
2
2
2022-01-02
2022-01-01
1
3
2022-01-03
2022-01-01
3
4
2022-01-03
2022-01-02
1
5
2022-01-03
2022-01-03
2
what i would like to calculate is the qty that has to be processed on each date.
E.g. on the target date 2022-01-01 the quota qty is 6 (2+1+3).
On the 2.1.2022 i would also have to add the qtys that havent been processed on the day before, which means id 2 because the actual date is 2022-01-02 (so after the target date) and id 3. The quota qty for the 2022-01-02 is then 1+3+1.
And for the 2022-01-03 is 6 = 2+1+3, because id 3 has an actual date on 2022-01-02 (it wasnt processed neither on 01-01 nor on 01-02 and id 4 wasnt processed on 01-02.
Here's what the desired output would look like:
target_date
qty_qouta
2022-01-01
6
2022-01-02
4
2022-01-03
6
Hopefully this gets you started ... recommend testing heaps more edge cases, the business rules don't quite feel right to me -> as you don't seem to show when actual>target. But hope this helps.
WITH CTE AS( SELECT 1 ID, '2022-01-01'::DATE ACTUAL_DATE,'2022-01-01'::DATE TARGET_DATE, 2 QTY
UNION ALL SELECT 2 ID, '2022-01-02'::DATE ACTUAL_DATE,'2022-01-01'::DATE TARGET_DATE, 1 QTY
UNION ALL SELECT 3 ID, '2022-01-03'::DATE ACTUAL_DATE,'2022-01-01'::DATE TARGET_DATE, 3 QTY
UNION ALL SELECT 4 ID, '2022-01-03'::DATE ACTUAL_DATE,'2022-01-02'::DATE TARGET_DATE, 1 QTY
UNION ALL SELECT 5 ID, '2022-01-03'::DATE ACTUAL_DATE,'2022-01-03'::DATE TARGET_DATE, 2 QTY
)
,CTE2 AS(SELECT
ACTUAL_DATE D
, SUM(QTY) ACTUAL_QTY
FROM CTE GROUP BY 1)
,CTE3 AS(SELECT
TARGET_DATE D
, SUM(QTY) TARGET_QTY
FROM CTE GROUP BY 1)
SELECT
D DATE
,ACTUAL_QTY
,TARGET_QTY
,TARGET_QTY-ACTUAL_QTY DELTA
,ZEROIFNULL(LAG(DELTA)OVER(PARTITION BY 1 ORDER BY D))GHOST
,GREATEST(TARGET_QTY,DELTA+GHOST,ACTUAL_QTY)VOLIA
FROM
CTE2 FULL OUTER JOIN CTE3 USING(D);

Generate multiples rows of new column based on one value of another column

I have a table like below:
ID
Date
1
2022-01-01
2
2022-03-21
I want to add a new column based on the date and it should look like this
ID
Date
NewCol
1
2022-01-01
2022-02-01
1
2022-01-01
2022-03-01
1
2022-01-01
2022-04-01
1
2022-01-01
2022-05-01
2
2022-03-21
2022-04-21
2
2022-03-21
2022-05-21
Let's say that there is a #EndDate = 2022-05-31 (that's where it should stop)
I'm having a hard time trying to figure out how to do it in SSMS. Would appreciate any insights! Thanks :)
In the following solutions we leverage string_split with combination with replicate to generate new records.
select ID
,Date
,dateadd(month, row_number() over(partition by ID order by (select null)), Date) as NewCol
from (
select *
from t
outer apply string_split(replicate(',',datediff(month, Date, '2022-05-31')-1),',')
) t
ID
Date
NewCol
1
2022-01-01
2022-02-01
1
2022-01-01
2022-03-01
1
2022-01-01
2022-04-01
1
2022-01-01
2022-05-01
2
2022-03-21
2022-04-21
2
2022-03-21
2022-05-21
Fiddle
For SQL in Azure and SQL Server 2022 we have a cleaner solution based on [ordinal][4].
"The enable_ordinal argument and ordinal output column are currently
supported in Azure SQL Database, Azure SQL Managed Instance, and Azure
Synapse Analytics (serverless SQL pool only). Beginning with SQL
Server 2022 (16.x) Preview, the argument and output column are
available in SQL Server."
select ID
,Date
,dateadd(month, ordinal, Date) as NewCol
from (
select *
from t
outer apply string_split(replicate(',',datediff(month, Date, '2022-05-31')-1),',',1)
) t
with cal (id, dt) as
(
select id, date as dt from t
union all select id, dateadd(month, 1, dt) from cal where month(dt) < month('2022-05-31')
)
select t.id
,t.date
,cal.dt as new_col
from cal join t on t.id = cal.id and t.date != cal.dt
order by id, new_col
id
date
new_col
1
2022-01-01
2022-02-01
1
2022-01-01
2022-03-01
1
2022-01-01
2022-04-01
1
2022-01-01
2022-05-01
2
2022-03-21
2022-04-21
2
2022-03-21
2022-05-21
Fiddle
There are many ways to "explode" a row into a set, the simplest in my opinion is a recursive CTE:
DECLARE #endpoint date = '20220531';
DECLARE #prev date = DATEADD(MONTH, -1, #endpoint);
WITH x AS
(
SELECT ID, date, NewCol = DATEADD(MONTH, 1, date) FROM #d
UNION ALL
SELECT ID, date, DATEADD(MONTH, 1, NewCol) FROM x
WHERE NewCol < #prev
)
SELECT * FROM x
ORDER BY ID, NewCol;
Working example in this fiddle.
Keep in mind that if you could have > 100 months you'll need to add OPTION (MAXRECURSION) (or just consider using a different solution at scale).

How to fill missing values for missing dates with value from date before in sql bigquery? [duplicate]

This question already has an answer here:
Create Balance Sheet with every date is filled in Bigquery
(1 answer)
Closed 8 months ago.
Hi I have a product table with daily price, the catch here is that for the table only updates if there's a price change, and for the dates in between will not be written into the table because the price is the same as the day before.
How do I fill missing values of price with the last entry of date before?
date
id
price
2022-01-01
1
5
2022-01-03
1
6
2022-01-05
1
7
2022-01-01
2
10
2022-01-02
2
11
2022-01-06
2
12
into
date
id
price
2022-01-01
1
5
2022-01-02
1
5
2022-01-03
1
6
2022-01-04
1
6
2022-01-05
1
7
2022-01-01
2
10
2022-01-02
2
11
2022-01-03
2
11
2022-01-04
2
11
2022-01-05
2
11
2022-01-06
2
12
I am currently thinking of creating a table for dates and joining and using lag function. Anyone can help?
select
date,id,
case
when price is null then nullPrice
else price
end as price
from(
select *,
Lag(price, 1) OVER(.
ORDER BY date,id ASC) AS nullPrice
from price_table
join date_table using(date)
)
Consider below:
WITH days_by_id AS (
SELECT id, GENERATE_DATE_ARRAY(MIN(date), MAX(date)) days
FROM sample
GROUP BY id
)
SELECT date, id,
IFNULL(price, LAST_VALUE(price IGNORE NULLS) OVER (PARTITION BY id ORDER BY date)) AS price
FROM days_by_id, UNNEST(days) date LEFT JOIN sample USING (id, date);
output :
You can use generate_date_array function for this
with date_arr
as(
select *
from unnest(generate_date_array('2022-01-01', '2022-05-01')) as dt)
select da.dt, t1.*
from date_arr da
left outer join table1 t1
on da.dt = t1.dt
You can replace hardcoded dates with max and min date from table.

Select start and end dates for changing values in SQL

I have a database with accounts and historical status changes
select Date, Account, OldStatus, NewStatus from HistoricalCodes
order by Account, Date
Date
Account
OldStatus
NewStatus
2020-01-01
12345
1
2
2020-10-01
12345
2
3
2020-11-01
12345
3
2
2020-12-01
12345
2
1
2020-01-01
54321
2
3
2020-09-01
54321
3
2
2020-12-01
54321
2
3
For every account I need to determine Start Date and End Date when Status = 2. An additional challenge is that the status can change back and forth multiple times. Is there a way in SQL to create something like this for at least first two timeframes when account was in 2? Any ideas?
Account
StartDt_1
EndDt_1
StartDt_2
EndDt_2
12345
2020-01-01
2020-10-01
2020-11-01
2020-12-01
54321
2020-09-01
2020-12-01
I would suggest putting this information in separate rows:
select t.*
from (select account, date as startdate,
lead(date) over (partition by account order by date) as enddate
from t
) t
where newstatus = 2;
This produces a separate row for each period when an account has a status of 2. This is better than putting the dates in separate pairs of columns, because you do not need to know the maximum number of periods of status = 2 when you write the query.
For a fixed maximum of status changes per account, you can use window functions and conditional aggregation:
select account,
max(case when rn = 1 then date end) as start_dt1,
max(case when rn = 1 then lead_date end) as end_dt1,
max(case when rn = 2 then date end) as start_dt2,
max(case when rn = 2 then lead_date end) as end_dt2
from (
select t.*,
row_number() over(partition by account, newstatus order by date) as rn,
lead(date) over(partition by account order by date) as lead_date
from mytable t
) t
where newstatus = 2
group by account
You can extend the select clause with more conditional expressions to handle more possible ranges per account.

Counting employees from one job level to another

I have a snapshot of a dataset as follows:
effective_date hire_date name job_level direct_report
01.01.2018 01.01.2018 xyz 5 null
01.02.2018 01.01.2018 xyz 5 null
01.03.2018 01.01.2018 xyz 5 null
01.04.2018 01.01.2018 xyz 6 null
01.05.2018 01.01.2018 xyz 6 null
01.01.2018 01.02.2018 abc 5 null
01.02.2018 01.02.2018 abc 5 null
01.03.2018 01.02.2018 abc 5 null
01.04.2018 01.02.2018 abc 5 null
01.05.2018 01.02.2018 abc 5 null
Effective date is an overview of info for each employee on a daily
basis.
Hire date is the date when an employee was hired
Job level is the level at which employee stands on that particular day
I want to find out as to how many employees moved/promoted from level 5 to level 6 during this overall time?
Here is one method that uses two levels of aggregation. You can get the employees that were promoted by comparing the minimum date for "5" to the maximum date of "6":
select name
from t
where job_level in (5, 6)
group by name
having min(case where job_level = 5 then effective_date end) < max(case where job_level = 6 then effective_date end);
To count them:
select count(*)
from (select name
from t
where job_level in (5, 6)
group by name
having min(case where job_level = 5 then effective_date end) < max(case where job_level = 6 then effective_date end)
) x;
Alternatively, you can use lag():
select count(distinct name)
from (select t.*, lag(job_level) over (partition by name order by effective_date) as prev_job_level
from t
) t
where prev_job_level = 5 and job_level = 6;
The two are subtly different, but within the range of the ambiguity of the question. For instance, the first would count 5 --> 4 --> 6, the second would not.
you can try this.
select count(distinct name) from employees e1
WHERE effective_date between '01.01.2018' and '01.05.2018'
And job_level = 5
and EXISTS (select * from employees e2 where e1.name = e2.name
and e2.effective_date > e1.effective_date
and e2.job_level = 6
)