Fill values "down" when pivoting - sql

I'm doing a PIVOT command. My row label is a date field. My columns are locations like [NY], [TX], etc. Some of the values from the source data are null, but once it's pivoted I'd like to "fill down" those nulls with the last known value in date order.
That is if column NY has a value for 1/1/2010 but null for 1/2/2010 I want to fill down the value from 1/1/2010 to 1/2/2010, and any other null dates below until another value already exists. So basically I'm filling in the null gaps with the same data for the closes date that has data for each of the columns.
An example of my pivot query I currently have is:
SELECT ReadingDate, [NY],[TX],[WI]
FROM
(SELECT NAME As 'NodeName',
CAST(FORMAT(readingdate, 'M/d/yyyy') as Date) As 'ReadingDate',
SUM(myvalue) As 'Value'
FROM MyTable) as SourceData
PIVOT (SUM(Value) FOR NodeName IN ([NY],[TX],[WI])) as PivotTable
Order BY ReadingDate
But I'm not sure how to do this "fill down" to fill in null values
Sample source data
1/1/2010, TX, 1
1/1/2010, NY, 5
1/2/2010, NY null
1/1/2010, WI, 3
1/3/2010, WI, 7
...
Notice how there is no WI for 1/2 or NY for 1/3 which would result in nulls in the pivot result. There is also a null record too also resulting in a null. For NY once pivoted 1/2 needs to be filled in with 5 because it's the last known value, but 1/3 also needs to be filed in with 5 once pivoted since that record didn't even exist but when pivoted it would show up as null value because it didn't exist but another location had the record.

This can be a pain in SQL Server. ANSI supports a nice feature on LAG(), called IGNORE NULLs, but SQL Server doesn't (yet) support it. I would start with the using conditional aggregation (personal preference):
select cast(readingdate as date) as readingdate,,
sum(case when name = 'NY' then value end) as NY,
sum(case when name = 'TX' then value end) as TX,
sum(case when name = 'WI' then value end) as WI
from mytable
group by cast(readingdate as date);
So, we have to be a bit more clever. We can assign the NULL values into groups based on the number of non-NULL values before them. Fortunately, this is easy to do using a cumulative COUNT() function. Then, we can get the one non-NULL value in this group by using MAX() (or MIN()):
with t as (
select cast(readingdate as date) as readingdate,
sum(case when name = 'NY' then value end) as NY,
sum(case when name = 'TX' then value end) as TX,
sum(case when name = 'WI' then value end) as WI,
from mytable
group by cast(readingdate as date)
),
t2 as (
select t.*,
count(NY) over (order by readingdate) as NYgrp,
count(TX) over (order by readingdate) as TXgrp,
count(WI) over (order by readingdate) as WIgrp
from t
)
select readingdate,
coalesce(NY, max(NY) over (partition by NYgrp)) as NY,
coalesce(TX, max(TX) over (partition by TXgrp)) as TX,
coalesce(WI, max(WI) over (partition by WIgrp)) as WI
from t2;

Related

Fill null value by previous value and group by Postgresql

I have an table and I want to fill the null value with previous value order by date but there have an group too
For example:
Table X:
Date
Group
value
1/1/2023
A
null
2/1/2023
A
Kevin
3/1/2023
A
null
4/1/2023
A
Tom
5/1/2023
A
null
6/1/2023
A
null
1/1/2023
B
Sara
2/1/2023
B
null
So I want to group by Group column and fill the null value of value column, The group can be multi value and the date is unique per group. I want the result like this:
Date
Group
value
1/1/2023
A
null
2/1/2023
A
Kevin
3/1/2023
A
Kevin
4/1/2023
A
Tom
5/1/2023
A
Tom
6/1/2023
A
Tom
1/1/2023
B
Sara
2/1/2023
B
Sara
How can I do it in postgresql ? Please help me
I have tried and I realy don't know how to do it. I just a newbie too
If you can have more than one NULL values consecutively, LAG function won't help you much. A generalized solution would use:
the COUNT window function to generate a partitioning of one non-null value and consecutive null values
the MAX window functions to reassign NULL values.
WITH cte AS (
SELECT *,
COUNT(CASE WHEN value_ IS NOT NULL THEN 1 END) OVER(
PARTITION BY Group_
ORDER BY Date_
) AS rn
FROM tab
)
SELECT Date_, Group_, MAX(value_) OVER(PARTITION BY group_, rn) AS value_
FROM cte
ORDER BY group_, Date_
Check the demo here.
If the input data is always in this form, we can use GREATEST and LAG:
SELECT
xdate,
xgroup,
GREATEST(xvalue, LAG(xvalue) OVER()) AS xvalue
FROM X
ORDER BY xgroup, xdate;
Try out here with your sample data: db<>fiddle
GREATEST fetches the highest of two (or more) values which is NOT NULL, LAG selects the value from the previous row.
If this is not sufficient in your scenario due to possible more complex input data, please edit your question to add further cases which should be covered.
In this answer, the columns were renamed by adding a x because the original names are SQL keywords and should be avoided if possible.

SQL query to allow for latest datasets per items

I have this table in an SQL server database:
and I would like a query that gives me the values of cw1, cw2,cw3 for a restricted date condition.
I would like a query giving me the "latest" values of cw1, cw2, cw3 giving me previous values of cw1, cw2, cw3, if they are null for the last plan_date. This would be with a date condition.
So if the condition is plan_date between "02.01.2020" and "04.01.2020" then the result should be
1 04.01.2020 null, 9, 4
2 03.01.2020 30 , 15, 2
where, for example, the "30" is from the last previous date for item_nr 2.
You can get the last value using first_value(). Unfortunately, that is a window function, but select distinct solves that:
select distinct item_nr,
first_value(cw1) over (partition by item_nr
order by (case when cw1 is not null then 1 else 2 end), plan_date desc
) as imputed_cw1,
first_value(cw2) over (partition by item_nr
order by (case when cw2 is not null then 1 else 2 end), plan_date desc
) as imputed_cw2,
first_value(cw3) over (partition by item_nr
order by (case when cw3 is not null then 1 else 2 end), plan_date desc
) as imputed_cw3
from t;
You can add a where clause after the from.
The first_value() window function returns the first value from each partition. The partition is ordered to put the non-NULL values first, and then order by time descending. So, the most recent non-NULL value is first.
The only downside is that it is a window function, so the select distinct is needed to get the most recent value for each item_nr.

Is there an SQL to show only the value that got changed from the previous row for all columns?

I have a history table where it has all the values and if the value changed then it will be a new row. This is the example
name mod_date create_user_id is_active other_column
name1 2020-01-06 22:06:58+00 1 1 value1
name1 2020-01-06 22:07:01+00 2 1 value2
name2 2020-01-06 22:07:27+00 1 1 value2
Then after a query I want the result to be like this
name mod_date create_user_id is_active other_column
name1 2020-01-06 22:06:58+00 1 1 value1
2020-01-06 22:07:01+00 2 value2
name2 2020-01-06 22:07:27+00 1
The point I'm trying to make is it's easier to check which value got changed after a timestamp. The first row will always be there as it's default. Then the next row only shows create_user_id and other_column as it's changed from 1 to 2 and value1 to value2
is_active would just be empty as it's never changed
I read about lag and partition but it seems like it only works for one column as I want to check every column
This is my example
select name
from (select h.*,
lag(name) over(partition by h.id order by h.mod_date) lag_name
from history h
) h
where name <> lag_name;
You are going to have to list all the columns, but I think the simplest expression is probably:
select nullif(name, lag(name) over (order by mod_date)) as name,
mod_date,
nullif(create_user_id, lag(create_user_id) over (order by mod_date)) as create_user_id,
nullif(is_active, lag(is_active) over (order by mod_date)) as is_active,
nullif(other_column, lag(name) over (order by mod_date)) as other_column
from t
order by mod_date;
You can construct the query for any audit table using information_schema.columns to get all the columns in the table.
This assumes that the column values are not NULL -- and if they are, then your results are ambiguous anyway.
If you don't have too many columns then you can use lag separately for previous value of each column. you can also use alias for window as you will be running this for same window in order to make query easy to read.
select name
from (select h.*,
lag(name) over main_window as lag_name,
lag(create_user_id) over main_window as prev_user_id,
lag(is_active) over main_window as prev_is_active,
lag(other_column) over main_window as prev_other_column
from history h
window main_window as (partition by h.id order by h.mod_date)
) h
where name <> lag_name
or create_user_id <> prev_user_id
or is_active <> prev_is_active
or other_column <> prev_other_column
For each column, use an expression like
CASE WHEN lag(col) OVER (ORDER BY mod_date) IS DISTINCT FROM col THEN col END
That will produce NULL if the column value is the same for neighboring rows.

TSQL - comparing grouped values within a table

I need to compare grouped data to look for shifts in a calculated value. The output of my current SQL looks something like this...
Grp_ID_1 / Metric / State / Value
A Metric1 OH 50
B Metric1 OH 65
A Metric1 CA 20
B Metric1 CA 35
In the example above, I need to calculate the difference between A-Metric1-OH value of 50 and B-metric1-OH value of 65.
You can use LEAD to calculate difference between rows.
SELECT LEAD(State, 1,0) OVER (ORDER BY Grp_ID_1 ) AS NextState ,
State - LEAD(State, 1,0) OVER (ORDER BY Grp_ID_1 ) AS StateDif
FROM yourTable
SELECT grp_ID_1, metric, state, value,
(SELECT MAX(value)
FROM tablename
) - value AS Difference
FROM tablename group by state, grp_ID_1, metric, value
having state = 'OH'

more efficiently pivot rows

I am trying to join multiple tables together. One of the tables I am trying to join has hundreds of rows per ID of data. I am trying to pivot about 100 rows for each ID into columns. The value I am trying to use isn't always in the same row. Below is an example (my real table has hundreds of rows per ID). AccNum for example in ID 1 may be in the NumV column, but for ID 2 it may be in the CharV column.
ID QType CharV NumV
1 AccNum 10
1 EmpNam John Inc 0
1 UW Josh 0
2 AccNum 11
2 EmpNam CBS 0
2 UW Dan 0
The original code I used was a select statement with hundreds of lines like one below:
Max(Case When PM.[QType] = 'AccNum' Then NumV End) as AccNum
This code with hundreds on lines completed in just under 10 min. The problem however is that in only pulls in values from the column I specify, so I will always loss the data that is in a different column. (In the example above I would get AccNum 10, but not AccNum11 because it's in the CharV column).
I updated the code to use a pivot:
;with CTE
As
(
Select [PMID], [QType],
Value=concat(Nullif([CharV],''''),Nullif([NumV],0))
From [DBase].[dbo].[PM]
)
Select C.[ID] AS M_ID
,Max(c.[AccNum]) As AcctNum
,Max(c.[EmpNam]) As EmpName
and so on...
I then select all of my hundreds of rows and then pivot it the data:
from CTE
pivot (max(Value) for [QType] in ([AccNum],[EmpNam],(more rows)))As c
The problem with this code, however, is that it takes almost 2 hours to run.
Is there a different, more efficient solution to what I am trying to accomplish? I need to have the speed of the first code, but the result of the second.
Perhaps you can reduce the Concat/NullIf processing by using a UNION ALL
Select ID,QType,Value=CharV From #YourTable where CharV>''
Union All
Select ID,QType,Value=cast(NumV as varchar(25)) From #YourTable where NumV>0
For the conditional aggregation approach
No need to worry about which field, just reference VALUE
Select [ID]
,[Accnum] = Max(Case When [QType] = 'AccNum' Then Value End)
,[EmpNam] = Max(Case When [QType] = 'EmpNam' Then Value End)
,[UW] = Max(Case When [QType] = 'UW' Then Value End)
From (
Select ID,QType,Value=CharV From #YourTable where CharV>''
Union All
Select ID,QType,Value=cast(NumV as varchar(25)) From #YourTable where NumV>0
) A
Group By ID
For the PIVOT approach
Select [ID],[AccNum],[EmpNam],[UW]
From (
Select ID,QType,Value=CharV From #YourTable where CharV>''
Union All
Select ID,QType,Value=cast(NumV as varchar(25)) From #YourTable where NumV>0
) A
Pivot (max([Value]) For [QType] in ([AccNum],[EmpNam],[UW])) p