First non Null value (ordered) aggregate function - sql

Given the following table in GBQ
Element, tmed, ingestion_time
Item1, 10.0, 2023-01-01
Item1, 11.0, 2023-01-02
Item2, null, 2023-01-02
Item2, 20.0 ,2023-01-03
Item3, 21.0, 2023-01-03
Item3, null, 2023-01-04
Item4, null, 2023-01-04
Item4, null, 2023-01-05
I would like to retrieve the latest non-null value (with the latest ingestion_time). That would retrieve the following result:
Element, tmed, ingestion_time
Item1, 11.0, 2023-01-02
Item2, 20.0, 2023-01-03
Item3, 21.0, 2023-01-03
Item4, null, 2023-01-05
For this purpose, I was using the aggregate function ANY_VALUE, which, even if the documentation does not show very clearly, takes the first non-null value (check discussion here) Nevertheless, it just takes the first non-null value, independently of the DATETIME field ingestion_time.
I tried different ORDER BY options, but with no success.

You can use the ROW_NUMBER window function inside a QUALIFY clause as follows by:
partitioning on your elements
ordering on tmed is NULL (pulls down your null values), ingestion_time DESC (pulls up your dates)
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY Element ORDER BY tmed IS NULL, ingestion_time DESC) = 1

Try using row_number function as the following:
select element, tmed, ingestion_time
from
(
select *,
row_number() over (partition by element order by case when tmed is not null then 1 else 2 end, ingestion_time desc) rn
from table_name
) T
where rn = 1

All solutions are simple and effective. Nevertheless, in order to generalize it to more fields and not only to tmed, I found the following solution:
WITH overwritten_original_table AS (
SELECT * EXCEPT(tmed),
FIRST_VALUE(tmed IGNORE NULLS) OVER (PARTITION BY element ORDER BY ingestion_time DESC) AS tmed
-- Here, you can add more fields with the same FIRST_VALUE logic
FROM original_table
)
SELECT
element,
ANY_VALUE(tmed) AS tmed,
-- Here, you can add more fields with the ANY_VALUE logic
MAX(ingestion_time) AS ingestion_time
FROM overwritten_original_table
GROUP BY fecha
As it is a solution intended for more than 1 field, I just took the maximum ingestion_time, but you can modify it to get an ingestion_time for every field.

Related

IF with timestamp bigquery

I need to add an attribute that indicates if that version is an original or copy. If is the first version of the site, is original, it is not, is a copy.
the table:
id_site id_version timestamp_version
1 5589 2/3/2022
1 2030 10/7/2022
1 1560 10/8/2022
2 6748 2/3/2022
2 7890 2/4/2022
3 4532 2/3/2022
The expected result:
id_site id_version timestamp_version type_version
1 5589 2/3/2022 original
1 2030 10/7/2022 copy
1 1560 10/8/2022 copy
2 6748 2/3/2022 original
2 7890 2/4/2022 copy
3 4532 2/3/2022 original
You can use an IF or CASE here. They are mostly interchangeable, but my preference is CASE since it's portable to nearly any other RDBMS where IF is only supported in a few.
CASE WHEN ROW_NUMBER() OVER (PARTITION BY id_site ORDER BY timestamp_version ASC) = 1 THEN 'copy' ELSE 'original' END
Inside the CASE expression we do a ROW_NUMBER() window function will "window" or partition each row in the result set by id_site and number each record for each distinct id_site sequentially ordered by timestamp_version in ascending order. We test to see if that ROW_NUMBER() is 1 and then label it with original or copy.
You can use a window function in an if statement for that:
with test as (
select * from unnest([
struct(1 as id_site, 5589 as id_version, timestamp(date "2022-03-02") as timestamp_version),
(1, 2030, timestamp(date "2022-07-10")),
(1, 1560, timestamp(date "2022-08-10")),
(2, 6748, timestamp(date "2022-03-02")),
(2, 7890, timestamp(date "2022-04-02")),
(3, 4532, timestamp(date "2022-03-02"))
])
)
select
*,
IF(timestamp_version = min(timestamp_version) over (partition by id_site), "original", "copy") AS type_version
from test
Consider below option
select *,
if(lag(id_version) over prev is null, 'original', 'copy') type_version,
from your_table
window prev as (partition by id_site order by timestamp_version)
if applied to sample data in your question - output is

Fill values "down" when pivoting

I'm doing a PIVOT command. My row label is a date field. My columns are locations like [NY], [TX], etc. Some of the values from the source data are null, but once it's pivoted I'd like to "fill down" those nulls with the last known value in date order.
That is if column NY has a value for 1/1/2010 but null for 1/2/2010 I want to fill down the value from 1/1/2010 to 1/2/2010, and any other null dates below until another value already exists. So basically I'm filling in the null gaps with the same data for the closes date that has data for each of the columns.
An example of my pivot query I currently have is:
SELECT ReadingDate, [NY],[TX],[WI]
FROM
(SELECT NAME As 'NodeName',
CAST(FORMAT(readingdate, 'M/d/yyyy') as Date) As 'ReadingDate',
SUM(myvalue) As 'Value'
FROM MyTable) as SourceData
PIVOT (SUM(Value) FOR NodeName IN ([NY],[TX],[WI])) as PivotTable
Order BY ReadingDate
But I'm not sure how to do this "fill down" to fill in null values
Sample source data
1/1/2010, TX, 1
1/1/2010, NY, 5
1/2/2010, NY null
1/1/2010, WI, 3
1/3/2010, WI, 7
...
Notice how there is no WI for 1/2 or NY for 1/3 which would result in nulls in the pivot result. There is also a null record too also resulting in a null. For NY once pivoted 1/2 needs to be filled in with 5 because it's the last known value, but 1/3 also needs to be filed in with 5 once pivoted since that record didn't even exist but when pivoted it would show up as null value because it didn't exist but another location had the record.
This can be a pain in SQL Server. ANSI supports a nice feature on LAG(), called IGNORE NULLs, but SQL Server doesn't (yet) support it. I would start with the using conditional aggregation (personal preference):
select cast(readingdate as date) as readingdate,,
sum(case when name = 'NY' then value end) as NY,
sum(case when name = 'TX' then value end) as TX,
sum(case when name = 'WI' then value end) as WI
from mytable
group by cast(readingdate as date);
So, we have to be a bit more clever. We can assign the NULL values into groups based on the number of non-NULL values before them. Fortunately, this is easy to do using a cumulative COUNT() function. Then, we can get the one non-NULL value in this group by using MAX() (or MIN()):
with t as (
select cast(readingdate as date) as readingdate,
sum(case when name = 'NY' then value end) as NY,
sum(case when name = 'TX' then value end) as TX,
sum(case when name = 'WI' then value end) as WI,
from mytable
group by cast(readingdate as date)
),
t2 as (
select t.*,
count(NY) over (order by readingdate) as NYgrp,
count(TX) over (order by readingdate) as TXgrp,
count(WI) over (order by readingdate) as WIgrp
from t
)
select readingdate,
coalesce(NY, max(NY) over (partition by NYgrp)) as NY,
coalesce(TX, max(TX) over (partition by TXgrp)) as TX,
coalesce(WI, max(WI) over (partition by WIgrp)) as WI
from t2;

Recursive CTE - consolidate start and end dates

I have the following table:
row_num customer_status effective_from_datetime
------- ------------------ -----------------------
1 Active 2011-01-01
2 Active 2011-01-02
3 Active 2011-01-03
4 Suspended 2011-01-04
5 Suspended 2011-01-05
6 Active 2011-01-06
And am trying to achieve the following result whereby consecutive rows with the same status are merged into one row with an effective from and to date range:
customer_status effective_from_datetime effective_to_datetime
--------------- ----------------------- ---------------------
Active 2011-01-01 2011-01-04
Suspended 2011-01-04 2011-01-06
Active 2011-01-06 NULL
I can get a recursive CTE to output the correct effective_to_datetime based on the next row, but am having trouble merging the ranges.
Code to generate sample data:
CREATE TABLE #temp
(
row_num INT IDENTITY(1,1),
customer_status VARCHAR(10),
effective_from_datetime DATE
)
INSERT INTO #temp
VALUES
('Active','2011-01-01')
,('Active','2011-01-02')
,('Active','2011-01-03')
,('Suspended','2011-01-04')
,('Suspended','2011-01-05')
,('Active','2011-01-06')
EDIT SQL updated as per comment.
WITH
group_assigned_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY customer_status ORDER BY effective_from_date) AS status_sequence_id,
ROW_NUMBER() OVER ( ORDER BY effective_from_date) AS sequence_id,
customer_status,
effective_from_date
FROM
your_table
)
,
grouped_data AS
(
SELECT
customer_status,
MIN(effective_from_date) AS min_effective_from_date,
MAX(effective_from_date) AS max_effective_from_date
FROM
group_assigned_data
GROUP BY
customer_status,
sequence_id - status_sequence_id
)
SELECT
[current].customer_status,
[current].min_effective_from_date AS effective_from,
[next].min_effective_from_date AS effective_to
FROM
grouped_data AS [current]
LEFT JOIN
grouped_data AS [next]
ON [current].max_effective_from_date = [next].min_effective_from_date + 1
ORDER BY
[current].min_effective_from_date
This isn't recursive, but that's possibly a good thing.
It doesn't deal with gaps in your data. To deal with that you could create a calendar table, with every relevant date, and join on that to fill missing dates with 'unknown' status, and then run the query against that. (Infact you cate do it it a CTE that is used by the CTE above).
At present...
- If row 2 was missing, it would not change the result
- If row 3 was missing, the end_date of the first row would change
Different behaviour can be determined by preparing your data, or other methods. We'd need to know the business logic you need though.
If any one date can have multiple status entries, you need to define what logic you want it to follow. At present the behaviour is undefined, but you could correct that as simply as adding customer_status to the ORDER BY portions of ROW_NUMBER().

Unclear on LAST_VALUE - Preceding

I have a table that looks like this,
Date Value
01/01/2010 03:59:00 324.44
01/02/2010 09:31:00 NULL
01/02/2010 09:32:00 NULL
.
.
.
01/02/2010 11:42:00 NULL
I want the first valid value to appear in all following rows. This is what I did,
select date,
nvl(value, LAST_VALUE(value IGNORE NULLS) over (order by value RANGE BETWEEN 1 PRECEDING AND CURRENT ROW)) value
from
table
This shows no difference at all, but if I say RANGE BETWEEN 3 PRECEDING AND CURRENT ROW it copies the data to all the rows. I'm not clear why this is happening. Can anyone explain if I'm misunderstanding how to use preceding?
Analytic functions still work on sets of data. They do not process one row at a time, you would need PL/SQL or MODEL to do that. PRECEDING refers to the last X rows, but before the analytic function has been applied.
These problems can be confusing in SQL because you have to build the logic into defining the set, instead of trying to pass data from one row to another. That's why I used CASE with LAST_VALUE in my previous answer.
Edit:
I've added a simple data set so we can all run the exact same query. VALUE1 seems to work to me, am I missing something? Part of the problem with VALUE2 is that the analytic ORDER BY uses VALUE, instead of the date.
select id, the_date, value
,last_value(value ignore nulls) over
(partition by id order by the_date) value1
,nvl(value, LAST_VALUE(value IGNORE NULLS) over
(order by value RANGE BETWEEN 1 PRECEDING AND CURRENT ROW)) value2
from
(
select 1 id, date '2011-01-01' the_date, 100 value from dual union all
select 1 id, date '2011-01-02' the_date, null value from dual union all
select 1 id, date '2011-01-03' the_date, null value from dual union all
select 1 id, date '2011-01-04' the_date, null value from dual union all
select 1 id, date '2011-01-05' the_date, 200 value from dual
)
order by the_date;
Results:
ID THE_DATE VALUE VALUE1 VALUE2
1 1/1/2011 100 100 100
1 1/2/2011 100
1 1/3/2011 100
1 1/4/2011 100
1 1/5/2011 200 200 200
It is possible to copy one row at time because i had done that using java Logic and Sql query
Statement sta;
ResultSet rs,rslast;
try{
//Connection creation code and "con" is an object of Connection Class so don't confuse about that.
sta = con.createStatement();
rs=sta.executeQuery("SELECT * FROM TABLE NAME");
rslast=sta.executeQuery("SELECT * FROM TABLENAME WHERE ID = (SELECT MAX(ID) FROM TABLENAME)");
rslast.next();
String digit =rslast.getString("ID");
System.out.print("ID"+rslast.getString("ID")); // it gives ID of the Last Record.
Instead using this method u can also use ORDER by Date in Descending order.
Now i hope u make logic that only insert Last record.

SQL Server 2008 pivot without aggregate

I have table to test score data that I need to pivot and I am stuck on how to do it.
I have the data as this:
gradelistening speaking reading writing
0 0.0 0.0 0.0 0.0
1 399.4 423.8 0.0 0.0
2 461.6 508.4 424.2 431.5
3 501.0 525.9 492.8 491.3
4 521.9 517.4 488.7 486.7
5 555.1 581.1 547.2 538.2
6 562.7 545.5 498.2 530.2
7 560.5 525.8 545.3 562.0
8 580.9 548.7 551.4 560.3
9 602.4 550.2 586.8 564.1
10 623.4 581.1 589.9 568.5
11 633.3 578.3 598.1 568.2
12 626.0 588.8 600.5 564.8
But I need it like this:
gr0 gr1 gr2 gr3 gr4 gr5 gr6 gr7 ...
listening 0.0 399.4 461.6 501.0 521.9 555.1 562.7 560.5 580.9...
speaking 0.0 423.8...
reading 0.0 0.0 424.2...
writing 0.0 0.0 431.5...
I don't need to aggregate anything, just pivot the data.
The following is one way to solve the problem, but I am not sure if it is the most efficient.
DECLARE #PivotData table(grade int, listening float, speaking float, reading float, writing float)
INSERT into #PivotData
SELECT 0, 0.0, 0.0, 0.0, 0.0 UNION ALL
SELECT 1, 399.4, 423.8, 0.0, 0.0 UNION ALL
SELECT 2, 461.6, 508.4, 424.4, 431.5 UNION ALL
SELECT 3, 501.0, 525.9, 492.8, 491.3
SELECT TestType, [0] As gr0, [1] as gr1, [2] as gr2, [3] as gr3
FROM
(
SELECT grade, TestType, score
FROM
(
SELECT grade, listening, speaking, reading, writing from #PivotData
) PivotData
UNPIVOT
(
score for TestType IN (listening, speaking, reading, writing)
) as initialUnPivot
) as PivotSource
PIVOT
(
max(score) FOR grade IN ([0], [1], [2], [3])
) as PivotedData
Basically what I did was to initially unpivot the data to get a table that contains the grade, testtype, and score each in its own column, then I pivoted the data to get the answer you want. The fact that my UnPivoted source data contains the TestType column makes it so that each combination of grade and testype returns a single score, so all aggregations will just return that particular score for the combination and will not perform anything on it.
I have only done it for the first 4 grades, but I am pretty sure you can tell what you need to add to have it work for all 13 grades.
Here is a solution. The code below uses Oracle's dual table to create a dummy table for the areas (e.g., listening, speaking, etc.); however, for SQLServer, I believe you can just truncate the 'from dual' clause within each union. The query performs a cartesian product in order to pull down the column-oriented grades into a normalized structure (columns skill, grade, and score). This is then used in the normal manner to pivot the data. I also added a "rank" column so the data could be sorted as per the results you specified.
select skill, rank
, max(case grade when 0 then score else null end) gr0
, max(case grade when 1 then score else null end) gr1
, max(case grade when 2 then score else null end) gr2
from (
select skill, rank, grade
, case skill when 'listening' then listening
when 'speaking' then speaking
when 'reading' then reading
when 'writing' then writing end score
from tmp_grade t, (
select 'listening' skill, 1 rank from dual
union (select 'speaking', 2 from dual)
union (select 'reading', 3 from dual)
union (select 'writing', 4 from dual)
) area1
)
group by skill, rank
order by rank;