Bigquery Query: Adding a specific value to previous rows in BigQuery - sql

I want to create a add a specific value to rows with null value in case they have something that isn't a null value. It's something difficult to understand, but it could be easier in watching the desired output:
This is my actual table:
DATESTAMP______________pressure__________final_date
2021-02-19T21:19:35_______10.12_____________null
2021-02-19T22:19:35_______11.13_____________null
2021-02-19T23:19:35_______10.43_____________null
2021-02-20T00:19:35_______11.98_____________null
2021-02-20T01:19:35_______10.21_____________null
2021-02-20T01:40:10_______20.21_____________2021-02-20
2021-02-24T23:11:00_______10.42_____________null
2021-02-25T00:11:00_______10.51_____________null
2021-02-25T00:11:00_______20.51_____________2021-02-25
2021-02-28T11:11:12_______10.51_____________null
2021-02-28T12:11:12_______10.52_____________null
This is mi desired table after doing the query:
DATESTAMP______________pressure__________final_date
2021-02-19T21:19:35_______10.12_____________2021-02-20
2021-02-19T22:19:35_______11.13_____________2021-02-20
2021-02-19T23:19:35_______10.43_____________2021-02-20
2021-02-20T00:19:35_______11.98_____________2021-02-20
2021-02-20T01:19:35_______10.21_____________2021-02-20
2021-02-20T01:40:10_______20.21_____________2021-02-20
2021-02-24T23:11:00_______10.42_____________2021-02-25
2021-02-25T00:11:00_______10.51_____________2021-02-25
2021-02-25T00:11:00_______20.51_____________2021-02-25
2021-02-28T11:11:12_______10.51_____________null
2021-02-28T12:11:12_______10.52_____________null
It doesn't matter if I have to create a new column:
That's my query:
SELECT *, IF(final_date is null, LAG(final_date ) OVER (ORDER BY DATESTAMP DESC), final_date ) AS preceding FROM(
SELECT
* FROM my_table
ORDER BY DATESTAMP ASC)
ORDER BY DATESTAMP ASC
And that's the result I received in the before query:
DATESTAMP______________pressure_________final_date_______preceding
2021-02-19T21:19:35_______10.12_____________null_____________null
2021-02-19T22:19:35_______11.13_____________null_____________null
2021-02-19T23:19:35_______10.43_____________null _____________null
2021-02-20T00:19:35_______11.98_____________null _____________null
2021-02-20T01:19:35_______10.21_____________null_____________2021-02-20
2021-02-20T01:40:10_______20.21_____________2021-02-20 ______2021-02-20
2021-02-24T23:11:00_______10.42_____________null_____________null
2021-02-25T00:11:00_______10.51_____________null_____________2021-02-25
2021-02-25T00:11:00_______20.51_____________2021-02-25_______2021-02-25
2021-02-28T11:11:12_______10.51_____________null_____________null
2021-02-28T12:11:12_______10.52_____________null_____________null
Can someone help me?
Thanks!

This looks like a cumulative minimum:
SELECT t.*,
MIN(final_date) OVER (ORDER BY DATESTAMP DESC) as imputed_final_date
FROM my_table

Related

Create a date range based on rows but if date skips a day create another row

I have a business requirement to show data from the following table.
I need a way thru SQL to show the data as...
So everytime the User_ID or SPOT or Date is skip in sequential order we create a new row.
Assuming SQL Server, a solution might be:
with MyTbl as (
select *
from ( values
('SomeOne1','A','2023-06-16')
,('SomeOne1','A','2023-06-17')
,('SomeOne1','A','2023-06-18')
,('SomeOne1','A','2023-06-19')
,('SomeOne1','B','2023-06-20')
,('SomeOne1','B','2023-06-21')
,('SomeOne1','B','2023-06-22')
,('SomeOne1','B','2023-06-23')
,('SomeOne1','B','2023-06-24')
,('SomeOne1','B','2023-06-25')
,('SomeOne1','B','2023-06-26')
,('SomeOne1','B','2023-06-27')
,('SomeOneB','A','2023-06-20')
,('SomeOneB','A','2023-06-21')
,('SomeOneB','A','2023-06-22')
,('SomeOneB','A','2023-06-23')
,('SomeOneB','A','2023-06-24')
,('SomeOneB','A','2023-06-25')
,('SomeOneB','A','2023-06-28')
,('SomeOneB','A','2023-06-29')
,('SomeOneB','A','2023-06-30')
,('SomeOneB','A','2023-07-01')
,('SomeOneB','A','2023-07-02')
,('SomeOneB','A','2023-07-03')
) T(UserId, Spot, this_Date)
), AssgnGrp as (
select UserId, Spot, this_date
, [Grp] = DATEADD(DAY,-1 * (DENSE_RANK() OVER (partition by UserId, Spot ORDER BY [this_date])-1), [this_date])
from MyTbl
)
select UserId, Spot, Grp, begin_date=min(this_date), end_date=max(this_date)
from AssgnGrp
group by UserId, Spot, Grp

Big Query first non null value record type

I'm working with Big Query and I have a record field 'funnels_informations' containing two subfields: 'partnership_title' and 'voucher_code'.
I want to have the first non-null value of partnership_title and the corresponding value of voucher_code.
For example here, I want to have partnership_title=indep and voucher_code=null:
Any solution please?
Thanks in advance.
You may consider below scalar subquery for your purpose.
WITH sample_table AS (
SELECT [STRUCT(STRING(null) AS partnership_title, STRING(NULL) AS voucher_code),
('indep', NULL), ('Le', 'LB')
] AS funnels_informations
)
SELECT (SELECT AS STRUCT * EXCEPT(offset)
FROM UNNEST(t.funnels_informations) WITH OFFSET
WHERE partnership_title IS NOT NULL
ORDER BY offset LIMIT 1
).*
FROM sample_table t;
Consider below option
select fi.*
from your_table t, t.funnels_informations fi with offset
where not partnership_title is null
qualify 1 = row_number() over(partition by to_json_string(t) order by offset)
if applied to sample data in your question - output is

Oracle SQL query to fetch data from log table

The below provided data is tiny snapshot of a huge log table.
Please help with me a query to identify records having the TRAN_ID's 451140014 and 440102253.
The status of the record is getting updated to 'Definite' from 'Actual'.
As per the business rules of our application it is NOT suppose to happen, I need to fetch the list of all records in this huge table where the statuses are getting updated.
ROW_ID TRAN_ID TRAN_DATE CHG_TYPE DB_SESSION DB_OSUSER DB_HOST STAT_CD
500-XNEGXU 451327759 7/24/2015 11:35:26 AM Update SBLDATALOAD siebelp pas01 Actual
500-XNEGXU 451299279 7/24/2015 10:13:18 AM Update SBLDATALOAD siebelp pas01 Actual
500-XNEGXU 451140014 7/24/2015 1:04:36 AM Update SBLDATALOAD siebelp pas01 Definite
500-XNEGXU 440102253 6/23/2015 3:10:33 PM Update SBLDATALOAD convteam pas01 Actual
500-XNEGXU 426245149 5/8/2015 2:11:21 PM Update SBLDATALOAD convteam pas11 Actual
Edit :
thanks a lot Ponder for your help. Little modification of your query to get the results in a single row. This would give me the next transaction id which flipped the status from 'Actual' to 'Definite'
select row_id, tran_id, next_tran_id,tran_date, next_tran_date,stat_cd
from (
select abc.*, lag(tran_id) over (order by tran_id desc) next_tran_id,lag(tran_date) over (order by tran_id desc) next_tran_date,
case when stat_cd='Actual' and (lag(stat_cd) over (partition by row_id order by tran_id desc)) = 'Definite' then 1
end change
from abc )
where change = 1 order by row_id, tran_id
This query, using function lead() displays all rows where stat_cd is Definite and prior row in order of tran_id:
select row_id, tran_id, tran_date, stat_cd
from (
select data.*,
case when stat_cd='Definite'
or (lead(stat_cd) over (order by tran_id)) = 'Definite' then 1
end change
from data )
where change = 1 order by row_id, tran_id
SQLFiddle demo
You may need to change over (order by tran_id) to over (partition by row_id order by tran_id) if your data is organized this way.
Edit: Modified query after additional informations were provided:
select row_id, tran_id, tran_date, stat_cd
from (
select xyz.*,
case
when stat_cd='Actual'
and (lead(stat_cd) over (order by tran_id)) = 'Definite' then 1
when stat_cd='Definite'
and (lag(stat_cd) over (order by tran_id)) = 'Actual' then 2
end change
from xyz)
where change is not null
SQLFiddle demo

Why would the query show data from the wrong month?

I have a query:
;with date_cte as(
SELECT r.starburst_dept_name,r.monthly_past_date as PrevDate,x.monthly_past_date as CurrDate,r.starburst_dept_average - x.starburst_dept_average as Average
FROM
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY starburst_dept_name ORDER BY monthly_past_date) AS rowid
FROM intranet.dbo.cse_reports_month
) r
JOIN
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY starburst_dept_name ORDER BY monthly_past_date) AS rowid
FROM intranet.dbo.cse_reports_month
Where month(monthly_past_date) > month(DATEADD(m,-2,monthly_past_date))
) x
ON r.starburst_dept_name = x.starburst_dept_name AND r.rowid = x.rowid+1
Where r.starburst_dept_name is NOT NULL
)
Select *
From date_cte
Order by Average DESC
So doing some testing, I have alter some columns data, to see why it gives me certain information. I don't know why when I run the query it gives my a date column that should not be there from "january" (row 4) like the picture below:
The database has more data that has the same exact date '2014-01-25 00:00:00.000', so I'm not sure why it would only get that row and compare the average?
I did before I run the query alter the column in that row and change the date? But I'm not sure if that would have something to do with it.
UPDATE:
I have added the sqlfinddle,
What I would like to get it subtract the average
from last_month - last 2 month ago.
It Was actually working until I made a change and alter the data.
I made the changes to test a certain situation, which obviously lead
to learning that there are flaws to the query.
Based on your SQL Fiddle, this eliminates joins from prior than month-2 from showing up.
SELECT
thismonth.starburst_dept_name
,lastmonth.monthtly_past_date [PrevDate]
,thismonth.monthtly_past_date [CurrDate]
,thismonth.starburst_dept_average - lastmonth.starburst_dept_average as Average
FROM dbo.cse_reports thismonth
inner join dbo.cse_reports lastmonth on
thismonth.starburst_dept_name = lastmonth.starburst_dept_name
AND month(DATEADD(MONTH,-1,thismonth.monthtly_past_date))=month(lastmonth.monthtly_past_date)
WHERE MONTH(thismonth.monthtly_past_date)=month(DATEADD(MONTH,-1,GETDATE()))
Order by thismonth.starburst_dept_average - lastmonth.starburst_dept_average DESC

Debugging a SQL Query

I have a table structure like below. I need to select the row where User_Id =100 and User_sub_id = 1 and time_used = minimum of all and where Timestamp the highest. The output of my query should result in :
US;1365510103204;NY;1365510103;100;1;678;
My query looks like this.
select *
from my_table
where CODE='DE'
and User_Id = 100
and User_sub_id = 1
and time_used = (select min(time_used)
from my_table
where CODE='DE'
and User_Id=100
and User_sub_id= 1);
this returns me all the 4 rows. I need only 1, the one with highest timestamp.
Many Thanks
CODE: Timestamp: Location: Time_recorded: User_Id: User_sub_Id: time_used
"US;1365510102420;NY;1365510102;100;1;1078;
"US;1365510102719;NY;1365510102;100;1;978;
"US;1365510103204;NY;1365510103;100;1;878;
"US;1365510102232;NY;1365510102;100;1;678;
"US;1365510102420;NY;1365510102;100;1;678;
"US;1365510102719;NY;1365510102;100;1;678;
"US;1365510103204;NY;1365510103;100;1;678;
"US;1365510102420;NY;1365510102;101;1;678;
"US;1365510102719;NY;1365510102;101;1;638;
"US;1365510103204;NY;1365510103;101;1;638;
Another possibly faster solution is using window functions:
select *
from (
select code,
timestamp,
min(time_used) over (partition by user_id, user_sub_id) as min_used,
row_number() over (partition by user_id, user_sub_id order by timestamp desc) as rn,
time_used,
user_id,
user_sub_id
from my_table
where CODE='US'
and User_Id = 100
and User_sub_id = 1
) t
where time_used = min_used
and rn = 1;
This only needs to scan the table once instead of twice as your solution with the sub-select is doing.
I would strongly recommend to rename the column timestamp.
First this is a reserved word and using them is not recommended.
And secondly it doesn't document anything - it's horrible name as such. time_used is much better and you should find something similar for timestamp. Is that the "recording time", the "expiration time", the "due time" or something completely different?
Then try this:
select *
from my_table
where CODE='DE'
and User_Id=100
and User_sub_id=1
and time_used=(
select min(time_used)
from my_table
where CODE='DE'
and User_Id=100 and User_sub_id=1
)
order by "timestamp" desc -- <-- this adds sorting
limit 1; -- <-- this retrieves only one row
Add to your query the following condition
ORDER BY Timestamp DESC, LIMIT 1