Bigquery Query: Adding a specific value to previous rows in BigQuery

Bigquery Query: Adding a specific value to previous rows in BigQuery - sql

I want to create a add a specific value to rows with null value in case they have something that isn't a null value. It's something difficult to understand, but it could be easier in watching the desired output:
This is my actual table:
DATESTAMP______________pressure__________final_date
2021-02-19T21:19:35_______10.12_____________null
2021-02-19T22:19:35_______11.13_____________null
2021-02-19T23:19:35_______10.43_____________null
2021-02-20T00:19:35_______11.98_____________null
2021-02-20T01:19:35_______10.21_____________null
2021-02-20T01:40:10_______20.21_____________2021-02-20
2021-02-24T23:11:00_______10.42_____________null
2021-02-25T00:11:00_______10.51_____________null
2021-02-25T00:11:00_______20.51_____________2021-02-25
2021-02-28T11:11:12_______10.51_____________null
2021-02-28T12:11:12_______10.52_____________null
This is mi desired table after doing the query:
DATESTAMP______________pressure__________final_date
2021-02-19T21:19:35_______10.12_____________2021-02-20
2021-02-19T22:19:35_______11.13_____________2021-02-20
2021-02-19T23:19:35_______10.43_____________2021-02-20
2021-02-20T00:19:35_______11.98_____________2021-02-20
2021-02-20T01:19:35_______10.21_____________2021-02-20
2021-02-20T01:40:10_______20.21_____________2021-02-20
2021-02-24T23:11:00_______10.42_____________2021-02-25
2021-02-25T00:11:00_______10.51_____________2021-02-25
2021-02-25T00:11:00_______20.51_____________2021-02-25
2021-02-28T11:11:12_______10.51_____________null
2021-02-28T12:11:12_______10.52_____________null
It doesn't matter if I have to create a new column:
That's my query:
SELECT *, IF(final_date is null, LAG(final_date ) OVER (ORDER BY DATESTAMP DESC), final_date ) AS preceding FROM(
SELECT
* FROM my_table
ORDER BY DATESTAMP ASC)
ORDER BY DATESTAMP ASC
And that's the result I received in the before query:
DATESTAMP______________pressure_________final_date_______preceding
2021-02-19T21:19:35_______10.12_____________null_____________null
2021-02-19T22:19:35_______11.13_____________null_____________null
2021-02-19T23:19:35_______10.43_____________null _____________null
2021-02-20T00:19:35_______11.98_____________null _____________null
2021-02-20T01:19:35_______10.21_____________null_____________2021-02-20
2021-02-20T01:40:10_______20.21_____________2021-02-20 ______2021-02-20
2021-02-24T23:11:00_______10.42_____________null_____________null
2021-02-25T00:11:00_______10.51_____________null_____________2021-02-25
2021-02-25T00:11:00_______20.51_____________2021-02-25_______2021-02-25
2021-02-28T11:11:12_______10.51_____________null_____________null
2021-02-28T12:11:12_______10.52_____________null_____________null
Can someone help me?
Thanks!

This looks like a cumulative minimum:
SELECT t.*,
MIN(final_date) OVER (ORDER BY DATESTAMP DESC) as imputed_final_date
FROM my_table

Related

Create a date range based on rows but if date skips a day create another row

I have a business requirement to show data from the following table.
I need a way thru SQL to show the data as...
So everytime the User_ID or SPOT or Date is skip in sequential order we create a new row.

Assuming SQL Server, a solution might be:
with MyTbl as (
select *
from ( values
('SomeOne1','A','2023-06-16')
,('SomeOne1','A','2023-06-17')
,('SomeOne1','A','2023-06-18')
,('SomeOne1','A','2023-06-19')
,('SomeOne1','B','2023-06-20')
,('SomeOne1','B','2023-06-21')
,('SomeOne1','B','2023-06-22')
,('SomeOne1','B','2023-06-23')
,('SomeOne1','B','2023-06-24')
,('SomeOne1','B','2023-06-25')
,('SomeOne1','B','2023-06-26')
,('SomeOne1','B','2023-06-27')
,('SomeOneB','A','2023-06-20')
,('SomeOneB','A','2023-06-21')
,('SomeOneB','A','2023-06-22')
,('SomeOneB','A','2023-06-23')
,('SomeOneB','A','2023-06-24')
,('SomeOneB','A','2023-06-25')
,('SomeOneB','A','2023-06-28')
,('SomeOneB','A','2023-06-29')
,('SomeOneB','A','2023-06-30')
,('SomeOneB','A','2023-07-01')
,('SomeOneB','A','2023-07-02')
,('SomeOneB','A','2023-07-03')
) T(UserId, Spot, this_Date)
), AssgnGrp as (
select UserId, Spot, this_date
, [Grp] = DATEADD(DAY,-1 * (DENSE_RANK() OVER (partition by UserId, Spot ORDER BY [this_date])-1), [this_date])
from MyTbl
)
select UserId, Spot, Grp, begin_date=min(this_date), end_date=max(this_date)
from AssgnGrp
group by UserId, Spot, Grp

Big Query first non null value record type

I'm working with Big Query and I have a record field 'funnels_informations' containing two subfields: 'partnership_title' and 'voucher_code'.
I want to have the first non-null value of partnership_title and the corresponding value of voucher_code.
For example here, I want to have partnership_title=indep and voucher_code=null:
Any solution please?
Thanks in advance.

You may consider below scalar subquery for your purpose.
WITH sample_table AS (
SELECT [STRUCT(STRING(null) AS partnership_title, STRING(NULL) AS voucher_code),
('indep', NULL), ('Le', 'LB')
] AS funnels_informations
)
SELECT (SELECT AS STRUCT * EXCEPT(offset)
FROM UNNEST(t.funnels_informations) WITH OFFSET
WHERE partnership_title IS NOT NULL
ORDER BY offset LIMIT 1
).*
FROM sample_table t;

Consider below option
select fi.*
from your_table t, t.funnels_informations fi with offset
where not partnership_title is null
qualify 1 = row_number() over(partition by to_json_string(t) order by offset)
if applied to sample data in your question - output is

Oracle SQL query to fetch data from log table

The below provided data is tiny snapshot of a huge log table.
Please help with me a query to identify records having the TRAN_ID's 451140014 and 440102253.
The status of the record is getting updated to 'Definite' from 'Actual'.
As per the business rules of our application it is NOT suppose to happen, I need to fetch the list of all records in this huge table where the statuses are getting updated.
ROW_ID TRAN_ID TRAN_DATE CHG_TYPE DB_SESSION DB_OSUSER DB_HOST STAT_CD
500-XNEGXU 451327759 7/24/2015 11:35:26 AM Update SBLDATALOAD siebelp pas01 Actual
500-XNEGXU 451299279 7/24/2015 10:13:18 AM Update SBLDATALOAD siebelp pas01 Actual
500-XNEGXU 451140014 7/24/2015 1:04:36 AM Update SBLDATALOAD siebelp pas01 Definite
500-XNEGXU 440102253 6/23/2015 3:10:33 PM Update SBLDATALOAD convteam pas01 Actual
500-XNEGXU 426245149 5/8/2015 2:11:21 PM Update SBLDATALOAD convteam pas11 Actual
Edit :
thanks a lot Ponder for your help. Little modification of your query to get the results in a single row. This would give me the next transaction id which flipped the status from 'Actual' to 'Definite'
select row_id, tran_id, next_tran_id,tran_date, next_tran_date,stat_cd
from (
select abc.*, lag(tran_id) over (order by tran_id desc) next_tran_id,lag(tran_date) over (order by tran_id desc) next_tran_date,
case when stat_cd='Actual' and (lag(stat_cd) over (partition by row_id order by tran_id desc)) = 'Definite' then 1
end change
from abc )
where change = 1 order by row_id, tran_id

This query, using function lead() displays all rows where stat_cd is Definite and prior row in order of tran_id:
select row_id, tran_id, tran_date, stat_cd
from (
select data.*,
case when stat_cd='Definite'
or (lead(stat_cd) over (order by tran_id)) = 'Definite' then 1
end change
from data )
where change = 1 order by row_id, tran_id
SQLFiddle demo
You may need to change over (order by tran_id) to over (partition by row_id order by tran_id) if your data is organized this way.
Edit: Modified query after additional informations were provided:
select row_id, tran_id, tran_date, stat_cd
from (
select xyz.*,
case
when stat_cd='Actual'
and (lead(stat_cd) over (order by tran_id)) = 'Definite' then 1
when stat_cd='Definite'
and (lag(stat_cd) over (order by tran_id)) = 'Actual' then 2
end change
from xyz)
where change is not null
SQLFiddle demo

Why would the query show data from the wrong month?

I have a query:
;with date_cte as(
SELECT r.starburst_dept_name,r.monthly_past_date as PrevDate,x.monthly_past_date as CurrDate,r.starburst_dept_average - x.starburst_dept_average as Average
FROM
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY starburst_dept_name ORDER BY monthly_past_date) AS rowid
FROM intranet.dbo.cse_reports_month
) r
JOIN
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY starburst_dept_name ORDER BY monthly_past_date) AS rowid
FROM intranet.dbo.cse_reports_month
Where month(monthly_past_date) > month(DATEADD(m,-2,monthly_past_date))
) x
ON r.starburst_dept_name = x.starburst_dept_name AND r.rowid = x.rowid+1
Where r.starburst_dept_name is NOT NULL
)
Select *
From date_cte
Order by Average DESC
So doing some testing, I have alter some columns data, to see why it gives me certain information. I don't know why when I run the query it gives my a date column that should not be there from "january" (row 4) like the picture below:
The database has more data that has the same exact date '2014-01-25 00:00:00.000', so I'm not sure why it would only get that row and compare the average?
I did before I run the query alter the column in that row and change the date? But I'm not sure if that would have something to do with it.
UPDATE:
I have added the sqlfinddle,
What I would like to get it subtract the average
from last_month - last 2 month ago.
It Was actually working until I made a change and alter the data.
I made the changes to test a certain situation, which obviously lead
to learning that there are flaws to the query.

Based on your SQL Fiddle, this eliminates joins from prior than month-2 from showing up.
SELECT
thismonth.starburst_dept_name
,lastmonth.monthtly_past_date [PrevDate]
,thismonth.monthtly_past_date [CurrDate]
,thismonth.starburst_dept_average - lastmonth.starburst_dept_average as Average
FROM dbo.cse_reports thismonth
inner join dbo.cse_reports lastmonth on
thismonth.starburst_dept_name = lastmonth.starburst_dept_name
AND month(DATEADD(MONTH,-1,thismonth.monthtly_past_date))=month(lastmonth.monthtly_past_date)
WHERE MONTH(thismonth.monthtly_past_date)=month(DATEADD(MONTH,-1,GETDATE()))
Order by thismonth.starburst_dept_average - lastmonth.starburst_dept_average DESC

Debugging a SQL Query

I have a table structure like below. I need to select the row where User_Id =100 and User_sub_id = 1 and time_used = minimum of all and where Timestamp the highest. The output of my query should result in :
US;1365510103204;NY;1365510103;100;1;678;
My query looks like this.
select *
from my_table
where CODE='DE'
and User_Id = 100
and User_sub_id = 1
and time_used = (select min(time_used)
from my_table
where CODE='DE'
and User_Id=100
and User_sub_id= 1);
this returns me all the 4 rows. I need only 1, the one with highest timestamp.
Many Thanks
CODE: Timestamp: Location: Time_recorded: User_Id: User_sub_Id: time_used
"US;1365510102420;NY;1365510102;100;1;1078;
"US;1365510102719;NY;1365510102;100;1;978;
"US;1365510103204;NY;1365510103;100;1;878;
"US;1365510102232;NY;1365510102;100;1;678;
"US;1365510102420;NY;1365510102;100;1;678;
"US;1365510102719;NY;1365510102;100;1;678;
"US;1365510103204;NY;1365510103;100;1;678;
"US;1365510102420;NY;1365510102;101;1;678;
"US;1365510102719;NY;1365510102;101;1;638;
"US;1365510103204;NY;1365510103;101;1;638;

Another possibly faster solution is using window functions:
select *
from (
select code,
timestamp,
min(time_used) over (partition by user_id, user_sub_id) as min_used,
row_number() over (partition by user_id, user_sub_id order by timestamp desc) as rn,
time_used,
user_id,
user_sub_id
from my_table
where CODE='US'
and User_Id = 100
and User_sub_id = 1
) t
where time_used = min_used
and rn = 1;
This only needs to scan the table once instead of twice as your solution with the sub-select is doing.
I would strongly recommend to rename the column timestamp.
First this is a reserved word and using them is not recommended.
And secondly it doesn't document anything - it's horrible name as such. time_used is much better and you should find something similar for timestamp. Is that the "recording time", the "expiration time", the "due time" or something completely different?

Then try this:
select *
from my_table
where CODE='DE'
and User_Id=100
and User_sub_id=1
and time_used=(
select min(time_used)
from my_table
where CODE='DE'
and User_Id=100 and User_sub_id=1
)
order by "timestamp" desc -- <-- this adds sorting
limit 1; -- <-- this retrieves only one row

Add to your query the following condition
ORDER BY Timestamp DESC, LIMIT 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Bigquery Query: Adding a specific value to previous rows in BigQuery - sql

This looks like a cumulative minimum: SELECT t.*, MIN(final_date) OVER (ORDER BY DATESTAMP DESC) as imputed_final_date FROM my_table

Related

Create a date range based on rows but if date skips a day create another row

Big Query first non null value record type

Oracle SQL query to fetch data from log table

Why would the query show data from the wrong month?

Debugging a SQL Query

Categories

Resources