BigQuery - Question about retrieving previous value - sql

I have one question about how to fill column looking for previous values.
My problem is that the column "abandoned" need to follow some rules to be filled. The rules are:
The column "abandoned" should be filled with "yes" or "no", choosing one value or another. Never i get two "yes" values followed.
Always **after ** the column "date_hour_canceled_negotiation" is filled, the column "abandoned" should be filled with "yes" and when the column "date_hour_negotiation" or "date_hour_canceled_negotiation" is filled, in same row, the column "abandoned" should be filled with "no".
Could someone help me please? Any help i appreciate a lot.
I would like to do something like this:
contract
date_hour_access
date_hour_negotiation
date_hour_canceled_negotiation
status_negotiation
abandoned
111111
2022-12-01 10:20:00
yes
111111
2022-12-02 10:20:00
no
111111
2022-12-03 10:20:00
yes
111111
2022-12-04 10:20:00
2022-12-04 10:30:00
active
no
111111
2022-12-05 10:20:00
2022-12-05 10:30:00
canceled
no
111111
2022-12-06 10:20:00
yes
111111
2022-12-07 10:20:00
no

One way of approaching this problem is to treat it as a gaps and islands:
compute partitions of records till a "canceled" status is encountered, using a running sum
compute a ranking for each new partition and assign 'yes' and 'no' to values of odd and even ranking numbers
force 'no' on records with the "canceled" status
WITH cte AS (
SELECT *, SUM(CASE WHEN status_negotiation = 'canceled' THEN 1 ELSE 0 END) OVER(PARTITION BY contract ORDER BY date_hour_access) AS partitions_,
CASE WHEN status_negotiation = 'canceled' THEN 1 ELSE 0 END AS canceled
FROM tab
)
SELECT *, CASE WHEN canceled = 1
OR MOD(ROW_NUMBER() OVER(PARTITION BY contract, partitions_, canceled ORDER BY date_hour_access), 2) = 0
THEN 'no' ELSE 'yes'
END AS abandoned
FROM cte
ORDER BY contract, date_hour_access

Related

Find the max value from previous row

I want to find in the below rows the maximum "book_type" value:
book_id
book_type
book_time
uniq_step
book_ordered
1
2022-10-13 00:00:00
800
0
1
2022-10-13 00:00:00
801
0
1
poetry
2022-10-13 00:00:00
802
1
1
2022-10-13 00:00:00
803
0
1
2022-10-13 01:00:00
804
0
1
poetry
2022-10-13 01:00:00
802
1
I want in the line with uniq_step = 804 to have as book_type = poetry but when I use the LAG window function I am getting ' ' (the space string).
So is there any way to take from the partition by book_time the max value as a lag?
You could try using the LAST_VALUE window function in place of the LAG one. Since your "book_type" values can't be NULL in your specific case, you can use a CASE statement inside the window function to make them NULL.
LAST_VALUE(CASE WHEN book_type <> "" THEN book_type END) OVER(
PARTITION BY book_id
ORDER BY uniq_step
)
Side Note: Empty spaces/strings are still values in a DBMS. If you have the possibility of refactoring the empty values in your db to NULL values, that will make the DBMS handle your data better than how it does now.

Apply a discount to order if user already ordered something else

I have a table with users, a table with levels, a table for submitted orders and processed orders.
Here's what the submitted orders looks like:
OrderId UserId Level_Name Discounted_Price Order_Date Price
1 1 OLE Core 0 2020-11-01 00:00:00.000 19.99
2 1 Xandadu 1 2020-11-01 00:00:00.000 0
3 2 Xandadu 0 2020-12-05 00:00:00.000 5
4 1 Eldorado 1 2021-01-31 00:00:00.000 9
5 2 Eldorado 0 2021-02-20 00:00:00.000 10
6 2 Birmingham Blues NULL 2021-07-10 00:00:00.000 NULL
What I am trying to do:
UserId 2 has an order for Birmingham Blues, they have already ordered Eldorado and so qualify for a discount on their Birmingham Blues order. Is there a way to check the entire table for this similarity, and if it exists update the discounted price to a 1 and change the price to lets say 10 for the Birmingham Blues order.
EDIT: I have researched the use of cursors, which I'm sure will do the job but they seem complicated and was hoping a simpler solution would be possible. A lot of threads seem to also avoid using cursors. I also looked at this question: T-SQL: Deleting all duplicate rows but keeping one and was thinking I could potentially use the answer to that in some way.
Based on your description and further comments, the following should hopefully meet your requirements - updating the row for the specified User where the values are currently NULL and the user has a qualifying existing order:
update s set
s.Discounted_Price = 1,
Price = 10
from submitted_Orders s
where s.userId=2
and s.Level_Name = 'Birmingham Blues'
and s.discounted_Price is null
and s.Price is null
and exists (
select * from submitted_orders so
where so.userId = s.userId
and so.Level_name = 'Eldorado'
and so.Order_Date < s.OrderDate
);

How do I check if an specific element inside my SQL array is followed by another one?

So, I have a grouped table that pretty much looks like this:
ID
event
date1
date2
001
click
2021-01-05
2021-01-06
impression
2021-01-05
2021-01-06
click
2021-04-03
2021-04-04
click
2021-05-07
2021-05-08
090
impression
2021-02-02
2021-02-03
impression
2021-06-04
2021-06-05
033
click
2021-03-15
2021-04-16
impression
2021-03-15
2021-04-16
064
impression
2021-05-17
2021-05-18
click
2021-06-19
2021-06-20
I need to get only the ids of users who first clicked and ad (value click on the event column) and saw and ad (impression value on the event column) at the same day in this exact order. The date1 is where we have to look to know if the events happened on the same day. The final result is something like this:
ID
event
date1
date2
001
click
2021-01-05
2021-01-06
impression
2021-01-05
2021-01-06
033
click
2021-03-15
2021-04-16
impression
2021-03-15
2021-04-16
I tried several methods but none of them worked. This what I'm asking can be done using only sql?
Thanks!
Consider below solution
select id, array_agg(event_rec) data
from `project.dataset.table`, unnest(data) event_rec
where event_rec.event in ('click', 'impression')
group by id, date1
having count(distinct event) = 2
if applied to sample data in your question - output is
You can unnest and reaggregate . . . I am thinking:
select t.*,
(select [event, next_event]
from (select event,
lead(event) over (partition by date1 order by n) as next_event,
n
from unnest(t.events) event with offset n
) e
where next_event.event = 'impression' and
event.event = 'click'
qualify row_number() over (order by n) = 1
) as events
from t;
EDIT:
If you are looking specifically for the first two elements of the array, then this is simply:
select t.*,
[event[safe_ordinal(1)].event, event[safe_ordinal(2)].event]
from t
where event[safe_ordinal(1)].event = 'click' and
event[safe_ordinal(2)].event = 'impression';

Calculate running count of events for continuous dates in SQL

I have data that can be summarized as follows:
eventid startdate enddate productkey date startGroup endGroup eventGroup
123 2020-01-01 2020-01-10 123456 2020-01-01 1 0 1
123 2020-01-01 2020-01-10 123456 2020-01-02 0 0 1
123 2020-01-01 2020-01-10 123456 2020-01-03 0 0 1
123 2020-01-01 2020-01-10 123456 2020-01-04 0 1 1
234 2020-01-05 2020-01-07 123456 2020-01-05 1 0 2
234 2020-01-05 2020-01-07 123456 2020-01-06 0 0 2
234 2020-01-05 2020-01-07 123456 2020-01-07 0 1 2
123 2020-01-01 2020-01-10 123456 2020-01-08 1 0 1
123 2020-01-01 2020-01-10 123456 2020-01-09 0 0 1
123 2020-01-01 2020-01-10 123456 2020-01-10 0 1 1
I store various events for products. Since they can be overlapping, I already have code to de-dup the data, but now, with some of the (de-duped) days missing, I need to put the data back together at an event level. In the example data, you see two events, 123 (running from 1/1 to 1/10) and 234 (running from 1/5 to 1/7). I already cut out the middle two days to get rid of overlaps and what I want output-wise, is three groups of events
1/1-1/4 (i.e. last column = 1)
1/5-1/7 (i.e. last column = 2)
1/8-1/10 (i.e. last column = 3)
I already have code to find the right start and end entries for each block of time, but don't know how to calculate the eventGroup column correctly. Current code for the last three columns is as follows:
CASE WHEN DATEADD(DAY, -1, date) = LAG(date) OVER (PARTITION BY eventid, productkey ORDER BY date) THEN 0 ELSE 1 END startGroup,
CASE WHEN DATEADD(DAY, +1, date) = LEAD(date) OVER (PARTITION BY eventid, productkey ORDER BY date) THEN 0 ELSE 1 END endGroup,
dense_rank() over (order by eventid, productkey) eventGroup
I already tried things like https://dba.stackexchange.com/questions/193680/group-rows-by-uninterrupted-dates, but still wasn't able to create the correct groups.
In Excel logic, it would be eventGroup = if ( startGroup = 0, eventGroup of previous row, eventGroup of previous row + 1), but not sure how to replicate that running counter here.
Can someone help please? Thanks!
To assign the groups, use a cumulative sum:
select t.*,
sum(startGroup) over (partition by eventId, productKey order by startdate)
from t;
Note: This assumes that you want to restart the numbering with event/product combination.

Summing with a column but potentially use another column

I have this table
campaignArchive
id campaignID bannerID poolID limitImpressions actualImpressions
-----------------------------------------------------------------------------
1 496 10367 7 12500 205
2 497 10367 7 12500 22860
3 498 10367 7 12500 1525
I need to sum actual impressions to date, which would ordinarily just be:
select sum(actualImpressions) as actuals from campaignArchive
However, if the actualImpressions column value exceeds the limitImpressions column value (as in row 2), I want the limitImpressions column value to be used instead.
Hope that makes sense. Any help appreciated.
You'll need to include a case statement.
select
sum(case when actualImpressions > limitImpressions
then limitImpressions
else actualImpressions end) as actuals
from campaignArchive