Multiple array_agg in bigquery

Multiple array_agg in bigquery - google-bigquery

I have a table like this:
I wanted to group information in arrays based on the first two variables and thats what I did
WITH sample as (
SELECT 1023 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'chop' as type,
'c1023' as id_cus, 'julian' as name, '12345' as phone, 'julian#gmail.com' as mail
UNION ALL
SELECT 1023 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'ext' as type,
'c1023' as id_cus, 'julian' as name, '12345' as phone, 'julian#gmail.com' as mail
UNION ALL
SELECT 1021 as id,10 as valuation,'inegi' as origin, 2021-01-01 as date_lead,'ext' as type,
'in-2020' as id_cus, 'lucian' as name, '12345' as phone, 'lucian#gmail.com' as mail
UNION ALL
SELECT 1021 as id,10 as valuation,'inegi' as origin, 2021-01-01 as date_lead,'ext' as type,
'in-2020' as id_cus, 'lucian' as name, '12345' as phone, 'lucian#gmail.com' as mail
UNION ALL
SELECT 1021 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'int' as type,
'c1021' as id_cus, 'lucian' as name, '12345' as phone, 'lucian#gmail.com' as mail
UNION ALL
SELECT 1021 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'int' as type,
'c1021' as id_cus, 'lucas' as name, '202342' as phone, 'lucas#gmail.com' as mail
UNION ALL
SELECT 1040 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'type' as type,
'c1040' as id_cus, 'julieta' as name, '202112' as phone, 'julieta#gmail.com' as mail
UNION ALL
SELECT 1040 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'chop' as type,
'c1040' as id_cus, 'julieta' as name, '202112' as phone, 'julieta#gmail.com' as mail
UNION ALL
SELECT 1040 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'rad' as type,
'c1040' as id_cus, 'julieta' as name, '202112' as phone, 'julieta#gmail.com' as mail
UNION ALL
SELECT 1040 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'uls' as type,
'c1040' as id_cus, 'julieta' as name, '123123' as phone, 'julieta#gmail.com' as mail
)
SELECT id,valuation,ARRAY_AGG(STRUCT(origin,date_lead,type,id_cus,name,phone,mail)) as lead
FROM sample
GROUP BY id,valuation
The problem here is that I noticed that I have a lot of repeteated values in the last three variables (name,phone and mail).
I would like to group them as well but I am not sure how to do that. I noticed that I can't create an additional array_agg instide of the first one.
I am looking to get something like this:
There is anyway to do something like this? WHat would you do?
Thank you.

I'd wrap another SELECT ... GROUP BY around the original query like:
WITH sample as (
SELECT 1023 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'chop' as type,
'c1023' as id_cus, 'julian' as name, '12345' as phone, 'julian#gmail.com' as mail
UNION ALL
SELECT 1023 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'ext' as type,
'c1023' as id_cus, 'julian' as name, '12345' as phone, 'julian#gmail.com' as mail
UNION ALL
SELECT 1021 as id,10 as valuation,'inegi' as origin, 2021-01-01 as date_lead,'ext' as type,
'in-2020' as id_cus, 'lucian' as name, '12345' as phone, 'lucian#gmail.com' as mail
UNION ALL
SELECT 1021 as id,10 as valuation,'inegi' as origin, 2021-01-01 as date_lead,'ext' as type,
'in-2020' as id_cus, 'lucian' as name, '12345' as phone, 'lucian#gmail.com' as mail
UNION ALL
SELECT 1021 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'int' as type,
'c1021' as id_cus, 'lucian' as name, '12345' as phone, 'lucian#gmail.com' as mail
UNION ALL
SELECT 1021 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'int' as type,
'c1021' as id_cus, 'lucas' as name, '202342' as phone, 'lucas#gmail.com' as mail
UNION ALL
SELECT 1040 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'type' as type,
'c1040' as id_cus, 'julieta' as name, '202112' as phone, 'julieta#gmail.com' as mail
UNION ALL
SELECT 1040 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'chop' as type,
'c1040' as id_cus, 'julieta' as name, '202112' as phone, 'julieta#gmail.com' as mail
UNION ALL
SELECT 1040 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'rad' as type,
'c1040' as id_cus, 'julieta' as name, '202112' as phone, 'julieta#gmail.com' as mail
UNION ALL
SELECT 1040 as id,10 as valuation,'tlv' as origin, 2021-01-01 as date_lead,'uls' as type,
'c1040' as id_cus, 'julieta' as name, '123123' as phone, 'julieta#gmail.com' as mail
)
SELECT id,valuation,ARRAY_AGG(STRUCT(lead,name,phone,mail)) FROM (
SELECT id,valuation,name,phone,mail, ARRAY_AGG(STRUCT(origin,date_lead,type,id_cus)) as lead
FROM sample
GROUP BY id,valuation,name,phone,mail
)
GROUP BY id,valuation

Related

Calculating if an higher ranked class was offered and booked

I am trying to solve following issue - and sorry if i'm drawing out too much:
A Flight Booking Platform is asking for prices on a Price Calculation Engine and its passing all necessary information to come up with prices for various options. Technically the Platform is asking for various combination of Routes (direct indirect flights, 1,2,3...stops ect.) so we will have many variants of Request ID and Route. There can be multiple requests per customer for same conditions until its booked.
The Booking Platform will - whenever it's asking for route prices - try to offer an higher class (only if available) to the customer - therefore it will do another price call within 0-5s but with a different class.
I know which values are from the higher classes e.g. U, others are just normal classes:e.g. I
I'm looking for a sql query to find out if an higher class was offered (within the next 5s) and if the customer booked the higher class - basically enhance the booking table and add "upselling offered" , "upselling realized".
"upselling offered" -> If there is an request for the same customer, origin, destination and date +-(5s) but the class is different than "yes" (higher class was available) if not "no".
"upselling upselling realized" -> If the customer asked for a lower class but then booked the upselling offer
There could be cases where no higher class was available - so in this case there would be only one class for that combination of Customer, Origin ect..
Table i'm looking for should look like:
request_id route customer origin destination req_date class price booked_request_id selected_route upselling_offered upselling_realized
124 2 c a b 2000-01-01 00:00:02.000 I 22 124 2 yes no
128 1 c a b 2000-01-05 00:00:03.000 U 24 128 1 yes yes
129 2 c a b 2000-01-05 00:00:08.000 I 23 129 2 no no
SQL for Values with the booking table:
with rr as (
select 123 as request_id, 2 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-01 00:00:00') as req_date,'I' as class ,17 as price ,'normal request' as explanation union all
select 123 as request_id, 3 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-01 00:00:00') as req_date,'I' as class ,20 as price ,'normal request' as explanation union all
select 124 as request_id, 1 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-01 00:00:02') as req_date,'I' as class ,19 as price ,'normal request' as explanation union all
select 124 as request_id, 2 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-01 00:00:02') as req_date,'I' as class ,22 as price ,'normal request' as explanation union all
select 124 as request_id, 3 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-01 00:00:02') as req_date,'I' as class ,25 as price ,'normal request' as explanation union all
select 125 as request_id, 1 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-01 00:00:06') as req_date,'U' as class ,26 as price ,'uselling offer' as explanation union all
select 125 as request_id, 2 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-01 00:00:06') as req_date,'U' as class ,27 as price ,'uselling offer' as explanation union all
select 126 as request_id, 1 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-03 00:00:03') as req_date,'I' as class ,24 as price ,'normal request' as explanation union all
select 126 as request_id, 2 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-03 00:00:03') as req_date,'I' as class ,28 as price ,'normal request' as explanation union all
select 126 as request_id, 3 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-03 00:00:03') as req_date,'I' as class ,23 as price ,'normal request' as explanation union all
select 127 as request_id, 1 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-05 00:00:03') as req_date,'I' as class ,22 as price ,'normal request' as explanation union all
select 127 as request_id, 2 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-05 00:00:03') as req_date,'I' as class ,26 as price ,'normal request' as explanation union all
select 128 as request_id, 3 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-05 00:00:03') as req_date,'U' as class ,29 as price ,'uselling offer' as explanation union all
select 128 as request_id, 1 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-05 00:00:03') as req_date,'U' as class ,24 as price ,'uselling offer' as explanation union all
select 129 as request_id, 2 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-02-08 00:00:08') as req_date,'I' as class ,23 as price ,'normal request' as explanation union all
select 129 as request_id, 3 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-02-08 00:00:08') as req_date,'I' as class ,26 as price ,'normal request' as explanation
),
bookings as (
select 124 as booked_request_id, 2 as selected_route union all
select 128 as booked_request_id, 1 as selected_route union all
select 129 as booked_request_id, 2 as selected_route
)
--select req_date, class, lead(req_date) over (partition by customer,origin, destination, class order by req_date )
--from rr left join bookings
-- on rr.request_id = bookings.booked_request_id
-- and rr.route = bookings.selected_route
--order by class,req_date
select req_date, request_id, route, class,
case when class = 'I' then
case when
lead(req_date) over (partition by customer,origin, destination, class, req_date order by req_date ) <= dateadd(second, 5, req_date)
and lead(class) over (partition by customer,origin, destination, class, req_date order by req_date ) <> class
then 'Yes' else 'No' end
when class = 'U' then
case when
lag(req_date) over (partition by customer,origin, destination, class, req_date order by req_date ) >= dateadd(second, -5, req_date)
and lag(class) over (partition by customer,origin, destination, class, req_date order by req_date ) <> class
then 'Yes' else 'No' end
end as upselling_offered
from rr left join bookings
on rr.request_id = bookings.booked_request_id
and rr.route = bookings.selected_route
order by req_date
Unfortunately my query doies not give the desired result - any idea what i'm missing?

Why not do a select that looks for the same customer, for the same route,
WHERE customer = {customer}
AND price > {previous booked price}
AND class > {previous booked class}
AND req_date > {previous_req_date}
AND req_date <= {previous_req_date + 5 secs}
AND upselling_offered = yes
AND upselling_realized = yes
The items in brackets are values you would code into your query request.
Also I recommend an index on customer, req_date, upselling_offered and upselling_realized.
If this does not answer your question, I may not be fully understanding what you are wanting to achieve.

meanwhile i have solved it - thanks. My issue was that in the window function i grouped by class which gave me the wrong window to look for the lead/lag value...
Here is the solution:
with rr as (
select 123 as request_id, 2 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-01 00:00:00') as req_date,'I' as class ,17 as price ,'normal request' as explanation union all
select 123 as request_id, 3 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-01 00:00:00') as req_date,'I' as class ,20 as price ,'normal request' as explanation union all
select 124 as request_id, 1 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-01 00:00:02') as req_date,'I' as class ,19 as price ,'normal request' as explanation union all
select 124 as request_id, 2 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-01 00:00:02') as req_date,'I' as class ,22 as price ,'normal request' as explanation union all
select 124 as request_id, 3 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-01 00:00:02') as req_date,'I' as class ,25 as price ,'normal request' as explanation union all
select 125 as request_id, 1 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-01 00:00:06') as req_date,'U' as class ,26 as price ,'uselling offer' as explanation union all
select 125 as request_id, 2 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-01 00:00:06') as req_date,'U' as class ,27 as price ,'uselling offer' as explanation union all
select 126 as request_id, 1 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-03 00:00:03') as req_date,'I' as class ,24 as price ,'normal request' as explanation union all
select 126 as request_id, 2 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-03 00:00:03') as req_date,'I' as class ,28 as price ,'normal request' as explanation union all
select 126 as request_id, 3 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-03 00:00:03') as req_date,'I' as class ,23 as price ,'normal request' as explanation union all
select 127 as request_id, 1 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-05 00:00:03') as req_date,'I' as class ,22 as price ,'normal request' as explanation union all
select 127 as request_id, 2 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-05 00:00:03') as req_date,'I' as class ,26 as price ,'normal request' as explanation union all
select 128 as request_id, 3 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-05 00:00:03') as req_date,'U' as class ,29 as price ,'uselling offer' as explanation union all
select 128 as request_id, 1 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-01-05 00:00:03') as req_date,'U' as class ,24 as price ,'uselling offer' as explanation union all
select 129 as request_id, 2 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-02-08 00:00:08') as req_date,'I' as class ,23 as price ,'normal request' as explanation union all
select 129 as request_id, 3 as route, 'c' as customer, 'a' as origin, 'b' AS destination, convert(datetime, '2000-02-08 00:00:08') as req_date,'I' as class ,26 as price ,'normal request' as explanation
),
bookings as (
select 124 as booked_request_id, 2 as selected_route union all
select 128 as booked_request_id, 1 as selected_route union all
select 129 as booked_request_id, 2 as selected_route
),
upsell_table as (
select request_id, max(upselling_offered) as upselling, class from
( select req_date, request_id, route, class,
--lead(req_date) over (partition by customer,origin, destination order by req_date asc) AS lead_date,
--lead(class) over (partition by customer,origin, destination order by req_date asc) AS lead_class,
case when class = 'I' then
case when
lead(req_date) over (partition by customer,origin, destination order by req_date asc) <= dateadd(second, 5, req_date)
and lead(class) over (partition by customer,origin, destination order by req_date asc) <> class
then 1 else 0 end
when class = 'U' then
case when
lag(req_date) over (partition by customer, origin, destination order by req_date ) >= dateadd(second, -5, req_date)
and lag(class) over (partition by customer, origin, destination order by req_date ) <> class
then 1 else 0 end
end as upselling_offered
from rr left join bookings
on rr.request_id = bookings.booked_request_id
and rr.route = bookings.selected_route
) T1
group by request_id, class
)
select booked_request_id, selected_route,
case when upselling = 1 then 'yes' else 'no' end as 'upselling_offered'
,case when upselling = 1 and T1.class = 'U' then 'yes' else 'no' end as 'upselling_successfull'
from upsell_table T1 join bookings
on request_id = bookings.booked_request_id

Retrieve last balance from update date using rank() and subquery in Oracle SQL

I'm having troubles retrieving balance information from my table. Dataset looks like this:
| Name | Last Name | Balance | Update Date |
+---------------+---------------+---------+-------------+
| John | Doe | $1600 | 2017-01-01 |
| John | Doe | $12 | 2017-01-02 |
| John | Doe | $1 | 2017-01-03 |
| John | Doe | $16 | 2017-01-04 |
| John | Doe | $16 | 2017-01-05 |
| John | Doe | $16 | 2017-01-06 |
The task is to get most recent Balance with Update Date, but if same Balance is the same for several days, then in that case we need to get first Update Date with this Balance, so in that case, we need the following result:
| Name | Last Name | Balance | Update Date |
+---------------+---------------+---------+-------------+
| John | Doe | $16 | 2017-01-04 |
I tried to use my query:
select
a.name,
a.last_name,
a.balance,
a.update_date
from
(select
name,
last_name,
balance,
update_date,
rank () over (partition by name, last_name order by update_date desc) top
from
customer_balance) a
where
a.top = 1
but it obviously returns:
| Name | Last Name | Balance | Update Date |
+---------------+---------------+---------+-------------+
| John | Doe | $16 | 2017-01-06 |
I'm not sure how to modify it to get desired result. Please note that I have limited access so no temp tables, functions or anything like that is allowed. Just plain selects, nothing fancy.
I'd appreciate your help.

You can do this by using Tabibitosan to find the group of rows with the same balance that contains the latest update_date row (the difference between the rows at the top of the whole dataset and the latest balance will be 0) and then a group by to pick the earliest update_date, like so:
WITH customer_balance AS (SELECT 'John' first_name, 'Doe' last_name, 1600 balance, to_date('01/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe' last_name, 12 balance, to_date('02/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe' last_name, 1 balance, to_date('03/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe' last_name, 16 balance, to_date('04/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe' last_name, 16 balance, to_date('05/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe' last_name, 16 balance, to_date('06/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe2' last_name, 1600 balance, to_date('01/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe2' last_name, 12 balance, to_date('02/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe2' last_name, 1 balance, to_date('03/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe2' last_name, 16 balance, to_date('04/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe2' last_name, 15 balance, to_date('05/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe2' last_name, 16 balance, to_date('06/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe2' last_name, 16 balance, to_date('07/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe3' last_name, 1600 balance, to_date('01/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe3' last_name, 12 balance, to_date('02/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe3' last_name, 1 balance, to_date('03/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe3' last_name, 16 balance, to_date('04/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe3' last_name, 16 balance, to_date('05/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe3' last_name, 16 balance, to_date('06/01/2017', 'dd/mm/yyyy') update_date FROM dual UNION ALL
SELECT 'John' first_name, 'Doe3' last_name, 17 balance, to_date('07/01/2017', 'dd/mm/yyyy') update_date FROM dual)
SELECT first_name,
last_name,
balance,
min(update_date) update_date
FROM (SELECT first_name,
last_name,
balance,
update_date,
row_number() OVER (PARTITION BY first_name, last_name ORDER BY update_date DESC) -- row number across the entire dataset (i.e. for each first_name and last_name)
- row_number() OVER (PARTITION BY first_name, last_name, balance ORDER BY update_date DESC) grp -- row number across each balance in the entire dataset.
FROM customer_balance)
WHERE grp = 0
GROUP BY first_name,
last_name,
balance;
FIRST_NAME LAST_NAME BALANCE UPDATE_DATE
---------- --------- ---------- -----------
John Doe 16 04/01/2017
John Doe2 16 06/01/2017
John Doe3 17 07/01/2017
I've provided 3 scenarios:
Where the latest rows are for the same balance but that balance doesn't occur earlier in the dataset (i.e. your original dataset)
The latest rows are for the same balance but that balance occurs earlier in the dataset
The latest row has a different balance to the previous row.

Maybe you could try this query
WITH bal AS
(SELECT 'John' first_name,
'Doe' last_name,
1600 balance,
to_date('20170101', 'YYYYMMDD') update_date
FROM dual
UNION ALL SELECT 'John',
'Doe',
12 balance,
to_date('20170102', 'YYYYMMDD') update_date
FROM dual
UNION ALL SELECT 'John',
'Doe',
1 balance,
to_date('20170103', 'YYYYMMDD') update_date
FROM dual
UNION ALL SELECT 'John',
'Doe',
16 balance,
to_date('20170104', 'YYYYMMDD') update_date
FROM dual
UNION ALL SELECT 'John',
'Doe',
16 balance,
to_date('20170105', 'YYYYMMDD') update_date
FROM dual
UNION ALL SELECT 'John',
'Doe',
16 balance,
to_date('20170106', 'YYYYMMDD') update_date
FROM dual
UNION ALL SELECT 'John',
'Doe',
328 balance,
to_date('20170107', 'YYYYMMDD') update_date
FROM dual) -- The main query
SELECT *
FROM
(SELECT bal.*,
LAG(balance) OVER(PARTITION BY first_name, last_name
ORDER BY update_date)prev_balance
FROM bal )
WHERE prev_balance IS NULL
OR balance != prev_balance
In the first step we get the previous balance.
At the second one we remove all lines where previous balance equals current one.
BTW sorry for the layout I answered from my smartphone.

I don't have time to write out a tested solution, but the analytic function lead() and lag() are intended for this:
select name, last_name, balance, update_date
from (select name,
last_name,
balance,
update_date,
lead(balance) over (partition by first_name, last_name
order by update_date)
as next_balance
where balance = :target_balance
order by update_date
)
where balance <> next_balance
and rownum = 1

How to transform ticks into minute bars in SQL

I have market data stored in a table in the following format:
Timestamp Price Quantity Condition
01/11/2016 09:03:57 14.34 1 S
01/11/2016 09:03:58 14.31 5
01/11/2016 09:03:59 14.34 1 S
01/11/2016 09:03:59 14.35 2
etc.
I want to group this into bars of one minute length, looking something like this:
BarEndTime Open High Low Close
01/11/2016 09:03 14.15 14.16 14.13 14.15
01/12/2016 09:04 14.17 14.19 14.17 14.18
How do I group this data into one minute clusters based on the timestamp of the base data set? I do this fairly easily in R, but for a number of reasons I'd like to build these in SQL as well.

I have no knowledge of R therefore I can only guess what "buckets" and "cluster" are. But if, by any chance you should be interesed in the opening, minimum, maximum and closing values of Pricefor each minute interval then the following might be helpful:
;WITH cte AS (
SELECT CONVERT(char(16),Timestamp,126) ts, MIN(Price) p0, MAX(Price) p1,
MIN(Timestamp) t0, MAX(Timestamp) t1
FROM #tbl GROUP BY CONVERT(char(16),Timestamp,126)
)
SELECT ts,(SELECT min(Price) FROM #tbl WHERE Timestamp=t0) po,
p0,p1,
(SELECT max(Price) FROM #tbl WHERE Timestamp=t1) pc
FROM cte
See here for an example.
Input:
Timestamp Price Qty Cnd
01/11/2016 09:03:57 14.34 1 S
01/11/2016 09:03:58 14.31 5
01/11/2016 09:03:59 14.34 1 S
01/11/2016 09:03:59 14.35 2
01/11/2016 09:04:37 11.84 1 S
01/11/2016 09:04:48 12.36 5
01/11/2016 09:04:49 14.54 1 S
01/11/2016 09:04:59 13.35 2
Output:
ts po p0 p1 pc
2016-01-11T09:03 14.34 14.31 14.35 14.35
2016-01-11T09:04 11.84 11.84 14.54 13.35
Since according to the sample data there can be more than one Price for a particular Timestamp given I had to equip the (SELECT min(Price) FROM #tbl WHERE Timestamp=t0) subquery for the opening and closing prices with a min()/max() aggregate function. Maybe you can find a better solution to limit these subqueries to just a one-value result.
In my solution I used a common table expression (CTE), which is not available in some database systems like MySql. So, in case you are using a RDBS without CTE you can easily rewrite the above using a simple subquery since the cte is only referenced once anyway:
SELECT ts,(SELECT min(Price) FROM #tbl WHERE Timestamp=t0) po,p0,p1,
(SELECT max(Price) FROM #tbl WHERE Timestamp=t1) pc
FROM
(SELECT CONVERT(char(16),Timestamp,126) ts, MIN(Price) p0, MAX(Price) p1,
MIN(Timestamp) t0, MAX(Timestamp) t1
FROM #tbl GROUP BY CONVERT(char(16),Timestamp,126)) subq

If you are on Oracle:
calculating open as the value occurring first in a minute (lowest value if more than one on the first timestamp, and close being the last occurring value ina minute (higher of the values if multiples exist with the same timestamp), analytics become your friend.
with dat as(
SELECT to_Date('01/11/2016 09:03:57','dd/mm/yyyy hh24:mi:ss') ts, 14.34 val, 1 qty, 'S' cond from dual union all
SELECT to_Date('01/11/2016 09:03:58','dd/mm/yyyy hh24:mi:ss') ts, 14.31 val, 5 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:03:59','dd/mm/yyyy hh24:mi:ss') ts, 14.34 val, 1qty, 'S' cond from dual union all
SELECT to_Date('01/11/2016 09:03:59','dd/mm/yyyy hh24:mi:ss') ts, 14.35 val, 2 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:03:51','dd/mm/yyyy hh24:mi:ss') ts, 14.35 val, 2 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:04:09','dd/mm/yyyy hh24:mi:ss') ts, 14.45 val, 2 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:04:19','dd/mm/yyyy hh24:mi:ss') ts, 14.15 val, 2 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:04:29','dd/mm/yyyy hh24:mi:ss') ts, 14.55 val, 2 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:04:39','dd/mm/yyyy hh24:mi:ss') ts, 14.85 val, 2 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:04:49','dd/mm/yyyy hh24:mi:ss') ts, 14.45 val, 2 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:04:59','dd/mm/yyyy hh24:mi:ss') ts, 14.25 val, 2 qty, null cond from dual )
select trunc(ts,'mi') as ts_minute,
min (val) keep (dense_rank first order by ts) as open_val,
max (val) keep (dense_rank last order by ts) as close_val,
min (val) min_val,
max(val) max_val
from dat
group by trunc(ts,'mi') ;
TS_MINUTE, OPEN_VAL, CLOSE_VAL, MIN_VAL, MAX_VAL
01/11/2016 9:03:00 AM, 14.35, 14.35, 14.31, 14.35
01/11/2016 9:04:00 AM, 14.45, 14.25, 14.15, 14.85

Oracle SQL Row Number selection

These below all relate to the same record in the same file....basically it is labelled 'UNK' until someone assigns a product number to it. in this case the number 12345678 was assigned by Paul on 01Jan. Each record before/after that is when someone changes something on that record.
What I want is to capture that record, the 1st time when it goes from UNK to a number... and capture the user name and date etc from that line.
I have tried min, least, and I'm not sure about rownum or where to put the string if I did.
Car_Id Product # user name date
111 unk john 20Dec
111 unk alan 25Dec
111 unk pete 30Dec
111 12345678 paul 01Jan
111 12345678 jim 10Jan
222 unk alan 25Dec
222 unk pete 30Dec
222 87654321 paul 02Jan
222 87654321 steve 05Jan
But in logical terms I want it to do this... give me the 1st record after UNK.
Please can I have the full string if possible.

Correct me if I am wrong, but your data seems to be ordered by date, so logically you could just take the first recoredset where the productnumber is not "unk".
Select *
From (SELECT * FROM YourTable orderby date) t -- make sure data is ordered before selecting it
where t.ProductNr <> 'unk' and -- don't get data without a number
rownum = 1 -- take the first

Sounds like maybe the analytic function row_number() would be the best way to do this:
with sample_data as (select 111 car_id, 'unk' product#, 'john' user_name, to_date('20/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 111 car_id, 'unk' product#, 'alan' user_name, to_date('21/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 111 car_id, 'unk' product#, 'pete' user_name, to_date('22/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 111 car_id, '12345678' product#, 'paul' user_name, to_date('23/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 111 car_id, '12345678' product#, 'jim' user_name, to_date('24/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 222 car_id, 'unk' product#, 'alan' user_name, to_date('25/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 222 car_id, 'unk' product#, 'pete' user_name, to_date('26/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 222 car_id, '87654321' product#, 'paul' user_name, to_date('27/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 222 car_id, '87654321' product#, 'steve' user_name, to_date('28/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual)
select car_id,
product#,
user_name,
dt
from (select sd.*,
row_number() over (partition by car_id order by dt) rn
from sample_data sd
where product# != 'unk')
where rn = 1;
CAR_ID PRODUCT# USER_NAME DT
---------- -------- --------- ---------------------
111 12345678 paul 23/12/2014 10:12:24
222 87654321 paul 27/12/2014 10:12:24

Conditional aggregation - once for each key

I have an aggregation problem that can probably best be described with some example data.
Below is a dataset with transports, identified by trp_no. Each such transport is loaded in a container. A container may hold multiple such transports, and in this example any transport may only be loaded in one container.
TRP_NO TRANSPORT_VOLUME COUNTRY CONTAINER_ID CONTAINER_MAX
------ ---------------- ------- ------------ -------------
1 10 SE A 80
2 20 SE A 80
3 30 SE A 80
The following keys (or functional dependencies) exists in the dataset:
trp_no -> {transport_volume, country, container_id}
container_id -> {container_max}
I want to calculate Filling Rate per Country, which is calculated as transported volume divided by the capacity. Translated into SQL, this becomes:
with sample_data as(
select 1 as trp_no, 10 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
select 2 as trp_no, 20 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
select 3 as trp_no, 30 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual
)
select country
,sum(transport_volume) / sum(container_max)
from sample_data
group
by country;
...which returns (10+20+30) / (80+80+80) = 25%. Which is not what I want, because all transports used the same container_id, and my query triple-counted the capacity.
The result I want is (10+20+30) / 80 = 75%.
So, I only want to sum container_max once for each container_id within the group.
Any ideas on how to fix the query?

This uses Rachcha's bigger sample set, which I think is necessary to really test this problem.
with sample_data as(
select 1 as trp_no, 10 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
select 2 as trp_no, 20 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
select 3 as trp_no, 30 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
select 4 as trp_no, 10 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
select 5 as trp_no, 20 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
select 6 as trp_no, 30 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
select 7 as trp_no, 10 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
select 8 as trp_no, 15 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
select 9 as trp_no, 20 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual
),
country_container_sum as
(
select country, sum(container_max) sum_container_max
from
(
select distinct country, container_id, container_max
from sample_data
)
group by country
),
country_transport_volume_sum as
(
select country, sum(transport_volume) sum_transport_volume
from sample_data
group by country
)
select country, sum_transport_volume / sum_container_max rate
from country_container_sum
join country_transport_volume_sum using (country);
Results:
COUNTRY RATE
------- ----
SE 0.666666666666667
AU 0.9

I added a little more sample data for illustrating a minor fix in the query that solved it-
with sample_data as(
select 1 as trp_no, 10 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
select 2 as trp_no, 20 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
select 3 as trp_no, 30 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
select 4 as trp_no, 10 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
select 5 as trp_no, 20 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
select 6 as trp_no, 30 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
select 7 as trp_no, 10 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
select 8 as trp_no, 15 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
select 9 as trp_no, 20 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual
)
select country
,sum(transport_volume / container_max) -- Note the change here
from sample_data
group
by country;
OUTPUT:
COUNTRY SUM(TRANSPORT_VOLUME/CONTAINER_MAX)
------- -----------------------------------
SE 1.35
AU .9
EDIT:
As I see your sample data, I think you need a bit of normalization in your database. The columns for a container and columns for a transport trip should reside in separate tables like this:\
TABLE CONTAINER (
container_id VARCHAR2 / INTEGER,
container_max INTEGER,
country VARCHAR2
)
TABLE trip (
trp_no INTEGER,
transport_volume INTEGER,
container_id VARCHAR2 / INTEGER REFERENCES container.container_id
)
EDIT 2:
If you want to specifically sum up the transport volumes according to the containers' capacities, you can use something like the following query (with the same sample data table sample_data from above):
select d.country,
(select sum(t.transport_volume)
from sample_data t
where t.country = d.country) /
(select sum(c.container_max)
from ( select country, container_max
from sample_data
group by container_id, country, container_max
) c
where c.country = d.country) as col1
from sample_data d
group by d.country;
OUTPUT:
COUNTRY COL1
------- -----------
SE 0.666666667
AU 0.9

This approach, while other ways are simpler, uses analytic functions. I only edit with this approach because, while jonearle's response gives you the correct output, you responded saying that you wanted an approach that uses analytic functions. This approach uses analytic functions.
However, you cannot use aggregate functions nor the group by clause with analytic functions (the idea itself doesn't make sense), without adding a second layer to the query. Depending on what other similar queries you want to run, this might be easier for you as a template query, however it's hard to tell without knowing what other similar queries you're running.
with sample_data as(
select 1 as trp_no, 10 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
select 2 as trp_no, 20 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
select 3 as trp_no, 30 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
select 4 as trp_no, 10 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
select 5 as trp_no, 20 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
select 6 as trp_no, 30 as transport_volume, 'SE' as country, 'B' as container_id, 100 as container_max from dual union all
select 7 as trp_no, 10 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
select 8 as trp_no, 15 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual union all
select 9 as trp_no, 20 as transport_volume, 'AU' as country, 'C' as container_id, 50 as container_max from dual
)
, sub as(
select x.*, sum(x.cont_mx_n) over (partition by country order by country, container_id, trp_no) as cont_mx
from(
select country
, container_id
, trp_no
, sum(transport_volume) over (partition by country order by country, container_id, trp_no) as transp_vol
, case when lead(container_id,1) over (partition by country order by country, container_id, trp_no) = container_id
then null
else container_max end as cont_mx_n
, row_number() over (partition by country order by country, container_id, trp_no) as maxchk
from sample_data
order by country, container_id, trp_no) x)
select country, transp_vol / cont_mx as rate
from sub y
where y.maxchk = (select max(x.maxchk) from sub x where x.country = y.country);
Result of the above is:
AU 0.9
SE 0.666666666666667

I tried this:
with sample_data as(
select 1 as trp_no, 10 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
select 2 as trp_no, 20 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual union all
select 3 as trp_no, 30 as transport_volume, 'SE' as country, 'A' as container_id, 80 as container_max from dual
)
select country
,sum(transport_volume) / container_max
from sample_data
group
by country, container_max;
The result was the expected.
ps: some nice guy remembered us about also grouping container_id, which won't affect the result in this case, but might be needed in other cases :-)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Multiple array_agg in bigquery - google-bigquery

Related

Calculating if an higher ranked class was offered and booked

Retrieve last balance from update date using rank() and subquery in Oracle SQL

How to transform ticks into minute bars in SQL

Oracle SQL Row Number selection

Conditional aggregation - once for each key

Categories

Resources