My table:
booking_id
arrivel_time
departure_time
date
I have cases that for the same key (booking_id) I have 2 records - the first one is with null on arrivel_time and departure_time and the second is with values (date and time) on the arrivel_time and departure_time or only in arrival time.
I would like to select only the record that the booking_id is with the values if it happens.
I am struggling with how to select that, would you be able to explain how to achieve this?
You can see an example of my desired results here
One method uses row_number():
select t.*
from (select t.*,
row_number() over (partition by booking_id order by arrival_time nulls last) as seqnum
from t
) t
where seqnum = 1;
In older versions of Hive, you can use:
select t.*
from (select t.*,
row_number() over (partition by booking_id
order by (case when arrival_time is not null then 1 else 2 end)
) as seqnum
from t
) t
where seqnum = 1;
You can use MAX() and DISTINCT functions to get the desired result:
SELECT DISTINCT
booking_id
MAX(arrivel_time),
MAX(departure_time),
date
FROM MyTable
GROUP BY booking_id, date
However in this case, what the MAX() function does is that it gets the latest date when the booking was updated, so in the event that you have for example one record having 04/05/2021 08:00 as your arrival time and another record with the same booking id having 04/05/2021 09:00 as your arrival time; then it will ignore the first and take the second.
The query I gave you above only works if one of the booking ids has a null value or if both are null.
The DISTINCT function is then used to consolidate 2 rows having the EXACT SAME VALUES into 1 row.
Related
I have a table that contains three columns: ACCOUNT_ID, STATUS, CREATE_DATE.
I want to grab only the LAST status for each account_id based on the latest create_date.
In the example above, I should only see three records and the last STATUS per that account_2.
Do you know a way to do this?
create table TBL 1 (
account_id int,
status string,
create_date date)
select account_id, max(create_date) from table group by account_id;
will give you the account_id and create_date at the closest past date to today (assuming create_date can never be in the future, which makes sense).
Now you can join with that data to get what you want, something along the lines for example:
select account_id, status, create_date from table where (account_id, create_date) in (<the select expression from above>);
If you use that frequently (account with the latest create date), then consider defining a view for that.
If you have many columns and want keep the row that is the last line, you can use QUALIFY to run the ranking logic, and keep the best, like so:
SELECT *
FROM tbl
QUALIFY row_number() over (partition by account_id order by create_date desc) = 1;
The long form is the same pattern the Ely shows in the second answer. But with the MAX(CREATE_DATE) solution, if you have two rows on the same last day, the IN solution with give you both. you can also get via QUALIFY if you use RANK
So the SQL is the same as:
SELECT account_id, status, create_date
FROM (
SELECT *,
row_number() over (partition by account_id order by create_date desc) as rn
FROM tbl
)
WHERE rn = 1;
So the RANK for, which will show all equal rows is:
SELECT *
FROM tbl
QUALIFY rank() over (partition by account_id order by create_date desc) = 1;
I have a table A which contains id and report_day and other columns. Also I've a table B which contains also id, report_day and also subscribers. I want to create a VIEW with id, report_day, subscribers columns. So it's a simple join:
select a.id, a.report_day, b.subscribers from schema.a
left join schema.b on a.id = b.id
and a.report_day = b.report_day
Now i want to add column subscribers_increment based on subscribers. But for some days I don't have stats for subscribers column and it's set to NULL. subcribers_increment it's just a (subcribers(current_day) - subscribers (prev_day).
I read some articles and add next statement:
case WHEN row_number() OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) = 1 THEN b.subscribers
else ab.subscribers - COALESCE(last_value(b.subscribers) OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0::bigint::numeric)
END::integer AS subscribers_increment
And now I've next result:
NULL is still NULL.
For example it has incorrect increment for 2021-04-07. It's increment for 2 days. Can i divide this value from 2021-04-08 by numbers of days (here it's 2) and write same value for 2021-04-07 and 2021-04-08 (or at least for 2021-04-07 where it was null)? And same logic for all days where subscribers is null?
So i need to follow next rules:
If I see NULL value in subcribers column I should go for the next (future) NOT NULL day and grab value for this next day. Substract from this (feature) value last not null value (past - order by date, so we looping back). Divide result of substraction by number of days and fill these rows for column subcribers_increment.
Is it possible?
UPDATE:
For my data it shoud look like this:
UPDATE v2
After applying script:
UPDATE v3
case (our increment) 25.03-27.03 still is NULL
The basic idea is:
Use lag() to get the previous subscribers and dates before joining. This assumes that the left join is the cause of all the NULL values.
Use a cumulative count in reverse to assign a grouping so NULL is combined with the next value in one grouping.
As a result of (2), the count of NULLs in a group is the denominator
As a result of (1) the difference between subscribers and prev_subscribers is the numerator.
The actual calculation requires more window functions and case logic.
So the idea is:
with t as (
select a.id, a.report_day, b.subscribers, b.prev_report_day, b.prev_subscribers,
count(b.subscribers) over (partition by a.id order by a.report_day desc) as grp
from first_table a left join
(select b.*,
lag(b.report_day) over (partition by id order by report_day) as prev_report_day,
lag(b.subscribers) over (partition by id order by report_day) as prev_subscribers
from second_table b
) b
on a.id = b.id and a.report_day = b.report_day
)
select t.*,
(case when t.subscribers is not null and t.prev_report_day = t.report_day - interval '1 day'
then t.subscribers - t.prev_subscribers
when t.subscribers is not null
then (t.subscribers - t.prev_subscribers) / count(*) over (partition by id, grp)
when t.subscribers is null
then (max(t.subscribers) over (partition by id, grp) - max(t.prev_subscribers) over (partition by id, grp)
) / count(*) over (partition by id, grp)
end)
from t;
Here is a db<>fiddle.
Select only latest amount, if null then before that.
table a
customer|amount|date
001|2 |20201101
001|null|20201102
001|3 |20201103
002|8.9 |20201101
002|7 |20201008
002|null|20201106
Result
001|null|20201101
001|null|20201102
001|3 |20201103
002|null|20201101
002|null|20201008
002|7 |20201106
amount data should be taken latest as per date , other record will be null, if amount is null for the latest date it should take the previous not null value.
My current attempt:
select top 1 [amount]
from table
where [amount] is not null
order by date desc
If you want to set all but the most recent value to NULL:
select customer_code, date,
(case when seqnum = 1 then amount end) as amount
from (select t.*,
row_number() over (partition by customer_code order by (amount is not null) desc, date desc) as seqnum
from table t
) t
where customer_code = '001'
order by date desc
Probably what you are looking for is a window function:
SELECT *
FROM (SELECT *,
row_number() over
(partition by customer
order by amount desc, date desc) as rn
FROM your_table
WHERE amount is not null)
WHERE rn = 1
You can use row_number or dense_rank depending on your needs
Create a view that returns all inserted values in descending order. Then select the first or second row according to the condition.
I've a situation where there is one ticket history table. it saves all the actions done against a ticket. how to write a query which will return the first record and the last record against specific ticket.
for example in the above table I've one ticket with id 78580. I want to get the first row and last row based on date column.
Just use row_number():
select t.*
from (select t.*,
row_number() over (partition by ticket_id order by action_when asc) as seqnum_a,
row_number() over (partition by ticket_id order by action_when desc) as seqnum_d
from tickets t
) t
where seqnum_a = 1 or seqnum_d = 1;
Use min and max to get first and last date, grouped by ticket id.
SELECT ticket_id, min(action_when), max(action_when)
FROM table_name
GROUP BY ticket_id;
I need to select the last row in mytable for a given pair of columns in Oracle v11.2:
id type timestamp raw_value normal_value
-- ---- --------- --------- ------------
1 3 3pm 3-Jun "Jon" "Jonathan"
1 3 5pm 3-Jun "Jonathan" "Jonathan"
1 3 2pm 4-Jun "John" "Jonathan"
1 3 8pm 6-Jun "Bob" "Robert"
1 5 6pm 3-Jun "NYC" "New York City"
1 5 7pm 5-Jun "N.Y.C." "New York City"
4 8 1pm 1-Jun "IBM" "International Business Machines"
4 8 5pm 8-Jun "I.B.M." "International Business Machines"
I'm thinking the query would be something like this:
SELECT raw_value, normal_value, MAX(timestamp)
FROM mytable
WHERE id = 1 and type = 3
GROUP BY id, type
For the above, this should give me:
"Bob", "Robert", 8pm 6-Jun
I do not actually need the timestamp in my answer, but only need it to select the matching row for the given id and type whose timestamp is greatest.
Will my approach work in Oracle v11.2, and if so, is there a way to omit timestamp from the selected columns since I don't actually need its value?
You can do this with the row_number() function:
select raw_value, normal_value, timestamp
from (select myt.*, ROW_NUMBER() over
(partition by id, type order by timestamp desc)
as seqnum
from mytable myt
) tmp
where seqnum = 1
and id = 1 and type = 3;
row_number() is an analytic function (aka window function) that assigns sequential numbers to rows. Every group defined by id, type gets its own numbers. The first row is the one with the most recent timestamp (order by timestamp desc). The outer select chooses this row in the where clause.
In the case of ties, this version returns only one row. To get all the rows, use rank() instead of row_number().
Try this:
SELECT m1.raw_value, m1.normal_value
FROM mytable m1
WHERE id = 1 and type = 3 and timestamp = (
SELECT MAX(timestamp)
FROM mytable m2
WHERE m1.id = m2.id and m1.type = m2.type
GROUP BY m2.id, m2.type
)
You can determine the most recent timestamp using the Oracle analytic RANK function like this:
SELECT
raw_value,
normal_value,
RANK() OVER (ORDER BY timestamp DESC) as TimestampRank
FROM myTable
This will set the TimestampRank column with value 1 for the row with the highest timestamp. If there's a tie for the highest timestamp, all rows with the highest timestamp with have TimestampRank set to 1.
To get just the "Bob", "Robert", surround the query above with an outer query that selects just those columns and filters for TimestampRank = 1:
SELECT raw_value, normal_value
FROM (
SELECT
raw_value,
normal_value,
RANK() OVER (ORDER BY timestamp DESC) as TimestampRank
FROM myTable
)
WHERE TimestampRank = 1
Note again that if there's a tie for the highest timestamp, all rows with that value will be returned. If you always want one row regardless of ties, use ROW_NUMBER() instead of RANK() in the query above.
Try
select max(raw_value ) keep (dense_rank last order by timestamp),
max(normal_value ) keep (dense_rank last order by timestamp)
from mytable
WHERE id = 1 and type = 3