"Group" some rows together before sorting (Oracle) - sql

I'm using Oracle Database 11g.
I have a query that selects, among other things, an ID and a date from a table. Basically, what I want to do is keep the rows that have the same ID together, and then sort those "groups" of rows by the most recent date in the "group".
So if my original result was this:
ID Date
3 11/26/11
1 1/5/12
2 6/3/13
2 10/15/13
1 7/5/13
The output I'm hoping for is:
ID Date
3 11/26/11 <-- (Using this date for "group" ID = 3)
1 1/5/12
1 7/5/13 <-- (Using this date for "group" ID = 1)
2 6/3/13
2 10/15/13 <-- (Using this date for "group" ID = 2)
Is there any way to do this?

One way to get this is by using analytic functions; I don't have an example of that handy.
This is another way to get the specified result, without using an analytic function (this is ordering first by the most_recent_date for each ID, then by ID, then by Date):
SELECT t.ID
, t.Date
FROM mytable t
JOIN ( SELECT s.ID
, MAX(s.Date) AS most_recent_date
FROM mytable s
WHERE s.Date IS NOT NULL
GROUP BY s.ID
) r
ON r.ID = t.ID
ORDER
BY r.most_recent_date
, t.ID
, t.Date
The "trick" here is to return "most_recent_date" for each ID, and then join that to each row. The result can be ordered by that first, then by whatever else.
(I also think there's a way to get this same ordering using Analytic functions, but I don't have an example of that handy.)

You can use the MAX ... KEEP function with your aggregate to create your sort key:
with
sample_data as
(select 3 id, to_date('11/26/11','MM/DD/RR') date_col from dual union all
select 1, to_date('1/5/12','MM/DD/RR') date_col from dual union all
select 2, to_date('6/3/13','MM/DD/RR') date_col from dual union all
select 2, to_date('10/15/13','MM/DD/RR') date_col from dual union all
select 1, to_date('7/5/13','MM/DD/RR') date_col from dual)
select
id,
date_col,
-- For illustration purposes, does not need to be selected:
max(date_col) keep (dense_rank last order by date_col) over (partition by id) sort_key
from sample_data
order by max(date_col) keep (dense_rank last order by date_col) over (partition by id);

Here is the query using analytic functions:
select
id
, date_
, max(date_) over (partition by id) as max_date
from table_name
order by max_date, id
;

Related

make in one query a count and max over partion by in SQL Oracle

In Oracle SQL I have a table with
userid
qualification
date
One
Qual1
01/01/2020
One
Qual2
01/01/2022
Two
Qual1
01/01/2021
Three
Qual2
01/01/2022
I want to have per user id:
the count of qualifications
the most recent qualification
So for this example I want:
userid
qualification
count
One
Qual2
2
Two
Qual1
1
Three
Qual2
1
I thought to use something like this:
select userid,
count(qualification)OVER (PARTITION BY userid) as count_qual,
MAX(qualification) OVER (PARTITION BY userid ORDER BY date desc) as qual_id
from Qualificaitons
but it returns me two lines for userid One
You can use MAX(..) KEEP (DENSE_RANK ..) aggregation function:
SELECT userid,
MAX(qualification) KEEP (DENSE_RANK LAST ORDER BY "DATE") AS qualification,
COUNT(qualification) AS count
FROM qualifications
GROUP BY userid;
Which, for the sample data:
CREATE TABLE qualifications (userid, qualification, "DATE") AS
SELECT 'One', 'Qual1', DATE '2020-01-01' FROM DUAL UNION ALL
SELECT 'One', 'Qual2', DATE '2022-01-01' FROM DUAL UNION ALL
SELECT 'Two', 'Qual1', DATE '2021-01-01' FROM DUAL UNION ALL
SELECT 'Three', 'Qual2', DATE '2022-01-01' FROM DUAL;
Outputs:
USERID
QUALIFICATION
COUNT
One
Qual2
2
Three
Qual2
1
Two
Qual1
1
db<>fiddle here
You can use two functions to compute the result you want. For example:
select userid, qualification, cnt
from (
select t.*,
count(*) over(partition by userid) as cnt,
row_number() over(partition by userid order by date desc) as rn
from Qualificaitons t
) x
where rn = 1
use first_value instead of max as agg function. read documentation here.
select userid, count(qualification)OVER (PARTITION BY userid) as count_qual,
first_value(qualification) OVER (PARTITION BY userid ORDER BY date desc) as qual_id
from Qualificaitons

BigQuery SQL: Sum of first N related items

I would like to know the sum of a value in the first n items in a related table. For example, I want to get the sum of a companies first 6 invoices (the invoices can be sorted by ID asc)
Current SQL:
SELECT invoices.company_id, SUM(invoices.amount)
FROM invoices
JOIN companies on invoices.company_id = companies.id
GROUP BY invoices.company_id
This seems simple but I can't wrap my head around it.
Consider also below approach
select company_id, (
select sum(amount)
from t.amounts amount
) as top_six_invoices_amount
from (
select invoices.company_id,
array_agg(invoices.amount order by invoices.invoice_id limit 6) amounts
from your_table invoices
group by invoices.company_id
) t
You can create order row numbers to the lines in a partition based on invoice id and filter to it, something like this:
with array_table as (
select 'a' field, * from unnest([3, 2, 1 ,4, 6, 3]) id
union all
select 'b' field, * from unnest([1, 2, 1, 7]) id
)
select field, sum(id) from (
select field, id, row_number() over (partition by a.field order by id desc) rownum
from array_table a
)
where rownum < 3
group by field
More examples for analytical examples here:
https://medium.com/#aliz_ai/analytic-functions-in-google-bigquery-part-1-basics-745d97958fe2
https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts

SQL group rows into pairs

I'm trying to add some sort of unique identifier (uid) to partitions made of pairs of rows, i.e. generate some uid/tag for each two rows of (identifier1,identifier2) in a window partition with size = 2 rows.
So, for example, the first 2 rows for ID X would get uid A, the next two rows for the same ID would get uid B and, if there is only one single row left in the partition for ID X, it would get id C.
Here's what I'm trying to accomplish, the picture illustrates the table's structure, I manually added the expectedIdentifier to illustrate the goal:
This is my current SQL, ntile doesn't solve it because the partition size varies:
select
rowId
, ntile(2) over (partition by firstIdentifier, secondIdentifier order by timestamp asc) as ntile
, *
from log;
Already tried ntile( (count(*) over partition...) / 2), but that doesn't work.
Generating the UID can be done with md5() or similar, but I'm having trouble tagging the rows as illustrated above (so I can md5 the generated tag/uid)
While count(*) is not supported within a Snowflake window function, count(1) is supported and can be used to create the unique identifier. Below is an example of an integer unique ID matching pairs of rows and handling "odd" row groups:
select
ntile(2) over (partition by firstIdentifier, secondIdentifier order by timestamp asc) as ntile
,ceil(count(1) over( partition by firstIdentifier, secondIdentifier order by timestamp asc) / 2) as id
, *
from log;
select *, char(65 + (row_number() over(partition by
firstidentifier,secondidentifier order by timestamp)-1)/2)
expectedidentifier from log
order by firstidentifier, timestamp
Here is the Sql Server Version
with log (firstidentifier,secondidentifier, timestamp)
as (
select 15396, 14460, 1 union all
select 15396, 14460, 1 union all
select 19744, 14451, 1 union all
select 19744, 14451, 1 union all
select 19744, 14451, 1 union all
select 15590, 12404, 1 union all
select 15590, 12404, 1 union all
select 15590, 12404, 1 union all
select 15590, 12404, 1 union all
select 15590, 12404, 1
)
select *, char(65 + (row_number() over(partition by
firstidentifier,secondidentifier order by timestamp)-1)/2)
expectedidentifier from log
order by firstidentifier,secondidentifier,timestamp

Use MIN() where you cannot GROUP?

I feel pretty dumb, but I get stuck with an apparently very easy query. I have something like this, where every row is a user that watched a movie:
user_id date duration
1 01-01-01 62m
1 03-01-01 95m
2 02-01-01 58m
2 06-01-01 25m
2 08-01-01 95m
3 03-01-01 96m
Now, what I would like to have is a table where I have the first movie watched by each user and its duration. The problem is if I use MIN() then I have to GROUP both user_id and duration. But if I GROUP for duration as well, then I am basically going to have the same table back. How can I solve the problem?
You can use a ranking function like ROW_NUMBER:
WITH CTE AS
(
SELECT rn = ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date ASC),
user_id, date, duration
FROM dbo.TableName
)
SELECT user_id, date, duration FROM CTE WHERE rn = 1
The advantage of ROW_NUMBER is that you can change the logic easily. For example, if you want to reverse the logic and get the row of the last watched film per user, you just have to change ORDER BY date ASC to ORDER BY date DESC.
The advantage of theCTE (common-table-expression) is that you can also use it to delete or update these records. Often used to delete or identify duplicates. So you can first select to see what will be deleted/updated before you execute it.
Try this query. I haven't tested it.
SELECT date, duration FROM tablename n
WHERE NOT EXISTS(
SELECT date, user_id FROM tablename g
WHERE n.user_id = g.user_id AND g.date < n.date
);
Assuming there can only be a single record per user per date, it'd be something like this:
select y.*
from table t
inner join (
select user_id, min(date) mindate
from table
group by user_id
) t1
on t.user_id = t1.user_id
and t.date = t1.mindate
You can use ROW_NUMBER() which is a ranking function that generates sequential number for every group based on the column that you want to sort. In this case, if there is a tie, only one record for every user is selected but if you want to select all of them, you need to use DENSE_RANK() rather than ROW_NUMBER()
SELECT user_id, date, duration
FROM
(
SELECT user_id, date, duration,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY date) rn
FROM tableName
) a
WHERE rn = 1
this also assumes that the data type of column date is DATE
If you are using SQL Server 2005 or later, you can use windowing functions.
SELECT *
FROM
(
SELECT user_id, date, duration, MIN(date) OVER(PARTITION BY user_id) AS MIN_DATE
FROM MY_TABLE
) AS RESULTS
WHERE date = MIN_DATE
The over clause and partion by will "group by" the user_id and select the min date per user_id without eliminating any rows. Then you select from the table where the date is equal to the min date and you are left with the first date per user_id. This is a common trick once you know about windowing functions.
If you want the first watch_date per user, there should be no date before this date for this user:
SELECT *
FROM watched_movies wm
WHERE NOT EXISTS (
SELECT *
FROM watched_movies nx
WHERE nx.user_id = wm.user_id
AND nx.watch_date < wm.watch_date
);
Note: I replaced the date column by watch_date, since date is a reserved word (type name).
This should give you the duration of the first movie watched on the earliest date:
SELECT a.user_id, b.date, a.duration
FROM table a
INNER JOIN (SELECT user_id,min(date) date FROM table GROUP BY user_id) b ON a.user_id = b.user_id AND a.date = b.date
INNER JOIN (SELECT user_id,date,min(session_id) FROM table GROUP BY user_id, date) c ON b.user_id = c.user_id AND b.date = c.date AND a.session_id = c.session_id
Try this:
WITH TABLE1
AS (SELECT
'1' AS USER_ID,
'01-01-01' AS DT,
62 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'1' AS USER_ID,
'03-01-01' AS DT,
95 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'2' AS USER_ID,
'02-01-01' AS DT,
58 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'2' AS USER_ID,
'06-01-01' AS DT,
25 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'2' AS USER_ID,
'08-01-01' AS DT,
95 AS DURATION
FROM
DUAL
UNION ALL
SELECT
'3' AS USER_ID,
'03-01-01' AS DT,
96 AS DURATION
FROM
DUAL)
SELECT
*
FROM
(SELECT
USER_ID,
DT,
DURATION,
RANK ( ) OVER (PARTITION BY USER_ID ORDER BY DT ASC) AS ROW_RANK
FROM
TABLE1)
WHERE
ROW_RANK = 1
Use a sub-query to get the min date then join that back to the table to get all other relevant columns.
SELECT T2.user_id
,T2.date
,T2.duration
FROM YourTable T2
INNER JOIN
(
SELECT T1.user_id
,MIN(T1.date) as first_date
FROM YourTable T1
) SQ
ON T2.user_id = sq.user_id
AND T2.date = sq.first_date

How to reverse the table that comes from SQL query which already includes ORDER BY

Here is my query:
SELECT TOP 8 id, rssi1, date
FROM history
WHERE (siteName = 'CCL03412')
ORDER BY id DESC
This the result:
How can I reverse this table based on date (Column2) by using SQL?
You can use the first query to get the matching ids, and use them as part of an IN clause:
SELECT id, rssi1, date
FROM history
WHERE id IN
(
SELECT TOP 8 id
FROM history
WHERE (siteName = 'CCL03412')
ORDER BY id DESC
)
ORDER BY date ASC
You could simply use a sub-query. If you apply a TOP clause the nested ORDER BY is allowed:
SELECT X.* FROM(
SELECT TOP 8 id, Column1, Column2
FROM dbo.History
WHERE (siteName = 'CCL03412')
ORDER BY id DESC) X
ORDER BY Column2
Demo
The SELECT query of a subquery is always enclosed in parentheses. It
cannot include a COMPUTE or FOR BROWSE clause, and may only include an
ORDER BY clause when a TOP clause is also specified.
Subquery Fundamentals
try the below :
select * from (SELECT TOP 8 id, rssi1, date
FROM history
WHERE (siteName = 'CCL03412')
ORDER BY id DESC ) aa order by aa.date DESC
didn't run it, but i think it should go well
WITH cte AS
(
SELECT id, rssi1, date, RANK() OVER (ORDER BY ID DESC) AS Rank
FROM history
WHERE (siteName = 'CCL03412')
)
SELECT id, rssi1, date
FROM cte
WHERE Rank <= 8
ORDER BY Date DESC
I have not run this but i think it will work. Execute and let me know if you face error
select id, rssi1, date from (SELECT TOP 8 id, rssi1, date
FROM history
WHERE (siteName = 'CCL03412')
ORDER BY id DESC) order by date ;