SQL query to retrieve only one occurrence for each id - sql

This is my (simplified) table:
id eventName seriesKey eventStart
1 Event1 5000 2012-01-01 14:00:00
2 Event2 5001 2012-01-01 14:30:00
3 Event3 5000 2012-01-01 14:50:00
4 Event4 5002 2012-01-01 14:55:00
5 Event5 5001 2012-01-01 15:00:00
6 Event6 5002 2012-01-01 15:30:00
7 Event7 5002 2012-01-01 16:00:00
I have to build a query that orders the table by eventStart (ASC) but for each seriesKey, I need only one occurrence.
Thank you very much

Try aggregating with GROUP BY and using aggregate functions like MIN().
SELECT seriesKey,
MIN(eventStart) eventStart
FROM events
GROUP BY seriesKey;
This results in:
5000 2012-01-01 14:00:00.000
5001 2012-01-01 14:30:00.000
5002 2012-01-01 14:30:00.000
If your're interested in all columns from events table, not just the above two columns I choose, there's a freaky implementation in some databases (e.g. SQL Server) which may help you:
SELECT *
FROM events e1
WHERE e1.ID IN
(
SELECT TOP 1 e2.ID
FROM events e2
WHERE e2.seriesKey = e1.seriesKey
ORDER BY e2.eventStart
);
Resulting in:
1 Event1 5000 2012-01-01 14:00:00.000
2 Event2 5001 2012-01-01 14:30:00.000
6 Event2 5002 2012-01-01 14:30:00.000

If you also need the other columns associated with the key, you have two options:
select *
from (
select id,
eventName,
seriesKey,
eventStart,
row_number() over (partition by seriesKey order by eventStart) as rn
from the_event_table
) t
where rn = 1
order by eventStart
or for older DBMS that do not support windowing functions:
select t1.id,
t1.eventName,
t1.seriesKey,
t1.eventStart
from the_event_table t1
where t1.eventStart = (select min(t2.eventStart)
from the_event_table t2
where t2.seriesKey = t1.seriesKey)
order by eventStart

you can get earlier date for each seriesKey:
select * from
(
select seriesKey, min(eventStart) as mindate
group by seriesKey
)
order by mindate

Related

How to order rows by the greatest date of each row, for a table with 8 date columns?

This is very different from doing an SQL order by 2 date columns (or for proper way to sort sql columns, which is only for 1 column). There, we would do something like:
ORDER BY CASE WHEN date_1 > date_2
THEN date_2 ELSE date_1 END
FYI, I'm using YYY-MM-DD in this example for brevity, but I also need it to work for
TIMESTAMP (YYYY-MM-DD HH:MI:SS)
I have this table:
id
name
date_1
date_2
date_3
date_4
date_5
date_6
date_7
date_8
1
John
2008-08-11
2008-08-12
2009-08-11
2009-08-21
2009-09-11
2017-08-11
2017-09-12
2017-09-30
2
Bill
2008-09-12
2008-09-12
2008-10-12
2011-09-12
2008-09-13
2022-05-20
2022-05-21
2022-05-22
3
Andy
2008-10-13
2008-10-13
2008-10-14
2008-10-15
2008-11-01
2008-11-02
2008-11-03
2008-11-04
4
Hank
2008-11-14
2008-11-15
2008-11-16
2008-11-17
2008-12-31
2009-01-01
2009-01-02
2009-01-02
5
Alex
2008-12-15
2018-12-15
2018-12-15
2018-12-16
2018-12-17
2018-12-18
2018-12-25
2008-12-31
... But, the permutations of that give me a headache, just to think about them.
This Answer had more of a "general solution", but that was to SELECT, not to ORDER BY...
SELECT MAX(date_col)
FROM(
SELECT MAX(date_col1) AS date_col FROM some_table
UNION
SELECT MAX(date_col2) AS date_col FROM some_table
UNION
SELECT MAX(date_col3) AS date_col FROM some_table
...
)
Is there something more like that, such as could be created by iterating a loop in, say PHP or Node.js? I need something a scalable solution.
I only need to list each row once.
I want to order them each by whichever col has the most recent date of those I list on that row.
Something like:
SELECT * FROM some_table WHERE
(
GREATEST OF date_1
OR date_2
OR date_3
OR date_4
OR date_5
OR date_6
OR date_7
OR date_8
)
You can use the GREATEST function to achieve it.
SELECT GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8) max_date,t.*
FROM Tab t
ORDER BY GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8) Desc;
DB Fiddle: Try it here
max_date
id
name
date_1
date_2
date_3
date_4
date_5
date_6
date_7
date_8
2022-05-22
2
Bill
2008-09-12
2008-09-12
2008-10-12
2011-09-12
2008-09-13
2022-05-20
2022-05-21
2022-05-22
2018-12-25
5
Alex
2008-12-15
2018-12-15
2018-12-15
2018-12-16
2018-12-17
2018-12-18
2018-12-25
2008-12-31
2017-09-30
1
John
2008-08-11
2008-08-12
2009-08-11
2009-08-21
2009-09-11
2017-08-11
2017-09-12
2017-09-30
2009-01-02
4
Hank
2008-11-14
2008-11-15
2008-11-16
2008-11-17
2008-12-31
2009-01-01
2009-01-02
2009-01-02
2008-11-04
3
Andy
2008-10-13
2008-10-13
2008-10-14
2008-10-15
2008-11-01
2008-11-02
2008-11-03
2008-11-04
In the event of a NULL value, GREATEST could throw-off the ORDER.
Based on this Answer from a Question about GREATEST handling NULL, this would apply these tables, based on the approved Answer:
SELECT COALESCE (
GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8),
date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8
) max_date,t.*
FROM TAB t
ORDER BY COALESCE (
GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8),
date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8
) DESC;

SQL Select with grouping and replacing a column

I have a requirement in which I need to retrieve rows in a select query in which I have to get value of END_DATE as EFFECTIVE_DATE -1 DAY for the records with same key (CARD_NBR in this case)
I have tried using it by GROUP by but I am not able to get the desired output. Could someone please help in guiding me ? The record with most recent effective date should have END_DATE as 9999-12-31 only.
Table:
CARD_NBR
SERIEL_NO
EFFECTIVE_DATE
END_DATE
12345
1
2021-01-01
9999-12-31
12345
2
2021-01-25
9999-12-31
12345
3
2021-02-15
9999-12-31
67899
1
2021-03-01
9999-12-31
67899
2
2021-04-02
9999-12-31
67899
3
2021-05-24
9999-12-31
Output:
CARD_NBR
SERIEL_NO
EFFECTIVE_DATE
END_DATE
12345
1
2021-01-01
2021-01-24
12345
2
2021-01-25
2021-02-14
12345
3
2021-02-15
9999-12-31
67899
1
2021-03-01
2021-04-01
67899
2
2021-04-02
2021-05-24
67899
3
2021-05-24
9999-12-31
You can use lead():
select t.*,
lead(effective_date - interval '1 day', 1, effective_date) over (partition by card_nbr order by effective_date) as imputed_end_date
from t;
Date manipulations are highly database-dependent so this uses Standard SQL syntax. You can incorporate this into an update, but the best approach also depends on the database.
SQLite v.3.25 now supports windows function and you can use below code to get your result.
SELECT A.CARD_NBR,
A.SRL_NO,
A.START_DT,
COALESCE(B.START_DT,A.END_DT) AS END_DT
FROM
(
SELECT A.CARD_NBR,
A.SRL_NO,
A.START_DT,
A.END_DT,
ROW_NUMBER() OVER(PARTITION BY A.CARD_NBR ORDER BY A.SRL_NO ASC) RNUM1
FROM T1 A
)A
LEFT JOIN
(
SELECT B.CARD_NBR,
B.SRL_NO,
B.START_DT,
B.END_DT,
ROW_NUMBER() OVER(PARTITION BY B.CARD_NBR ORDER BY B.SRL_NO ASC) RNUM1
FROM T1 B
)B
ON A.CARD_NBR=B.CARD_NBR
AND A.RNUM1+1=B.RNUM1

Select to search column on group by query

I have one table called prices that have a reference from table products through product_id column. I want a query that selects prices grouped by product_id with the max final date and get the value of start_date through one select with id of price grouped.
I try with the following query but I am getting a wrong value of start date. Is weird because of the result subquery return more than one row even though I use the price id on where clause. Because that I put the limit on the query but it is wrong.
select prices.produto_id, prices.id,
MAX(CASE WHEN prices.finish_date IS NULL THEN COALESCE(prices.finish_date,'9999-12-31') ELSE prices.finish_date END) as finish_date,
(select start_date from prices where prices.id = prices.id limit 1)
as start_date from prices group by prices.product_id, prices.id
How I can get the relative start date of the price id in my grouped row? I am using postgresql.
A example to view what I want with my query:
DataSet:
ID | PRODUCT_ID | START_DATE | FINISH_DATE
1 1689 2018-01-19 02:00:00 2019-11-19 23:59:59
2 1689 2019-10-11 03:00:00 2019-10-15 23:59:59
3 1689 2019-01-11 03:00:00 2019-05-15 23:59:59
4 1690 2019-11-11 03:00:00 2019-12-15 23:59:59
5 1690 2019-05-11 03:00:00 2025-12-15 23:59:59
6 1691 2019-05-11 03:00:00 null
I want this result:
ID | PRODUCT_ID | START_DATE | FINISH_DATE
1 1689 2018-01-19 02:00:00 2019-11-19 23:59:59
5 1690 2019-05-11 03:00:00 2025-12-15 23:59:59
6 1691 2019-05-11 03:00:00 9999-12-31 23:59:59
The start date should be the same value of the row before the group by.
I would recommend DISTINCT ON in Postgres:
select distinct on (p.product_id) p.*
from prices p
order by p.product_id,
p.finish_date desc nulls first;
NULL values are treated as larger than any other value, so a descending sort puts them first. However, I've included nulls first just to be explicit.
DISTINCT ON is a very handy Postgres extension, which you can learn more about in the documentation.
Try this
with data as (
SELECT id, product_id,
max(COALESCE(finish_date,'9999-12-31')) as finish_date from prices group by 1,2)
select d.*, p.start_date from data d join prices p on p.id = d.id;
It surely isnt' the most elegant solution, but it should work.

Can Anyone an Hint with the following SQL

imagine i have a table like this
Bikename Username NumOfUsages LastTimeOfUsage
Bike1 Haldi 5 2018-12-13 12:00:00
Bike1 Torte 1 2018-08-15 12:00:00
Bike2 Haldi 3 2018-12-15 12:00:00
Bike3 Manne 2 2018-09-16 12:00:00
Bike3 Torte 5 2018-09-16 12:00:00
now i wants a Result like this
Bikename Username NumOfUsages LastTimeOfUsage
Bike1 Haldi 5 2018-12-13 12:00:00
Bike2 Haldi 3 2018-12-15 12:00:00
Bike3 Torte 5 2018-09-16 12:00:00
as you can see i wants to have the Entry with the MAX NumofUsages only
Thanks for your Help so much...
You want correlated subquery :
select b.*
from Bikes b
where b.NumOfUsages = (select max(b1.NumOfUsages) from Bikes b1 where b1.bikename = b.bikename);
By this way, it will return entire row for maximum NumOfUsages for each bikename since it has referenced from outer query with subquery (i.e. b1.bikename = b.bikename).
you can use row_number() maximum dbms support
select * from
(
select *, row_number() over(partition by bikename order by NumOfUsages desc) rn
from table_name
)t where t.rn=1
You didn't specify the SQL flavor in your question, but in most cases simply group by the bikename and take max(numOfUsages) (and order by that as well)

Select min/max from group defined by one column as subgroup of another - SQL, HPVertica

I'm trying to find the min and max date within a subgroup of another group. Here's example 'data'
ID Type Date
1 A 7/1/2015
1 B 1/1/2015
1 A 8/5/2014
22 B 3/1/2015
22 B 9/1/2014
333 A 8/1/2015
333 B 4/1/2015
333 B 3/29/2014
333 B 2/28/2013
333 C 1/1/2013
What I'd like to identify is - within an ID, what is the min/max Date for each block of similar Type? So for ID # 333 I want the below info:
A: min & max = 8/1/2015
B: min = 2/28/2013
max = 4/1/2015
C: min & max = 1/1/2013
I'm having trouble figuring out how to identify only uninterrupted groupings of Type within a grouping of ID. For ID #1, I need to keep the two 'A' Types with separate min/max dates because they were split by a Type 'B', so I can't just pull the min date of all Type A's for ID #1, it has to be two separate instances.
What I've tried is something like the below two lines, but neither of these accurately captures the case mentioned above for ID #1 where Type B interrupts Type A.
Max(Date) OVER (Partition By ID, Type)
or this:
Row_Number() OVER (Partition By ID, Type ORDER BY Date DESC)
,then selecting Row #1 for max date, and date ASC w/ row #1 for min date
Thank you for any insight you can provide!
If I understand right, you want the min/max values for an id/type grouped using a descending date sort, but the catch is that you want them based on clusters within the id by time.
What you can do is use CONDITIONAL_CHANGE_EVENT to tag the rows on change of type, then use that in your GROUP BY on a standard min/max aggregation.
This would be the intermediate step towards getting to what you want:
select ID, Type, Date,
CONDITIONAL_CHANGE_EVENT(Type) OVER( PARTITION BY ID ORDER BY Date desc) cce
from mytable
group by ID, Type, Date
order by ID, Date desc, Type
ID Type Date cce
1 A 2015-07-01 00:00:00 0
1 B 2015-01-01 00:00:00 1
1 A 2014-08-05 00:00:00 2
22 B 2015-03-01 00:00:00 0
22 B 2014-09-01 00:00:00 0
333 A 2015-08-01 00:00:00 0
333 B 2015-04-01 00:00:00 1
333 B 2014-03-29 00:00:00 1
333 B 2013-02-28 00:00:00 1
333 C 2013-01-01 00:00:00 2
Once you have them grouped using CCE, you can do an aggregate on this to get the min/max you are looking for grouping on cce. You can play with the order by at the bottom, this ordering seem to make the most sense to me.
select id, type, min(date), max(date)
from (
select ID, Type, Date,
CONDITIONAL_CHANGE_EVENT(Type) OVER( PARTITION BY ID ORDER BY Date desc) cce
from mytable
group by ID, Type, Date
) x
group by id, type, cce
order by id, 3 desc, 4 desc;
id type min max
1 A 2015-07-01 00:00:00 2015-07-01 00:00:00
1 B 2015-01-01 00:00:00 2015-01-01 00:00:00
1 A 2014-08-05 00:00:00 2014-08-05 00:00:00
22 B 2014-09-01 00:00:00 2015-03-01 00:00:00
333 A 2015-08-01 00:00:00 2015-08-01 00:00:00
333 B 2013-02-28 00:00:00 2015-04-01 00:00:00
333 C 2013-01-01 00:00:00 2013-01-01 00:00:00