Select only one row by more conditions - sql

This query is supported by PostgreSQL but H2 can not run query because of Over(partition by) . Question is how to select only one row with latest created time for different values in 2 columns.
Example:
id name created ecid psid
1 aa 2019-02-07 1 1
2 bb 2019-02-01 1 1
3 cc 2019-02-05 2 2
4 dd 2019-02-06 2 3
5 ee 2019-02-08 2 3
Result:
id name created ecid psid
1 aa 2019-02-07 1 1
3 cc 2019-02-05 2 2
5 ee 2019-02-08 2 3
SELECT s.*, MAX(s.created) OVER (PARTITION BY s.ecid, s.psid) AS latest FROM ...
WHERE latest = created

use correlated subquery
select t1.* from table t1
where t1.created = ( select max(created)
from table t2 where t1.ecid=t2.ecid and t1.psid=t2.psid)

Use NOT EXISTS:
select t.* from tablename t
where not exists (
select 1 from tablename
where ecid = t.ecid and psid = t.psid and created > t.created
)

Related

How to delete records with lower version in big query?

Lets say my table contains the following data
id
name
version
1
Rahul
1
1
Rahul
2
2
John
1
3
Mike
1
2
John
2
4
Rubel
1
5
David
1
1
Rahul
3
I need to filter the duplicate records with lower version. How can this be done?
The output essentially should be
id
name
version
1
Rahul
3
2
John
2
3
Mike
1
4
Rubel
1
5
David
1
For this dataset, aggregation seems sufficient:
select id, name, max(version) as max_version
from mytable
group by id, name
You can use not exists as follows:
select id, name, version
from your_table t
Where not exists
(Select 1 from your_table tt
Where tt.id = t.id and tt.version > t.version)
Or you can use analytical function row_number as follows:
Select id, name, version from
(select t.*,
Row_number() over (partition by id order by version desc) as rn
from your_table t) t
Where rn = 1

SQL group by multiple fields get first occurrence

I have this table (sales_lines):
id sale_id sale_seq_id other_fields
----------------------------------------
1 1 1
2 1 2
3 2 1
4 3 1
5 3 2
But this table can have a duplicated sale_seq_id (yes, it's an error). Like this:
id sale_id sale_seq_id other_fields
----------------------------------------
1 1 1
2 1 2
3 1 2
4 2 1
5 3 1
6 3 1
7 3 2
Lines 3 and 6 are errors, so I should discard them.
How can I do it?
To delete the wrong records do
delete from sales_lines
where id not in
(
select min(id)
from sales_lines
group by sale_id, sale_seq_id
)
To just delete the correct data do
select min(id), sale_id, sale_seq_id
from sales_lines
group by sale_id, sale_seq_id
I would use correlated sub-query :
select sl.*
from sales_line sl
where sl.id = (select min(sl1.id)
from sales_line sl1
where sl1.sale_id = sl.sale_id and
sl1.sale_seq_id = sl.sale_seq_id
);
If your DBMS supports window function then you can do :
select sl.*
from (select sl.*,
row_number() over (partition by sl.sale_id, sl.sale_seq_id order by sl.id) as seq
from sales_line sl
) sl
where seq = 1;
By this way, you will get full row with other fields too.

Get most frequent value from a windowing function

I have a SQL table that looks like:
user_id role date
1 1 2019-11-26 21:20:54.397+00
1 2 2019-11-27 22:46:28.923+00
2 1 2019-12-06 22:17:53.925+00
2 3 2019-12-13 00:12:28.006+00
3 1 2019-11-25 21:57:17.701+00
3 1 2019-12-06 20:48:28.314+00
3 1 2019-12-15 23:59:06.81+00
4 3 2019-12-04 15:26:10.639+00
4 3 2019-11-22 19:20:01.025+00
4 3 2019-11-25 12:38:53.169+00
I would like to get the most frequent role according to past dates and use. The result should looks like:
user_id role date most_frequent_role
1 1 2019-11-26 21:20:54.397+00 NULL
1 2 2019-11-27 22:46:28.923+00 1
2 1 2019-12-06 22:17:53.925+00 NULL
2 3 2019-12-13 00:12:28.006+00 1
3 1 2019-11-25 21:57:17.701+00 NULL
3 1 2019-12-06 20:48:28.314+00 1
3 1 2019-12-15 23:59:06.81+00 1
4 3 2019-12-04 15:26:10.639+00 NULL
4 3 2019-11-22 19:20:01.025+00 3
4 3 2019-11-25 12:38:53.169+00 3
Following query will work for you.
select test.user_id,test.role,test.role_date,
case when test.role_date in
(select min(role_date) from test group by user_id) then NULL
else t.role end as MOST_FREQUENT_ROLE
from
(select user_id,min(role) as role from test group by user_id
)t
join test on t.user_id=test.user_id
order by user_id,role_date
Output
USER_ID ROLE ROLE_DATE MOST_FREQUENT_ROLE
1 1 26-NOV-19 -
1 2 27-NOV-19 1
2 1 06-DEC-19 -
2 3 13-DEC-19 1
3 1 25-NOV-19 -
3 1 06-DEC-19 1
3 1 15-DEC-19 1
4 3 22-NOV-19 -
4 3 25-NOV-19 3
4 3 04-DEC-19 3
If you strictly want to go with window function, Try below -
SELECT user_id
,role
,date
,CASE WHEN date = MIN(date) OVER(PARTITION BY user_id ORDER BY date)
THEN NULL
ELSE MIN(role) OVER(PARTITION BY user_id) END MOST_FREQUENT_ROLE
FROM YOUR_TABLE;
Technically, what you are trying to calculate is the mode (this is a statistical term).
Postgres has a built-in mode() function. Alas, it does not work as you need as a window function, so it provides little help.
I would recommend using a lateral join:
select t.*, m.role
from t left join lateral
(select t2.role
from t t2
where t2.user_id = t.user_id and
t2.date < t.date
group by t2.role
order by count(*) desc,
max(date) desc -- in the event of ties, use the most recent
limit 1
) m
on 1=1
order by user_id, date;
Here is a db<>fiddle. Note that I added some rows to give an example of where the running mode changes.
This will not be particularly efficient but an index on (user_id, date, role) should help.
If you have just a handful of roles there are probably more efficient solutions. If that is the case and performance is an issue, ask a new question.

SQL: SELECT value for all rows based on a value in one of the rows and a condition

I have a list of total store visits for a customer for a month. The customer has a home store but can visit other stores. Like the table below:
MemberId | HomeStoreId | VisitedStoreId | Month | Visits
1 5 5 1 5
1 5 3 1 2
1 5 2 1 1
1 5 4 1 7
I want my select statement to give the number of visits to the home store against each store for that member for that month. Like the below:
MemberId | HomeStoreId | VisitedStoreId | Month | Visits | HomeStoreVisits
1 5 5 1 5 5
1 5 3 1 2 5
1 5 2 1 1 5
1 5 4 1 7 5
I've looked at a SUM with CASE statements inside and OVER with PARTITION but I can't seem to work it out.
Thanks
I would use window functions:
select t.*,
sum(case when homestoreid = visitedstoreid then visits end) over
(partition by memberid, month) as homestorevisits
from t;
SELECT MemberID,HomestoreID,visitedstoreid,Month,visits, homestorevisits
FROM Table LEFT OUTER JOIN
(SELECT MemberID, Visits homestorevisits
FROM TABLE WHERE homestoreID =VisitedStoreId
)T ON T.MemberID = Table.MemberID
You can achieve this using a simple subquery.
SELECT MemberId, HomeStoreID, VisitedStoreID, Month, Visits,
(SELECT Visits FROM table t2
WHERE t2.MemberId = t1.MemberId
AND t2.HomeStoreId = t1.HomeStoreId
AND t2.Month = t1.Month
AND t2.VisitedStoreId = t2.HomeStoreId) AS HomeStoreVisits
FROM table t1

How to get minimum date by group id per client id

I am trying to Select Group ID, and minimum dates per clients ID.
This is sample data set.
ClientID GroupID DataDate
1 9 2016-05-01
2 8 2015-04-01
3 7 2016-07-05
1 6 2015-01-05
1 5 2014-11-12
2 4 2016-11-02
1 3 2013-02-14
2 2 2011-04-01
I wrote
SELECT
clientID, MIN(DataDate)
FROM sampleTable
GROUP BY clientID
But in this query, I do not have GroupID selected. I need to include GroupID to join another table.
If I do:
Select
ClientID, GroupID, MIN(DataDate)
FROM sampleTable
GROUP BY ClientID, GroupID
It won't really get minimum dates per client.
Could you help me. How I should do this?
You can use ROW_NUMBER instead:
SELECT
ClientID, GroupID, DataDate
FROM (
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY ClientID ORDER BY DataDate)
FROM SampleData
) t
WHERE rn = 1
If you want to include ties, use RANK instead of ROW_NUMBER.
I hope i understood your question correctly .
You want to display min dates for each client id's
If my table has data like this:
CID GID D1
1 9 03-06-2016
1 6 01-06-2017
1 5 01-06-2015
1 3 01-06-2014
2 4 01-06-2017
2 8 01-06-2014
3 5 03-06-2016
2 4 01-06-2011
Output :
CID GID D1
1 3 01-06-2014
2 4 01-06-2011
3 5 03-06-2016
This is what i think you can go with .
select cx.cid,cx.gid, cx.d1 from cli cx where cx.d1=(select min(c1.d1) from cli c1 where c1.cid=cx.cid)
group by cx.cid,cx.gid,cx.d1
order by cx.gid
Hope it helps.