Alternate SQL Server query by performance? - sql

Query which I am using:
select SUM(marks)
from Table1
where name = ?
and Date = (select top 1 Date
from Table1
where name =?
and Date < ?
order by Date desc)
Table1:
id
name
marks
Date
1
abc
34
01/01/2021
2
abc
15
05/01/2021
3
abc
20
05/01/2021
4
def
34
05/01/2021
5
abc
12
10/01/2021
select sum(marks)
from Table1
where name ='abc'
and Date = (select top 1 Date
from Table1
where name = 'abc'
and Date < 10/01/2021
order by Date desc)
Result 35

Using RANK() would take comparatively less time:
select sum(marks)
from
(
select *, rank()OVER(order by date desc) as rnk
from table1
where name ='abc' and Date < '10/01/2021'
) as we
where rnk=1
Result: 35
Explanation:
Your query is using sub-query in WHERE clause which will check for each and every condition and you are filtering for name abc 2 times. Alternatively I am doing it once and feeding subquery in FROM clause that significantly saves time.
Look at the demo here with time elapsed (have made some additional dummy data to check time)

Related

Multiple select of max values in PostgreSQL for specific ID

I have a table like this:
ID cbk due_16_30 due_31_60
1 2018-06-19 5 200
2 2018-06-19 100 -5
1 2018-06-19 -2 2
2 2018-06-18 20 Null
2 2018-06-18 50 22
1 2018-06-18 30 150
3 2018-06-18 20 70
I want to select for each specific ID a max due_16_30 and a max due_31_60 from the latest date, where date is between some start date and end date. How can I do that in PostgreSQL?
P.S. This is how the 2nd part is solved: https://stackoverflow.com/a/51493567/8495108
One method uses distinct on:
select distinct on (id) id, max(due_16_30), max(due_31_60)
from t
where date >= ? and date < ?
group by id, date
order by id, date desc;
You can do :
select t.id, t.cbk, max(t.due_16_30), max(t.due_31_60)
from table t
where cbk = (select t1.cbk
from table t1
where t1.cbk >= start_dt and t1.cbk <= end_dt
order by t1.cbk desc
limit 1
)
group by t.id, cbk
order by t.id desc
limit 1;
You need to consider myID parameter for both subquery and outer query as :
select ID,
cbk,
max(due_16_30) as max_due_16_30,
max(due_31_60) as max_due_31_60
from tab
where cbk in
(
select max(cbk)
from tab
where cbk between start_date and end_date
and ID = myID
)
and ID = myID
group by ID, cbk;
We may try for id = 3 in the demo, since there's no id = 3 for the latest date 2018-06-19 for whole table :
DB-Fiddle Demo
Solved like this (simplified):
SELECT v.id, max(v.due_31_60), max(v.due_61_90), v.cbk
FROM my_table as v
JOIN (select id, max(cbk) as max_date from my_table
WHERE (cbk >= start_date and cbk <= end_date )
GROUP BY id) as q
ON q.id = v.id and v.cbk = q.cbk
GROUP BY v.id, v.cbk

selecting set of second lowest values

I have two columns of interest ID and Deadline:
ID Deadline (DD/MM/YYYY)
1 01/01/2017
1 05/01/2017
1 04/01/2017
2 02/01/2017
2 03/01/2017
2 06/02/2017
2 08/03/2017
Each ID can have multiple (n) deadlines. I need to select all rows where the Deadline is second lowest for each individual ID.
Desired output:
ID Deadline (DD/MM/YYYY)
1 04/01/2017
2 03/01/2017
Selecting minimum can be done by:
select min(deadline) from XXX group by ID
but I am lost with "middle" values. I am using Rpostgresql, but any idea helps as well.
Thanks for your help
One way is to use ROW_NUMBER() window function
SELECT id, deadline
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY deadline) rn
FROM xxx
) q
WHERE rn = 2 -- get only second lowest ones
or with LATERAL
SELECT t.*
FROM (
SELECT DISTINCT id FROM xxx
) i JOIN LATERAL (
SELECT *
FROM xxx
WHERE id = i.id
ORDER BY deadline
OFFSET 1 LIMIT 1
) t ON (TRUE)
Output:
id | deadline
----+------------
1 | 2017-04-01
2 | 2017-03-01
Here is a dbfiddle demo
Using ROW_NUMBER() after taking distinct records will eliminate the chance of getting the lowest date instead of second lowest if there are duplicate records.
select ID,Deadline
from (
select ID,
Deadline,
ROW_NUMBER() over(partition by ID order by Deadline) RowNum
from (select distinct ID, Deadline from SourceTable) T
) Tbl
where RowNum = 2

Group BY Having COUNT, but Order on a column not contained in group

I have a table where I need to get the ID, for a group(based on ID and Name) with a COUNT(*) = 3, for the latest set of timestamps.
So for example below, I want to retrieve ID 2. As it has 3 rows, and the latest timestamps (even though ID 3 has latest timestamps overall, it doesn't have a count of 3).
But I don't understand how to order by Date, as I cannot contain it in the Group By clause, as it is not the same:
SELECT TOP 1 ID
FROM TABLE
GROUP BY ID,Name
HAVING COUNT(ID) > 2
AND Name = 'ABC'
--ORDER BY Date DESC
Sample Data
ID Name Date
1 ABC 2015-05-27 08:00
1 ABC 2015-05-27 09:00
1 ABC 2015-05-27 10:00
2 ABC 2015-05-27 11:00
2 ABC 2015-05-27 12:00
2 ABC 2015-05-27 13:00
3 ABC 2015-05-27 14:00
3 ABC 2015-05-27 15:00
In SQL server, you need aggregate the columns not on group by list:
SELECT TOP 1 ID
FROM TABLE
WHERE Name = 'ABC'
GROUP BY ID,Name
HAVING COUNT(ID) > 2
ORDER BY MAX(Date) DESC
The name filter should be put before the group by for better performance, if you really need it.
You could do it in a nested query.
Subquery:
SELECT ID
from TABLE
GROUP BY ID
HAVING Count(ID) > 2
That gives you the IDs you want. Put that in another query:
SELECT ID, Data
FROM Table
Where ID in (Subquery)
Order by Date DESC;
First get all desired IDs. That is all IDs having a count > 2. Get the maximum date for each such ID. Then rank these records with ROW_NUMBER, giving the latest ID #1. At last remove all IDs that are not ranked #1.
select name, id
from
(
select
name, id, row_count() over (partition by name order by max_date desc) as rn
from
(
select name, id, max(date) as max_date
from mytable
--where name = 'ABC'
group by name, id
having count(*) > 2
) wanted_ids
) ranked_ids
where rn = 1;

SQL get the closest row for a certain date in other table

Having a table TAB_A:
ID | Date | Value
--------------------------------
101 | 2014-03-01 | 101000001
101 | 2014-03-03 | 101000003
101 | 2014-03-06 | 101000006
102 | 2014-03-01 | 102000001
103 | 2014-03-01 | 103000001
And, for example, this single record in another table TAB_B:
ID | Date | TAB_A.Id
-----------------------------------
40002 | 2014-03-05 | 101
I need to get the closest (most recent) TAB_A.Value to TAB_B.Date field (which in this case would be '101000003' and NOT '101000006').
I've been searching for other responses with similar scenarios (like this one), but this is not exactly what I need.
Any suggestions? Thanks in advance for your help.
EDIT: I forgot to mention that TAB_A has over 200K records and TAB_B has over 55M records.
Seeing as the tag is sql-server, Limit won't work. Instead, use top
SELECT TOP 1 ID, Date, Value
FROM TAB_A
WHERE Date < (SELECT Date from TAB_B where ID=40002)
ORDER BY Date DESC
or
SELECT ID, Date, Value
FROM tab_a
WHERE date=
(SELECT MAX(date)
FROM TAB_A
WHERE Date <
(SELECT Date
FROM TAB_B
WHERE ID=40002)
)
If you want just 1 result in the last query, use DISTINCT. For example, if the date you were looking for is 2014-03-01, the 2nd query would show you 3 examples, with distinct just 1. In the first query, top 1 already ensures that you just have 1 result
.
EDIT: updated for the comment below:
SELECT b.id, b.date, a.value FROM
(SELECT TOP 1 ID, Date, Value
FROM TAB_A
WHERE Date < (SELECT Date from TAB_B B where ID=40002)
ORDER BY Date DESC) a
,
(SELECT id,date,[TAB_A.id] FROM tab_b )b
WHERE a.id=b.[TAB_A.id]
Excuse my capital letters/small letters inconsistency...
SELECT ID, Date, Value
FROM TAB_A
WHERE Date < (SELECT Date from TAB_B where ID=40002)
ORDER BY Date DESC LIMIT 1
First you select this date which you need from TAB_B in a subquery. Then you select all these dates that are earlier than this from TAB_B (you can modify to <= if you need). Then you order descending by date and select TOP 1 (the highest one). I think that you could also use MAX (but I am not sure).
Try this:
SELECT TOP 1 * FROM (
SELECT A.ID,A.Value, MIN(DATEDIFF(day,A.Date,B.Date)) as MinDiff
FROM TAB_A A, TAB_B B
GROUP BY A.ID,A.Value ) as T
WHERE MinDiff>0
ORDER BY MinDiff
Result:
ID VALUE MINDIFF
101 101000003 2
See result in SQL Fiddle.
Explanation:
Inner query will select ID,Value and minimum date difference. With the outer query, we can select the record having minimum date difference which is greater than 0.
You should have an index on date in tab_a for this to perform well (requires sqlserver 2008+):
declare #tab_a table(id int, Date date, value int)
insert #tab_a values (101,'2014-03-01',101000001),
(101,'2014-03-03',101000003),(101,'2014-03-06',101000006),
(102,'2014-03-01',102000001),(103,'2014-03-01',103000001)
declare #tab_b table(id int, Date date, tab_a_id int)
insert #tab_b
values
( 40002, '2014-03-05', 101 ), ( 40002, '2014-03-02', 101 )
select b.ID, b.Date bdate, x.Date adate, x.value
from #tab_b b
outer apply
(select top 1 value, date
from #tab_a a
where a.date <= b.date
and a.id = b.TAB_A_Id
order by a.date desc
) x
Result:
ID bdate adate value
40002 2014-03-05 2014-03-03 101000003
40002 2014-03-02 2014-03-01 101000001
The following code will show the first record for the given condition.In your case it will return what u needed..!
SELECT Value
FROM TAB_A
WHERE DATE < (SELECT Date from TAB_B WHERE ID= '40002') and ROWNUM <= 1
ORDER BY DATE;

SQL select columns group by

If I have a table which is of the following format:
ID NAME NUM TIMESTAMP BOOL
1 A 5 09:50 TRUE
1 B 6 13:01 TRUE
1 A 1 10:18 FALSE
2 A 3 12:20 FALSE
1 A 1 05:30 TRUE
1 A 12 06:00 TRUE
How can I get the ID, NAME and NUM for each unique ID, NAME pair with the latest Timestamp and BOOL=TRUE.
So for the above table the output should be:
ID NAME NUM
1 A 5
1 B 6
I tried using Group By but I cannot seem to get around that either I need to put an aggregator function around num (max, min will not work when applied to this example) or specifying it in group by (which will end up matching on ID, NAME, and NUM combined). Both as far as I can think will break in some case.
PS: I am using SQL Developer (that is the SQL developed by Oracle I think, sorry I am a newbie at this)
If you're using at least SQL-Server 2005 you can use the ROW_NUMBER function:
WITH CTE AS
(
SELECT ID, NAME, NUM,
RN = ROW_NUMBER()OVER(PARTITION BY ID, NAME ORDER BY TIMESTAMP DESC)
FROM Table
WHERE BOOL='TRUE'
)
SELECT ID, NAME, NUM FROM CTE
WHERE RN = 1
Result:
ID NAME NUM
1 A 5
1 B 6
Here's the fiddle: http://sqlfiddle.com/#!3/a1dc9/10/0
select t1.* from table as t1 inner join
(
select NAME, NUM, max(TIMESTAMP) as TIMESTAMP from table
where BOOL='TRUE'
) as t2
on t1.name=t2.name and t1.num=t2.num and t1.timestamp=t2.timestamp
where t1.BOOL='TRUE'
select t1.*
from TABLE1 as t1
left join
TABLE1 as t2
on t1.name=t2.name and t1.TIMESTAMP>t2.TIMESTAMP
where t1.BOOL='TRUE' and t2.id is null
should do it for you.