How to optimise a query that uses multiple sub select statements

How to optimise a query that uses multiple sub select statements - sql

was hoping some-one could help me with this:
My table is:
id Version datetime name resource
---|--------|------------|--------|---------
1 | 1 | 03/03/2009 | con1 | 399
2 | 2 | 03/03/2009 | con1 | 244
3 | 3 | 01/03/2009 | con1 | 555
4 | 1 | 03/03/2009 | con2 | 200
5 | 2 | 03/03/2009 | con2 | 500
6 | 3 | 04/03/2009 | con2 | 600
7 | 4 | 31/03/2009 | con2 | 700
I need to select each distinct "name" that has greatest value of "datetime" that less than or equal to a given date; and where the version is the maximum version if there are multiple records that satisfy the first condition.
The result if the given date were '04/03/2009' would be:
id Version datetime name resource
---|--------|------------|--------|---------
2 | 2 | 03/03/2009 | con1 | 244
6 | 3 | 04/03/2009 | con2 | 600
Currently I've created the following query, which works, but I suspect it's not the best when it comes to performance when run on a large table:
SELECT [id], [Version], [datetime], [name], [resource]
FROM theTable
WHERE [Version] =
(
SELECT MAX(Version) FROM theTable AS theTable2 WHERE theTable.[name] = theTable2.[name]
AND theTable2.[datetime] =
(
SELECT MAX(theTable3.[datetime]) FROM theTable AS theTable3
WHERE theTable2.[name] = theTable3.[name] AND theTable3.[datetime] <= '04/03/2009'
)
)
I'd appreciate if some-one could suggest a more efficient way to do this; and if possible, provide an example:-).
Thanks in advance.

You can use PARTITION BY. This lets you basically rank the results. In your instance, you then want to select only the result with ranking 1. First, filter out the results with invalid date times (using WHERE), then in the partition, order by the columns descending (thus, the first result would be the one with the maximum datetime, and, in case of datetime tie, the maximum version as well.)
SELECT [id], [Version], [datetime], [name], [resource]
FROM
(
SELECT [id], [Version], [datetime], [name], [resource], row_number()
OVER (PARTITION BY [name] ORDER BY [datetime] DESC, [Version] DESC) as groupIndex
FROM theTable
WHERE [datetime] <= '04/03/2009'
) AS t
WHERE groupIndex = 1

Related

SQL use of OVER and PARTITION BY

I have the following table;
ClientID | Location | Episode | Date
001 | Area1 | 4 | 01Dec16
001 | Area2 | 3 | 01Nov16
001 | Area2 | 2 | 01Oct16
001 | Area1 | 1 | 01Sep16
002 | Area2 | 3 | 21Dec16
002 | Area1 | 2 | 21Nov16
002 | Area1 | 1 | 21Oct16
And I'm looking to create 2 new columns based to the latest episode of the client
ClientID | Location | Episode | Date | LatestEpisode | LatestLocation
001 | Area1 | 4 | Dec | 4 | Area1
001 | Area2 | 3 | Nov | 4 | Area1
001 | Area2 | 2 | Oct | 4 | Area1
001 | Area1 | 1 | Sep | 4 | Area1
002 | Area2 | 3 | Dec | 3 | Area2
002 | Area1 | 2 | Nov | 3 | Area2
002 | Area1 | 1 | Oct | 3 | Area2
I have worked out I can use OVER to work out the LatestEspisode:
LatestEpisode = MAX(Episode) OVER(PARTITION BY ClientID)
But can't work out how to get the LatestLocation?
EDIT: Sorry if I haven't got the format right, this is my first post. I was trying to look at how to post correctly but I found it quite confusing
I have searched stackoverflow many times over the last 3 days and have found various ways using OVER and ROW NUMBER() but I don't have a lot of experience of them. Many of the examples I had found previously were fine for producing an aggregated table but I want to keep the full table, this is why I thought using OVER was the way to go.

Sql server 2012 version introduced the FIRST_VALUE() function,
That enables you to write your select query like this:
SELECT ClientID,
Location,
Episode,
[Date],
LatestEpisode = FIRST_VALUE(Episode) OVER(PARTITION BY ClientID ORDER BY [Date] DESC),
LatestLocation = FIRST_VALUE(Location) OVER(PARTITION BY ClientID ORDER BY [Date] DESC)
FROM tableName

In SQL Server, I would do this with cross apply:
select e.*, e2.episode as LatestEpisode, e2.location as LatestLocation
from episodes e cross apply
(select top 1 e2.*
from episodes e2
where e2.clientId = e.clientId
order by e2.episode desc
) elast;
Although you can express this logic with window functions, the lateral join (implemented in SQL Server using the apply keyword) is more natural way of expressing the logic.
If you are not familiar with lateral joins, you can think of them as a correlated subqueries in the from clause . . . but queries that allow you to return multiple columns. I should add, though, that one of the main use cases is for table-valued functions, so it is a very powerful construct.

First, you need to select LatestEpisode per each client and then you can use this value to identify row, where you can get LatestLocation from
SELECT *
,(
SELECT Location
FROM Episodes
WHERE ClientId = MyTable.ClientId
AND Episode = MyTable.LatestEpisode
) AS LatestLocation
FROM (
SELECT *
,MAX(Episode) OVER (PARTITION BY ClientId) AS LatestEpisode
FROM Episodes
) AS MyTable
You can also use common table expression (CTE):
WITH cte
AS (
SELECT *
,MAX(Episode) OVER (PARTITION BY ClientId) AS LatestEpisode
FROM Episodes
)
SELECT cte.*
,(
SELECT Location
FROM Episodes
WHERE ClientId = cte.ClientId
AND Episode = cte.LatestEpisode
) AS LatestLocation
FROM cte

I have worked on it and able to produce the required result
Please try below
Declare #Table table ( ClientID varchar(max), Location varchar(500), Episode int, Dated varchar(30))
Insert Into #Table
Values ('001', 'Area1', 4 ,'01Dec16' )
,('001', 'Area2', 3, '01Nov16')
, ('001', 'Area2', 2, '01Oct16')
,('001' ,'Area1' ,1, '01Sep16')
,('002' ,'Area2' ,3, '21Dec16')
,('002' ,'Area1' ,2, '21Nov16')
,('002' ,'Area1' ,1, '21Oct16')
; WITH LL AS
(
SELECT CLientID ,MAX(CAST (Dated as Date)) as maxdate
FROM #table
GROUP BY ClientID
)
, Area AS
(
SELECT Location, x.ClientID, x.Dated FROM #Table x INNER JOIN LL b ON x.ClientID = b.ClientID AND x.Dated = b.maxdate
)
SELECT a.*
, LatestEpisode = MAX(Episode) OVER(PARTITION BY a.ClientID)
, LatestLocation = b.Location
FROM #Table a
INNER JOIN Area b ON a.ClientID = b.ClientID

SQL Server - Insert lines with null values when month doesn't exist

I have a table like this one:
Yr | Mnth | W_ID | X_ID | Y_ID | Z_ID | Purchases | Sales | Returns |
2015 | 10 | 1 | 5210 | 1402 | 2 | 1000.00 | etc | etc |
2015 | 12 | 1 | 5210 | 1402 | 2 | 12000.00 | etc | etc |
2016 | 1 | 1 | 5210 | 1402 | 2 | 1000.00 | etc | etc |
2016 | 3 | 1 | 5210 | 1402 | 2 | etc | etc | etc |
2014 | 3 | 9 | 880 | 2 | 7 | etc | etc | etc |
2014 | 12 | 9 | 880 | 2 | 7 | etc | etc | etc |
2015 | 5 | 9 | 880 | 2 | 7 | etc | etc | etc |
2015 | 7 | 9 | 880 | 2 | 7 | etc | etc | etc |
For each combination of (W, X, Y, Z) I would like to insert the months that don't appear in the table and are between the first and last month.
In this example, for combination (W=1, X=5210, Y=1402, Z=2), I would like to have additional rows for 2015/11 and 2016/02, where Purchases, Sales and Returns are NULL. For combination (W=9, X=880, Y=2, Z=7) I would like to have additional rows for months between 2014/4 and 2014/11, 2015/01 and 2015/04, 2016/06.
I hope I have explained myself correctly.
Thank you in advance for any help you can provide.

The process is rather cumbersome in this case, but quite possible. One method uses a recursive CTE. Another uses a numbers table. I'm going to use the latter.
The idea is:
Find the minimum and maximum values for the year/month combination for each set of ids. For this, the values will be turned into months since time 0 using the formula year*12 + month.
Generate a bunch of numbers.
Generate all rows between the two values for each combination of ids.
For each generated row, use arithmetic to re-extract the year and month.
Use left join to bring in the original data.
The query looks like:
with n as (
select row_number() over (order by (select null)) - 1 as n -- start at 0
from master.spt_values
),
minmax as (
select w_id, x_id, y_id, z_id, min(yr*12 + mnth) as minyyyymm,
max(yr*12 + mnth) as maxyyyymm
from t
group by w_id, x_id, y_id, z_id
),
wxyz as (
select minmax.*, minmax.minyyyymm + n.n,
(minmax.minyyyymm + n.n) / 12 as yyyy,
((minmax.minyyyymm + n.n) % 12) + 1 as mm
from minmax join
n
on minmax.minyyyymm + n.n <= minmax.maxyyyymm
)
select wxyz.yyyy, wxyz.mm, wxyz.w_id, wxyz.x_id, wxyz.y_id, wxyz.z_id,
<columns from t here>
from wxyz left join
t
on wxyz.w_id = t.w_id and wxyz.x_id = t.x_id and wxyz.y_id = t.y_id and
wxyz.z_id = t.z_id and wxyz.yyyy = t.yr and wxyz.mm = t.mnth;

Thank you for your help.
Your solution works, but I noticed it is not very good in terms of performance, but meanwhile I have managed to get a solution for my problem.
DECLARE #start_date DATE, #end_date DATE;
SET #start_date = (SELECT MIN(EOMONTH(DATEFROMPARTS(Yr , Mnth, 1))) FROM Table_Input);
SET #end_date = (SELECT MAX(EOMONTH(DATEFROMPARTS(Yr , Mnth, 1))) FROM Table_Input);
DECLARE #tdates TABLE (Period DATE, Yr INT, Mnth INT);
WHILE #start_date <= #end_date
BEGIN
INSERT INTO #tdates(PEriod, Yr, Mnth) VALUES(#start_date, YEAR(#start_date), MONTH(#start_date));
SET #start_date = EOMONTH(DATEADD(mm,1,DATEFROMPARTS(YEAR(#start_date), MONTH(#start_date), 1)));
END
DECLARE #pks TABLE (W_ID NVARCHAR(50), X_ID NVARCHAR(50)
, Y_ID NVARCHAR(50), Z_ID NVARCHAR(50)
, PerMin DATE, PerMax DATE);
INSERT INTO #pks (W_ID, X_ID, Y_ID, Z_ID, PerMin, PerMax)
SELECT W_ID, X_ID, Y_ID, Z_ID
, MIN(EOMONTH(DATEFROMPARTS(Ano, Mes, 1))) AS PerMin
, MAX(EOMONTH(DATEFROMPARTS(Ano, Mes, 1))) AS PerMax
FROM Table1
GROUP BY W_ID, X_ID, Y_ID, Z_ID;
INSERT INTO Table_Output(W_ID, X_ID, Y_ID, Z_ID
, ComprasLiquidas, RTV, DevManuais, ComprasBrutas, Vendas, Stock, ReceitasComerciais)
SELECT TP.DB, TP.Ano, TP.Mes, TP.Supplier_Code, TP.Depart_Code, TP.BizUnit_Code
, TA.ComprasLiquidas, TA.RTV, TA.DevManuais, TA.ComprasBrutas, TA.Vendas, TA.Stock, TA.ReceitasComerciais
FROM
(
SELECT W_ID, X_ID, Y_ID, Z_ID
FROM #tdatas CROSS JOIN #pks
WHERE Period BETWEEN PerMin And PerMax
) AS TP
LEFT JOIN Table_Input AS TA
ON TP.W_ID = TA.W_ID AND TP.X_ID = TA.X_ID AND TP.Y_ID = TA.Y_ID
AND TP.Z_ID = TA.Z_ID
AND TP.Yr = TA.Yr
AND TP.Mnth = TA.Mnth
ORDER BY TP.W_ID, TP.X_ID, TP.Y_ID, TP.Z_ID, TP.Yr, TP.Mnth;
I do the following:
Get the Min and Max date of the entire table - #start_date and #end_date variables;
Create an auxiliary table with all dates between Min and Max - #tdates table;
Get all the combinations of (W_ID, X_ID, Y_ID, Z_ID) along with the min and max dates of that combination - #pks table;
Create the cartesian product between #tdates and #pks, and in the WHERE clause I filter the results between the Min and Max of the combination;
Compute a LEFT JOIN of the cartesian product table with the input data table.

Measuring two arrays for equality of lowest numbers

I have the following SQL Tables:
create table events (
sensor_id integer not null,
event_type integer not null,
value integer not null,
time timestamp unique not null
);
And it looks something like this:
sensor_id | event_type | value | time
-----------+------------+------------+--------------------
2 | 2 | 5 | 2014-02-13 12:42:00
2 | 4 | -42 | 2014-02-13 13:19:57
2 | 2 | 2 | 2014-02-13 14:48:30
3 | 2 | 7 | 2014-02-13 12:54:39
2 | 3 | 54 | 2014-02-13 13:32:36
what is the easiest way to return the most recent value in terms of time by sensor_id and event_type? so it looks like this:
sensor_id | event_type | value
-----------+------------+-----------
2 | 2 | 2
2 | 3 | 54
2 | 4 | -42
3 | 2 | 7
I cant get my head around it

try the t-sql code below:
SELECT e.sensor_id, e.event_type, e.value
FROM
(SELECT e.sensor_id, time = MAX(e.time)
FROM dbo.events e WITH(NOLOCK)
GROUP BY e.sensor_id, e.event_type) m
JOIN dbo.events e WITH(NOLOCK) ON e.sensor_id = m.sensor_id
AND e.time = m.time
ORDER BY e.sensor_id
how are your sorting the rows? can't seem to see the pattern.
EDIT: got the sorting now!
SELECT e.sensor_id, e.event_type, e.value
FROM
(SELECT e.sensor_id, time = MAX(e.time)
FROM dbo.events e WITH(NOLOCK)
GROUP BY e.sensor_id, e.event_type) m
JOIN dbo.events e WITH(NOLOCK) ON e.sensor_id = m.sensor_id
AND e.time = m.time
ORDER BY e.sensor_id, e.sensor_id + e.event_type

One way is to do:
SELECT value FROM events WHERE sensor_id=? AND event_type=? ORDER BY time DESC
Then the first one is the most recent.
Databases can sometimes limit you to just one, depending on the database you want.
I put a question mark where you would need to put a number.

SELECT
sensor_id, event_type, value
FROM
(
SELECT
ROW_NUMBER() OVER
(PARTITION BY sensor_id, event_type ORDER BY time DESC) AS Ordering,
*
FROM events
) data
WHERE
data.Ordering = 1

SQL Find First Occurrence

I've been at this for about an hour now and am making little to no progress - thought I'd come here for some help/advice.
So, given a sample of my table:
+-----------+-----------------------------+--------------+
| MachineID | DateTime | AlertType |
+-----------+-----------------------------+--------------+
| 56 | 2015-10-05 00:00:23.0000000 | 2000 |
| 42 | 2015-10-05 00:01:26.0000000 | 1006 |
| 50 | 2015-10-05 00:08:33.0000000 | 1018 |
| 56 | 2015-10-05 00:08:48.0000000 | 2003 |
| 56 | 2015-10-05 00:10:15.0000000 | 2000 |
| 67 | 2015-10-05 00:11:59.0000000 | 3001 |
| 60 | 2015-10-05 00:13:02.0000000 | 1006 |
| 67 | 2015-10-05 00:13:08.0000000 | 3000 |
| 56 | 2015-10-05 00:13:09.0000000 | 2003 |
| 67 | 2015-10-05 00:14:50.0000000 | 1018 |
| 67 | 2015-10-05 00:15:00.0000000 | 1018 |
| 47 | 2015-10-05 00:16:55.0000000 | 1006 |
+-----------+-----------------------------+--------------+
How would I get the first occurrence of MachineID w/ an AlertType of 2000
and the last occurrence of the same MachineID w/ and AlertType of 2003.
Here is what I have tried - but it is not outputting what I expect.
SELECT *
FROM [Alerts] a
where
DateTime >= '2015-10-05 00:00:00'
AND DateTime <= '2015-10-06 00:00:00'
and not exists(
select b.MachineID
from [Alerts] b
where b.AlertType=a.AlertType and
b.MachineID<a.MachineID
)
order by a.DateTime ASC
EDIT: The above code doesn't get me what I want because I am not specifically telling it to search for AlertType = 2000 or AlertType = 2003, but even when I try that, I am still unable to gather my desired results.
Here is what I would like my output to display:
+-----------+-----------------------------+--------------+
| MachineID | DateTime | AlertType |
+-----------+-----------------------------+--------------+
| 56 | 2015-10-05 00:00:23.0000000 | 2000 |
| 56 | 2015-10-05 00:13:09.0000000 | 2003 |
+-----------+-----------------------------+--------------+
Any help with this would be greatly appreciated!

Not sure, but:
select * from [Table]
WHERE [DateTime] IN (
SELECT MIN([DateTime]) as [DateTime]
FROM [Table]
WHERE AlertType = 2000
GROUP BY MachineId
UNION ALL
SELECT MAX([DateTime]) as [DateTime]
FROM [Table]
WHERE AlertType = 2003
GROUP BY MachineId)
ORDER BY MachineId, AlertType

It looks like your outer section takes all records between 2015-10-05 to 2015-10-06, which includes all the records sorted by date. The inner portion only happens when no records fit the outer date range.
Looks like GSazheniuk has it right, but I am not sure if you just want the 2 records or everything that matches the MachineID and the two alerts?

Not sure what your attempt has to do with your question, but to answer this:
How would I get the first occurrence of MachineID w/ an AlertType of
2000 and the last occurrence of the same MachineID w/ and AlertType of
2003.
Simple:
SELECT * FROM (
SELECT TOP 1 * FROM Alerts WHERE AlertType='2000' ORDER BY Datetime ASC
UNION ALL
SELECT TOP 1 * FROM Alerts WHERE AlertType='2003' ORDER BY Datetime DESC
) t

I think everyone misses that your alert type is NOT a deciding factor, but a supplemental.
This should give you what you are looking for. I walked through the whole process.
`IF OBJECT_ID('tempdb..#alerts') IS NOT NULL DROP table #alerts
CREATE TABLE #alerts
(
MachineID int,
dte DATETIME,
alerttype int
)
INSERT INTO #alerts VALUES ('56','20151005 00:00:23','2000')
INSERT INTO #alerts VALUES ('42','20151005 00:01:26','1006')
INSERT INTO #alerts VALUES ('50','20151005 00:08:33','1018')
INSERT INTO #alerts VALUES ('56','20151005 00:08:48','2003')
INSERT INTO #alerts VALUES ('56','20151005 00:10:15','2000')
INSERT INTO #alerts VALUES ('67','20151005 00:11:59','3001')
INSERT INTO #alerts VALUES ('60','20151005 00:13:02','1006')
INSERT INTO #alerts VALUES ('67','20151005 00:13:08','3000')
INSERT INTO #alerts VALUES ('56','20151005 00:13:09','2003')
INSERT INTO #alerts VALUES ('67','20151005 00:14:50','1018')
INSERT INTO #alerts VALUES ('67','20151005 00:15:00','1018')
INSERT INTO #alerts VALUES ('47','20151005 00:16:55','1006')
GO
WITH rnk as ( --identifies the order of the records.
Select
MachineID,
dte = dte,
rnk = RANK() OVER (partition BY machineid ORDER BY dte DESC) --ranks the machine ID's based on date (first to Last)
FROM #alerts
),
agg as( --Pulls your first and last record
SELECT
MachineID,
frst = MIN(rnk),
lst = MAX(rnk)
FROM rnk
GROUP BY MachineID
)
SELECT
pop.MachineID,
pop.dte,
pop.alerttype
FROM #alerts pop
JOIN rnk r ON pop.MachineID = r.MachineID AND pop.dte = r.dte --the date join allows you to hook into your ranks
JOIN agg ON pop.MachineID = agg.MachineID
WHERE agg.frst = r.rnk OR agg.lst = r.rnk -- or clause can be replaced by two queries with a union all
ORDER BY 1,2 --viewability... machineID, date`
I personally use cross apply's to preform tasks like this, but CTE's are much more visually friendly for this exercise.

subtract data from single column

I have a database table with 2 columns naming piece and diff and type.
Here's what the table looks like
id | piece | diff | type
1 | 20 | NULL | cake
2 | 15 | NULL | cake
3 | 10 | NULL | cake
I want like 20 - 15 = 5 then 15 -10 = 5 , then so on so fort with type as where.
Result will be like this
id | piece | diff | type
1 | 20 | 0 | cake
2 | 15 | 5 | cake
3 | 10 | 5 | cake
Here's the code I have so far but i dont think I'm on the right track
SELECT
tableblabla.id,
(tableblabla.cast(pieces as decimal(7, 2)) - t.cast(pieces as decimal(7, 2))) as diff
FROM
tableblabla
INNER JOIN
tableblablaas t ON tableblabla.id = t.id + 1
Thanks for the help

Use LAG/LEAD window function.
Considering that you want to find Difference per type else remove Partition by from window functions
select id, piece,
Isnull(lag(piece)over(partition by type order by id) - piece,0) as Diff,
type
From yourtable
If you are using Sql Server prior to 2012 use this.
;WITH cte
AS (SELECT Row_number()OVER(partition by type ORDER BY id) RN,*
FROM Yourtable)
SELECT a.id,
a.piece,
Isnull(b.piece - a.piece, 0) AS diff,
a.type
FROM cte a
LEFT JOIN cte b
ON a.rn = b.rn + 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to optimise a query that uses multiple sub select statements - sql

Related

SQL use of OVER and PARTITION BY

SQL Server - Insert lines with null values when month doesn't exist

Measuring two arrays for equality of lowest numbers

SQL Find First Occurrence

subtract data from single column

Categories

Resources