Understanding a Correlated Subquery

Understanding a Correlated Subquery - sql

I want to create a query that returns the most recent date for a date field and the highest value of a integer field for each "assessment" record. What I think is required is a correlated subquery and using the MAX function.
example data would be as follows
the date field could have duplicate dates for each assessment but each duplicate date group would have a different the integer in the integer field.
eg
1256 2/6/14 0
1256 2/6/14 1
1256 1/6/14 0
4534 3/6/14 0
4534 3/6/14 1
4534 3/6/14 2
select assessment, Max(correctnum) maxofcorrectnum, dateeffect
from lraassm outerassm
where dateeffect =
(select MAX(dateeffect) maxofdateeffect
from pthdbo.lraassm innerassm
innerassm.assessment = outerassm.assessment
group by innerassm.assessment)
group by assessment, dateeffect
so my theory is that the inner query executes and gives the outer query the criteria for the dateeffect field in the outer query and then the outer query would return the maximum of the correctnum field for this dateeffect and also return its corresponding assessment and the dateeffect.
Could someone please confirm this is correct. How does the subquery handle the rows? what other ways are there to solve this problem? thanks

Your query is doing the right thing, but granted, the correlated subquery is a little difficult to understand. What the subquery does is, it filters the records based on assessment from the outer query and then returns the maximum dateeffect for that assessment. In fact, you don't need the group by clause on the correlated query.
These types of queries are where common when working with data in ERP systems, when you're only interested in "latest" records, etc. This is also known as a "top segment" type of query (which the query optimizer is sometimes able to figure out by itself). I've found, that on SQL Server 2005 or newer, it is a lot easier to use the ROW_NUMBER() function. The following query should return the same as yours, namely one record from lraassm for each assessment, that has the highest value of dateeffect and correctnum.
select * from (
select
assessment, dateeffect, correctnum,
ROW_NUMBER() OVER (
PARTITION BY assessment,
ORDER BY dateeffect DESC, correctnum DESC
) AS segment
from lraassm) AS innerQuery
where segment = 1

This is the query I worked out using my tables. But it will get you on the right track and you should be able to substitute your fields/tables in.
Select * from Decode
where updated_time = (Select MAX(updated_time)from DECODE)
That Query gives you every record that has the most recent updated_time. The next query will return the greatest entry_id value as well as the most recent updated_time from those Records
Select MAX(entry_id), updated_time from Decode
where updated_time = (Select MAX(updated_time)from DECODE)
group by updated_time
The result is 2 columns 1 record, 1st column is the Maximum value of entry id, the second is the most recent updated_time. Is that what you wanted to return?

Related

get the latest records

I am currently still on my SQL educational journey and need some help!
The query I have is as below;
SELECT
Audit_Non_Conformance_Records.kf_ID_Client_Reference_Number,
Audit_Non_Conformance_Records.TimeStamp_Creation,
Audit_Non_Conformance_Records.Clause,
Audit_Non_Conformance_Records.NC_type,
Audit_Non_Conformance_Records.NC_Rect_Received,
Audit_Non_Conformance_Records.Audit_Num
FROM Audit_Non_Conformance_Records
I am trying to tweak this to show only the most recent results based on Audit_Non_Conformance_Records.TimeStamp_Creation
I have tried using MAX() but all this does is shows the latest date for all records.
basically the results of the above give me this;
But I only need the result with the date 02/10/2019 as this is the latest result. There may be multiple results however. So for example if 02/10/2019 had never happened I would need all of the idividual recirds from the 14/10/2019 ones.
Does that make any sense at all?

You can filter with a subquery:
SELECT
kf_ID_Client_Reference_Number,
TimeStamp_Creation,
Clause,
NC_type,
NC_Rect_Received,
Audit_Num
FROM Audit_Non_Conformance_Records a
where TimeStamp_Creation = (
select max(TimeStamp_Creation)
from Audit_Non_Conformance_Records
)
This will give you all whose TimeStamp_Creation is equal to the greater value available in the table.
If you want all records that have the greatest day (exluding time), then you can do:
SELECT
kf_ID_Client_Reference_Number,
TimeStamp_Creation,
Clause,
NC_type,
NC_Rect_Received,
Audit_Num
FROM Audit_Non_Conformance_Records a
where cast(TimeStamp_Creation as date) = (
select cast(max(TimeStamp_Creation) as date)
from Audit_Non_Conformance_Records
)
Edit
If you want the latest record per refNumber, then you can correlate the subquery, like so:
SELECT
kf_ID_Client_Reference_Number,
TimeStamp_Creation,
Clause,
NC_type,
NC_Rect_Received,
Audit_Num
FROM Audit_Non_Conformance_Records a
where TimeStamp_Creation = (
select max(TimeStamp_Creation)
from Audit_Non_Conformance_Records a1
where a1.refNumber = a.refNumber
)
For performance, you want an index on (refNumber, TimeStamp_Creation).

If you want the latest date in SQL Server, you can express this as:
SELECT TOP (1) WITH TIES ancr.kf_ID_Client_Reference_Number,
ancr.TimeStamp_Creation,
ancr.Clause,
ancr.NC_type,
ancr.NC_Rect_Received,
ancr.Audit_Num
FROM Audit_Non_Conformance_Records ancr
ORDER BY CONVERT(date, ancr.TimeStamp_Creation) DESC;
SQL Server is pretty good about handling dates with conversions, so I would not be surprised if this used an index on TimeStamp_Creation.

SQL statement to match dates that are the closest?

I have the following table, let's call it Names:
Name Id Date
Dirk 1 27-01-2015
Jan 2 31-01-2015
Thomas 3 21-02-2015
Next I have the another table called Consumption:
Id Date Consumption
1 26-01-2015 30
1 01-01-2015 20
2 01-01-2015 10
2 05-05-2015 20
Now the problem is, that I think that doing this using SQL is the fastest, since the table contains about 1.5 million rows.
So the problem is as follows, I would like to match each Id from the Names table with the Consumption table provided that the difference between the dates are the lowest, so we have: Dirk consumes on 27-01-2015 about 30. In case there are two dates that have the same "difference", I would like to calculate the average consumption on those two dates.
While I know how to join, I do not know how to code the difference part.
Thanks.
DBMS is Microsoft SQL Server 2012.
I believe that my question differs from the one mentioned in the comments, because it is much more complicated since it involves comparison of dates between two tables rather than having one date and comparing it with the rest of the dates in the table.

This is how you could it in SQL Server:
SELECT Id, Name, AVG(Consumption)
FROM (
SELECT n.Id, Name, Consumption,
RANK() OVER (PARTITION BY n.Id
ORDER BY ABS(DATEDIFF(d, n.[Date], c.[Date]))) AS rnk
FROM Names AS n
INNER JOIN Consumption AS c ON n.Id = c.Id ) t
WHERE t.rnk = 1
GROUP BY Id, Name
Using RANK with PARTITION BY n.Id and ORDER BY ABS(DATEDIFF(d, n.[Date], c.[Date])) you can locate all matching records per Id: all records with the smallest difference in days are going to have rnk = 1.
Then, using AVG in the outer query, you are calculating the average value of Consumption between all matching records.
SQL Fiddle Demo

Subqueries: What am I doing fundamentally wrong?

I thought that selecting values from a subquery in SQL would only yield values from that subset until I found a very nasty bug in code. Here is an example of my problem.
I'm selecting the rows that contain the latest(max) function by date. This correctly returns 4 rows with the latest check in of each function.
select *, max(date) from cm where file_id == 5933 group by function_id;
file_id function_id date value max(date)
5933 64807 1407941297 1 1407941297
5933 64808 1407941297 11 1407941297
5933 895175 1306072348 1306072348
5933 895178 1363182349 1363182349
When selecting only the value from the subset above, it returns function values from previous dates, i.e. rows that don't belong in the subset above. You can see the result below where the dates are older than in the first subset.
select temp.function_id, temp.date, temp.value
from (select *, max(date)
from cm
where file_id 5933
group by function_id) as temp;
function_id date value
64807 1306072348 1 &lt-outdated row, not in first subset
64808 1306072348 17 &lt-outdated row, not in first subset
895175 1306072348
895178 1363182349
What am I doing fundamentally wrong? Shouldn't selects performed on subqueries only return possible results from those subqueries?

SQLite allows you to use MAX() to select the row to be returned by a GROUP BY, but this works only if the MAX() is actually computed.
When you throw the max(date) column away, this no longer works.
In this case, you actually want to use the date value, so you can just keep the MAX():
SELECT function_id,
max(date) AS date,
value
FROM cm
WHERE file_id = 5933
GROUP BY function_id

You seem to be missing the fact that your subquery is returning ALL rows for the given file_id. If you want to restrict your subquery to recs with the most recent date, then you need to restrict it with a WHERE NOT EXISTS clause to check that no more recent records exist for the given condition.

Perhaps my question was not formulated correctly, but this post had the solutions I was essentially looking for:
https://stackoverflow.com/a/123481/2966951
https://stackoverflow.com/a/121435/2966951
Filtering out the most recent row was my problem. I was surprised that selecting from a subquery with a max value could yield anything other than that value.

Get a Row if within certain time period of other row

I have a SQL statement that I am currently using to return a number of rows from a database:
SELECT
as1.AssetTagID, as1.TagID, as1.CategoryID,
as1.Description, as1.HomeLocationID, as1.ParentAssetTagID
FROM Assets AS as1
INNER JOIN AssetsReads AS ar ON as1.AssetTagID = ar.AssetTagID
WHERE
(ar.ReadPointLocationID='Readpoint1' OR ar.ReadPointLocationID='Readpoint2')
AND (ar.DateScanned between 'LastScan' AND 'Now')
AND as1.TagID!='000000000000000000000000'
I am wanting to do a query that will get the row with the oldest DateScanned from this query and also get another row from the database if there was one that was within a certain period of time from this row (say 5 seconds for an example). The oldest record would be relatively simple by selecting the first record in a descending sort, but how would I also get the second record if it was within a certain time period of the first?
I know I could do this process with multiple queries, but is there any way to combine this process into one query?
The database that I am using is SQL Server 2008 R2.
Also please note that the DateScanned times are just placeholders and I am taking care of that in the application that will be using this query.

Here is a fairly general way to approach it. Get the oldest scan date using min() as a window function, then use date arithmetic to get any rows you want:
select t.* -- or whatever fields you want
from (SELECT as1.AssetTagID, as1.TagID, as1.CategoryID,
as1.Description, as1.HomeLocationID, as1.ParentAssetTagID,
min(DateScanned) over () as minDateScanned, DateScanned
FROM Assets AS as1
INNER JOIN AssetsReads AS ar ON as1.AssetTagID = ar.AssetTagID
WHERE (ar.ReadPointLocationID='Readpoint1' OR ar.ReadPointLocationID='Readpoint2')
AND (ar.DateScanned between 'LastScan' AND 'Now')
AND as1.TagID!='000000000000000000000000'
) t
where datediff(second, minDateScanned, DateScanned) <= 5;

I am not really sure of sql server syntax, but you can do something like this
SELECT * FROM (
SELECT
TOP 2
as1.AssetTagID,
as1.TagID,
as1.CategoryID,
as1.Description,
as1.HomeLocationID,
as1.ParentAssetTagID ,
ar.DateScanned,
LAG(ar.DateScanned) OVER (order by ar.DateScanned desc) AS lagging
FROM
Assets AS as1
INNER JOIN AssetsReads AS ar
ON as1.AssetTagID = ar.AssetTagID
WHERE (ar.ReadPointLocationID='Readpoint1' OR ar.ReadPointLocationID='Readpoint2')
AND (ar.DateScanned between 'LastScan' AND 'Now')
AND as1.TagID!='000000000000000000000000'
ORDER BY
ar.DateScanned DESC
)
WHERE
lagging IS NULL or DateScanned - lagging < '5 SECONDS'
I have tried to sort the results by DateScanned desc and then just the top most 2 rows. I have then used the lag() function on DateScanned field, to get the DateScanned value for the previous row. For the topmost row the DateScanned shall be null as its the first record, but for the second one it shall be value of the first row. You can then compare both of these values to determine whether you wish to display the second row or not
more info on the lagging function: http://blog.sqlauthority.com/2011/11/15/sql-server-introduction-to-lead-and-lag-analytic-functions-introduced-in-sql-server-2012/

How do I count data from 2 different tables by date

I have 2 tables with no relations, both tables have different number of columns, but there are a few columns that are the same but hold different data. I was able to create a function or view of only the data I wanted, but when I try to count the data by filtering the date, I always get the wrong count in return. Let me explain by showing the 2 functions and what I try to do:
Function 1
ID - number from 1 to 8
data sent - YES or NO
Date - date value
Function 2
ID - number from 1 to 8
data sent - yes or no
date - date value
Upon running both separately, I get all the rows from the tables and everything looks good.
Then I try to add the following to each function:
select
count([data sent]), ID
from function1
Where (date between #date1 and #date2)
group by ID
The above statement works great and gives me the right result for each function.
Now I thought what if I want to add those 2 functions into one and get the count from both functions on 1 page.
So I created the following function:
Function 3
select
count(Function1.[data sent]) as Expr1,
Function1.id,
count(Function2.[data sent]) as Expr2,
Function1.date
from
Function1
LEFT OUTER JOIN
Function2 on Function1.id = Function2.id
Where
(Function1.date between #date1 and #date2)
group by
Function1.id
Upon running the above, I get the following table:
ID Expr1 Expr2
On both Expr1 and Expr2, I get results which I am not sure where they come from. I guess something is being multiplied by 100000 since one table holds almost 15000 rows and the other around 5000 rows.
What I would like to know first is if it possible at all to be able to filter by date and count records from both table at the same time. If anyone need more information please let me know and I will be glad to share and explain more.
Thank you

The LEFT OUTER JOIN is taking each row of the left table, finding ALL of the rows in the right table with the same id field, and creating that many rows in the result table. Since id isn't what we usually think of as an identity field (it looks more like a "deviceId" or something), you'll get lots of matches for each one. Repeat 15000 times and you get your combinatorial explosion.
Tip: To debug things like this, you can create sample tables with a tiny subset of the real data, say 10 rows from each, and run your query on them. You'll see the issue immediately.
It's possible to filter by date. It's hard to recommend an actual solution without better understanding your phrase "I want to add those 2 functions into one and get the count from both functions on 1 page".

Why can't you create a temporary table for each function then join them together?

Maybe subqueries can help you to achieve what you want:
SELECT
ID = COALESCE(f1.ID, f2.ID),
Date = COALESCE(f1.Date, f2.Date),
f1.Expr1,
f2.Expr2
FROM (
SELECT
ID,
Date,
Expr1 = COUNT([data sent])
FROM Function1
WHERE Date BETWEEN #date1 AND #date2
GROUP BY
ID,
Date
) f1
FULL JOIN (
SELECT
ID,
Date,
Expr2 = COUNT([data sent])
FROM Function2
WHERE Date BETWEEN #date1 AND #date2
GROUP BY
ID,
Date
) f2
ON f1.ID = f2.ID AND f1.Date = f2.Date
This query also uses full (outer) join instead of left join, in case the right side of the join contains rows that have no match in the left side (and you want those rows).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Understanding a Correlated Subquery - sql

Related

get the latest records

SQL statement to match dates that are the closest?

Subqueries: What am I doing fundamentally wrong?

Get a Row if within certain time period of other row

How do I count data from 2 different tables by date

Categories

Resources