Howto get newest dataset with SQL join?

Howto get newest dataset with SQL join? - sql

I have got the following join:
SELECT l.cFirma AS Lieferant,
SUM(la.fEKNetto) AS Verbindlichkeiten,
l.kLieferant AS Lieferanten_ID,
100 - gk1.fFaktor * 100 AS Grundkondition,
MAX(gk1.dDatum) AS Datum
FROM tBestellung b, tArtikel a, tBestellpos p, tLieferant l, tLiefArtikel la, tGrundkondition gk1
WHERE
CAST('01.01.2010' AS DATETIME) <= CAST(b.dErstellt AS DATETIME)
AND b.cType = 'B'
AND p.tBestellung_kBestellung = b.kBestellung
AND a.kArtikel = p.tArtikel_kArtikel
AND l.kLieferant = la.tLieferant_KLieferant
AND a.kArtikel = la.tArtikel_kArtikel
AND gk1.tLieferant_kLieferant = l.kLieferant
GROUP BY l.kLieferant, cFirma, gk1.fFaktor
ORDER BY Verbindlichkeiten DESC, Lieferant
Please fokus on the table "tGrundkondition" alias gk1. There is a DATETIME column called "dDatum" and a foreign key "tLieferant_kLieferant".
Now I need only the latest data from this table, joined with the other stuff. I already used the MAX(gk1.dDatum) function, but I still get all entries of gk1. I need only the latest (with the highest dDate). Actually I don't need to output the date but only to filter the data.
I'm running this statement on Microsoft SQL Server via ODBC. Do you need any further information?
I hope you can help me. Thanks in advance.

You need to use a correlated subquery, for example add the following:
WHERE gk1.DATUM = (SELECT MAX(SUB.DATUM) FROM tGrundkondition SUB
WHERE SUB.tLieferant_kLieferant = l.kLieferant)
I am not sure this is 100% correct because I don't know your table structure, but it should give you an idea.

Try to do something like this:
SELECT l.cFirma AS Lieferant,
SUM(la.fEKNetto) AS Verbindlichkeiten,
l.kLieferant AS Lieferanten_ID,
100 - gk1.fFaktor * 100 AS Grundkondition,
gk1.dDatum AS Datum
FROM tBestellung b, tArtikel a, tBestellpos p, tLieferant l, tLiefArtikel la, tGrundkondition gk1
WHERE
CAST('01.01.2010' AS DATETIME) <= CAST(b.dErstellt AS DATETIME)
AND b.cType = 'B'
AND p.tBestellung_kBestellung = b.kBestellung
AND a.kArtikel = p.tArtikel_kArtikel
AND l.kLieferant = la.tLieferant_KLieferant
AND a.kArtikel = la.tArtikel_kArtikel
AND gk1.tLieferant_kLieferant = l.kLieferant
AND gk1.dDatum = (SELECT MAX(dDatum) from _ITS TABLE_)
GROUP BY l.kLieferant, cFirma, gk1.fFaktor
ORDER BY Verbindlichkeiten DESC, Lieferant
I don't know if it works on SQL SERVER.... but I used a lot on DB2

SELECT l.cFirma AS Lieferant,
SUM(la.fEKNetto) AS Verbindlichkeiten,
l.kLieferant AS Lieferanten_ID,
100 - gk1.fFaktor * 100 AS Grundkondition,
gk1.dDatum AS Datum
FROM (
SELECT TOP 1 *
FROM tGrundkondition
ORDER BY
dDatum DESC
) gk1,
tBestellung b, tArtikel a, tBestellpos p, tLieferant l, tLiefArtikel la
WHERE
CAST('01.01.2010' AS DATETIME) <= CAST(b.dErstellt AS DATETIME)
AND b.cType = 'B'
AND p.tBestellung_kBestellung = b.kBestellung
AND a.kArtikel = p.tArtikel_kArtikel
AND l.kLieferant = la.tLieferant_KLieferant
AND a.kArtikel = la.tArtikel_kArtikel
AND gk1.tLieferant_kLieferant = l.kLieferant
GROUP BY l.kLieferant, cFirma, gk1.fFaktor
ORDER BY Verbindlichkeiten DESC, Lieferant

Related

Improve SQL query performance. UNION vs OR in this situation

Problem
So the situation that I am facing here with this SQL Query, is that it is taking about 12 seconds to run making the screen super slow.
Goal
Do the necessary changes in order to improve the performance and make it faster. I was thinking about instead of the OR in the Where clause to use the UNION?
SELECT Tool.*, Interview.*
FROM Tool
INNER JOIN Interview ON Interview.Id = Tool.InterviewId
WHERE (Tool.ToolTypeId = #ToolTypeId
AND Tool.Is_Active = 1
AND Tool.InterviewId = #InterviewId
AND Tool.ToolId = #ToolId
AND Tool.CustomerId = #CustomerId)
OR Tool.Id = (
SELECT TOP 1 SubTool.Id
FROM Tool SubTool
INNER JOIN Interview subInterview ON subInterview.Id = SubTool.ToolId
WHERE SubTool.ToolTypeId = #ToolTypeId
AND SubTool.Is_Active = 1
AND SubTool.InterviewId != #InterviewId
AND SubTool.ToolId = #ToolId
AND subTool.CustomerId = #CustomerId
AND convert(datetime, subTool.DateTime, 120) < #ToolDateTime
ORDER BY subTool.DateTime DESC, subTool.StartDate DESC,
subTool.EndDate, subTool.Id DESC
)
ORDER BY Tool.StartDate, Tool.Id
NOTE: I believe the actual query output is not necessary in this case, since we are looking for some structural issues that might be impacting the performance.

I would suggest rephrasing the query to eliminate the subquery in the WHERE clause.
If you are looking for one row in the result set regardless of conditions, you can use:
SELECT TOP (1) Tool.*, Interview.*
FROM Tool JOIN
Interview
ON Interview.Id = Tool.InterviewId
WHERE Tool.ToolTypeId = #ToolTypeId AND
Tool.Is_Active = 1
Tool.ToolId = #ToolId AND
Tool.CustomerId = #CustomerId AND
(Tool.InterviewId = #InterviewId OR
convert(datetime, Tool.DateTime, 120) < #ToolDateTime
)
ORDER BY (CASE WHEN Tool.InterviewId = #InterviewId THEN 1 ELSE 2 END),
Tool.DateTime DESC, sTool.StartDate DESC, Tool.EndDate, Tool.Id DESC;
Your final ORDER BY suggests that you are expecting more than one row for the first condition. So, you can use a subquery and window functions:
SELECT ti.*
FROM (SELECT Tool.*, Interview.*,
COUNT(*) OVER (PARTITION BY (CASE WHEN Tool.InterviewId = #InterviewId THEN 1 ELSE 0 END)) as cnt_interview_match,
ROW_NUMBER() OVER (ORDER BY subTool.DateTime DESC, subTool.StartDate DESC, subTool.EndDate, subTool.Id DESC) as seqnum
FROM Tool JOIN
Interview
ON Interview.Id = Tool.InterviewId
WHERE Tool.ToolTypeId = #ToolTypeId AND
Tool.Is_Active = 1
Tool.ToolId = #ToolId AND
Tool.CustomerId = #CustomerId AND
(Tool.InterviewId = #InterviewId OR
convert(datetime, Tool.DateTime, 120) < #ToolDateTime
)
) ti
WHERE InterviewId = #InterviewId OR
(cnt_interview_match = 0 AND seqnum = 1);
Note that the subquery requires that the columns have different names, so you might need to fiddle with that.
Then, you want an index on TOOL(ToolTypeId, Is_Active, ToolId, CustomerId, InterviewId, DateTime). I assume that Interview(Id) is already indexed as the primary key of the table.

My interpretation of what your query does (and by inference, what you want it to do) is different from #GordonLinoff's.
Gordon's queries could be paraphrased as...
If there are rows with Tool.InterviewId = #InterviewId...
Return those rows and only those rows
If there are no such rows to return...
Return the latest row for other InterviewId's
My understanding of what your query actually does is that you want both possibilities returned together, at all times.
This ambiguity is an example of why you really should include example data.
For example, this would be a re-wording or your existing query (using UNION as per your own suggestion)...
WITH
filter_tool_customer AS
(
SELECT Tool.*, Interview.*
FROM Tool
INNER JOIN Interview ON Interview.Id = Tool.InterviewId
WHERE Tool.ToolTypeId = #ToolTypeId
AND Tool.Is_Active = 1
AND Tool.ToolId = #ToolId
AND Tool.CustomerId = #CustomerId
),
matched AS
(
SELECT *
FROM filter_tool_customer
WHERE InterviewId = #InterviewId
),
mismatched AS
(
SELECT TOP 1 *
FROM filter_tool_customer
WHERE InterviewId <> #InterviewId
AND DateTime < CONVERT(VARCHAR(20), #ToolDateTime, 120)
ORDER BY DateTime DESC,
StartDate DESC,
EndDate,
Id DESC
),
combined AS
(
SELECT * FROM matched
UNION ALL
SELECT * FROM mismatched
)
SELECT * FROM combined ORDER BY StartDate, Id

How to select a single row for each unique ID

SQL novice here learning on the job, still a greenhorn. I have a problem I don't know how to overcome. Using IBM Netezza and Aginity Workbench.
My current output will try to return one row per case number based on when a task was created. It will only keep the row with the newest task. This gets me about 85% of the way there. The issue is that sometimes multiple tasks have a create day of the same day.
I would like to incorporate Task Followup Date to only keep the newest row if there are multiple rows with the same Case Number. I posted an example of what my current code outputs and what i would like it to output.
Current code
SELECT
A.PS_CASE_ID AS Case_Number
,D.CASE_TASK_TYPE_NM AS Task
,C.TASK_CRTE_TMS
,C.TASK_FLWUP_DT AS Task_Followup_Date
FROM VW_CC_CASE A
INNER JOIN VW_CASE_TASK C ON (A.CASE_ID = C.CASE_ID)
INNER JOIN VW_CASE_TASK_TYPE D ON (C.CASE_TASK_TYPE_ID = D.CASE_TASK_TYPE_ID)
INNER JOIN ADMIN.VW_RSN_CTGY B ON (A.RSN_CTGY_ID = B.RSN_CTGY_ID)
WHERE
(A.PS_Z_SPSR_ID LIKE '%EFT' OR A.PS_Z_SPSR_ID LIKE '%CRDT')
AND CAST(A.CASE_CRTE_TMS AS DATE) >= '2020-01-01'
AND B.RSN_CTGY_NM = 'Chargeback Initiation'
AND CAST(C.TASK_CRTE_TMS AS DATE) = (SELECT MAX(CAST(C2.TASK_CRTE_TMS AS DATE)) from VW_CASE_TASK C2 WHERE C2.CASE_ID = C.CASE_ID)
GROUP BY
A.PS_CASE_ID
,D.CASE_TASK_TYPE_NM
,C.TASK_CRTE_TMS
,C.TASK_FLWUP_DT
Current output
Desired output

You could use ROW_NUMBER here:
WITH cte AS (
SELECT DISTINCT A.PS_CASE_ID AS Case_Number, D.CASE_TASK_TYPE_NM AS Task,
C.TASK_CRTE_TMS, C.TASK_FLWUP_DT AS Task_Followup_Date,
ROW_NUMBER() OVER (PARTITION BY A.PS_CASE_ID ORDER BY C.TASK_FLWUP_DT DESC) rn
FROM VW_CC_CASE A
INNER JOIN VW_CASE_TASK C ON A.CASE_ID = C.CASE_ID
INNER JOIN VW_CASE_TASK_TYPE D ON C.CASE_TASK_TYPE_ID = D.CASE_TASK_TYPE_ID
INNER JOIN ADMIN.VW_RSN_CTGY B ON A.RSN_CTGY_ID = B.RSN_CTGY_ID
WHERE (A.PS_Z_SPSR_ID LIKE '%EFT' OR A.PS_Z_SPSR_ID LIKE '%CRDT') AND
CAST(A.CASE_CRTE_TMS AS DATE) >= '2020-01-01' AND
B.RSN_CTGY_NM = 'Chargeback Initiation' AND
CAST(C.TASK_CRTE_TMS AS DATE) = (SELECT MAX(CAST(C2.TASK_CRTE_TMS AS DATE))
FROM VW_CASE_TASK C2
WHERE C2.CASE_ID = C.CASE_ID)
)
SELECT
Case_Number,
Task,
TASK_CRTE_TMS,
Task_Followup_Date
FROM cte
WHERE rn = 1;

One method used window functions:
with cte as (
< your query here >
)
select x.*
from (select cte.*,
row_number() over (partition by case_number, Task_Followup_Date
order by TASK_CRTE_TMS asc
) as seqnum
from cte
) x
where seqnum = 1;

Oracle SQL Group by Clause

I would like to write a sql to get top 5 table space storage metric. Below query gives the metric about all tbspaces. Appreciate if someone fine tune this to have only top N
SELECT
ts.tablespace_name AS TBNAME,
round((ts.tablespace_size/1024/1024),2) AS SIZE_MB,
round((ts.tablespace_used_size/1024/1024),2) AS USED_MB,
round(((ts.tablespace_size - ts.tablespace_used_size)/1024/1024),2) AS FREE_MB
FROM
mgmt$db_tablespaces ts,
(SELECT d.target_guid, d.tablespace_name, count(d.file_name) df_count,
sum(decode(d.autoextensible, 'YES', 1, 0)) auto_extend
FROM mgmt$db_datafiles d, mgmt$target t
WHERE t.target_guid = '<id>' AND
(t.target_type='rac_database' OR
(t.target_type='oracle_database' AND t.TYPE_QUALIFIER3 != 'RACINST')) AND
t.target_guid = d.target_guid
GROUP BY d.target_guid, d.tablespace_name) df
WHERE
ts.target_guid = df.target_guid AND
df.tablespace_name = ts.tablespace_name
ORDER BY ts.tablespace_size;`
Thanks

You can use the ROWNUM. Oracle applies rownum to the result after it has been returned.
You need to filter the result after it has been returned, so a subquery is required. You can also use RANK() function to get Top-N results.
SELECT
*
FROM
(
SELECT
ts.tablespace_name AS TBNAME,
round((ts.tablespace_size/1024/1024),2) AS SIZE_MB,
round((ts.tablespace_used_size/1024/1024),2) AS USED_MB,
round(((ts.tablespace_size - ts.tablespace_used_size)/1024/1024),2) AS FREE_MB
FROM
mgmt$db_tablespaces ts,
(SELECT d.target_guid, d.tablespace_name, count(d.file_name) df_count,
sum(decode(d.autoextensible, 'YES', 1, 0)) auto_extend
FROM mgmt$db_datafiles d, mgmt$target t
WHERE t.target_guid = '<id>' AND
(t.target_type='rac_database' OR
(t.target_type='oracle_database' AND t.TYPE_QUALIFIER3 != 'RACINST')) AND
t.target_guid = d.target_guid
GROUP BY d.target_guid, d.tablespace_name) df
WHERE
ts.target_guid = df.target_guid AND
df.tablespace_name = ts.tablespace_name
ORDER BY ts.tablespace_size
)
WHERE ROWNUM <= 5;

Fastest way to check if the the most recent result for a patient has a certain value

Mssql < 2005
I have a complex database with lots of tables, but for now only the patient table and the measurements table matter.
What I need is the number of patient where the most recent value of 'code' matches a certain value. Also, datemeasurement has to be after '2012-04-01'. I have fixed this in two different ways:
SELECT
COUNT(P.patid)
FROM T_Patients P
WHERE P.patid IN (SELECT patid
FROM T_Measurements M WHERE (M.code ='xxxx' AND result= 'xx')
AND datemeasurement =
(SELECT MAX(datemeasurement) FROM T_Measurements
WHERE datemeasurement > '2012-01-04' AND patid = M.patid
GROUP BY patid
GROUP by patid)
AND:
SELECT
COUNT(P.patid)
FROM T_Patient P
WHERE 1 = (SELECT TOP 1 case when result = 'xx' then 1 else 0 end
FROM T_Measurements M
WHERE (M.code ='xxxx') AND datemeasurement > '2012-01-04' AND patid = P.patid
ORDER by datemeasurement DESC
)
This works just fine, but it makes the query incredibly slow because it has to join the outer table on the subquery (if you know what I mean). The query takes 10 seconds without the most recent check, and 3 minutes with the most recent check.
I'm pretty sure this can be done a lot more efficient, so please enlighten me if you will :).
I tried implementing HAVING datemeasurment=MAX(datemeasurement) but that keeps throwing errors at me.

So my approach would be to write a query just getting all the last patient results since 01-04-2012, and then filtering that for your codes and results. So something like
select
count(1)
from
T_Measurements M
inner join (
SELECT PATID, MAX(datemeasurement) as lastMeasuredDate from
T_Measurements M
where datemeasurement > '01-04-2012'
group by patID
) lastMeasurements
on lastMeasurements.lastmeasuredDate = M.datemeasurement
and lastMeasurements.PatID = M.PatID
where
M.Code = 'Xxxx' and M.result = 'XX'

The fastest way may be to use row_number():
SELECT COUNT(m.patid)
from (select m.*,
ROW_NUMBER() over (partition by patid order by datemeasurement desc) as seqnum
FROM T_Measurements m
where datemeasurement > '2012-01-04'
) m
where seqnum = 1 and code = 'XXX' and result = 'xx'
Row_number() enumerates the records for each patient, so the most recent gets a value of 1. The result is just a selection.

Joining two SELECT statements using Outer Join with multiple alias

I have a complicated select statement that when it is executed, I give it a date or date range I want and the output comes out. The problem is I don't know how to join the same SQL statement with the first Statement having 1 date range and the second statement having another data range. Example below:
When I execute the select statement, I choose for the Month of November:
EMPLID NAME Current_Gross_Hours(November)
When I execute the select statement again, I choose from January to November:
EMPLID NAME Year_To_Date_Hours(January - November)
What I want:
EMPLID NAME Current_Gross_Hours(November) Year_To_Date_Hours(January - November)
The SQL Select statement runs correctly if execute by themselves. But I don't know how to join them.
Here is the SQL code that I want to write, but I don't know how to write the SQL statement correctly. Any help or direction is greatly appreciated.
(SELECT DISTINCT
SUM("PSA"."AL_HOURS") AS "Current Gross Hours", "PSJ"."EMPLID","PSP"."NAME"
FROM
"PS_JOB" "PSJ", "PS_EMPLOYMENT" "PSE", "PS_PERSONAL_DATA" "PSP", "PS_AL_CHK_HRS_ERN" "PSA"
WHERE
((("PSA"."CHECK_DT" = TO_DATE('2011-11-01', 'YYYY-MM-DD')) AND
("PSJ"."PAYGROUP" = 'SK2') AND
(("PSJ"."EFFSEQ"= (
SELECT MAX("INNERALIAS"."EFFSEQ")
FROM "PS_JOB" INNERALIAS
WHERE "INNERALIAS"."EMPL_RCD_NBR" = "PSJ"."EMPL_RCD_NBR"
AND "INNERALIAS"."EMPLID" = "PSJ"."EMPLID"
AND "INNERALIAS"."EFFDT" = "PSJ"."EFFDT")
AND
"PSJ"."EFFDT" = (
SELECT MAX("INNERALIAS"."EFFDT")
FROM "PS_JOB" INNERALIAS
WHERE "INNERALIAS"."EMPL_RCD_NBR" = "PSJ"."EMPL_RCD_NBR"
AND "INNERALIAS"."EMPLID" = "PSJ"."EMPLID"
AND "INNERALIAS"."EFFDT" <= SYSDATE)))))
AND
("PSJ"."EMPLID" = "PSE"."EMPLID" ) AND ("PSJ"."EMPLID" = "PSP"."EMPLID" ) AND ("PSJ"."FILE_NBR" = "PSA"."FILE_NBR" ) AND ("PSJ"."PAYGROUP" = "PSA"."PAYGROUP" ) AND ("PSE"."EMPLID" = "PSP"."EMPLID" )
GROUP BY
"PSJ"."EMPLID", "PSP"."NAME"
) AS "Q1"
LEFT JOIN
(SELECT DISTINCT
SUM("PSA"."AL_HOURS") AS "YEAR_TO_DATE Gross Hours", "PSJ"."EMPLID"
FROM
"PS_JOB" "PSJ", "PS_EMPLOYMENT" "PSE", "PS_PERSONAL_DATA" "PSP", "PS_AL_CHK_HRS_ERN" "PSA"
WHERE
((("PSA"."CHECK_DT" BETWEEN TO_DATE('2011-01-01', 'YYYY-MM-DD') AND TO_DATE('2011-11-01', 'YYYY-MM-DD')) AND
("PSJ"."PAYGROUP" = 'SK2') AND
(("PSJ"."EFFSEQ"= (
SELECT MAX("INNERALIAS"."EFFSEQ")
FROM "PS_JOB" INNERALIAS
WHERE "INNERALIAS"."EMPL_RCD_NBR" = "PSJ"."EMPL_RCD_NBR"
AND "INNERALIAS"."EMPLID" = "PSJ"."EMPLID"
AND "INNERALIAS"."EFFDT" = "PSJ"."EFFDT")
AND
"PSJ"."EFFDT" = (
SELECT MAX("INNERALIAS"."EFFDT")
FROM "PS_JOB" INNERALIAS
WHERE "INNERALIAS"."EMPL_RCD_NBR" = "PSJ"."EMPL_RCD_NBR"
AND "INNERALIAS"."EMPLID" = "PSJ"."EMPLID"
AND "INNERALIAS"."EFFDT" <= SYSDATE)))))
AND
("PSJ"."EMPLID" = "PSE"."EMPLID" ) AND ("PSJ"."EMPLID" = "PSP"."EMPLID" ) AND ("PSJ"."FILE_NBR" = "PSA"."FILE_NBR" ) AND ("PSJ"."PAYGROUP" = "PSA"."PAYGROUP" ) AND ("PSE"."EMPLID" = "PSP"."EMPLID" )
GROUP BY
"PSJ"."EMPLID"
) AS "Q2"
ON "Q1"."EMPLID"="Q2"."EMPLID"
ORDER BY
"Q1"."NAME"

You are missing a SELECT ... FROM at the start. The AS keyword only works when creating column aliases, not query block aliases. Quotation marks are only necessary for case-sensitive column names - they don't look incorrect in your example but they frequently cause mistakes.
SELECT Q1.NAME, ...
FROM
(
SELECT ...
) Q1
JOIN
(
SELECT ...
) Q2
ON Q1.EMPLID=Q2.EMPLID
ORDER BY Q1.NAME

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Howto get newest dataset with SQL join? - sql

You need to use a correlated subquery, for example add the following: WHERE gk1.DATUM = (SELECT MAX(SUB.DATUM) FROM tGrundkondition SUB WHERE SUB.tLieferant_kLieferant = l.kLieferant) I am not sure this is 100% correct because I don't know your table structure, but it should give you an idea.

Related

Improve SQL query performance. UNION vs OR in this situation

How to select a single row for each unique ID

Oracle SQL Group by Clause

Fastest way to check if the the most recent result for a patient has a certain value

Joining two SELECT statements using Outer Join with multiple alias

Categories

Resources