Efficient SQL Query Design

Efficient SQL Query Design - sql

What is generally considered the most efficient way to do this type of query?
We have a database of 10 years worth of laboratory data and we would like to select out performance data for various tests. This query for example will select the number of hours its taken to do a test and calculate an average turnaround time and allow us to plot a sparkline of avg TAT per day.
Say we have 100 test names is it acceptable in terms of performance to iterate over the test names in a loop and fire this query off once per loop? Or is there a more efficient way?
SELECT
Date_Authorised_Index.Date_Authorised
, Result_Set.Date_Booked_In
, avg(DATEDIFF('hh',Result_Set.Date_Time_Booked_In,Result_Set.Date_Time_Authorised)) as HrsIn
, count(Date_Authorised_Index.Date_Authorised) as numbers
, Date_Authorised_Index.Registration_Number
, Date_Authorised_Index.Request_Row_ID
, Date_Authorised_Index.Specimen_Number
, Result_Set.Authorised_By
, Result_Set.Namespace
, Result_Set.Set_Code
, Result_Set.Date_Time_Authorised
, Request.Date_Time_Received
, Request.Location
FROM
Date_Authorised_Index Date_Authorised_Index
, Result_Set Result_Set
, Request
WHERE
Date_Authorised_Index.Date_Authorised = Result_Set.Date_Authorised
AND Date_Authorised_Index.Request_Row_ID = Request.Request_Row_ID
AND Date_Authorised_Index.Request_Row_ID = Result_Set.Request_Row_ID
AND (Date_Authorised_Index.Discipline='C') AND Result_Set.Set_Code=?
GROUP BY
Result_Set.Date_Booked_In

For starters I would rewrite this query so it uses explicit join syntax.
Also even though MySQL does not force you to restate every non-aggregate column in the group by clause that doesn't mean that's a good thing.
Unless the Result_Set.Date_Booked_In uniquely identifies a row, you are selecting random values from a multiple of rows.
SELECT
dai.Date_Authorised
, rs.Date_Booked_In
, avg(DATEDIFF('hh',rs.Date_Time_Booked_In,rs.Date_Time_Authorised)) as HrsIn
, count(dai.Date_Authorised) as numbers
, dai.Registration_Number
, dai.Request_Row_ID
, dai.Specimen_Number
, rs.Authorised_By
, rs.Namespace
, rs.Set_Code
, rs.Date_Time_Authorised
, r.Date_Time_Received
, r.Location
FROM
Date_Authorised_Index dai
INNER JOIN Result_Set rs ON (dai.Date_Authorised = rs.Date_Authorised
AND dai.Request_Row_ID = rs.Request_Row_ID)
INNER JOIN Request R ON (dai.Request_Row_ID = r.Request_Row_ID)
WHERE
(dai.Discipline= 'C') AND rs.Set_Code=?
GROUP BY
rs.Date_Booked_In
If you want to select a 100 rows in one go, just make a new table with the set_codes you want to select and join against that.
Make sure you index the field sc.set_code (or better yet make it the primary key)
SELECT lots_of_columns
FROM table1 as dai
INNER JOIN table2 as rs ON (what you joined on before)
INNER JOIN table3 as r ON (same here)
INNER JOIN Setcodes as sc ON (sc.Set_code = rs.SetCode) <<-- extra join.
WHERE
dai.discipline = 'C'
GROUP BY rs.Date_Booked_In
Or you can even use a `IN (...) like below, although that will propably be slower than a join.
SELECT lots_of_columns
FROM table1 as dai
INNER JOIN table2 as rs ON (what you joined on before)
INNER JOIN table3 as r ON (same here)
WHERE
dai.discipline = 'C' AND rs.Set_Code IN (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
GROUP BY rs.Date_Booked_In

Related

Error in nested SQL statement

Can someone help me fix this SQL statement? I have 2 tables... trying to get a list of all records in table 1 (c) along with a count (if any) of matching records in table 2 (cp_docs).
SELECT TOP 100 c.cal_procedure ,
c.description ,
c.active ,
c.create_user ,
c.create_date ,
c.edit_user ,
c.edit_date ,
c.id,
cp_docs.cpd
FROM cal_procedure c
OUTTER JOIN (select cal_procedure as cp, count(id) as cpd
from cal_procedure_doc
group by cal_procedure) cp_docs
ON cp_docs.cp = c.cal_procedure
Thanks,
Tracy

Hard to say without the error message but your outer join has a couple issues.
OUTER is incorrectly written at OUTTER
Your OUTER keyword needs to be prefixed with LEFT OR RIGHT. With the logic in your query you want likely want LEFT
Fixed SQL:
SELECT TOP 100 c.cal_procedure ,
c.description ,
c.active ,
c.create_user ,
c.create_date ,
c.edit_user ,
c.edit_date ,
c.id,
cp_docs.cpd
FROM cal_procedure c
LEFT OUTER JOIN (select cal_procedure as cp, count(id) as cpd
from cal_procedure_doc
group by cal_procedure) cp_docs
ON cp_docs.cp = c.cal_procedure
Now in your query, you could get null values in the cpd column if there were no values in the cal_prodcedure_doc table. If you look at Max's answer, you would get 0's instead. If you wanted to use your current approach but have the zero's display you would need to wrap cp_docs.cpd in a coalesce function
coalesce(cp_docs.cpd, 0)
In the end I think Max's answer is easier to read and probably the way I would write this query as I think it's easier to read. If the tables are huge you may want to check how each performs to see one is better than the other.

You can just add a subquery to the SELECT clause. It's cleaner than joining a temp table. If you try to read someone else's query to figure out how a calculation is done, you'll start with the SELECT statement. If the select statement points you to a table alias (e.g. cp_docs), you need to find the table in the FROM clause... etc. The execution plans are almost identical; the proposed SELECT clause subquery actually eliminates one innocuous Compute Scaler step.
SELECT c.cal_procedure ,
c.description ,
c.active ,
c.create_user ,
c.create_date ,
c.edit_user ,
c.edit_date ,
c.id,
(SELECT COUNT(*) FROM cal_procedure_docs where cal_procedure = c.cal_procedure) AS cpd
FROM cal_procedure c

Perhaps you want outer apply :
SELECT TOP 100 c.cal_procedure, c.description, c.active, c.create_user,
c.create_date, c.edit_user, c.edit_date, c.id, cp_docs.cpd
FROM cal_procedure c OUTER APPLY
(select count(id) as cpd
from cal_procedure_doc
where cal_procedure = c.cal_procedure
) cp_docs
ORDER BY ? ? ? ;

Matching values from two columns with the values of two columns in a sub-query

I have a problem I can't really figure out, even though I thought I had the solution.
I think this is DB2 SQL by the way.
I have a customer number and a country code (extracted from a string using SUBSTR) which I don't want to find in combination in a subquery, like so:
SELECT ku.orgnr AS customer ,
Substr(bu.bank_account_swiftadr,5,2) AS country
FROM db811.bet_utl bu
LEFT JOIN db811.henv_utl bh
ON bu.betaling_urn = bh.betaling_urn
LEFT JOIN db811.betaling_status bs
ON bu.betaling_status = bs.betaling_status
LEFT JOIN db811.kunde_orgnr ku
ON bu.kundenr = ku.kundenr
WHERE bu.kanal = 'N'
AND (ku.orgnr, Substr(bu.bank_account_swiftadr,5,2)) ;
Not in the results below
SELECT ku.orgnr AS customer ,
Substr(bu.bank_account_swiftadr,5,2) AS country ,
COUNT(*) AS numberof
FROM db811.bet_utl_hist bu
LEFT JOIN db811.kunde_orgnr ku
ON bu.kundenr = ku.kundenr
WHERE
and bu.kanal = 'N'
AND bu.betalingsdato > '2016-01-01'
GROUP BY ku.orgnr ,
substr(bu.bank_account_swiftadr,5,2);
This should work I though, but it seems to match on just one of them, and I need both to be true in order for me to exclude it with the NOT IN.
I assume I am missing something basic since I am quite new at this.

My inner join resulted in the repetition of many rows

I wrote the following query:
SELECT
CAN.Cycle
, CAN.FECCandID
, CAN.CID
, CAN.FirstLastP
, CAN.Party
, CAN.DistIDRunFor
, CAN.DistIDCurr
, CAN.CurrCand
, CAN.CycleCand
, CAN.CRPICO
, CAN.RecipCode
, CAN.NoPacs
FROM Cands16 AS CAN
JOIN MercerRobert_Indivs AS MER
ON CAN.CID = MER.RecipID
My goal was to return every row from the Cands16 in which CID = RecipID. This was the result:
While the MER table does have rows with multiple incidences of the same value for RecipID, every incidence of CID in the Cands16 table is unique. I do not want these duplicate rows resulting from my query. So what should I do? I am using SQL Server 2016 Management Studio.

Seeing you do not use any columns from MER, it seems you just want to know whether there exists such id in MER. So the easiest would be to remove the join:
select
CAN.Cycle
, CAN.FECCandID
, CAN.CID
, CAN.FirstLastP
, CAN.Party
, CAN.DistIDRunFor
, CAN.DistIDCurr
, CAN.CurrCand
, CAN.CycleCand
, CAN.CRPICO
, CAN.RecipCode
, CAN.NoPacs
from Cands16 CAN
where exists (
select *
from MercerRobert_Indivs MER
where CAN.CID = MER.RecipID
) ;

Your example SQL uses a simple join.
SQL Server (Transact-SQL) JOINS are used to retrieve data from multiple tables. A SQL Server JOIN is performed whenever two or more tables are joined in a SQL statement.
There are 4 different types of SQL Server joins:
SQL Server INNER JOIN (or sometimes called simple join)
SQL Server LEFT OUTER JOIN (or sometimes called LEFT JOIN)
SQL Server RIGHT OUTER JOIN (or sometimes called RIGHT JOIN)
SQL Server FULL OUTER JOIN (or sometimes called FULL JOIN)
See this link for more details and explanations of the different joins you can use.

Try:
SELECT
CAN.Cycle
, CAN.FECCandID
, CAN.CID
, CAN.FirstLastP
, CAN.Party
, CAN.DistIDRunFor
, CAN.DistIDCurr
, CAN.CurrCand
, CAN.CycleCand
, CAN.CRPICO
, CAN.RecipCode
, CAN.NoPacs
FROM Cands16 AS CAN
-- the same RecipID can occur several times in MercerRobert_Indivs
-- so make sure this is not the case before joining ...
JOIN (SELECT DISTINCT RecipID FROM MercerRobert_Indivs) AS MER
ON CAN.CID = MER.RecipID

SQL: SELECT DISTINCT not returning distinct values

The code below is supposed to return unique records in the lp_num field from the subquery to then be used in the outer query, but I am still getting multiples of the lp_num field. A ReferenceNumber can have multiple ApptDate records, but each lp_num can only have 1 rf_num. That's why I tried to retrieve unique lp_num records all the way down in the subquery, but it doesn't work. I am using Report Builder 3.0.
Current Output
Screenshot
The desired output would be to have only unique records in the lp_num field. This is because each value in the lp_num field is a pallet, one single pallet. the info to the right is when it arrived (ApptDate) and what the reference number is for the delivery (ref_num). Therefore, it makes no sense for a pallet to have multiple receipt dates...it can only arrive once...
SELECT DISTINCT
dbo.ISW_LPTrans.item,
dbo.ISW_LPTrans.lot,
dbo.ISW_LPTrans.trans_type,
dbo.ISW_LPTrans.lp_num,
dbo.ISW_LPTrans.ref_num,
(MIN(CONVERT(VARCHAR(10),dbo.CW_CheckInOut.ApptDate,101))) as appt_date_only,
dbo.CW_CheckInOut.ApptTime,
dbo.item.description,
dbo.item.u_m,
dbo.ISW_LPTrans.qty,
(CASE
WHEN dbo.ISW_LPTrans.trans_type = 'F'
THEN 'Produced internally'
ELSE
(CASE
WHEN dbo.ISW_LPTrans.trans_type = 'R'
THEN 'Received from outside'
END)
END
) as original_source
FROM
dbo.ISW_LPTrans
INNER JOIN dbo.CW_Dock_Schedule ON LTRIM(RTRIM(dbo.ISW_LPTrans.ref_num)) = dbo.CW_Dock_Schedule.ReferenceNumber
INNER JOIN dbo.CW_CheckInOut ON dbo.CW_CheckInOut.TruckID = dbo.CW_Dock_Schedule.TruckID
INNER JOIN dbo.item ON dbo.item.item = dbo.ISW_LPTrans.item
WHERE
(dbo.ISW_LPTrans.trans_type = 'R') AND
--CONVERT(VARCHAR(10),dbo.CW_CheckInOut.ApptDate,101) <= CONVERT(VARCHAR(10),dbo.ISW_LPTrans.trans_date,101) AND
dbo.ISW_LPTrans.lp_num IN
(SELECT DISTINCT
dbo.ISW_LPTrans.lp_num
FROM
dbo.ISW_LPTrans
INNER JOIN dbo.item ON dbo.ISW_LPTrans.item = dbo.item.item
INNER JOIN dbo.job ON dbo.ISW_LPTrans.ref_num = dbo.job.job AND dbo.ISW_LPTrans.ref_line_suf = dbo.job.suffix
WHERE
(dbo.ISW_LPTrans.trans_type = 'W' OR dbo.ISW_LPTrans.trans_type = 'I') AND
dbo.ISW_LPTrans.ref_num IN
(SELECT
dbo.ISW_LPTrans.ref_num
FROM
dbo.ISW_LPTrans
--INNER JOIN dbo.ISW_LPTrans on dbo.ISW_LPTrans.
WHERE
dbo.ISW_LPTrans.item LIKE #item AND
dbo.ISW_LPTrans.lot LIKE #lot AND
dbo.ISW_LPTrans.trans_type = 'F'
GROUP BY
dbo.ISW_LPTrans.ref_num
) AND
dbo.ISW_LPTrans.ref_line_suf IN
(SELECT
dbo.ISW_LPTrans.ref_line_suf
FROM
dbo.ISW_LPTrans
--INNER JOIN dbo.ISW_LPTrans on dbo.ISW_LPTrans.
WHERE
dbo.ISW_LPTrans.item LIKE #item AND
dbo.ISW_LPTrans.lot LIKE #lot AND
dbo.ISW_LPTrans.trans_type = 'F'
GROUP BY
dbo.ISW_LPTrans.ref_line_suf
)
GROUP BY
dbo.ISW_LPTrans.lp_num
HAVING
SUM(dbo.ISW_LPTrans.qty) < 0
)
GROUP BY
dbo.ISW_LPTrans.item,
dbo.ISW_LPTrans.lot,
dbo.ISW_LPTrans.trans_type,
dbo.ISW_LPTrans.lp_num,
dbo.ISW_LPTrans.ref_num,
dbo.CW_CheckInOut.ApptDate,
dbo.CW_CheckInOut.ApptTime,
dbo.item.description,
dbo.item.u_m,
dbo.ISW_LPTrans.qty
ORDER BY
dbo.ISW_LPTrans.lp_num

In a nutshell - the way you use DISTINCT is logically wrong from SQL perspective.
Your DISTINCT is in an IN subquery in the WHERE clause - and at that point of code it has absolutely no effect (except from the performance penalty). Think on it - if the outer query returns non-unique values of dbo.ISW_LPTrans.lp_num (which obvioulsy happens) those values can still be within the distinct values of the IN subquery - the IN does not enforce a 1-to-1 match, it only enforces the fact that the outer query values are within the inner values, but they can match multiple times. So it is definitely not DISTINCT's fault.
I would go through the following check steps:
See if there is insufficient JOIN ON condition(s) in the outer FROM section that leads to data multiplication (e.g. if a table has primary-to-foreign key relation on several columns, but you join on one of them only etc.).
Check which of the sources contains non-distinct records in the outer FROM section - then either cleanse your source, or adjust the JOIN condition and / or the WHERE clause so that you only pick distinct & correct records. In fact you might need to SELECT DISTINCT in the FROM sections - there it would make much more sense.

Should I make a table and insert or union all

I have to create a report that measures the total credits received on an application. My problem is that I have 4 separate products with varying criteria in the where clause that don't allow me to make just one query and be done with the dataset. I originally was going to make 4 separate queries and then union join them all together.
I don't know if I'm limiting myself b/c of my skillset and I'm wondering if a Union Join is the best approach here. Should I make a table instead and insert this data into it. Then use that table for my report? Instead of making 4 separate queries and unioning them together?
Here's a snippet of one of the q's. Each of the other three are similar but different. I'm not specifically searching for help with this code, as much as I'm looking for the concept I should apply to complete the task.
SELECT distinct
w.application_id,
w.product_id,
X.scenario_id,
X.history_id,
x.is_applied,
product.product_name,
475 as total_fee_amount,
w.status,
w.funding_status,
w.amt_requested
FROM FEE INNER JOIN
(select * from APP
where ((status = 'W') or (APP.funding_status = 'F' and APP.status = 'A'
and APP.delete_app <> 1 and amt_requested between '25000' and '150000'))) as w
ON FEE.history_id = w.history_id INNER JOIN
product ON w.product_id = product.product_id INNER JOIN
(select scenario_id, history_id, is_applied from calc_history
where is_applied = '1' AND save_type = '0' ) as X
ON FEE.history_id = X.history_id AND FEE.scenario_id = X.scenario_id
WHERE
product.product_id in ('1064','1053','1065')
I'm using SQL Server 2008

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas