Leverage T-SQL newer capabilities to make query more efficient - sql

I have this T-SQL query written in a very basic and inefficient way:
SELECT
e.ExchangeId,
e.ExchangeName,
s.StockId,
s.StockName,
sp.StockDate,
sp.StockPrice,
DATEADD(YEAR, -3, sp.StockDate) AS YearTDate,
(SELECT spm.StockPrice FROM dbo.StockPrices spm WITH(NOLOCK) WHERE (spm.StockId = sp.StockId) AND (spm.StockDate = DATEADD(YEAR, -3, sp.StockDate))) AS YearTPrice,
(SELECT TOP 1 spm.StockDate FROM dbo.StockPrices spm WITH(NOLOCK) WHERE (spm.StockId = sp.StockId) AND (spm.StockDate = DATEADD(YEAR, -3, sp.StockDate)) ORDER BY spm.StockPriceId) AS LatestDate,
(SELECT TOP 1 spm.StockPrice FROM dbo.StockPrices spm WITH(NOLOCK) WHERE (spm.StockId = sp.StockId) AND (spm.StockDate = DATEADD(YEAR, -3, sp.StockDate)) ORDER BY spm.StockPriceId) AS LatestPrice,
((SELECT TOP 1 spm.StockPrice FROM dbo.StockPrices spm WITH(NOLOCK) WHERE (spm.StockId = sp.StockId) AND (spm.StockDate = DATEADD(YEAR, -3, sp.StockDate)) ORDER BY spm.StockPriceId) - sp.StockPrice) AS PL,
CASE WHEN sp.StockPrice < (SELECT MIN(spm.StockPrice) FROM dbo.StockPrices spm WITH(NOLOCK) WHERE (spm.StockId = sp.StockId) AND (spm.StockDate BETWEEN DATEADD(YEAR, -3, sp.StockDate) AND DATEADD(DAY, -1, sp.StockDate))) THEN 'Opportunity' ELSE 'None' END AS [Status]
FROM dbo.StockPrices sp WITH(NOLOCK)
INNER JOIN dbo.Stocks s WITH(NOLOCK)
ON s.StockId = sp.StockId
INNER JOIN dbo.Exchanges e WITH(NOLOCK)
ON e.ExchangeId = s.ExchangeId
GO
How can I rewrite this query to be more efficient? i.e. using WITH keyword or some other features I might not be aware of.

I would start my moving your 5 subqueries into a single query. Of the 4 TOP (1) queries all but one of them order by StockPriceId, so I'm going to assume it should be the same for YearTPrice (which currently returns an arbitrary row).
For the MIN value, I use a windowed MIN instead.
I also remove the NOLOCK hints as they are clearly being abused. If you "msut" (you don't) need to have the NOLOCK hint against every table in the query then change the isolation level of the tranasction.
SELECT e.ExchangeId,
e.ExchangeName,
s.StockId,
s.StockName,
sp.StockDate,
sp.StockPrice,
DATEADD(YEAR, -3, sp.StockDate) AS YearTDate,
spm.StockPrice AS YearTPrice,
spm.StockDate AS LatestDate,
spm.StockPrice AS LatestPrice,
spm.StockPrice - sp.StockPrice AS PL,
CASE WHEN sp.StockPrice < spm.MinPrice THEN 'Opportunity' ELSE 'None' END AS [Status]
FROM dbo.StockPrices sp
INNER JOIN dbo.Stocks s ON s.StockId = sp.StockId
INNER JOIN dbo.Exchanges e ON e.ExchangeId = s.ExchangeId
--I use an outer apply, as I don't know if a row is guarenteed to be returned
OUTER APPLY (SELECT TOP (1)
dt.StockPrice,
dt.StockDate,
dt.MinPrice
FROM (SELECT ca.StockPriceId,
ca.StockPrice,
ca.StockDate,
MIN(StockPrice) OVER (PARTITION BY ca.StockId) AS MinPrice
FROM dbo.StockPrices ca
WHERE ca.StockId = sp.StockId
AND ca.StockDate = DATEADD(YEAR, -3, sp.StockDate)) dt --This isn't SARGable, so will result in a scan
ORDER BY dt.StockPriceId) spm;
of course, this is all completely untested as no sample data exists, so I have no way of knowing how much this will change your query (or even effect the results as I can't test) but it does reduce 5 or 6 scans of StockPrices down to 1 or 2

Related

Joining two tables on columns that don't equal

I have a equipment table and a downtime table that I am wanting to join, I am wanting to display all the equipment and the downtime hours. If there is no downtime for a certain piece of equipment then I want to display a zero in the rows where value is null. This is what I have below. It only gives me the equipment that has downtime in the other table.
Select a.EquipNbr,
ISNULL(Sum(a.Downtime),0)
From MobileDowntime (nolock) a
Join MblEquip (nolock) b on a.EquipNbr = b.EquipNbr
Where b.DelFlg = 0 and
b.EquipNbr <> 'Clean Shop' and
a.DateTm Between DATEADD(month, DATEDIFF(month, 0, getDate()), 0) and DATEADD(month, DATEDIFF(month, -1, getDate()), -1)
Group By a.EquipNbr
Order by a.EquipNbr Asc
example of what I am trying to accomplish.. But the downtime table on captures data on change so there might not be any downtime for that piece of equipment for the whole month.
66 total pieces of equipment
Equipment / Downtime
1717 57
1723 0
1724 0
1725 50
1728 0
1734 35
1738 0
You want a left join and to move conditions on the MobileDowntime table to the on clause:
Select e.EquipNbr, coalesce(sum(md.Downtime), 0)
From MblEquip e left join
MobileDowntime md
on md.EquipNbr = e.EquipNbr and
md.DateTm between DATEADD(month, DATEDIFF(month, 0, getDate()), 0) and DATEADD(month, DATEDIFF(month, -1, getDate()), -1)
where e.DelFlg = 0 and e.EquipNbr <> 'Clean Shop'
group by e.EquipNbr
order by e.EquipNbr Asc;
Note that I replaced your table aliases (hopefully correctly). a and b are meaningless. Instead, I used abbreviations for the table names.
Final Answer
Select b.EquipNbr, Sum(ISNULL((a.Downtime),0)) From MobileDowntime (nolock) a
RIGHT OUTER Join MblEquip (nolock) b on a.EquipNbr = b.EquipNbr
Where b.DelFlg = 0 and b.EquipNbr != 'Clean Shop'
AND
(
a.datetm is null or
(a.DateTm Between DATEADD(month, DATEDIFF(month, 0, getDate()), 0)
and DATEADD(month, DATEDIFF(month, -1, getDate()), -1) )
)
Group By b.EquipNbr Order by b.EquipNbr Asc
Fiddle: https://dbfiddle.uk/?rdbms=sqlserver_2012&fiddle=cc2c2cce139cda7d7c5878d6c967da34
Step by Step
Step 1:
What you need to do is to use an outer-join, and a function that replaces NULL with zero (that you are doing).
So as a first step you would do the following:
Select b.EquipNbr, ISNULL((a.Downtime),0) From MobileDowntime (nolock) a
RIGHT OUTER Join MblEquip (nolock) b on a.EquipNbr = b.EquipNbr
Step 2: With Group by
Following, you can add the group by to get the following:
Select b.EquipNbr, Sum(ISNULL((a.Downtime),0)) From MobileDowntime (nolock) a
RIGHT OUTER Join MblEquip (nolock) b on a.EquipNbr = b.EquipNbr
Where b.DelFlg = 0 and b.EquipNbr != 'Clean Shop'
Group By b.EquipNbr Order by b.EquipNbr Asc
The final part is the where condition using the dates.
Update
The conversion error I think was because of the numerical comparison != .
I did an experiment and converted the Varchar to Int.
Then I changed the != to not like.
Select b.EquipNbr, Sum(ISNULL((a.Downtime),0)) From MobileDowntime (nolock) a
RIGHT OUTER Join MblEquip (nolock) b on a.EquipNbr = b.EquipNbr
Where b.DelFlg = 0 and b.EquipNbr not like 'Clean Shop'
AND
(
a.datetm is null or
(a.DateTm Between DATEADD(month, DATEDIFF(month, 0, getDate()), 0)
and DATEADD(month, DATEDIFF(month, -1, getDate()), -1) )
)
Group By b.EquipNbr Order by b.EquipNbr Asc
You can use left outer join in which it will show null when there is no downtime hours

I need help improving my SQL query for pulling a recent document count

This is just a portion of the query, but it seems to be the bottleneck:
SELECT CAST (CASE WHEN EXISTS
(SELECT 1
FROM dbo.CBDocument
WHERE (FirmId = R.FirmId) AND
(ContributionDate > DATEADD(m, -3, GETDATE())) AND
((EntityTypeId = 2600 AND EntityId = P.IProductId) OR
(EntityTypeId = 2500 AND EntityId = M.IManagerId)))
THEN 1 ELSE 0 END AS BIT) AS HasRecentDocuments
FROM dbo.CBIProduct P
JOIN dbo.CBIManager M ON P.IManagerId = M.IManagerId
JOIN dbo.CBIProductRating R ON P.IProductId = R.IProductId
JOIN dbo.CBIProductFirmDetail D ON (D.IProductId = P.IProductId) AND
(R.FirmId = D.FirmId)
CROSS APPLY (SELECT TOP 1 RatingDate, IProductRatingId, FirmId
FROM dbo.CBIProductRating
WHERE (IProductId = P.IProductId) AND (FirmId = R.FirmId)
ORDER BY RatingDate DESC) AS RD
WHERE (R.IProductRatingId = RD.IProductRatingId) AND (R.FirmId = RD.FirmId)
There are a lot of other columns that I typically pull back that need the CROSS APPLY and the other joins. The bit I need to optimize is the sub-query in the case statement. This subquery takes over 3 minutes to return 119k records. I know enough about SQL to get this far, but there has to be a way to make this more efficient.
The gist of the query is just to return a flag if the associated product has any documents that have been added to the system within the last 3 months.
Edit: My DB is hosted in Azure and the database tuning advisor won't connect to it. There is a tuning advisor component in Azure, but it's not suggesting anything. There must be a better approach to the query.
Edit: In an attempt to further simplify and determine the culprit, I whittled it down to this query: (Rather than determine if a recent doc exists, it just counts recent docs.)
SELECT D.FirmId, P.IProductId,
,(SELECT COUNT(DocumentId) FROM dbo.CBDocument WHERE
(FirmId = D.FirmId) AND
(ContributionDate > DATEADD(m, -3, GETDATE())) AND
((EntityTypeId = 2600 AND EntityId = P.IProductId) OR
(EntityTypeId = 2500 AND EntityId = M.IManagerId))) AS RecentDocCount
FROM dbo.CBIProduct P
FULL JOIN dbo.CBIProductFirmDetail D ON D.IProductId = P.IProductId
JOIN dbo.CBIManager M ON M.IManagerId = P.IManagerId
That runs in 3 minutes, 53 seconds.
If I declare a variable to store the date (DECLARE #Today DATE = GETDATE())
and put the variable in place of GETDATE() in the query (DATEADD(m, -3, #Today)), it runs in 12 seconds.
Is there a known performance issue with GETDATE()? As far as I know, I can't use the variable in a view definition.
Does this shine any light on anything that could point to a solution? I suppose I could turn the whole thing into a stored procedure, but then I also have to adjust the application code.
Thanks.
This is the query that you claim needs optimization:
SELECT CAST(CASE WHEN EXISTS (SELECT 1
FROM dbo.CBDocument d
WHERE (d.FirmId = R.FirmId) AND
(d.ContributionDate > DATEADD(m, -3, GETDATE())) AND
((d.EntityTypeId = 2600 AND d.EntityId = P.IProductId) OR
(d.EntityTypeId = 2500 AND d.EntityId = M.IManagerId)
)
)
. . .
I'll trust your judgement. I think phrasing the query like this gives you more paths to optimization:
SELECT CAST(CASE WHEN EXISTS (SELECT 1
FROM dbo.CBDocument d
WHERE d.FirmId = R.FirmId AND
d.ContributionDate > DATEADD(m, -3, GETDATE()) AND
d.EntityTypeId = 2600 AND d.EntityId = P.IProductId
) OR
EXISTS (SELECT 1
FROM dbo.CBDocument d
WHERE d.FirmId = R.FirmId AND
d.ContributionDate > DATEADD(m, -3, GETDATE()) AND
d.EntityTypeId = 2500 AND d.EntityId = M.IManagerId
)
. . .
Then you want an index on CBDocument(FirmId, EntityTypeId, EntityId, ContributionDate).
Operations such as correlated subqueries and full outer join are rather expensive and I would suggest looking for alternatives to those. Whilst I am not familiar with your data model or data, I suggest the changing the "from table" to CBIProductFirmDetail and I have further assumed an inner join the product table and the manager table then inner joined to the product table. If that join sequence is correct this removes the expense of some outer joins.
In place of the correlated subquery to determine a count, I suggest you treat that as a subquery which is left joined.
SELECT
d.FirmId
, p.IProductId
, COALESCE(Docs.RecentDocCount,0) RecentDocCount
FROM dbo.CBIProductFirmDetail d
JOIN dbo.CBIProduct p ON d.IProductId = p.IProductId
JOIN dbo.CBIManager m ON p.IManagerId = m.IManagerId
LEFT JOIN (
SELECT
FirmId
, EntityId
, EntityTypeId
, COUNT(DocumentId) recentdoccount
FROM dbo.CBDocument
WHERE ContributionDate > DATEADD(m, -3, GETDATE())
AND EntityTypeId IN (2500,2600)
GROUP BY
FirmId
, EntityId
, EntityTypeId
) AS docs ON d.FirmId = docs.FirmId
AND (
(docs.EntityTypeId = 2600 AND docs.EntityId = p.IProductId)
OR (docs.EntityTypeId = 2500 AND docs.EntityId = m.IManagerId)
)
;
There might be benefit in dividing that subquery too to avoid the awkward OR in that join, so:
SELECT
d.FirmId
, p.IProductId
, COALESCE(d2500.DocCount,0) + COALESCE(d2600.DocCount,0) RecentDocCount
FROM dbo.CBIProductFirmDetail d
JOIN dbo.CBIProduct p ON d.IProductId = p.IProductId
JOIN dbo.CBIManager m ON p.IManagerId = m.IManagerId
LEFT JOIN (
SELECT
FirmId
, EntityId
, COUNT(DocumentId) doccount
FROM dbo.CBDocument
WHERE ContributionDate > DATEADD(m, -3, GETDATE())
AND EntityTypeId = 2500
GROUP BY
FirmId
, EntityId
) AS d2500 ON d.FirmId = d2500.FirmId
AND m.IManagerId = d2500.EntityId
LEFT JOIN (
SELECT
FirmId
, EntityId
, COUNT(DocumentId) doccount
FROM dbo.CBDocument
WHERE ContributionDate > DATEADD(m, -3, GETDATE())
AND EntityTypeId = 2600
GROUP BY
FirmId
, EntityId
) AS d2600 ON d.FirmId = d2600.FirmId
AND p.IProductId = d2600.EntityId
;
Depending on stuff it may be faster to use a left join:
SELECT CAST(CASE when x.FirmId is not null THEN 1 ELSE 0 END AS BIT) AS HasRecentDocuments
FROM dbo.CBIProduct P
JOIN dbo.CBIManager M ON P.IManagerId = M.IManagerId
JOIN dbo.CBIProductRating R ON P.IProductId = R.IProductId
JOIN dbo.CBIProductFirmDetail D ON (D.IProductId = P.IProductId) AND (R.FirmId = D.FirmId)
LEFT JOIN dbo.CBDocument x ON x.FirmId = R.FirmId
AND x.ContributionDate > DATEADD(m, -3, GETDATE())
AND ( (x.EntityTypeId = 2600 AND x.EntityId = P.IProductId)
OR (x.EntityTypeId = 2500 AND x.EntityId = M.IManagerId))
CROSS APPLY (SELECT TOP 1 RatingDate, IProductRatingId, FirmId
FROM dbo.CBIProductRating
WHERE (IProductId = P.IProductId) AND (FirmId = R.FirmId)
ORDER BY RatingDate DESC) AS RD
WHERE (R.IProductRatingId = RD.IProductRatingId) AND (R.FirmId = RD.FirmId)
it certainly looks simpler.

SQL statement causing timeouts

Here is my code that I really need to get revised. I did it in a simple way as I am not a pro in SQL.
SELECT Inv1.AutoIndex, Inv1.DocState, Inv1.OrderNum,
Inv1.ExtOrderNum, dbo.Client.ubARSMS, dbo.Client.Fax1
FROM dbo.InvNum Inv1 INNER JOIN
dbo.Client ON Inv1.AccountID = dbo.Client.DCLink
WHERE (dbo.Client.ubARSMS = 1)
AND (Inv1.OrderDate >= dbo.Client.udARSMSACTDATE)
AND Inv1.OrderNum NOT IN (SELECT o.OrderNum
FROM dbo.net_OrderSMSLog o
WHERE (o.DocState = 4))
AND Inv1.AutoIndex NOT IN(SELECT Inv2.OrigDocID
FROM dbo.InvNum Inv2
WHERE Inv2.OrderNum = Inv1.OrderNum)
AND
(
DATEPART(YEAR, Inv1.InvDate) = DATEPART(YEAR, GETDATE())
AND DATEPART(MONTH, Inv1.InvDate) = DATEPART(MONTH, GETDATE())
AND DATEPART(DAY, Inv1.InvDate) = DATEPART(DAY, GETDATE())
OR
DATEPART(YEAR, Inv1.InvDate) = DATEPART(YEAR,DATEADD(dd,-1,GETDATE()))
AND DATEPART(MONTH, Inv1.InvDate) = DATEPART(MONTH,DATEADD(dd,-1,GETDATE()))
AND DATEPART(DAY, Inv1.InvDate) = DATEPART(DAY,DATEADD(dd,-1,GETDATE()))
)
I need this to work as fast as possible.
This is your query:
SELECT Inv1.AutoIndex, Inv1.DocState, Inv1.OrderNum,
Inv1.ExtOrderNum, c.ubARSMS, c.Fax1
FROM dbo.InvNum Inv1 INNER JOIN
dbo.Client c
ON Inv1.AccountID = c.DCLink AND Inv1.OrderDate >= c.udARSMSACTDATE
WHERE (c.ubARSMS = 1) AND
Inv1.OrderNum NOT IN (SELECT o.OrderNum
FROM dbo.net_OrderSMSLog o
WHERE (o.DocState = 4)
) AND
Inv1.AutoIndex NOT IN (SELECT Inv2.OrigDocID
FROM dbo.InvNum Inv2
WHERE Inv2.OrderNum = Inv1.OrderNum
) OR
(Inv1.InvDate >= CAST(DATEADD(day, -1 GETDATE()) as date) AND
Inv1.InvDate < CAST(GETDATE() as date)
)
This is really two queries, which you can combine using UNION ALL. The first is:
SELECT Inv1.AutoIndex, Inv1.DocState, Inv1.OrderNum,
Inv1.ExtOrderNum, c.ubARSMS, c.Fax1
FROM dbo.InvNum Inv1 INNER JOIN
dbo.Client c
ON Inv1.AccountID = c.DCLink AND Inv1.OrderDate >= c.udARSMSACTDATE
WHERE (c.ubARSMS = 1) AND
Inv1.OrderNum NOT IN (SELECT o.OrderNum
FROM dbo.net_OrderSMSLog o
WHERE (o.DocState = 4)
) AND
Inv1.AutoIndex NOT IN (SELECT Inv2.OrigDocID
FROM dbo.InvNum Inv2
WHERE Inv2.OrderNum = Inv1.OrderNum
)
For this, I would suggest indexes on Client(ubARSMS, DCLink, udARSMACTDate), InvNum(ClientId, OrderNum, AutoIndex), InvNum(OrderNum, OrigDocId), and net_OrderSMSLog(DocState, OrderNum).
For the second query:
SELECT Inv1.AutoIndex, Inv1.DocState, Inv1.OrderNum,
Inv1.ExtOrderNum, c.ubARSMS, c.Fax1
FROM dbo.InvNum Inv1 INNER JOIN
dbo.Client c
ON Inv1.AccountID = c.DCLink AND Inv1.OrderDate >= c.udARSMSACTDATE
WHERE (Inv1.InvDate >= CAST(DATEADD(day, -1 GETDATE()) as date) AND
Inv1.InvDate < CAST(GETDATE() as date)
)
You want an index on InvNum(InvDate, AccountId, OrderDate) and Client(DCLink, udARSMACTDate).
Since my query above was working when the load on the server was minimal and there were less records to pull.
The timeout was the only thing that I needed to be handled without having to create indexes on the clients Main database.
So I simply created a 2nd windows service that checks if the 1st windows service crashes/stopped due to timeout, then restart it.
Solved my issue of the timeout crashing my service.

If field = Null then use a different field

I have written some Sql code to display all clients who's offers are about to expire in the next 90 days by using the dateOffered field. However there is another field in the database called OfferExpirydate I would use this field however it it not always filled out.
My question is i want the code to look at OfferExpirydate and if it has a value then use it or else use the Dateoffered field as my code below stats.
( if the OfferExpirydate is not filled out it is set to a NULL )
Any help on this would be great thanks
SELECT
DateOffered,
pr.ClientID,
pr.id AS profileID,
cf.Clntnme,
pm.Lender,
ABS(DATEDIFF(DAY, DateOffered, DATEADD(d,-90, GETDATE()))) AS 'NoOfDays'
FROM tbl_profile AS pr
INNER JOIN tbl_Profile_Mortgage AS pm
ON pr.id = pm.fk_profileID
INNER JOIN dbo.tbl_ClientFile AS cf
ON pr.ClientID = cf.ClientID
WHERE
DateCompleted IS NULL AND
DateOffered > DATEADD(d,-90, GETDATE())
AND DATEDIFF(DAY, DateOffered, DATEADD(d,-90, GETDATE())) > -15
ORDER BY DateOffered ASC
COALESCE(col1, col2, ...)
will pick the first non-null value.
Try this:
SELECT DateOffered,
pr.ClientID,
pr.id AS profileID,
cf.Clntnme,
pm.Lender,
ABS(DATEDIFF(DAY, DateOffered, DATEADD(d,-90, GETDATE()))) AS 'NoOfDays'
FROM tbl_profile AS pr
INNER JOIN tbl_Profile_Mortgage AS pm
ON pr.id = pm.fk_profileID
INNER JOIN dbo.tbl_ClientFile AS cf
ON pr.ClientID = cf.ClientID
WHERE DateCompleted IS NULL AND
1 = CASE WHEN OfferExpirydate IS NOT NULL AND DATEDIFF(DAY, OfferExpirydate, GETDATE()) > -15 THEN 1
WHEN DateOffered > DATEADD(d,-90, GETDATE()) AND DATEDIFF(DAY, DateOffered, DATEADD(d,-90, GETDATE())) > -15 THEN 1
ELSE 0
END
ORDER BY DateOffered ASC

How do I correctly join four tables in MSsql on one column?

I'm trying to join four separate queries on "PROD_CD" to return the correct output into one query to prevent having to merge the queries together in another language after. With the current one (and I've tried many variations, all with various problems), I'm receiving a lot of duplicate results and different numbers of duplicates for each.
Here's the current query I've been trying (all the date functions are for determining dataset over a span of time - the database is very old and uses Clarion time):
$query_ats = "SELECT
plog.prod_cd as prod_id,
ord_log.ORDER_QTY as total_so,
ediordlg.ORDER_QTY as total_edi_so,
inv_data.IN_STOCK as in_stock
FROM plog
INNER JOIN ord_log
ON plog.prod_cd = ord_log.prod_cd
INNER JOIN ediordlg
ON plog.prod_cd = ediordlg.prod_cd AND ord_log.prod_cd = ediordlg.prod_cd
INNER JOIN inv_data
ON plog.prod_cd = inv_data.prod_cd AND ord_log.prod_cd = inv_data.prod_cd AND ediordlg.prod_cd = inv_data.prod_cd
WHERE
inv_data.CLASS_CD = 'ALG7'
AND
dateadd(day, plog.EST_DT, '18001228') BETWEEN getdate() and dateadd(day, $x, getdate())
AND
dateadd(day, ord_log.SHIP_DT, '18001228') BETWEEN getdate() and dateadd(day, $x, getdate())
AND
dateadd(day, ediordlg.SHIP_DT, '18001228') BETWEEN getdate() and dateadd(day, $x, getdate())
GROUP BY plog.prod_cd, plog.log_qty, ord_log.ORDER_QTY, ediordlg.ORDER_QTY, inv_data.IN_STOCK
ORDER BY plog.prod_cd ASC";
And this is a sample of what it outputs:
Array
(
[prod_id] => ALG-809
[total_so] => 4
[total_edi_so] => 46
[in_stock] => 0
)
Array
(
[prod_id] => ALG-809
[total_so] => 6
[total_edi_so] => 46
[in_stock] => 0
)
Array
(
[prod_id] => ALG-809
[total_so] => 7
[total_edi_so] => 46
[in_stock] => 0
)
Here are the four separate queries that return the correct results:
$query_stock = "SELECT
prod_cd,
inv_data.DESCRIP,
inv_data.IN_STOCK
from
inv_data
where
inv_data.CLASS_CD = 'ALG7'
ORDER BY
inv_data.prod_cd ASC";
$query_po = "SELECT
plog.prod_cd,
SUM(plog.log_qty) as total_po
FROM
plog JOIN inv_data ON plog.prod_cd = inv_data.prod_cd
WHERE
inv_data.CLASS_CD = 'ALG7'
AND
dateadd(day, EST_DT, '18001228') BETWEEN getdate() and dateadd(day, $x, getdate())
GROUP BY
plog.prod_cd
ORDER BY
plog.prod_cd ASC";
$query_so = "SELECT
ord_log.prod_cd,
SUM(ord_log.ORDER_QTY) as total_so
FROM
ord_log JOIN inv_data ON ord_log.prod_cd = inv_data.prod_cd
WHERE
inv_data.CLASS_CD = 'ALG7'
AND
dateadd(day, SHIP_DT, '18001228') BETWEEN getdate() and dateadd(day, $x, getdate())
GROUP BY
ord_log.PROD_CD
ORDER BY
ord_log.prod_cd ASC";
$query_edi = "SELECT
ediordlg.prod_cd,
SUM(ediordlg.ORDER_QTY) as total_so_EDI
FROM
ediordlg JOIN inv_data ON ediordlg.prod_cd = inv_data.prod_cd
WHERE
inv_data.CLASS_CD = 'ALG7'
AND
dateadd(day, SHIP_DT, '18001228') BETWEEN getdate() and dateadd(day, $x, getdate())
GROUP BY
ediordlg.PROD_CD
ORDER BY
ediordlg.prod_cd ASC";
I'm sure it's the JOIN I'm using but I can't figure it out for the life of me. Any suggestions? Thanks!
Rule 1 of SQL in cases like this: Select initially from the thing that you want rows out of, join to the things you want more information about. In your case what you want seems to be the product ID and this seems to be uniquely stored in the inv_data table, so we'll start with that.
select
i.prod_cd as product,
i.descrip as description,
i.in_stock
from
inv_data as i
where
i.class_cd = 'ALG7'
order by
i.prod_cd asc
;
You'll get one of each because only one of each is stored. The rest is just details. Let's add some joins
select
i.prod_cd as product,
i.descrip as description,
i.in_stock,
sum(l.order_qty) as l_total_so,
sum(e.order_qty) as e_total_so
from
inv_data as i
inner join plog as p
on i.prod_cd = p.prod_cd
inner join ord_log as l
on i.prod_cd = l.prod_cd
inner join ediordlg as e
on i.prod_cd = e.prod_cd
where
i.class_cd = 'ALG7'
order by
i.prod_cd asc
group by
i.prod_cd,
i.descrip,
i.in_stock
;
You don't need to list so many columns in your on clauses because they're all the same anyway (by definition, because the earlier inner join succeeded).
If it turns out that plog and not inv_data is your master table simply reverse them in the query. If you want both total values together, use sum(l.order_qty + e.order_qty) as total_so instead of creating two columns.
The thing to understand is that joins can multiply the results. Understanding which tables have more and doing something in each case to limit the "extra" results each time they would be extra will result in a clean resultset. In this case probably just summing and grouping is enough, but in a complex case you would need to join to a sub-query that selects back a sufficiently distinct set.
And, on a related note, distinct is poison for your query! It has a very specific purpose which is not to fix "too many duplicate rows after joining." If you're using it for that you probably have a bug that could have unknown other side effects. Try to fix it with group by and smarter on statements first, then nested queries.
Not related to your question, but important: It appears that you are expanding a variable $x inside your query. This is probably not safe (or efficient); try to use a parametrized query instead.
why not use a DISTINCT?
$query_ats = "SELECT DISTINCT
plog.prod_cd as prod_id,
ord_log.ORDER_QTY as total_so,
ediordlg.ORDER_QTY as total_edi_so,
inv_data.IN_STOCK as in_stock
FROM plog
INNER JOIN ord_log
ON plog.prod_cd = ord_log.prod_cd
INNER JOIN ediordlg
ON plog.prod_cd = ediordlg.prod_cd AND ord_log.prod_cd = ediordlg.prod_cd
INNER JOIN inv_data
ON plog.prod_cd = inv_data.prod_cd AND ord_log.prod_cd = inv_data.prod_cd AND ediordlg.prod_cd = inv_data.prod_cd
WHERE
inv_data.CLASS_CD = 'ALG7'
AND
dateadd(day, plog.EST_DT, '18001228') BETWEEN getdate() and dateadd(day, $x, getdate())
AND
dateadd(day, ord_log.SHIP_DT, '18001228') BETWEEN getdate() and dateadd(day, $x, getdate())
AND
dateadd(day, ediordlg.SHIP_DT, '18001228') BETWEEN getdate() and dateadd(day, $x, getdate())
GROUP BY plog.prod_cd, plog.log_qty, ord_log.ORDER_QTY, ediordlg.ORDER_QTY, inv_data.IN_STOCK
ORDER BY plog.prod_cd ASC";