Bizarre result in SQL query - PostgreSQL

Bizarre result in SQL query - PostgreSQL - sql

I discovered this strange behavior with this query:
-- TP4N has stock_class = 'Bond'
select lot.symbol
, round(sum(lot.qty_left), 4) as "Qty"
from ( select symbol
, qty_left
-- , amount
from trade_lot_tbl t01
where t01.symbol not in (select symbol from stock_tbl where stock_class = 'Cash')
and t01.qty_left > 0
and t01.trade_date <= current_date -- only current trades
union
select 'CASH' as symbol
, sum(qty_left) as qty_left
-- , sum(amount) as amount
from trade_lot_tbl t11
where t11.symbol in (select symbol from stock_tbl where stock_class = 'Cash')
and t11.qty_left > 0
and t11.trade_date <= current_date -- only current trades
group by t11.symbol
) lot
group by lot.symbol
order by lot.symbol
;
Run as is, the Qty for TP4N is 1804.42
Run with the two 'amount' lines un-commented, which as far as I can tell should NOT affect the result, yet Qty for TP4N = 1815.36. Only ONE of the symbols (TP4N) has a changed value, all others remain the same.
Run with the entire 'union' statement commented out results in Qty for TP4N = 1827.17
The correct answer, as far as I can tell, is 1827.17.
So, to summarize, I get three different values by modifying parts of the query that, as far as I can tell, should NOT affect the answer.
I'm sure I'm going to kick myself when the puzzle is solved, this smells like a silly mistake.

Likely, what you are seeing is caused by the use of union. This set operator deduplicates the resultsets that are returned by both queries. So adding or removing columns in the unioned sets may affect the final resultset (by default, adding more columns reduces the risk of duplication).
As a rule of thumb: unless you do want deduplication, you should use union all (which is also more efficient, since the database does not need to search for duplicates).

Related

Alter a existing SQL statement, to give an additional column of data, but to not affect performance, so best approach

In this query, I want to add a new column, which gives the SUM of a.VolumetricCharge, but only where PremiseProviderBillings.BillingCategory = 'Water'. But i don't want to add it in the obvious place since that would limit the rows returned, I only want it to get the new column value
SELECT b.customerbillid,
-- Here i need SUM(a.VolumetricCharge) but where a.BillingCategory is equal to 'Water'
Sum(a.volumetriccharge) AS Volumetric,
Sum(a.fixedcharge) AS Fixed,
Sum(a.vat) AS VAT,
Sum(a.discount) + Sum(deferral) AS Discount,
Sum(Isnull(a.estimatedconsumption, 0)) AS Consumption,
Count_big(*) AS Records
FROM dbo.premiseproviderbillings AS a WITH (nolock)
LEFT JOIN dbo.premiseproviderbills AS b WITH (nolock)
ON a.premiseproviderbillid = b.premiseproviderbillid
-- Cannot add a where here since that would limit the results and change the output
GROUP BY b.customerbillid;

Bit of a tricky one, as what you're asking for will definitely affect performance (your asking SQL Server to do more work after all!).
However, we can add a column to your results which performs a conditional sum so that it does not affect the result of the other columns.
The answer lies in using a CASE expression!
Sum(
CASE
WHEN PremiseProviderBillings.BillingCategory = 'Water' THEN
a.volumetriccharge
ELSE
0
END
) AS WaterVolumetric

Is there a performance benefit to repeating a WHERE filter in subqueries?

I have the following query:
WITH prices AS (
SELECT itemId
, monthId
, MIN(lastPrice / firstPrice) AS gain
FROM (
SELECT *
, FIRST_VALUE(price) OVER (PARTITION BY monthId
ORDER BY date) AS firstPrice
, LAST_VALUE(price) OVER (PARTITION BY monthId
ORDER BY date) AS lastPrice
FROM (
SELECT *
FROM foo
WHERE monthId = 82 -- a repeat of the final WHERE
) x
) x
WHERE firstPrice != 0
AND lastPrice != 0
GROUP BY itemId
, monthId
)
SELECT f.monthId
, f.itemId
, p.gain
FROM foo f
LEFT JOIN prices p
ON f.itemId = p.itemId
AND f.monthId = p.monthId
WHERE gain IS NOT NULL
AND monthId = 82 -- repeated above
As noted, the full query ends with a WHERE monthId = 82 clause, which is also present in the prices subquery.
If I remove the WHERE from the subquery, the result is the same. This makes sense since the result would be naturally constrained by the final WHERE.
However, the case without the subquery WHERE runs dramatically slower (40 vs. 3 minutes). However, I'm not proficient enough at SQL to know if this is expected or if it's merely an artifact of statistics (I've run the version with the subquery WHERE many, many times already and only now tried to remove it).
It'd make sense for this to improve performance since it allows the server to only perform the operations within prices (there are many more in my real case) on the subset of rows with monthId = 82. However, I don't know if the compiler already optimizes the subquery to filter it with that subset regardless and therefore the benefit I'm seeing is merely an illusion.
For the record, my actual FIRST/LAST_VALUE calls have ROWS BETWEEN PRECEEDING UNBOUNDED AND FOLLOWING UNBOUNDED, just omitted them to simplify the query.

The SQL Server optimizer is smart enough to push where filters into subqueries under many circumstances. However, optimizers make mistakes and they miss situations -- as would appear to be the case here. In general, you can check the query plan to see if it makes a difference.
I would be inclined to repeat the logic, just to be sure that the query is as efficient as possible.

Eliminating Entries Based On Revision

I need to figure out how to eliminate older revisions from my query's results, my database stores orders as 'Q000000' and revisions have an appended '-number'. My query currently is as follows:
SELECT DISTINCT Estimate.EstimateNo
FROM Estimate
INNER JOIN EstimateDetails ON EstimateDetails.EstimateID = Estimate.EstimateID
INNER JOIN EstimateDoorList ON EstimateDoorList.ItemSpecID = EstimateDetails.ItemSpecID
WHERE (Estimate.SalesRepID = '67' OR Estimate.SalesRepID = '61') AND Estimate.EntryDate >= '2017-01-01 00:00:00.000' AND EstimateDoorList.SlabSpecies LIKE '%MDF%'
ORDER BY Estimate.EstimateNo
So for instance, the results would include:
Q120455-10
Q120445-11
Q121675-2
Q122361-1
Q123456
Q123456-1
From this, I need to eliminate 'Q120455-10' because of the presence of '-11' for that order, and 'Q123456' because of the presence of the '-1' revision. I'm struggling greatly with figuring out how to do this, my immediate thought was to use case statements but I'm not sure what is the best way to implement them and how to filter. Thank you in advance, let me know if any more information is needed.

First you have to parse your EstimateNo column into sequence number and revision number using CHARINDEX and SUBSTRING (or STRING_SPLIT in newer versions) and CAST/CONVERT the revision to a numeric type
SELECT
SUBSTRING(Estimate.EstimateNo,0,CHARINDEX('-',Estimate.EstimateNo)) as [EstimateNo],
CAST(SUBSTRING(Estimate.EstimateNo,CHARINDEX('-',Estimate.EstimateNo)+1, LEN(Estimate.EstimateNo)-CHARINDEX('-',Estimate.EstimateNo)+1) as INT) as [EstimateRevision]
FROM
...
You can then use
APPLY - to select TOP 1 row that matches the EstimateNo or
Window function such as ROW_NUMBER to select only records with row number of 1
For example, using a ROW_NUMBER would look something like below:
SELECT
ROW_NUMBER() OVER(PARTITION BY EstimateNo ORDER BY EstimateRevision DESC) AS "LastRevisionForEstimate",
-- rest of the needed columns
FROM
(
-- query above goes here
)
You can then wrap the query above in a simple select with a where predicate filtering out a specific value of LastRevisionForEstimate, for instance
SELECT --needed columns
FROM -- result set above
WHERE LastRevisionForEstimate = 1
Please note that this is to a certain extent, pseudocode, as I do not have your schema and cannot test the query
If you dislike the nested selects, check out the Common Table Expressions

Trouble with pulling distinct data

Ok this is hard to explain partially because I'm bad at sql but this code isn't doing exactly what I want it to do. I'll try to explain what it is supposed to do as best I can and hopefully someone can spot a glaring mistake. I'm sorry about the long winded explanation but there is a lot going on here and I really could use the help.
The point of this script is to search for parts which need to be obsoleted. in other words they haven't been used in three years and are still active.
When we obsolete part, "part.status" is set to 'O'. It is normally null. Also, the word 'OBSOLETE' is usually written in to "part.description"
The "WORK_ORDER" contains every scheduled work order. These are defined by base,lot, and sub ID's. It also contains many dates such as the date when the work order was closed.
the "REQUIREMENT" table contains all the parts require for each job. many jobs may require multiple parts, some at different legs of the job. The way this is handled is that for a given "REQUIREMENT.WORKORDER_BASE_ID" and "REQUIREMENT.WORKORDER_LOT_ID", they may be listed on a dozen or so subsequent rows. Each line specifies a different "REQUIREMENT.PART_ID". The sub id separates what leg of the job that the part is needed. All of the parts I care about start with 'PCH'
When I run this code it returns 14 lines, I happen to know it should be returning about 39 right now. I believe the screwy part starts at line 17. I found that code on another form hoping that it would help solve the original problem. Without that code, I get like 27K lines because the DB is pulling every criteria matching requirement from every criteria matching work order. Many of these parts are used on multiple jobs. I've also tried using DISTINCT on REQUIREMENT.PART_ID which seems like it should solve the problem. Alas it doesn't.
So I know despite all the information I probably still didn't give nearly enough. Does anyone have any suggestions?
SELECT
PART.ID [Engr Master]
,PART.STATUS [Master Status]
,WO.CLOSE_DATE
,PT.ID [Die]
,PT.STATUS [Die Status]
FROM PART
CROSS APPLY(
SELECT
WORK_ORDER.BASE_ID
,WORK_ORDER.LOT_ID
,WORK_ORDER.SUB_ID
,WORK_ORDER.PART_ID
,WORK_ORDER.CLOSE_DATE
FROM WORK_ORDER
WHERE
GETDATE() - (360*3) > WORK_ORDER.CLOSE_DATE
AND PART.ID = WORK_ORDER.PART_ID
AND PART.STATUS ='O'
)WO
CROSS APPLY(
SELECT
REQUIREMENT.WORKORDER_BASE_ID
,REQUIREMENT.WORKORDER_LOT_ID
,REQUIREMENT.WORKORDER_SUB_ID
,REQUIREMENT.PART_ID
FROM REQUIREMENT
WHERE
WO.BASE_ID = REQUIREMENT.WORKORDER_BASE_ID
AND WO.LOT_ID = REQUIREMENT.WORKORDER_LOT_ID
AND WO.SUB_ID = REQUIREMENT.WORKORDER_SUB_ID
AND REQUIREMENT.PART_ID LIKE 'PCH%'
)REQ
CROSS APPLY(
SELECT
PART.ID
,PART.STATUS
FROM PART
WHERE
REQ.PART_ID = PART.ID
AND PART.STATUS IS NULL
)PT
ORDER BY PT.ID

This is difficult to understand without any sample data, but I took a stab at it anyway. I removed the second JOIN to PART (that had alias PART1) as it seemed unecessary. I also removed the subquery that was looking for parts HAVING COUNT(PART_ID) = 1
The first JOIN to PART should be done on REQUIREMENT.PART_ID = PART.PART_ID as the relationship as already been defined from WORK_ORDER to REQUIREMENT, hence you can JOIN PART directly to REQUIREMENT at this point.
EDIT 03/23/2015
If I understand this correctly, you just need a distinct list of PCH parts, and their respective last (read: MAX) CLOSE_DATE. If that is the case, here is what I propose.
I broke the query up into a couple of CTE's. The first CTE is simply going through the PART table and pulling out a DISTINCT list of PCH parts, grouping by PART_ID and DESCRIPTION.
The second CTE, is going through the REQUIREMENT table, joining to the WORK_ORDER table and, for each PART_ID (handled by the PARTITION) assigning the CLOSE_DATE a ROW_NUMBER in descending order. This will ensure that each ROW_NUMBER with a value of "1" will be the Max CLOSE_DATE for each PART_ID.
The final SELECT statement simply JOINS the two Cte's on PART_ID, filtering where LastCloseDate = 1 (the ROW_NUMBER assigned in the second CTE).
If I understand the requirements correctly, this should give you the desired results.
Additionally, I removed the filter WHERE PART.DESCRIPTION NOT LIKE 'OB%' because we're already filtering by PART.STATUS IS NULL and you stated above that an 'O' is placed in this field for Obsolete parts. Also, [DIE] and [ENGR MASTER] have the same value in the 27 rows being pulled before, so I just used the same field and labeled them differently.
; WITH Parts AS(
SELECT prt.PART_ID AS [ENGR MASTER]
, prt.DESCRIPTION
FROM PART prt
WHERE prt.STATUS IS NULL
AND prt.PART_ID LIKE 'PCH%'
GROUP BY prt.ID, prt.DESCRIPTION
)
, LastCloseDate AS(
SELECT req.PART_ID
, wrd.CLOSE_DATE
, ROW_NUMBER() OVER(PARTITION BY req.PART_ID ORDER BY wrd.CLOSE_DATE DESC) AS LastCloseDate
FROM REQUIREMENT req
INNER JOIN WORK_ORDER wrd
ON wrd.BASE_ID = req.WORKORDER_BASE_ID
AND wrd.LOT_ID = req.WORKORDER_LOT_ID
AND wrd.SUB_ID = req.WORKORDER_SUB_ID
WHERE wrd.CLOSE_DATE IS NOT NULL
AND GETDATE() - (365 * 3) > wrd.CLOSE_DATE
)
SELECT prt.PART_ID AS [DIE]
, prt.PART_ID AS [ENGR MASTER]
, prt.DESCRIPTION
, lst.CLOSE_DATE
FROM Parts prt
INNER JOIN LastCloseDate lst
ON prt.PART_ID = lst.PART_ID
WHERE LastCloseDate = 1

SQL Output Question

Edited
I am running into an error and I know what is happening but I can't see what is causing it. Below is the sql code I am using. Basically I am getting the general results I want, however I am not accurately giving the query the correct 'where' clause.
If this is of any assistance. The count is coming out as this:
Total Tier
1 High
2 Low
There are 4 records in the Enrollment table. 3 are active, and 1 is not. Only 2 of the records should be displayed. 1 for High, and 1 for low. The second Low record that is in the total was flagged as 'inactive' on 12/30/2010 and reflagged again on 1/12/2011 so it should not be in the results. I changed the initial '<=' to '=' and the results stayed the same.
I need to exclude any record from Enrollments_Status_Change that where the "active_status" was changed to 0 before the date.
SELECT COUNT(dbo.Enrollments.Customer_ID) AS Total,
dbo.Phone_Tier.Tier
FROM dbo.Phone_Tier as p
JOIN dbo.Enrollments as eON p.Phone_Model = e.Phone_Model
WHERE (e.Customer_ID NOT IN
(Select Customer_ID
From dbo.Enrollment_Status_Change as Status
Where (Change_Date >'12/31/2010')))
GROUP BY dbo.Phone_Tier.Tier
Thanks for any assistance and I apologize for any confusion. This is my first time here and i'm trying to correct my etiquette on the fly.

If you don't want any of the fields from that table dbo.Enrollment_Status_Change, and you don't seem to use it in any way — why even include it in the JOINs? Just leave it out.
Plus: start using table aliases. This is very hard to read if you use the full table name in each JOIN condition and WHERE clause.
Your code should be:
SELECT
COUNT(e.Customer_ID) AS Total, p.Tier
FROM
dbo.Phone_Tier p
INNER JOIN
dbo.Enrollments e ON p.Phone_Model = e.Phone_Model
WHERE
e.Active_Status = 1
AND EXISTS (SELECT DISTINCT Customer_ID
FROM dbo.Enrollment_Status_Change AS Status
WHERE (Change_Date <= '12/31/2010'))
GROUP BY
p.Tier
Also: most likely, your EXISTS check is wrong — since you didn't post your table structures, I can only guess — but my guess would be:
AND EXISTS (SELECT * FROM dbo.Enrollment_Status_Change
WHERE Change_Date <= '12/31/2010' AND CustomerID = e.CustomerID)
Check for existence of any entries in dbo.Enrollment_Status_Change for the customer defined by e.CustomerID, with a Change_Date before that cut-off date. Right?

Assuming you want to:
exclude all customers whose latest enrollment_status_change record was since the start of 2011
but
include all customers whose latest enrollment_status_change record was earlier than the end of 2010 (why else would you have put that EXISTS clause in?)
Then this should do it:
SELECT COUNT(e.Customer_ID) AS Total,
p.Tier
FROM dbo.Phone_Tier p
JOIN dbo.Enrollments e ON p.Phone_Model = e.Phone_Model
WHERE dbo.Enrollments.Active_Status = 1
AND e.Customer_ID NOT IN (
SELECT Customer_ID
FROM dbo.Enrollment_Status_Change status
WHERE (Change_Date >= '2011-01-01')
)
GROUP BY p.Tier
Basically, the problem with your code is that joining a one-to-many table will always increase the row count. If you wanted to exclude all the records that had a matching row in the other table this would be fine -- you could just use a LEFT JOIN and then set a WHERE clause like Customer_ID IS NULL.
But because you want to exclude a subset of the enrollment_status_change table, you must use a subquery.
Your intention is not clear from the example given, but if you wanted to exclude anyone who's enrollment_status_change as before 2011, but include those who's status change was since 2011, you'd just swap the date comparator for <.
Is this any help?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas