Two identical queries that give different results - sql

I have 2 queries: one is written in ANSI SQL, another is written using oracle dialect.
I think that they both must give the same resultset, but it is no true. First query gives 385 rows and the second - only 25
First:
SELECT idclient, cl.surname, sum(sub1.s)
FROM client cl JOIN incomestatement incst USING(idclient)
JOIN (SELECT c.idincome ID, sum(inst.total) AS s
FROM instalment inst JOIN credit c USING(idcredit)
WHERE inst.paydate > c.paydate AND c.isloaned = 1
GROUP BY c.idincome) sub1 ON incst.idincome = sub1.ID
GROUP BY idclient, cl.surname;
Second:
SELECT c.idclient, c.surname, sum(sub.s)
FROM client c, incomestatement inc,
(SELECT sum(inst.total) as s, cr.idincome as id
FROM instalment inst, credit cr
WHERE inst.paydate > cr.paydate AND cr.isloaned = 1 AND cr.idcredit = inst.idcredit
GROUP BY cr.idincome
) sub
WHERE c.idclient = inc.idclient AND inc.income = sub.ID
group by c.idclient, c.surname;
So why they don't give the same result?

I'd approach the problem in steps.
Do the two sub-queries produce the same data sets?
If they do, proceed to step 2.
If not, then you have two simpler queries to analyze and dissect.
Given that the pair of sub-queries produce the same answer, you can then establish whether the Client and IncomeStatement joins give the same results (treat it as another sub-query)
If they do, proceed to step 3.
If not, then you have a pair of queries (one with JOIN, one with classic SQL notation) to analyze and dissect.
Given that the pair of joins and the pair of subqueries each produce the same result, analyze why the join of these does not work correctly.

Have you made a commit?
It's possible that you don´t commited some transactions, so the results can be differents.

Related

When I try to get MIN value of manufactured parts I get (Only one expression ... when the subquery is not introduced with EXISTS)

I try to get MIN value of manufactured parts grouped by project like so:
This is my query:
SELECT
proinfo.ProjectN
,ProjShipp.[Parts]
,ProjShipp.Qty AS 'Qty Total'
,Sum(DailyProduction.Quantity) AS 'Qty Manufactured'
,(SELECT DailySumPoteau.IdProject, MIN(DailySumPoteau.DailySum)
FROM (SELECT PShipp.IdProject, SUM(DailyWelding.Quantity) DailySum
FROM DailyWeldingPaintProduction DailyWelding
INNER JOIN ProjectShipping PShipp ON PShipp.id=DailyWelding.FK_idPartShip
WHERE PShipp.id=ProjShipp.id
GROUP BY PShipp.id,PShipp.IdProject)DailySumPoteau
GROUP BY DailySumPoteau.IdProject ) AS 'Qt Pole'
FROM [dbo].[DailyWeldingPaintProduction] DailyProduction
INNER join ProjectShipping ProjShipp on ProjShipp.id=DailyProduction.FK_idPartShip
inner join ProjectInfo proinfo on proinfo.id=IdProject
GROUP By proinfo.id
,proinfo.ProjectN
,ProjShipp.[Parts]
,ProjShipp.Qty
,ProjShipp.[Designation]
,ProjShipp.id
I have three tables:
01 - ProjectInfo: it stores information about the project:
02 - ProjectShipping: it stores information about the parts and it has ProjectInfoId as foreign key:
03 - DailyWeldingPaintProduction: it stores information about daily production and it has ProjectShippingId as foreign key:
but when I run it I get this error:
Msg 116, Level 16, State 1, Line 13
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
How can I solve this problem?.
From your target results, I suspect that you want a window MIN(). Assuming that your query works and generates the correct results when the subquery is removed (column QtPole left apart), that would be:
SELECT pi.ProjectN, ps.[Parts], ps.Qty AS QtyTotal,
SUM(dp.Quantity) AS QtyManufactured,
MIN(SUM(dp.Quantity)) OVER(PARTITION BY pi.ProjectN) AS QtPole
ps.Designation
FROM [dbo].[DailyWeldingPaintProduction] dp
INNER join ProjectShipping ps on ps.id=dp.FK_idPartShip
INNER join ProjectInfo pi on pi.id=IdProject
GROUP BY pi.id, pi.ProjectN, ps.[Parts], ps.Qty, ps.Designation, ps.id
Side note: don't use single quotes for identifiers; they should be reserved for literal strings only. Use the proper quoting character for your database (in SQL Server: square brackets) - or better yet, use identifiers that do not require being quoted.
Formulating the query in the way you have done is not necessarily the best solution. As the other solution mentions, the best method in this instance is probably to use a window function / OVER. But since this can depend on indexes, and also to understand what went wrong, I will give you the way to fix the original query.
The issue with your query is that it has a correlated subquery in the SELECT which returns two values. What you are trying to do can be done in RDBMSs that support row constructors, unfortunately SQl Server is not one of them.
What you are trying to get at here is to get a whole resultset per row of the table.
The correct syntax for your query is to APPLY the resultset of the subquery for every row. You can CROSS APPLY in this instance because you are guaranteed a result anyway due to the correlation:
SELECT
proinfo.ProjectN
,ProjShipp.[Parts]
,ProjShipp.Qty AS 'Qty Total'
,Sum(DailyProduction.Quantity) AS 'Qty Manufactured'
,QtPole.IdProject
,QtPole.MinDailySum
FROM [dbo].[DailyWeldingPaintProduction] DailyProduction
INNER join ProjectShipping ProjShipp on ProjShipp.id=DailyProduction.FK_idPartShip
inner join ProjectInfo proinfo on proinfo.id=ProjShipp.IdProject
CROSS APPLY (
SELECT DailySumPoteau.IdProject, MIN(DailySumPoteau.DailySum) MinDailySum
FROM (SELECT DailyWelding.FK_idPartShip IdProject, SUM(DailyWelding.Quantity) DailySum
FROM DailyWeldingPaintProduction DailyWelding
WHERE DailyWelding.FK_idPartShip=ProjShipp.id
GROUP BY DailyWelding.FK_idPartShip) DailySumPoteau
GROUP BY DailySumPoteau.IdProject
) AS QtPole
GROUP By proinfo.id
,proinfo.ProjectN
,ProjShipp.[Parts]
,ProjShipp.Qty
,ProjShipp.[Designation]
,ProjShipp.id
,QtPole.IdProject
,QtPole.MinDailySum
I have taken the liberty of cleaning up the subquery by removing the unnecessary ProjectShipping reference. Note that the addition of grouping columns here does not matter because of the correlation to ProjShipp.Id
Note also that depending on indexes and density and such like, it may be better to formulate the subquery as a JOIN instead, with the correlation on the outside in the ON. You would need to modify the grouping in that case.

SQL Math Operation In Correlated Subquery

I am working with three tables, basically, one is a bill of materials, one contains part inventory, and the last one contains work orders or jobs. I am trying to find out if it is possible to have a correlated subquery that can perform a math operation using a value from the outer query. Here's an example of what I'm trying to do:
SELECT A.work_order,A.assembly,A.job_quantity,
(SELECT COUNT(X.part_number)
FROM bom X
WHERE X.assembly = A.assembly
AND (X.quantity_required * A.job_quantity) >= (SELECT Y.quantity_available FROM inventory Y WHERE
Y.part_number = X.part_number)) AS negatives
FROM work_orders A
ORDER BY A.assembly ASC
I am attempting to find out, for a given work order, if there are parts that we do not have enough of to build the assembly. I'm currently getting an "Error correlating fields" error. Is it possible to do this kind of operation in a single query?
Try moving the subquery to a join, something like this:
SELECT a.work_order, a.assembly, a.job_quantity, n.negatives
FROM work_orders a JOIN (SELECT x.part_number, COUNT(x.part_number) as negatives
FROM bom x JOIN work_orders b
ON x.assembly = b.assembly
WHERE (x.quantity_required * b.job_quantity) >= (SELECT y.quantity_available
FROM inventory y WHERE
y.part_number = x.part_number)
GROUP BY x.part_number) n
ON a.part_number = n.part_number
ORDER BY a.assembly ASC
Or create a temporary cursor with the subquery and then use it to join the main table.
Hope this helps.
Luis

Two almost identical queries returning different results

I am getting different results for the following two queries and I have no idea why. The only difference is one has an IN and one has an equals.
Before I go into the queries you should know that I found a better way to do it by moving the subquery into a common table expression, but this is still driving me crazy! I really want to know what caused the issue in the first place, I am asking out of curiosity
Here's the first query:
use [DB.90_39733]
Select distinct x.uniqproducer, cn.Firstname,cn.lastname,e.code,
ecn.FirstName, ecn.LastName, ecn.entid, x.uniqline
from product x
join employ e on e.EmpID=x.uniqproducer
join contactname cn on cn.uniqentity=e.uniqentity
join [ETL_GAWR92]..idlookupentity ide on ide.enttype='EM'
and ide.UniqEntity=e.UniqEntity
left join [ETL_GAWR92]..EntConName ecn on ecn.entid=ide.empid
and ecn.opt='Y'
Where x.UniqProducer =(SELECT TOP 1 idl.UniqEntity
FROM [ETL_GAWR92]..IDLookupEntity idl
LEFT JOIN [ETL_GAWR92]..Employ e2 ON e2.ProdID = ''
WHERE idl.empID = e2.EmpID AND
idl.EntType = 'EM')
And the second one:
use [DB.90_39733]
Select distinct x.uniqproducer, cn.Firstname,cn.lastname,e.code,
ecn.FirstName, ecn.LastName, ecn.entid, x.uniqline
from product x
join employ e on e.EmpID=x.uniqproducer
join contactname cn on cn.uniqentity=e.uniqentity
join [ETL_GAWR92]..idlookupentity ide on ide.enttype='EM'
and ide.UniqEntity=e.UniqEntity
left join [ETL_GAWR92]..EntConName ecn on ecn.entid=ide.empid
and ecn.opt='Y'
Where x.UniqProducer IN (SELECT TOP 1 idl.UniqEntity
FROM [ETL_GAWR92]..IDLookupEntity idl
LEFT JOIN [ETL_GAWR92]..Employ e2 ON e2.ProdID = ''
WHERE idl.empID = e2.EmpID AND
idl.EntType = 'EM')
The first query returns 0 rows while the second query returns 2 rows.The only difference is x.UniqProducer = versus x.UniqProducer IN for the last where clause.
Thanks for your time
SELECT TOP 1 doesn't guarantee that the same record will be returned each time.
Add an ORDER BY to your select to make sure the same record is returned.
(SELECT TOP 1 idl.UniqEntity
FROM [ETL_GAWR92]..IDLookupEntity idl
LEFT JOIN [ETL_GAWR92]..Employ e2 ON e2.ProdID = ''
WHERE idl.empID = e2.EmpID AND
idl.EntType = 'EM' ORDER BY idl.UniqEntity)
I would guess (with strong emphasis on the word “guess”) that the reason is based on how equals and in are processed by the query engine. For equals, SQL knows it needs to do a comparison with a specific value, where for in, SQL knows it needs to build a subset, and find if the "outer" value is in that "inner" subset. Yes, the end results should be the same as there’s only 1 row returned by the subquery, but as #RickS pointed out, without any ordering there’s no guarantee of which value ends up “on top” – and the (sub)query plan used to build the in - driven subquery might differ from that used by the equals pull.
A follow-up question: which is the correct dataset? When you analyze the actual data, should you have gotten zero, two, or a different number of rows?

COUNT is outputting more than one row

I am having a problem with my SQL query using the count function.
When I don't have an inner join, it counts 55 rows. When I add the inner join into my query, it adds a lot to it. It suddenly became 102 rows.
Here is my SQL Query:
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'
GROUP BY [fmsStage].[dbo].[File].[FILENUMBER]
Also, I have to do TOP 1 at the SELECT statement because it returns 51 rows with random numbers inside of them. (They are probably not random, but I can't figure out what they are.)
What do I have to do to make it just count the rows from [fmsStage].[dbo].[file].[FILENUMBER]?
First, your query would be much clearer like this:
SELECT COUNT(f.[FILENUMBER])
FROM [fmsStage].[dbo].[File] f INNER JOIN
[fmsStage].[dbo].[Container] c
ON v.[FILENUMBER] = c.[FILENUMBER]
WHERE f.[RELATIONCODE] = 'SHIP02' AND
c.DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08';
No GROUP BY is necessary. Otherwise you'll just one row per file number, which doesn't seem as useful as the overall count.
Note: You might want COUNT(DISTINCT f.[FILENUMBER]). Your question doesn't provide enough information to make a judgement.
Just remove GROUP BY Clause
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)