Using NOT EXISTS with a subquery - sql

I'm trying to create a query to retrieve all customers who purchased one SKU ('Red399') but did not also purchase a second SKU ('Red323'). It seems like the best way to do this is by using NOT EXISTS with a subquery to filter out people who bought 'Red323'.
My query isn't returning any errors, but it also isn't returning any results, and I think it might be because I have too many conditions in the initial WHERE clause but I'm not sure:
SELECT DISTINCT o."FirstName", o."LastName", o."Email", ol."SKU"
FROM flight_export_order o
JOIN flight_export_orderline ol
ON o."OrderDisplayID" = ol."OrderDisplayID"
WHERE ol."SKU" = 'Red399'
AND o."OrderDate" BETWEEN '07/22/2020' AND '08/03/2020'
AND NOT EXISTS
(SELECT DISTINCT o."Email"
FROM flight_export_order o
JOIN flight_export_orderline ol
ON o."OrderDisplayID" = ol."OrderDisplayID"
WHERE ol."SKU" = 'Red323'
AND o."OrderDate" BETWEEN '07/22/2020' AND '08/03/2020')

You don't need the subquery.
You can group by customer and set the conditions in the having clause:
SELECT o."FirstName", o."LastName", o."Email"
FROM flight_export_order o INNER JOIN flight_export_orderline ol
ON o."OrderDisplayID" = ol."OrderDisplayID"
WHERE ol."SKU" IN ('Red399', 'Red323')
AND o."OrderDate" BETWEEN '07/22/2020' AND '08/03/2020'
GROUP BY o."FirstName", o."LastName", o."Email"
HAVING SUM((ol."SKU" = 'Red399')::int) > 0
AND SUM((ol."SKU" = 'Red323')::int) = 0

You were on the right track to begin with. EXISTS / NOT EXISTS are the right tools:
SELECT o.* -- or just the columns you need
FROM flight_export_order o
WHERE o."OrderDate" BETWEEN '2020-07-22' AND '2020-08-03'
AND EXISTS (
SELECT FROM flight_export_orderline
WHERE "OrderDisplayID" = o."OrderDisplayID"
AND "SKU" = 'Red399'
)
AND NOT EXISTS (
SELECT FROM flight_export_orderline
WHERE "OrderDisplayID" = o."OrderDisplayID"
AND "SKU" = 'Red323'
);
With a multicolumn index on flight_export_orderline ("OrderDisplayID", "SKU"), this is as fast as it gets. An index on just ("OrderDisplayID") (like you probably have) goes a long way, too.
Plus an index on flight_export_order("OrderDate"), obviously.
I see no need for any expensive aggregating or DISTINCT. See:
Select rows which are not present in other table
Aside 1: try to avoid quoted CaMeL-case identifiers in Postgres if you can. See:
Are PostgreSQL column names case-sensitive?
Aside 2: It's reommended to use ISO 8601 date format (YYYY-MM-DD), which is always unambiguous and independent of locale and session settings.

Related

If transaction within date range, then return customer name (and not all the transactions!)

This code is taking a significant amount of time to run. It's returning every single transaction within the date range but I just need to know if the customer has had at least one transaction, then include the CustomerID, CustomerName, Type, Sign, ReportingName.
I think I need to GROUP BY 'CustomerID' but again only if there was a transaction within the date range. And of course, I'm sure there is an optimal way to execute the below TSQL because it's quite slow at present.
Thanks in advance for any help!
SELECT [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))
Check your indexes on fragmentation, to speed up your query. And make sure you have indexes.
If you just need one result, just TOP 1
SELECT TOP 1 [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))
If you only need to check for the existence of a row, and not actually get any data from it then use EXISTS() rather than INNER JOIN, e.g.
SELECT vpr.[RelatedNameId] AS CustomerID
,vpr.[RelatedName] AS CustomerName
,tt.[ParticluarType] AS Type
,prd.[Sign]
,prd.ReportingName
,tr.[EffectiveDate] AS [Date]
FROM [AFGPurchase].[IvL].[Account] AS acc
INNER JOIN [AFGPurchase].[IvL].[Position] AS pos ON acc.[AccountId] = pos.[AccountId]
INNER JOIN [AFGPurchase].[IvL].[Product] AS prd ON pos.[ProductID] = prd.[ProductId]
INNER JOIN [ABC].[dbo].[vwPrimary] AS vpr ON acc.[ReportingEntityId] = vpr.[RelatedNameId]
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] AS tt ON acc.[TaxTreatmentId] = tt.[TaxTreatmentId]
WHERE tt.[RegistrationType] LIKE 'NON%'
AND prd.[Sign]='XYZ2'
AND pos.[Quantity]<>0
AND EXISTS
( SELECT 1
FROM [AFGPurchase].[IvL].[Transaction] AS tr
WHERE tr.[PositionId] = pos.[PositionId]
AND tr.[EffectiveDate] BETWEEN '2021-12-31' AND '2022-12-31'
);
N.B. I have added in table aliases and removed all the unnecessary parentheses for readability - you may disagree that it is more readable, but I would expect that most people would agree
This may not offer any performance benefits over simply grouping by the columns you are selecting and keeping your joins as they are - SQL is after all a declarative language where you tell the engine what you want, not how to get it. So you may find that the two plans are the same because you are requesting the same result. Using EXISTS does have the advance of being more semantically tied to what you are trying to do though, so gives the optimiser the best chance of getting to the right plan. If you are still having performance issues, then you may need to inspect the execution plan, and see if it suggests any indexes.
Finally, if you are really still using SQL Server 2008 then you really need to start thinking about your upgrade path. It has been completely unsupported for over 3 years now.

SQL Math Operation In Correlated Subquery

I am working with three tables, basically, one is a bill of materials, one contains part inventory, and the last one contains work orders or jobs. I am trying to find out if it is possible to have a correlated subquery that can perform a math operation using a value from the outer query. Here's an example of what I'm trying to do:
SELECT A.work_order,A.assembly,A.job_quantity,
(SELECT COUNT(X.part_number)
FROM bom X
WHERE X.assembly = A.assembly
AND (X.quantity_required * A.job_quantity) >= (SELECT Y.quantity_available FROM inventory Y WHERE
Y.part_number = X.part_number)) AS negatives
FROM work_orders A
ORDER BY A.assembly ASC
I am attempting to find out, for a given work order, if there are parts that we do not have enough of to build the assembly. I'm currently getting an "Error correlating fields" error. Is it possible to do this kind of operation in a single query?
Try moving the subquery to a join, something like this:
SELECT a.work_order, a.assembly, a.job_quantity, n.negatives
FROM work_orders a JOIN (SELECT x.part_number, COUNT(x.part_number) as negatives
FROM bom x JOIN work_orders b
ON x.assembly = b.assembly
WHERE (x.quantity_required * b.job_quantity) >= (SELECT y.quantity_available
FROM inventory y WHERE
y.part_number = x.part_number)
GROUP BY x.part_number) n
ON a.part_number = n.part_number
ORDER BY a.assembly ASC
Or create a temporary cursor with the subquery and then use it to join the main table.
Hope this helps.
Luis

The "where" condition worked not as expected ("or" issue)

I have a problem to join thoses 4 tables
Model of my database
I want to count the number of reservations with different sorts (user [mrbs_users.id], room [mrbs_room.room_id], area [mrbs_area.area_id]).
Howewer when I execute this query (for the user (id=1) )
SELECT count(*)
FROM mrbs_users JOIN mrbs_entry ON mrbs_users.name=mrbs_entry.create_by
JOIN mrbs_room ON mrbs_entry.room_id = mrbs_room.id
JOIN mrbs_area ON mrbs_room.area_id = mrbs_area.id
WHERE mrbs_entry.start_time BETWEEN "145811700" and "1463985000"
or
mrbs_entry.end_time BETWEEN "1458120600" and "1463992200" and mrbs_users.id = 1
The result is the total number of reservations of every user, not just the user who has the id = 1.
So if anyone could help me.. Thanks in advance.
Use parentheses in the where clause whenever you have more than one condition. Your where is parsed as:
WHERE (mrbs_entry.start_time BETWEEN "145811700" and "1463985000" ) or
(mrbs_entry.end_time BETWEEN "1458120600" and "1463992200" and
mrbs_users.id = 1
)
Presumably, you intend:
WHERE (mrbs_entry.start_time BETWEEN 145811700 and 1463985000 or
mrbs_entry.end_time BETWEEN 1458120600 and 1463992200
) and
mrbs_users.id = 1
Also, I removed the quotes around the string constants. It is bad practice to mix data types, and in some databases, the conversion between types can make the query less efficient.
The problem you've faced caused by the incorrect condition WHERE.
So, should be:
WHERE (mrbs_entry.start_time BETWEEN 145811700 AND 1463985000 )
OR
(mrbs_entry.end_time BETWEEN 1458120600 AND 1463992200 AND mrbs_users.id = 1)
Moreover, when you use only INNER JOIN (JOIN) then it be better to avoid WHERE clause, because the ON clause is executed before the WHERE clause, so criteria there would perform faster.
Your query in this case should be like this:
SELECT COUNT(*)
FROM mrbs_users
JOIN mrbs_entry ON mrbs_users.name=mrbs_entry.create_by
JOIN mrbs_room ON mrbs_entry.room_id = mrbs_room.id
AND
(mrbs_entry.start_time BETWEEN 145811700 AND 1463985000
OR ( mrbs_entry.end_time BETWEEN 1458120600 AND 1463992200 AND mrbs_users.id = 1)
)
JOIN mrbs_area ON mrbs_room.area_id = mrbs_area.id

Select a field called "return" in postgreSQL

I'm having a problem with a query in postgres, the table cgporders_items has a field called return, I cannot get actual result of that field with this query, it returns me al ceros.
SELECT "Cgporder".id AS "Cgporder__id"
,"Sale".preorder_number AS "Sale__preorder_number"
,"Contact".id AS "Contact__id"
,"Contact".NAME AS "Contact__name"
,"Ptype".NAME AS "Ptype__name"
,(
SELECT code
FROM products
WHERE id = "CgporderItem".parent_id
) AS "Product__parent_code"
,"Product".id AS "Product__id"
,"Product".code AS "Product__code"
,"Product".NAME AS "Product__name"
,"CgporderItem".quantity AS "CgporderItem__quantity"
,"CgporderItem".return AS "CgporderItem__return"
,"CgporderItem".cep_id AS "CgporderItem__cep"
FROM cgporders AS "Cgporder"
INNER JOIN contacts AS "Contact" ON ("Contact".id = "Cgporder".contact_id)
INNER JOIN cgporders_items AS "CgporderItem" ON ("Cgporder".id = "CgporderItem".cgporder_id)
INNER JOIN products AS "Product" ON ("Product".id = "CgporderItem".product_id)
INNER JOIN ptypes AS "Ptype" ON ("Ptype".id = "Product".ptype_id)
LEFT JOIN cgporders_sales AS "CgporderSale" ON ("Cgporder".id = "CgporderSale".cgporder_id)
LEFT JOIN sales AS "Sale" ON ("Sale".id = "CgporderSale".sale_id)
WHERE "CgporderItem".parent_id != 0
AND "Cgporder"."issue_date" >= '2015-11-27'
AND "Cgporder"."issue_date" <= '2015-11-27'
AND "Cgporder"."status" = 'confirmed'
ORDER BY "Ptype".NAME
,"Product"."code";
There are actually a lots of rows that matches the select condition, but it return cero on "CgporderItem".return AS "CgporderItem__return"
If I make a simple query like select "return" from cgporders_items it works. But in this query it does not work.
Can you help me please?
"return" is a reserved word in SQL, but not in Postgres. See the list here. The following code works find in Postgres (SQL Fiddle is here):
create table dum (return int);
select dum.return from dum;
Your problem is something else. If I had to guess, the where clause is too restrictive (the condition on dates is a bit suspect).

SQL Reporting count of parameter in a column

I am working in SSRS 3.0 with a SQL table including the following fields:
ApptID BookedBy ConfirmedBy CancelledBy
I also have a parameter setup to select which users to filter by (matches data in the BookedBy, ConfirmedBy and CancelledBy columns) called #Scheduler (which is a multi vale parameter/array).
I need to get a count for booked, confirmed and scheduled for how many times any value in the Scheduler parameter shows up in that column.
Basically:
COUNT(BookedBy IN (#Scheduler)) AS BookedCount
Can anyone help me out with the syntax for doing this?
Try this
SELECT Count(BookedBy = #Scheduler) as [BookedCount],
Count(ConfirmedBy = #Scheduler) as [ConfirmedCount],
Count(CancelledBy = #Scheduler) as [CancelledCount]
FROM tablename
WHERE BookedBy = #Scheduler OR
ConfirmedBy = #Scheduler OR
CancelledBy = #Scheduler
NB - Not tested might contain typos
If your input is a list separated by commas you can convert that to a table. See a reference like this:
http://www.projectdmx.com/tsql/sqlarrays.aspx
For this use case I'd recommend one of the solutions that saves the result in a CTE (since you only need to convert your input once and this will be fastest)
Then you could use that table (called sTable with column name) like this:
SELECT Count(Bo.Name) as [BookedCount],
Count(Co.Name) as [ConfirmedCount],
Count(Ca.Name) as [CancelledCount]
FROM tablename
LEFT JOIN sTable Bo ON BookedBy = Bo.name
LEFT JOIN sTable Co ON ConfirmedBy = Co.name
LEFT JOIN sTable Ca ON CancelledBy = Ca.name
I guess this will work but it does not seem as nice as the others:
SELECT (SELECT COUNT(*) FROM table WHERE BookedBy in (#Scheduler)) AS [BookedCount],
(SELECT COUNT(*) FROM table WHERE ConfirmedBy in (#Scheduler)) as [ConfirmedCount],
(SELECT COUNT(*) FROM table WHERE CancelledBy in (#Scheduler)) as [CancelledCount]