Making select query more efficient (subquery slows run speed) - sql

The below query seems to take forever to run ever since I have added the subquery into it.
I originally tried to accomplish my goal by having two joins but the results were wrong.
Does anyone know the correct way to write this?
SELECT
c.cus_Name,
COUNT(o.orderHeader_id) AS Orders,
(select count(ol.orderLines_id) from orderlines ol where ol.orderLines_orderId = o.orderHeader_id) as linesOrderd,
MAX(o.orderHeader_dateCreated) AS lastOrdered,
SUM(o.orderHeader_totalSell) AS orderTotal,
SUM(o.orderHeader_currentSell) AS sellTotal
FROM
cus c
JOIN
orderheader o ON o.orderHeader_customer = c.cus_id
group by
c.cus_name
order by
orderTotal desc
Example data below

For the data you want, I think this is the way to go:
SELECT c.cus_Name,
COUNT(o.orderHeader_id) AS Orders,
SUM(ol.cnt) as linesOrderd,
MAX(o.orderHeader_dateCreated) AS lastOrdered,
SUM(o.orderHeader_totalSell) AS orderTotal,
SUM(o.orderHeader_currentSell) AS sellTotal
FROM cus c JOIN
orderheader o
ON o.orderHeader_customer = c.cus_id LEFT JOIN
(SELECT ol.orderLines_orderId, count(*) as cnt
FROM orderlines ol
GROUP BY ol.orderLines_orderId
) ol
ON ol.orderLines_orderId = o.orderHeader_id)
GROUP BY c.cus_name
ORDER BY orderTotal DESC;
I'm not sure if it will be much faster, but it will at least produce a sensible result -- the total number of order lines for a customer rather than the number of order lines on an arbitrary order.

Strange that subselect should not be possible since the count is only very indirectly related to the grouping. You want to count all orderlines of all orders which are related to one customer? Normally this should be done using the second join, but then the orderheader will be repeated as often as the order_lines exist. That would produce wrong results in the other aggregations.
normally this should help then, put the subselect into the joined table:
could you replace orderheader o by
(select o.*, (select count(ol.orderLines_id) from orderlines ol where ol.orderLines_orderId = o.orderHeader_id) as linesOrder from orderheader o) as o
and replace the subselect by
sum(o.linesOrder)

Related

oracle sql statement help to query against multiple tables

I am struggling with a sql statement. I am hoping a guru can help a beginner out, currently I have multiple select in statements.. but think there is a better way as I have been stuck.
Below are the tables and pertinent columns in each table
country
-country_id
barcodes_banned_in_country
-barcode(varchar)
-country_id
-country_name
orders
-order_id
-country_name
item
-order_id
-item_id
-barcode(varchar)
The goal is to get all orders that are banned based off the barcode banned list.
Any help with this sql statement would be appreciated.
One option uses exists:
select o.*
from orders o
where exists (
select 1
from barcodes_banned_in_country bic
inner join item i on i.barcode = bic.barcode
where i.order_id = o.order_id and bic.country_name = o.country_name
)
This brings all orders whose at least one item have a barcode that is banned in the order's country.
If, on the other hand, you want the list of the banned barcodes per order, then you can join and aggregate:
select o.order_id, o.country_name, listagg(i.barcode, ',') banned_barcodes
from orders o
inner join item i
on i.order_id = o.order_id
inner join barcodes_banned_in_country bic
on i.barcode = bic.barcode
and bic.country_name = o.country_name
group by o.order_id, o.country_name
Note that, as commented by MT0, you should really be storing the id of the country in orders rather than the country name. Accordingly, you wouldn't need the country name in the banned barcodes table.

Order of Execution of Subqueries in SQL

SELECT customerid,
(SELECT COUNT(*)
FROM orders
WHERE customers.customerid = orders.customerid) as total_orders
FROM customers
Can anyone explain the working of this SQL code? The subquery should always return the same number of rows in this case according to me, because the total no. of rows where
customers.customerid = orders.customerid is same. But its displaying each customer and the total_orders made by him/her. What is the order of execution that results in this?
Please find the database here:
https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_distinct
Your query is:
SELECT c.customerid,
(SELECT COUNT(*)
FROM orders o
WHERE c.customerid = o.customerid
) as total_orders
FROM customers c;
(Note that I added table aliases and qualified all column names.)
This is a scalar, correlated subquery. It is a scalar subquery because it returns a single value (rather than a table).
It is correlated because the subquery is linked to the outer query. This is the part that confuses you.
Basically, the outer query says that the result set will have one row for each customer.
The subquery than says that for each customer, the result set will count the number of matching rows for the customer in any given row.
Although writing the query with a subquery is totally fine, this would often be written as:
SELECT c.customerid, COUNT(o.customerid) as total_orders
FROM customers c LEFT JOIN
orders o
ON c.customerid = o.customerid
GROUP BY c.customerId
You are basically using the Correlated subquery which means your inner query is executed for each of the row of the outer query.
In your case, the inner query gets executed for all the customers because of the where clause customers.customerid = orders.customerid. So, the aggregate function COUNT(*) returns the total number of orders for every customer. Since your outer query selects customerId and total_orders that is why you get 2 columns.

SQL a SELECT within a SELECT? Northwind (Microsoft)

First of all, I'm practicing with Northwind database (Microsoft creation).
The table design I'm working with is:
The question I'm trying to solve is:
Which Product is the most popular? (number of items)
Well, my query was:
SELECT DISTINCT
P.ProductName
FROM
Products P,
[Order Details] OD,
Orders O,
Customers C
WHERE
C.CustomerID = O.CustomerID
and O.OrderID = OD.OrderID
and OD.ProductID = P.ProductID
and P.UnitsInStock = (SELECT MAX(P.UnitsInStock) Items
FROM Products P)
Now, I had exactly one result as they asked:
ProductName
1 Rhönbräu Klosterbier
Yet, I doublt that my query was good. Do I really need a SELECT within a SELECT?
It feels like duplication for some reason.
Any help would be appreciated. Thanks.
To get the most popular product (bestselling product) use query
SELECT ProductName, MAX(SumQuantity)
FROM (
SELECT P.ProductName ProductName,
SUM(OD.Quantity) SumQuantity
FROM [Order Details] OD
LEFT JOIN Product P ON
P.ProductId = OD.ProductID
GROUP BY OD.ProductID
) Res;
Does the most units in stock necessarily equate to the most popular product? I don't think that is always a true statement (It could even be the opposite in fact.).
To me the question is asking, which is the most popular product sold. If you think about it that way, you'd be looking at the amount sold for each product and selecting the product with the most sold.
Does that make sense?
With regards to your specific query, the query only utilizes the Products table. You make joins, but they are not used at all in the query and should get overlooked by the query optimizer.
I would personally rewrite your query as the following:
SELECT
P.ProductName
FROM
Products P
INNER JOIN
(SELECT
MAX(P.UnitsInStock) AS Items
FROM Products P) maxProd
ON P.UnitsInStock= maxProd.Items
About your question, it is perfectly acceptable to utilize a subquery (the select in the where clause). It is even necessary at times. Most of the time I would use an Inner Join like I did above, but if the dataset is small enough, it shouldn't make much difference with query time.
In this scenario, you should rethink the question that is being asked and think about what being the most popular item means.
Rethinking the problem:
Let's look at the datasets that you've shown above. Which could be used to tell you how many products have been sold? A customer would have to order a product, right? Looking at the two tables that are potentially applicable, one contains details about number of items sold, quantity, or you could think of popularity in terms of the number of times appearing in orders. Start with that dataset and use a similar methodology to what you've done, but perhaps you'll have to use a sum and group by. Why? Perhaps more than one customer bought the item.
The problem with the dataset is it doesn't tell you the name of the product. It only gives you the ID. There is a table though that has this information. Namely, the Products table. You'll notice that both tables have the Product ID variable, and you are able to join on this.
You can find the most popular product by counting the number of orders placed on each product .And the one with most number of order will be the most popular product.
Below script will give you the most popular product based on the the number of orders placed .
;WITH cte_1
AS(
SELECT p.ProductID,ProductName, count(OrderID) CNT
FROM Products p
JOIN [Order Details] od ON p.ProductID=od.ProductID
GROUP BY p.ProductID,ProductName)
SELECT top 1 ProductName
FROM cte_1
ORDER BY CNT desc
if you are using SQL server 2012 or any higher version, use 'with ties' for fetching multiple products having same order count.
;WITH cte_1
AS(
SELECT p.ProductID,ProductName, count(OrderID) CNT
FROM Products p
JOIN [Order Details] od ON p.ProductID=od.ProductID
GROUP BY p.ProductID,ProductName)
SELECT top 1 with ties ProductName
FROM cte_1
ORDER BY CNT desc
In your sample code,you tried to pull the product with maximum stock held. since you joined with other tables (like order details etc) you are getting multiple results for the same product. if you wanted to get a product with maximum stock,you can use any of the following script.
SELECT ProductName
FROM Products P
WHERE P.UnitsInStock = (SELECT MAX(P.UnitsInStock) Items
FROM Products P)
OR
SELECT top 1 ProductName
FROM Products P
ORDER BY P.UnitsInStock desc
OR
SELECT top 1 with ties ProductName --use with ties in order to pull top products having same UnitsInStock
FROM Products P
ORDER BY P.UnitsInStock desc

How can I refine this query to narrow the returned resultset

I have the following Table structure:
Against which I am running the following query;
SELECT DISTINCT c.VatCountryCode,
od.IntrastatCommodityCode,
CAST(ROUND(SUM(od.Quantity * od.UnitCost/vt.ConversionFactor),2) AS DECIMAL(12,2)) AS Value,
SUM(od.Quantity) AS Quantity
FROM Contacts c
INNER JOIN OrderHeaders oh ON c.ContactId = oh.CustomerId
INNER JOIN OrderDetails od ON oh.OrderId = od.OrderId
INNER JOIN VatTransactions vt ON c.ContactId = vt.OriginatingContactId
WHERE c.VatCountryCode <> RTRIM('GB')
AND c.ContactId = vt.originatingContactId
AND vt.VatTransactionDate Between '20160101' AND '20160229'
GROUP BY VatCountryCode,IntrastatCommodityCode
ORDER BY VatCountryCode,IntrastatCommodityCode
Which process results as illustrated:
The results returned in the value and quantity are way too high given that it should only be referencing a two month period, so I'm guessing that it is in fact pulling results from all of the records in the OrderDetails Table.
The InvoiceId and the ContactId are referenced in the VatTransactions Table which I'm using to set the date span for the query on.
I suspect, but am not sure) that this is a case where I shouldn't be using nothing but Inner Joins. If anyone could suggest where I try and make alterations (and why so that I can put said knowledge to use in future) I'd be most grateful
I think it might be this line:
vt.VatTransactionDate Between '20160101' AND '20160229'
You are looking for a value between two strings, not two numbers, which could have confusing results. I would try:
CAST(vt.VatTransactionDate AS DECIMAL(12,2)) Between 20160101 AND 20160229

Improve efficiency of PostgreSQL Query - One to Many, Count is 1

I would like to improve the efficiency of the following query, if possible:
SELECT * FROM orders o
INNER JOIN order_items oi
ON o.id = oi.order_id
WHERE o.fulfilled = false
AND o.id NOT IN (SELECT order_id
FROM order_items
WHERE sku = '011111'
GROUP BY order_id
HAVING COUNT(order_id) = 1)
There is a one to many relationship between the orders and order_items tables (o.id = oi.order_id).
The goal is to select all of the information from two tables, with the following conditions:
The order has not been fulfilled (orders.fulfilled = false).
Exclude all of the orders that have exactly one order item with an SKU of '011111' (oi.sku like '011111').
Any help is appreciated!
IN can be slower , modified the query to use inner join
select * from orders o
inner join order_items oi
on o.id = oi.order_id
and o.fulfilled = false
inner join( select order_id
from order_items
where sku != '011111'
group by order_id
having count(order_id) = 1) T
on T.order_id = oi.id
count(whatever) usually will force a full table scan (because it has no idea how many orders there are grouped by order_items and you can not create an index on an aggregate), unless there is another clause that can use an index. Most likely a sku not equaling something will not be selective enough (I'm guessing you have a lot skus.) You can look at the explain output and you probably see a full table scan in the IN part of you query.
If thats the case then you have the option of caching the count data and then indexing it through a trigger function that updates a current_count column every time an order is placed or fulfilled. Or, you could cache a query that kept tracked of the count (say if the information does not need to refreshed very much.)
Can we assume that an order can't have more than one item with the same sku on the same order?
Can we assume that you can't have an order with no items?
If so, writing the opposite might be faster. The query below finds all orders that have any sku other than '011111'. Also, correlated subqueries are usually faster than non-correlated subqueries (although optimizers are smart enough to rewrite this a lot of the time). Exists clauses are usually faster than an in clause since the engine can exit before looking through all of the subquery rows.
SELECT *
FROM orders o
INNER JOIN order_items oi
ON o.id = oi.order_id
WHERE o.fulfilled = false
AND EXISTS (SELECT 'x'
FROM order_items oi2
WHERE o.order_id = oi2.order_id
AND sku != '011111')