Efficiency of joining subqueries in SQL Server

Efficiency of joining subqueries in SQL Server - sql

I have a customers and orders table in SQL Server 2008 R2. Both have indexes on the customer id (called id). I need to return details about all customers in the customers table and information from the orders table, such as details of the first order.
I currently left join my customers table on a subquery of the orders table, with the subquery returning the information I need about the orders. For example:
SELECT c.id
,c.country
,First_orders.product
,First_orders.order_id
FROM customers c
LEFT JOIN SELECT( id,
product
FROM (SELECT id
,product
,order_id
,ROW_NUMBER() OVER (PARTITION BY id ORDER BY Order_Date asc) as order_No
FROM orders) orders
WHERE Order_no = 1) First_Orders
ON c.id = First_orders.id
I'm quite new to SQL and want to understand if I'm doing this efficiently. I end up left joining quite a few subqueries like this onto the customers table in one select query and it can take tens of minutes to run.
So am I doing this efficiently or can it be improved? For example, I'm not sure if my index on id in the orders table is of any use and maybe I could speed up the query by creating a temporary table of what is in the subquery first and creating a unique index on id in the temporary table so SQL Server knows id is now a unique column and then joining my customers table to this temporary table? I typically have one or two million rows in the customers and orders tables.
Many thanks in advance!

You can remove one of your subqueries to make it a little more efficient:
SELECT c.id
,c.country
,First_orders.product
,First_orders.order_id
FROM customers c
LEFT JOIN (SELECT id
,product
,order_id
,ROW_NUMBER() OVER (PARTITION BY id ORDER BY Order_Date asc) as order_No
FROM orders) First_Orders
ON c.id = First_orders.id AND First_Orders.order_No = 1
In your above query, you need to be careful where you place your parentheses as I don't think it will work. Also, you're returning product in your results, but not including in your nested subquery.

For someone who is just learning SQL, your query looks pretty good.
The index on customers may or may not be used for the query -- you would need to look at the execution plan. An index on orders(id, order_date) could be used quite effectively for the row_number function.
One comment is on the naming of fields. The field orders.id should not be the customer id. That should be something like 'orders.Customer_Id`. Keeping the naming system consistent across tables will help you in the future.

Try this...its easy to understand
;WITH cte
AS (
SELECT id
,product
,order_id
,ROW_NUMBER() OVER (
PARTITION BY id ORDER BY Order_Date ASC
) AS order_No
FROM orders
)
SELECT c.id
,c.country
,c1.Product
,c1.order_id
FROM customers c
INNER JOIN cte c1 ON c.id = c1.id
WHERE c1.order_No = 1

Related

Making table consisting all customers and last orders of each customer

Lets say, I have two tables: Customers and Orders. I need to get result consisting all customers and their last order.
I am trying to make this query, because my bigger goal is to iterate over all customers and their last orders to get crucial information. I am trying to do this using cursor, so I need that table.
-edit-
I have MSSQL database on SQL 2014 server.
I have relation one-to-many, where customers have many orders.
I need to migrate data from one DB to another with different data schema. I thought about making sql script to get data from one DB and then using cursor and variables to insert data to a new one. There are not many records so performance is not an issue.

Let us have Customer(cid PK, and possibly other columns) and Orders(cid FK, order_time) having 1:N cardinality. The the solution can be along these lines:
select c.*, o.*
from customer c
join orders o on c.cid = o.cid
join
(
select cid, max(order_time) max_order_time
from orders
group by cid
) t on o.cid = t.cid and
o.order_time = t.max_order_time

My first thought is to use row_number():
select
from customer c join
(select o.*, row_number() over (partition by cid order by order_time desc) as seqnum
from orders o
where order_time < '2018-01-01' and order_time >= '2017-01-01'
) o
on c.cid = o.cid and o.seqnum = 1;

Can we use order by in subquery? If not why sometime could use top(n) order by?

I'm an entry level trying to learn more about SQL,
I have a question "can we use order by in subquery?" I did look for some article says no we could not use.
But on the other hand, I saw examples using top(n) with order by in subquery:
select c.CustomerId,
c.OrderId
from CustomerOrder c
inner join (
select top 2
with TIES CustomerId,
COUNT(distinct OrderId) as Count
from CustomerOrder
group by CustomerId
order by Count desc
) b on c.CustomerId = b.CustomerId
So now I'm bit confused.
Could anyone advise?
Thank you very much.

Yes, you are right we cannot use order by in a inner query. Because it is acting as a table. A table in itself needs to be sorted when queried for different purposes.
In your query itself the inner query is select some records using Top 2. Eventhough these are top 2 records only, they form a table with 2 records which is enough for it to recognized as a table and join it with another table
The right query will be:-
SELECT * FROM
(
SELECT c.CustomerId, c.OrderId, DENSE_RANK() OVER(ORDER BY b.count DESC) AS RANK
FROM CustomerOrder c
INNER JOIN
(SELECT CustomerId, COUNT(distinct OrderId) as Count
FROM CustomerOrder GROUP BY CustomerId) b
ON c.CustomerId = b.CustomerId
) a
WHERE RANK IN (1,2);
Hope I have answered your question.

Yes we can use order by clause in sub query, for example i have a table named as product (check the screen shot of table http://prntscr.com/f15j3z). Chek this query on your side and revert me in case of any doubt.
select p1.* from product as p1 where product_id = (select p2.product_id from product as p2 order by product_id limit 0,1)

yes we can use order by in subquery,but it is pointless to use it.
It is better to use it in the outer query.There is no use of ordering the result of subquery, because result of inner query will become the input for outer query and it does not have to do any thing with the order of the result of subquery.

A simple nested SQL statement

I have a question in SQL that I am trying to solve. I know that the answer is very simple but I just can not get it right. I have two tables, one with customers and the other one with orders. The two tables are connected using customer_id. The question is to list all the customers that did not make any order! The question is to be run in MapInfo Professional, a GIS desktop software, so not every SQL command is applicable to that program. In other words, I will be thankful if I get more than approach to solve that problem.
Here is how I have been thinking:
SELECT customer_id
from customers
WHERE order_id not in (select order_id from order)
and customer.customer_id = order.customer_id

How about this:
SELECT * from customers
WHERE customer_id not in (select customer_id from order)
The logic is, if we don't have a customer_id in order that means that customer has never placed an order. As you have mentioned that customer_id is the common key, hence above query should fetch the desired result.

SELECT c.customer_id
FROM customers c
LEFT JOIN orders o ON (o.customer_id = c.customer_id)
WHERE o.order_id IS NULL

... The NOT EXISITS way:
SELECT * FROM customers
WHERE NOT EXISTS (
SELECT * FROM orders
WHERE orders.customer_id = customer.customer_id
)

There are some problems with your approach:
There is probably no order_id in the customers table, but in your where-statement you refer to it
The alias (or table-name) order in the where-statement (order.customer_id) is not known because there is no join statement in there
If there would be a join, you would filter out all customers without orders, exactly the opposite of what you want
Your question is difficualt to answer to me because I do not know which SQL subset MapInfo GIS understands, but lets try:
select * from customers c where not exists (select * from order o where o.customer_id=c.customer_id)

Using SQL query to find details of customers who ordered > x types of products

Please note that I have seen a similar query here, but think my query is different enough to merit a separate question.
Suppose that there is a database with the following tables:
customer_table with customer_ID (key field), customer_name
orders_table with order_ID (key field), customer_ID, product_ID
Now suppose I would like to find the names of all the customers who have ordered more than 10 different types of product, and the number of types of products they ordered. Multiple orders of the same product does not count.
I think the query below should work, but have the following questions:
Is the use of count(distinct xxx) generally allowed with a "group by" statement?
Is the method I use the standard way? Does anybody have any better ideas (e.g. without involving temporary tables)?
Below is my query
select T1.customer_name, T1.customer_ID, T2.number_of_products_ordered
from customer_table T1
inner join
(
select cust.customer_ID as customer_identity, count(distinct ord.product_ID) as number_of_products_ordered
from customer_table cust
inner join order_table ord on cust.customer_ID=ord.customer_ID
group by ord.customer_ID, ord.product_ID
having count(distinct ord.product_ID) > 10
) T2
on T1.customer_ID=T2.customer_identity
order by T2.number_of_products_ordered, T1.customer_name

Isn't that what you are looking for? Seems to be a little bit simpler. Tested it on SQL Server - works fine.
SELECT customer_name, COUNT(DISTINCT product_ID) as products_count FROM customer_table
INNER JOIN orders_table ON customer_table.customer_ID = orders_table.customer_ID
GROUP BY customer_table.customer_ID, customer_name
HAVING COUNT(DISTINCT product_ID) > 10

You could do it more simply:
select
c.id,
c.cname,
count(distinct o.pid) as `uniques`
from o join c
on c.id = o.cid
group by c.id
having `uniques` > 10

SQL query for join with condition

I have these two tables:
Customers: Id, Name
Orders: Id, CustomerId, Time, Status
I want to get a list of customers for which the LAST order does not have a status of 'Wrong'.
I know how to use a LEFT JOIN to get a count of orders for each customer, but I don't know how I can use this statement for what I want. Maybe a JOIN is not the right thing to use too, I'm not sure.
It's possible that customers do not have any order, and they should be returned.
I'm abstracting the real tables here, but the scenario is for a windows phone app sending notifications. I want to get all clients for which their last notification does not have a 'Dropped' status. I can sort their notifications (orders) by the 'Time' field. Thanks for the help, while I continue experimenting with subqueries in the where clause.

Select ...
From Customers As C
Where Not Exists (
Select 1
From Orders As O1
Join (
Select O2.CustomerId, Max( O2.Time ) As Time
From Orders As O2
Group By O2.CustomerId
) As LastOrderTime
On LastOrderTime.CustomerId = O1.CustomerId
And LastOrderTime.Time = O1.Time
Where O1.Status = 'Dropped'
And O1.CustomerId = C.Id
)
There are obviously alternatives based on the actual database product and version. For example, in SQL Server one could use the TOP command or a CTE perhaps. However, without knowing what specific product is being used, the above solution should produce the results you want in almost any database product.
Addition
If you were using a product that supported ranking functions (which database product and version isn't mentioned) and common-table expressions, then an alternative solution might be something like so:
With RankedOrders As
(
Select O.CustomerId, O.Status
, Row_Number() Over( Partition By CustomerId Order By Time Desc ) As Rnk
From Orders As O
)
Select ...
From Customers
Where Not Exists (
Select 1
From RankedOrders As O1
Where O1.CustomerId = C.Id
And O1.Rnk = 1
And O1.Status = 'Dropped'
)

Assuming Last order refers to the Time column here is my query:
SELECT C.Id,
C.Name,
MAX(O.Time)
FROM
Customers C
INNER JOIN Orders O
ON C.Id = O.CustomerId
WHERE
O.Status != 'Wrong'
GROUP BY C.Id,
C.Name
EDIT:
Regarding your table configuration. You should really consider revising the structure to include a third table. They would look like this:
Customer
CustomerId | Name
Order
OrderId | Status | Time
CompletedOrders
CoId | CustomerId | OrderId
Now what you do is store the info about a customer or order in their respective tables ... then when an order is made you just create a CompletedOrders entry with the ids of the 2 individual records. This will allow for a 1 to Many relationship between customer and orders.

Didn't check it out, but something like this?
SELECT c.CustmerId, c.Name, MAX(o.Time)
FROM Customers c
LEFT JOIN Orders o ON o.CustomerId = c.CustomerId
WHERE o.Status <> 'Wrong'
GROUP BY c.CustomerId, C.Name

You can get list of customers with the LAST order which has status of 'Wrong' with something like
select customerId from orders where status='Wrong'
group by customerId
having time=max(time)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Efficiency of joining subqueries in SQL Server - sql

Try this...its easy to understand ;WITH cte AS ( SELECT id ,product ,order_id ,ROW_NUMBER() OVER ( PARTITION BY id ORDER BY Order_Date ASC ) AS order_No FROM orders ) SELECT c.id ,c.country ,c1.Product ,c1.order_id FROM customers c INNER JOIN cte c1 ON c.id = c1.id WHERE c1.order_No = 1

Related

Making table consisting all customers and last orders of each customer

Can we use order by in subquery? If not why sometime could use top(n) order by?

A simple nested SQL statement

Using SQL query to find details of customers who ordered > x types of products

SQL query for join with condition

Categories

Resources