SQL Cross Apply with three tables - sql

I am trying to combine 3 tables using Cross Apply in a time-efficient manor. I can get the results that I want, but the run time is too great. The three tables are:
-CUSTOMERS, which has the columns CustomerId(primary key) and CurrentSetType
-HISTORY, which has the columns CustomerId(foreign key), SetType, and TimeStamp
-UPDATELIST, which has the column CustomerId
My goal is to find the most recent SetType from HISTORY for each CustomerId in UPDATELIST that is different from the CurrentSetType (this is part of a glorified 'undo' button). I believe my problem is that the CUSTOMERS and HISTORY tables are enormous, and I don't think I'm getting them paired down to the smaller UPDATELIST before doing a cross apply on the entire thing. My current query is this:
DECLARE #UPDATELIST TABLE (Identifier INT NOT NULL PRIMARY KEY);
INSERT INTO #UPDATELIST (Identifier) VALUES (#####); -- a few hundred lines of this
SELECT CustomerId, ITEM.SetType
FROM CUSTOMERS
CROSS APPLY
(SELECT TOP 1 SetType FROM HISTORY
WHERE HISTORY.CustomerId IN (SELECT Identifier FROM #UPDATELIST)
AND HISTORY.CustomerId = CUSTOMERS.CustomerId
AND HISTORY.SetType != CUSTOMERS.CurrentSetType ORDER BY TimeStamp DESC) AS ITEM
What is the most efficient query for this?
EDIT: I am using MSDN SQL version 12.0.5532

My first thought would be something like this:
SELECT
CustomerID
, SetType
FROM
(
SELECT
C.CustomerID
, H.SetType
, ROW_NUMBER() OVER (PARTITION BY C.CustomerID ORDER BY H.TimeStamp DESC) R
FROM
CUSTOMERS C
JOIN UPDATELIST U ON U.CustomerId = C.CustomerId
JOIN HISTORY H ON
H.CustomerId = C.CustomerId
AND H.SetType <> C.CurrentSetType
) Q
WHERE R = 1
How does that work?

Related

Making table consisting all customers and last orders of each customer

Lets say, I have two tables: Customers and Orders. I need to get result consisting all customers and their last order.
I am trying to make this query, because my bigger goal is to iterate over all customers and their last orders to get crucial information. I am trying to do this using cursor, so I need that table.
-edit-
I have MSSQL database on SQL 2014 server.
I have relation one-to-many, where customers have many orders.
I need to migrate data from one DB to another with different data schema. I thought about making sql script to get data from one DB and then using cursor and variables to insert data to a new one. There are not many records so performance is not an issue.
Let us have Customer(cid PK, and possibly other columns) and Orders(cid FK, order_time) having 1:N cardinality. The the solution can be along these lines:
select c.*, o.*
from customer c
join orders o on c.cid = o.cid
join
(
select cid, max(order_time) max_order_time
from orders
group by cid
) t on o.cid = t.cid and
o.order_time = t.max_order_time
My first thought is to use row_number():
select
from customer c join
(select o.*, row_number() over (partition by cid order by order_time desc) as seqnum
from orders o
where order_time < '2018-01-01' and order_time >= '2017-01-01'
) o
on c.cid = o.cid and o.seqnum = 1;

Returning ID's from two other tables or null if no IDs found using using a left join SQL Server

I am wondering if someone could hep me. I am trying to make a join on two tables and return an id if an id is there but if there is no id return null but still return the row for that product and not ignore it. My query below returns twice the amount the records to which I can not figure out why.
SELECT
T2.ProductID, FirstChild.SupplierID, SecondChild.AccountID
FROM
Products T2
LEFT OUTER JOIN
(
SELECT TOP(1) SupplierID, Reference,CompanyID, Row_Number() OVER (Partition By SupplierID Order By SupplierID) AS RowNo FROM Suppliers
)
FirstChild ON T2.SupplierReference = FirstChild.Reference AND RowNo = 1AND FirstChild.CompanyID =T2.CompanyID
LEFT OUTER JOIN
(
SELECT TOP(1) AccountID, SageKey,CompanyID, Row_Number() OVER (Partition By AccountID Order By AccountID) AS RowNo2 FROM Accounts
)
SecondChild ON T2.ProductAccountReference = SecondChild.Reference AND RowNo2 = 1 AND SecondChild.CompanyID =T2.CompanyID
Example of what I am trying to do
ProductID SupplierID AccountID
1 5 2
2 6 NULL
3 NULL NULL
OUTER APPLY and ditching the ROW_NUMBER Seems like a better choice here:
SELECT
p.ProductId
,FirstChild.SupplierId
,SecondChild.AccountId
FROM
Products p
OUTER APPLY (SELECT TOP (1) s.SupplierId
FROM
Suppliers s
WHERE
p.SupplierReference = s.SupplierReference
AND p.CompanyId = s.CompanyId
ORDER BY
s.SupplierId
) FirstChild
OUTER APPLY (SELECT TOP (1) a.AccountId
FROM
Accounts
WHERE
p.ProductAccountReference = a.Reference
AND p.CompanyId = a.CompanyId
ORDER BY
a.AccountID
) SecondChild
The way your query is written above there is no correlation for the derived tables. Which means you would always get what ever SupplierId SQL chooses based on optimization and if that doesn't happen to always be Row1 you wont get the value. You need to relate your Table and select top 1, adding an ORDER BY in your derived table is like identifying the row number you want.
If it's just showing duplicate records, wouldn't an inelegant solution just be to add distinct in the select line?

Easiest Approach to Selecting Top Result from Windowed Function?

So, let's say I have a list of customers and I want to select details for all customers, as well as their most purchased product from a specific class of products. Even if they have not purchased one of these products I want to select the customer detail, while simply displaying null for their most purchased product from that class.
I would start with the following either as a CTE or temp table:
SELECT
CUST_NUMBER
,PRODUCT
,ROW_NUMBER() OVER (PARTITION BY CUST_NUMBER ORDER BY COUNT(ORDER_NUM) DESC) [ProdRank]
FROM ORDERS
WHERE PROD_CLASS = 'x'
GROUP BY
CUST_NUMBER
,PRODUCT
The thing is this - There can be many different products within this product class, and I am only interested in selecting where ProdRank = 1. As you might know though, I cannot specify either in the WHERE or in HAVING clause for ProdRank to = 1.
I get the error message "Windowed functions can only appear in the SELECT or ORDER BY clauses."
The situation is further complicated by the fact that many customers may have not ordered any products within the product class. Because of this I cannot simply left join the customer list to the above and specify WHERE ProdRank = 1, or else it mimics an inner join and I drop any customers where ProdRank is Null.
The method I've come up with in order to deal with this is to first create a temp table with the code above as #Products which includes the customer and every product with the respective ranking. I then create a second temp table called #TopProducts where I simply :
SELECT * FROM
#Products WHERE
ProdRank = 1
After that I just left join against #TopProducts from my Customers table.
It seems like there should be a simpler way of dealing with this though. Is there any way I can select the top partitioned result of ROW_NUMBER() or RANK() in one step, rather than creating two temp tables?
Use a Common Table Expression
WITH topProducts AS (
SELECT
CUST_NUMBER
,PRODUCT
,ROW_NUMBER() OVER (PARTITION BY CUST_NUMBER ORDER BY COUNT(ORDER_NUM) DESC) [ProdRank]
FROM ORDERS
WHERE PROD_CLASS = 'x'
GROUP BY
CUST_NUMBER
,PRODUCT
)
SELECT *
FROM CustomerDetails c
LEFT JOIN TopProducts p ON (ProdRank = 1 AND c.CUST_NUMBER = p.CUST_NUMBER)
Use a subquery:
SELECT *
FROM CustomerDetails c
LEFT JOIN (
SELECT
CUST_NUMBER
,PRODUCT
,ROW_NUMBER() OVER (PARTITION BY CUST_NUMBER ORDER BY COUNT(ORDER_NUM) DESC) [ProdRank]
FROM ORDERS
WHERE PROD_CLASS = 'x'
GROUP BY
CUST_NUMBER
,PRODUCT
) p ON (ProdRank = 1 AND c.CUST_NUMBER = p.CUST_NUMBER)
I would use outer apply and top in your scenario. Does that make sense?
Few examples here Real life example, when to use OUTER / CROSS APPLY in SQL
I would write a piece of code, but I'm on mobile and that's really not comfortable...

Refactoring a tsql view which uses row_number() to return rows with a unique column value

I have a sql view, which I'm using to retrieve data. Lets say its a large list of products, which are linked to the customers who have bought them. The view should return only one row per product, no matter how many customers it is linked to. I'm using the row_number function to achieve this. (This example is simplified, the generic situation would be a query where there should only be one row returned for each unique value of some column X. Which row is returned is not important)
CREATE VIEW productView AS
SELECT * FROM
(SELECT
Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering,
customer.Id
//various other columns
FROM products
LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
//various other joins
) as temp
WHERE temp.prodcut_numbering = 1
Now lets say that the total number of rows in this view is ~1 million, and running select * from productView takes 10 seconds. Performing a query such as select * from productView where productID = 10 takes the same amount of time. I believe this is because the query gets evaluated to this
SELECT * FROM
(SELECT
Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering,
customer.Id
//various other columns
FROM products
LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
//various other joins
) as temp
WHERE prodcut_numbering = 1 and prodcut.Id = 10
I think this is causing the inner subquery to be evaluated in full each time. Ideally I'd like to use something along the following lines
SELECT
Row_number() OVER(PARTITION BY products.productID ORDER BY products.productID) AS product_numbering,
customer.id
//various other columns
FROM products
LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
//various other joins
WHERE prodcut_numbering = 1
But this doesn't seem to be allowed. Is there any way to do something similar?
EDIT -
After much experimentation, the actual problem I believe I am having is how to force a join to return exactly 1 row. I tried to use outer apply, as suggested below. Some sample code.
CREATE TABLE Products (id int not null PRIMARY KEY)
CREATE TABLE Customers (
id int not null PRIMARY KEY,
productId int not null,
value varchar(20) NOT NULL)
declare #count int = 1
while #count <= 150000
begin
insert into Customers (id, productID, value)
values (#count,#count/2, 'Value ' + cast(#count/2 as varchar))
insert into Products (id)
values (#count)
SET #count = #count + 1
end
CREATE NONCLUSTERED INDEX productId ON Customers (productID ASC)
With the above sample set, the 'get everything' query below
select * from Products
outer apply (select top 1 *
from Customers
where Products.id = Customers.productID) Customers
takes ~1000ms to run. Adding an explicit condition:
select * from Products
outer apply (select top 1 *
from Customers
where Products.id = Customers.productID) Customers
where Customers.value = 'Value 45872'
Takes some identical amount of time. This 1000ms for a fairly simple query is already too much, and scales the wrong way (upwards) when adding additional similar joins.
Try the following approach, using a Common Table Expression (CTE). With the test data you provided, it returns specific ProductIds in less than a second.
create view ProductTest as
with cte as (
select
row_number() over (partition by p.id order by p.id) as RN,
c.*
from
Products p
inner join Customers c
on p.id = c.productid
)
select *
from cte
where RN = 1
go
select * from ProductTest where ProductId = 25
What if you did something like:
SELECT ...
FROM products
OUTER APPLY (SELECT TOP 1 * from customer where customerid = products.buyerid) as customer
...
Then the filter on productId should help. It might be worse without filtering, though.
The problem is that your data model is flawed. You should have three tables:
Customers (customerId, ...)
Products (productId,...)
ProductSales (customerId, productId)
Furthermore, the sale table should probably be split into 1-to-many (Sales and SalesDetails). Unless you fix your data model you're just going to run circles around your tail chasing red-herring problems. If the system is not your design, fix it. If the boss doesn't let your fix it, then fix it. If you cannot fix it, then fix it. There isn't a easy way out for the bad data model you're proposing.
this will probably be fast enough if you really don't care which customer you bring back
select p1.*, c1.*
FROM products p1
Left Join (
select p2.id, max( c2.id) max_customer_id
From product p2
Join customer c2 on
c2.productID = p2.id
group by 1
) product_max_customer
Left join customer c1 on
c1.id = product_max_customer.max_customer_id
;

SQL query for join with condition

I have these two tables:
Customers: Id, Name
Orders: Id, CustomerId, Time, Status
I want to get a list of customers for which the LAST order does not have a status of 'Wrong'.
I know how to use a LEFT JOIN to get a count of orders for each customer, but I don't know how I can use this statement for what I want. Maybe a JOIN is not the right thing to use too, I'm not sure.
It's possible that customers do not have any order, and they should be returned.
I'm abstracting the real tables here, but the scenario is for a windows phone app sending notifications. I want to get all clients for which their last notification does not have a 'Dropped' status. I can sort their notifications (orders) by the 'Time' field. Thanks for the help, while I continue experimenting with subqueries in the where clause.
Select ...
From Customers As C
Where Not Exists (
Select 1
From Orders As O1
Join (
Select O2.CustomerId, Max( O2.Time ) As Time
From Orders As O2
Group By O2.CustomerId
) As LastOrderTime
On LastOrderTime.CustomerId = O1.CustomerId
And LastOrderTime.Time = O1.Time
Where O1.Status = 'Dropped'
And O1.CustomerId = C.Id
)
There are obviously alternatives based on the actual database product and version. For example, in SQL Server one could use the TOP command or a CTE perhaps. However, without knowing what specific product is being used, the above solution should produce the results you want in almost any database product.
Addition
If you were using a product that supported ranking functions (which database product and version isn't mentioned) and common-table expressions, then an alternative solution might be something like so:
With RankedOrders As
(
Select O.CustomerId, O.Status
, Row_Number() Over( Partition By CustomerId Order By Time Desc ) As Rnk
From Orders As O
)
Select ...
From Customers
Where Not Exists (
Select 1
From RankedOrders As O1
Where O1.CustomerId = C.Id
And O1.Rnk = 1
And O1.Status = 'Dropped'
)
Assuming Last order refers to the Time column here is my query:
SELECT C.Id,
C.Name,
MAX(O.Time)
FROM
Customers C
INNER JOIN Orders O
ON C.Id = O.CustomerId
WHERE
O.Status != 'Wrong'
GROUP BY C.Id,
C.Name
EDIT:
Regarding your table configuration. You should really consider revising the structure to include a third table. They would look like this:
Customer
CustomerId | Name
Order
OrderId | Status | Time
CompletedOrders
CoId | CustomerId | OrderId
Now what you do is store the info about a customer or order in their respective tables ... then when an order is made you just create a CompletedOrders entry with the ids of the 2 individual records. This will allow for a 1 to Many relationship between customer and orders.
Didn't check it out, but something like this?
SELECT c.CustmerId, c.Name, MAX(o.Time)
FROM Customers c
LEFT JOIN Orders o ON o.CustomerId = c.CustomerId
WHERE o.Status <> 'Wrong'
GROUP BY c.CustomerId, C.Name
You can get list of customers with the LAST order which has status of 'Wrong' with something like
select customerId from orders where status='Wrong'
group by customerId
having time=max(time)