I need to write a query which would return results in line with the following.. For three tables, customers, products and order history, I want to find the latest order for each product for each customer. Any help appreciated.
The latest order for each product for each customer - you can achieve this by using a ROW_NUMBER() window function and partitioning your data by those for each criteria.
So try something like this (just guessing table and column names, since you haven't provided anything to go on):
;WITH NewestData AS
(
SELECT
oh.OrderDate,
c.CustomerName,
p.ProductName,
RowNum = ROW_NUMBER() OVER (PARTITION BY oh.CustomerID, oh.ProductID
ORDER BY oh.OrderDate DESC)
FROM
dbo.OrderHistory oh
INNER JOIN
dbo.Customer c ON oh.CustomerID = c.CustomerID
INNER JOIN
dbo.Product p ON oh.ProductID = p.ProductID
)
SELECT
OrderDate, CustomerName, ProductName
FROM
NewestData
WHERE
RowNum = 1
OK, explanation time:
the CTE (Common Table Expression) basically joins the three tables (guessing what the table and column names are, and how they are connected) and selects some of the columns from those tables (you could add more columns, if you need them, of course!)
the RowNum is a consecutive number, starting at 1, for each "partition" of data; the PARTITION BY clause expresses that for each combination of a (CustomerID, ProductID), you want to have a "partition" which gets numbered (1, 2, 3, 4,.....) based on the ORDER BY clause - here, it gets number with 1 for the most recent order for that partition (for that customer+product).
So in the end, all you need to do, is select from that CTE, and select those rows only that have RowNum = 1 - those are the most recent orders for each "partition" of (CustomerID, ProductID) in your order history table.
This works from SQL Server 2005 on and newer versions - it is not supported in 2000 .....
Related
I am trying to write a SQL query that returns the name and purchase amount of the five customers in each state who have spent the most money.
Table schemas
customers
|_state
|_customer_id
|_customer_name
transactions
|_customer_id
|_transact_amt
Attempts look something like this
SELECT state, Sum(transact_amt) AS HighestSum
FROM (
SELECT name, transactions.transact_amt, SUM(transactions.transact_amt) AS HighestSum
FROM customers
INNER JOIN customers ON transactions.customer_id = customers.customer_id
GROUP BY state
) Q
GROUP BY transact_amt
ORDER BY HighestSum
I'm lost. Thank you.
Expected results are the names of customers with the top 5 highest transactions in each state.
ERROR: table name "customers" specified more than once
SQL state: 42712
First, you need for your JOIN to be correct. Second, you want to use window functions:
SELECT ct.*
FROM (SELECT c.customer_id, c.name, c.state, SUM(t.transact_amt) AS total,
ROW_NUMBER() OVER (PARTITION BY c.state ORDER BY SUM(t.transact_amt) DESC) as seqnum
FROM customers c JOIN
transaactions t
ON t.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, c.state
) ct
WHERE seqnum <= 5;
You seem to have several issues with SQL. I would start with understanding aggregation functions. You have a SUM() with the alias HighestSum. It is simply the total per customer.
You can get them using aggregation and then by using the RANK() window function. For example:
select
state,
rk,
customer_name
from (
select
*,
rank() over(partition by state order by total desc) as rk
from (
select
c.customer_id,
c.customer_name,
c.state,
sum(t.transact_amt) as total
from customers c
join transactions t on t.customer_id = c.customer_id
group by c.customer_id
) x
) y
where rk <= 5
order by state, rk
There are two valid answers already. Here's a third:
SELECT *
FROM (
SELECT c.state, c.customer_name, t.*
, row_number() OVER (PARTITION BY c.state ORDER BY t.transact_sum DESC NULLS LAST, customer_id) AS rn
FROM (
SELECT customer_id, sum(transact_amt) AS transact_sum
FROM transactions
GROUP BY customer_id
) t
JOIN customers c USING (customer_id)
) sub
WHERE rn < 6
ORDER BY state, rn;
Major points
When aggregating all or most rows of a big table, it's typically substantially faster to aggregate before the join. Assuming referential integrity (FK constraints), we won't be aggregating rows that would be filtered otherwise. This might change from nice-to-have to a pure necessity when joining to more aggregated tables. Related:
Why does the following join increase the query time significantly?
Two SQL LEFT JOINS produce incorrect result
Add additional ORDER BY item(s) in the window function to define which rows to pick from ties. In my example, it's simply customer_id. If you have no tiebreaker, results are arbitrary in case of a tie, which may be OK. But every other execution might return different results, which typically is a problem. Or you include all ties in the result. Then we are back to rank() instead of row_number(). See:
PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?
While transact_amt can be NULL (has not been ruled out) any sum may end up to be NULL as well. With an an unsuspecting ORDER BY t.transact_sum DESC those customers come out on top as NULL comes first in descending order. Use DESC NULLS LAST to avoid this pitfall. (Or define the column transact_amt as NOT NULL.)
PostgreSQL sort by datetime asc, null first?
I have two tables:
patients(ID, Firstname, Lastname, ...)
records(ID, Date, Time, Version)
I want to (inner) join these tables, so I have the records with patient data, but in the column for Version I want always the first value that was recorded for the patient (so with the minimum of date and time dependent on the patient (id)). I tried with subquery but HANA doesn't allow ORDER-BY or LIMIT clause in subqueries.
How can I implement this with SQL? (HANA SQL)
Kind regards and thanks in advance.
HANA supports window functions, so you can join against a derived table that picks the first version:
select p.*, r.id, r.date, r.time, r.version
from patients p
join (
select id, date, time, version, patient_id,
row_number() over (partition by patient_id order by version) as rn
from records
) r on p.id = r.patient_id and r.rn = 1
The above assumes that the records table has a column patient_id that contains the id of the patients table to which that record belongs to.
I have sql such as:
select
c.customerID, sum(o.orderCost)
from customer c, order o
where c.customerID=o.customerID
group by c.customerID;
This returns a list of
customerID, orderCost
where orderCost is the total cost of all orders the customer has made. I want to select the customer who has paid us the most (who has the highest orderCost). Do I need to create a nested query for this?
You need a nested query, but you don't have to access the tables twice if you use analytic functions.
select customerID, sumOrderCost from
(
select customerID, sumOrderCost,
rank() over (order by sumOrderCost desc) as rn
from (
select c.customerID, sum(o.orderCost) as sumOrderCost
from customer c, orders o
where c.customerID=o.customerID
group by c.customerID
)
)
where rn = 1;
The rank() function ranks the results from your original query by the sum() value, then you only pick those with the highest rank - that is, the row(s) with the highest total order cost.
If more than one customer has the same total order cost, this will return both. If that isn't what you want you'll have to decide how to determine which single result to use. If you want the lowest customer ID, for example, add that to the ranking function:
select customerID, sumOrderCost,
rank() over (order by sumOrderCost desc, customerID) as rn
You can adjust you original query to return other data instead, just for the ordering, and not include it in the outer select.
You need to create nested query for this.
Two queries.
I am trying to select the max dates on a field with other tables, to only give me one distinct row for the max date and not other rows with other dates. the code i have for max is
SELECT DISTINCT
Cust.CustId,
LastDate=(Select Max(Convert(Date,TreatmentFieldHstry.TreatmentDateTime))
FROM TreatmentFieldHstry
WHERE Cust.CustSer = Course.CustSer
AND Course.CourseSer = Session.CourseSer
AND Session.SessionSer = TreatmentFieldHstry.SessionSer)
This gives multiple rows depending on how many dates - i just want one for the max - can anyone help with this?
Thanks
You didn't specify exactly what database and version you're using - but if you're on SQL Server 2005 or newer, you can use something like this (a CTE with the ROW_NUMBER ranking function) - I've simplified it a bit, since I don't know what those other tables are that you have in your select, that don't ever show up in any of the SELECT column lists.....
;WITH TopData AS
(
SELECT c.CustId, t.TreatmentDateTime,
ROW_NUMBER() OVER(PARTITION BY c.CustId ORDER BY t.TreatmentDateTime DESC) AS 'RowNum'
FROM
dbo.TreatmentFieldHstry t
INNER JOIN
dbo.Customer c ON c.CustId = t.CustId -- or whatever JOIN condition you have
WHERE
c.CustSer = Course.CustSer
)
SELECT
*
FROM
TopData
WHERE
RowNum = 1
Basically, the CTE (Common Table Expression) partitions your data by CustId and order by TreatmentDateTime (descending - newest first) - and numbers every entry with a consecutive number - for each "partition" (e.g. for each new value of CustId). With this, the newest entry for each customer has RowNum = 1 which is what I use to select it from that CTE.
I'm trying to write a SQL Query for DB2 Version 8 which retrieves the most recent order of a specific part for a list of users. The query receives a parameter which contains a list of customerId numbers and the partId number. For example,
Order Table
OrderID
PartID
CustomerID
OrderTime
I initially wanted to try:
Select * from Order
where
OrderId = (
Select orderId
from Order
where
partId = #requestedPartId# and customerId = #customerId#
Order by orderTime desc
fetch first 1 rows only
);
The problem with the above query is that it only works for a single user and my query needs to include multiple users.
Does anyone have a suggestion about how I could expand the above query to work for multiple users? If I remove my "fetch first 1 rows only," then it will return all rows instead of the most recent. I also tried using Max(OrderTime), but I couldn't find a way to return the OrderId from the sub-select.
Thanks!
Note: DB2 Version 8 does not support the SQL "TOP" function.
Try the following one. I didn't test it. The idea is that you first find all orders for all your specified customers. These will be grouped and you find the biggest order time for each customer (combination of group by and max). This is the foo query, that identifies the records that you need. Than you join it with your order table to retrieve the necessary information for these orders.
select o.*
from order o inner join
(select customerId, max(orderTime)
from order o
where customerId in ( #customerIds#)
and partId = #requestedPartId#
group by customerId) foo
on o.customerId = foo.customerId
and o.orderTime = foo.orderTime
EDIT: The above query gives you the most recent order for each customer you specified under the condition, that there is only one order per customer and orderTime. To get only one order it is slightly different. The following example assumes that the orderTime is unique, meaning there are no two orders at the same time in the database. This is generally be true if orderTime is recorded in milliseconds.
select o.*
from order o inner join
(select customerId, max(orderTime)
from order o
where customerId in ( #customerIds#)
and partId = #requestedPartId#) foo
on o.orderTime = foo.orderTime