Query two similar tables and combine sorted results - sql

I have three tables
orders.orderid (and other non-pertinent stuff)
payment.orderid
payment.transactiondate
payment.amount
projectedpayment.orderid
projectedpayment.projecteddate
projectedpayment.projectedamount
Essentially, payment represents when actual payments are received; projectedpayment represents when the system thinks they should be received. I need to build a query to compare projected vs actual.
I'd like to query them such that each row in the query has the orderid, payment.transactiondate, payment.amount, projectedpayment.projecteddate, projectedpayment.projectedamount, with the rows from payment and projectedpayment sorted by their respective dates. e.g.,
orderid transactiondate amount projecteddate projectedamount
1 2015-01-01 12.34 2015-01-03 12.34
1 2015-01-15 12.34 2015-01-15 12.44
1 null null 2015-02-01 12.34
2 2014-12-31 50.00 null null
So broken down by order, what are the actual and projected payments, where there may be more projected payments than actual, or more actual payments than projected, aligned by date (simply by sorting the two, nothing more complex than that).
It seems like I should be able to achieve this with a left join from orders to some kind of union of the other two tables sorted with an order by, but I haven't been able to make it work, so it may be something completely different. I know I cannot join all three of order, payment, and projectedpayment or I get the cross-product of the latter two tables.
I happen to be using postgresql 9.4, but hopefully we don't need to get too database-specific.

I dont know postgres sorry :( but if you know how to do partitioned row numbers something like this should work.
select
coalesce(a.orderid,b.orderid) as orderid
,transactiondate
,amount
,projecteddate
,projectedamount
FROM
(select
orderid
,ransactiondate
,amount
,row_number() over (partition by orderid order by orderid,transactiondate) as rn
from payment) as a
full join
(select
orderid
,projecteddate
,projectedamount
,row_number() over (partition by orderid order by orderid,projecteddate) as rn
from projectedpayment) as b
on a.orderid= b.orderid
and a.rn = b.rn
*this is sqlserver syntax (2K5+ AFAIK)
The logic here is that you need to assign a unique number to each predicted and actual payment so that you can join the two tables together but only have each row matching a single row from the other table.
If you have ONLY ONE PAYMENT PER DAY then yo could do the full join on the order ID and date without worrying about row numbers.
the full join allows you to have nulls on either side so you will need to coalesce orderid
*also this doesn't show orders with NO payments or predictions.. comment if this is an issue.

This should work
Select * from Orders o
Left Join Payments p on o.ID = p.OrderID
Left Join ProjectedPaymentp pp on o.ID = pp.OrderID
Order By o.ID

If i correctly understand, the following query should help:
select o.orderid, ap.transactiondate, ap.amount, pp.projecteddate, pp.projectedamount
from orders o
left join
(
select p.orderid, p.transactiondate, p.amount,
row_number() over (partition by p.orderid order by p.transactiondate) n
from payment p
) ap on o.orderid = ap.order
left join
(
select p.orderid, p.projecteddate, p.projectedamount,
row_number() over (partition by p.orderid order by p.projecteddate) n
from projectedpayment p
) pp on o.orderid = ap.order and (ap.n is null or ap.n = pp.n)
order by o.orderid, ap.n, pp.n
UPD
Another option (works in slightly different way and you can have NULL values not only for last records for same orderid but it will be completely sorted by date, in one timeline):
select o.orderid, ap.transactiondate, ap.amount, pp.projecteddate, pp.projectedamount
from orders o
inner join
(
select ap.orderid, ap.transactiondate d from payment ap
union
select ap.orderid, ap.projecteddate d from projectedpayment pp
) d on d.orderid = o.orderid
left join payment ap on ap.orderid = o.orderid and ap.transactiondate = d.d
left join projectedpayment pp on pp.orderid = o.orderid and pp.projecteddate = d.d
order by o.orderid, d.d

Related

SQL: payments and orders query

I have 2 tables...
the first one "orders" collects
memberid
order_id
authorization date
the 2nd one "payments" collects
memberid
paymentid
payment number
last_udpate
a memberid might contain more than 1 card and I need to create a query that brings me the orderid and which payment number has used on each order taking into account when was the last update and authorization date..
I've got till here..
SELECT orders.order_id, payments.payments_number
FROM orders
JOIN payments
ON orders.memberid = payments.memberid
I'm a bit lost what subquery should I use to make it work..
Thanks!
Hmmm . . . I think a correlated subquery does what you want:
select o.*,
(select p.payment_number
from payments p
where p.memberid = o.memberid and
p.last_update <= o.authorization_date
order by last_update desc
limit 1
) as payment_number
from orders o;
You can use an additional join if you need other columns from payments.
Since you specify the cardinality of 1 to many and I am not sure of which way your cardinality is most accurately represented.
If each order has many payments then maybe you want something like this.
SELECT o.memberid,
o.order_id,
o.[authorization date],
p.payments_number,
p.last_udpate
FROM orders o
LEFT JOIN payments p
ON o.memberid = p.memberid
If it is the other way round, maybe something like this
SELECT o.memberid,
o.order_id,
o.[authorization date],
p.payments_number,
p.last_udpate
FROM payments p
LEFT JOIN orders o
ON o.memberid = p.memberid
Something similar to what was proposed but using a CTE could be:
;
with cte_partial_results as
(
SELECT o.memberid,
o.order_id,
o.[authorization date] as [authorization_date],
p.paymentid,
p.[payment number] as payment_number
p.last_update,
ROW_NUMBER() OVER(PARTITION BY o.memberid, o.order_id ORDER BY p.last_udpate desc) as RowNo
FROM orders o
LEFT JOIN payments p ON o.memberid = p.memberid
WHERE
p.last_udpate <= o.[authorization date]
)
SELECT
memberid,
order_id,
payment_number,
last_update as most_recent_update
--add other cols you want.
FROM
cte_partial_results
WHERE
RowNo = 1

Query returning individuals most recent order, date of the order, number of products in the order and the total amount

Need help with a query to returns each individuals most recent order, date of the order, number of products in the order and the total amount. I am kind of stuck trying to get the number of products and total.
Here are the table diagrams
Not sure if I should be using multiple joins or subqueries:
SELECT FirstName, LastName, MAX(O.OrderDate), O.OrderDate
FROM Customer C
INNER JOIN Order O ON C.CustomerID = O.CustomerID
It is always good practice to start from assumed dim tables such as product.This Query should help. It is better to aggregate quantity and amount to get results at aggregate level
SELECT FirstName
,LastName
,max(o.orderdate) Orderdate
,Sum(Quantity) Quantity
,Sum(TotalAmount) TotalAmount
FROM Product p
INNER JOIN Orderitem oi
ON Oi.product_id = p.id
INNER JOIN
ORDER o
ON o.id = oi.order_id
INNER JOIN Customer c
ON c.id = o.Customer_id
GROUP BY FirstName
,LastName
not sure if you want aggregations but here you go:
SELECT customer.firstname, customer.lastname, COUNT(DISTINCT orderitem.productid), [order].totalamount
FROM [order] LEFT JOIN customer ON [order].customerid=customer.id LEFT JOIN orderitem ON order.id=orderitem.orderid
WHERE [order].date=MAX([order].date)
GROUP BY customer.firstname,customer.lastname, [order].totalamount
Still don't know why you are applying a where clause for the last order, it's up to you to keep or not the where condtion.

Getting max value before given date

I am pretty new to using MS SQL 2012 and I am trying to create a query that will:
Report the order id, the order date and the employee id that processed the order
report the maximum shipping cost among the orders processed by the same employee prior to that order
This is the code that I've come up with, but it returns the freight of the particular order date. Whereas I am trying to get the maximum freight from all the orders before the particular order.
select o.employeeid, o.orderid, o.orderdate, t2.maxfreight
from orders o
inner join
(
select employeeid, orderdate, max(freight) as maxfreight
from orders
group by EmployeeID, OrderDate
) t2
on o.EmployeeID = t2.EmployeeID
inner join
(
select employeeid, max(orderdate) as mostRecentOrderDate
from Orders
group by EmployeeID
) t3
on t2.EmployeeID = t3.EmployeeID
where o.freight = t2.maxfreight and t2.orderdate < t3.mostRecentOrderDate
Step one is to read the order:
select o.employeeid, o.orderid, o.orderdate
from orders o
where o.orderid = #ParticularOrder;
That gives you everything you need to go out and get the previous orders from the same employee and join each one to the row you get from above.
select o.employeeid, o.orderid, o.orderdate, o2.freight
from orders o
join orders o2
on o2.employeeid = o.employeeid
and o2.orderdate < o.orderdate
where o.orderid = #ParticularOrder;
Now you have a whole bunch of rows with the first three values the same and the fourth is the freight cost of each previous order. So just group by the first three fields and select the maximum of the previous orders.
select o.employeeid, o.orderid, o.orderdate, max( o2.freight ) as maxfreight
from orders o
join orders o2
on o2.employeeid = o.employeeid
and o2.orderdate < o.orderdate
where o.orderid = #ParticularOrder
group by o.employeeid, o.orderid, o.orderdate;
Done. Build your query in stages and many times it will turn out to be much simpler than you at first thought.
It is unclear why you are using t3. From the question it doesn't sound like the employee's most recent order date is relevant at all, unless I am misunderstanding (which is absolutely possible).
I believe the issue lies in t2. You are grouping by orderdate, which will return the max freight for that date and employeeid, as you describe. You need to calculate a maximum total from all orders that occurred before the date that the order occurred on, for that employee, for every row you are returning.
It probably makes more sense to use a subquery for this.
SELECT o.employeeid, o.orderid, o.orderdate, m.maxfreight
FROM
orders o LEFT OUTER JOIN
(SELECT max(freight) as maxfreight
FROM orders AS f
WHERE f.orderdate <= o.orderdate AND f.employeeid = o.employeeid
) AS m
Hoping this is syntactically correct as I'm not in front of SSMS right now. I also included a left outer join as your previous query with an inner join would have excluded any rows where an employee had no previous orders (i.e. first order ever).
You can do what you want with a correlated subquery or apply. Here is one way:
select o.employeeid, o.orderid, o.orderdate, t2.maxfreight
from orders o outer apply
(select max(freight) as maxfreight
from orders o2
where o2.employeeid = o.employeid and
o2.orderdate < o.orderdate
) t2;
In SQL Server 2012+, you can also do this with a cumulative maximum:
select o.employeeid, o.orderid, o.orderdate,
max(freight) over (partition by employeeid
order by o.orderdate rows between unbounded preceding and 1 preceding
) as maxfreight
from orders o;

SQL INNER JOIN Without Repeats

Getting the next table:
Column1 - OrderID - Earliest orders of customers from Column2
Column2 - CustomerID - Customers from orders in Column1
Column3 - OrderID - All *Other* orders of customers from Column2
which do not appear in Column1
This is my query and I'm looking for a way to apply the rules mentioned above:
SELECT O1.orderid, C1.customerid, O2.Orderid
FROM orders AS O1
INNER JOIN customers AS C1 ON O1.customerid = C1.customerid
RIGHT JOIN orders AS O2 ON C1.customerid = O2.customerid
WHERE O1.orderdate >= '2014-01-01'
AND O1.orderdate <= '2014-03-31'
ORDER BY O1.orderid
Thanks in advance
Not entirely sure why you want to get a result out like this as the earliest order will repeat for each order for the given customer.
SELECT earliestOrders.orderid, C1.customerid, O1.Orderid
FROM orders AS O1
INNER JOIN customers AS C1 ON O1.customerid = C1.customerid
INNER JOIN (
select o.customerid, min(o.OrderId) as OrderId
from orders o
Group by o.customerid
) earliestOrders
ON earliestOrders.CustomerId = C1.CustomerId
AND earliestOrders.orderid <> O1.Orderid
To find the first order per customer, look for first order dates per customer and then pick the one or one of the orders made by the customer then. (If orderdate really is just a date one customer can have placed more than one order that day, so we pick one of them. With MIN(orderid) we are likely to get the first one of that bunch :-)
Outer join the other orders and you are done.
If your dbms supports IN clauses on tuples, you get a quite readable statement:
select first_order.orderid, first_order.customerid, later_order.orderid
from
(
select customerid, min(first_order.orderid) as first_orderid
from orders
where (customerid, orderdate) in
(
select customerid, min(orderdate)
from orders
group by cutomerid
)
) first_order
left join orders later_order
on later_order.customerid = first_order.customerid
and later_order.orderid <> first_order.orderid
;
If your dbms doesn't support IN clauses on tuples, the statement looks a bit more clumsy:
select first_order.orderid, first_order.customerid, later_order.orderid
from
(
select first_orders.customerid, min(first_orders.orderid) as orderid
from orders first_orders
inner join
(
select customerid, min(orderdate)
from orders
group by cutomerid
) first_order_dates
on first_order_dates.customerid = first_orders.customerid
and first_order_dates.orderdate = first_orders.orderdate
group by first_orders.customerid
) first_order
left join orders later_order
on later_order.customerid = first_order.customerid
and later_order.orderid <> first_order.orderid
;

SQL Statement Help - Select latest Order for each Customer

Say I have 2 tables: Customers and Orders. A Customer can have many Orders.
Now, I need to show any Customers with his latest Order. This means if a Customer has more than one Orders, show only the Order with the latest Entry Time.
This is how far I managed on my own:
SELECT a.*, b.Id
FROM Customer a INNER JOIN Order b ON b.CustomerID = a.Id
ORDER BY b.EntryTime DESC
This of course returns all Customers with one or more Orders, showing the latest Order first for each Customer, which is not what I wanted. My mind was stuck in a rut at this point, so I hope someone can point me in the right direction.
For some reason, I think I need to use the MAX syntax somewhere, but it just escapes me right now.
UPDATE: After going through a few answers here (there's a lot!), I realized I made a mistake: I meant any Customer with his latest record. That means if he does not have an Order, then I do not need to list him.
UPDATE2: Fixed my own SQL statement, which probably caused no end of confusion to others.
I don't think you do want to use MAX() as you don't want to group the OrderID. What you need is an ordered sub query with a SELECT TOP 1.
select *
from Customers
inner join Orders
on Customers.CustomerID = Orders.CustomerID
and OrderID = (
SELECT TOP 1 subOrders.OrderID
FROM Orders subOrders
WHERE subOrders.CustomerID = Orders.CustomerID
ORDER BY subOrders.OrderDate DESC
)
Something like this should do it:
SELECT X.*, Y.LatestOrderId
FROM Customer X
LEFT JOIN (
SELECT A.Customer, MAX(A.OrderID) LatestOrderId
FROM Order A
JOIN (
SELECT Customer, MAX(EntryTime) MaxEntryTime FROM Order GROUP BY Customer
) B ON A.Customer = B.Customer AND A.EntryTime = B.MaxEntryTime
GROUP BY Customer
) Y ON X.Customer = Y.Customer
This assumes that two orders for the same customer may have the same EntryTime, which is why MAX(OrderID) is used in subquery Y to ensure that it only occurs once per customer. The LEFT JOIN is used because you stated you wanted to show all customers - if they haven't got any orders, then the LatestOrderId will be NULL.
Hope this helps!
--
UPDATE :-) This shows only customers with orders:
SELECT A.Customer, MAX(A.OrderID) LatestOrderId
FROM Order A
JOIN (
SELECT Customer, MAX(EntryTime) MaxEntryTime FROM Order GROUP BY Customer
) B ON A.Customer = B.Customer AND A.EntryTime = B.MaxEntryTime
GROUP BY Customer
While I see that you've already accepted an answer, I think this one is a bit more intuitive:
select a.*
,b.Id
from customer a
inner join Order b
on b.CustomerID = a.Id
where b.EntryTime = ( select max(EntryTime)
from Order
where a.Id = b.CustomerId
);
a.Id = b.CustomerId because you want the max EntryTime of all orders (in b) for the customer (a.Id).
I would have to run something like this through an execution plan to see the difference in execution, but where the TOP function is done after-the-fact and that using order by can be expensive, I believe that using max(EntryTime) would be the best way to run this.
You can use a window function.
SELECT *
FROM (SELECT a.*, b.*,
ROW_NUMBER () OVER (PARTITION BY a.ID ORDER BY b.orderdate DESC,
b.ID DESC) rn
FROM customer a, ORDER b
WHERE a.ID = b.custid)
WHERE rn = 1
For each customer (a.id) it sorts all orders and discards everything but the latest.
ORDER BY clause includes both order date and entry id, in case there are multiple orders on the same date.
Generally, window functions are much faster than any look-ups using MAX() on large number of records.
This query is much faster than the accepted answer :
SELECT c.id as customer_id,
(SELECT co.id FROM customer_order co WHERE
co.customer_id=c.id
ORDER BY some_date_column DESC limit 1) as last_order_id
FROM customer c
SELECT Cust.*, Ord.*
FROM Customers cust INNER JOIN Orders ord ON cust.ID = ord.CustID
WHERE ord.OrderID =
(SELECT MAX(OrderID) FROM Orders WHERE Orders.CustID = cust.ID)
Something like:
SELECT
a.*
FROM
Customer a
INNER JOIN Order b
ON a.OrderID = b.Id
INNER JOIN (SELECT Id, max(EntryTime) as EntryTime FROM Order b GROUP BY Id) met
ON
b.EntryTime = met.EntryTime and b.Id = met.Id
One approach that I haven't seen above yet:
SELECT
C.*,
O1.ID
FROM
dbo.Customers C
INNER JOIN dbo.Orders O1 ON
O1.CustomerID = C.ID
LEFT OUTER JOIN dbo.Orders O2 ON
O2.CustomerID = C.ID AND
O2.EntryTime > O1.EntryTime
WHERE
O2.ID IS NULL
This (as well as the other solutions I believe) assumes that no two orders for the same customer can have the exact same entry time. If that's a concern then you would have to make a choice as to what determines which one is the "latest". If that's a concern post a comment and I can expand the query if needed to account for that.
The general approach of the query is to find the order for a customer where there is not another order for the same customer with a later date. It is then the latest order by definition. This approach often gives better performance then the use of derived tables or subqueries.
A simple max and "group by" is sufficient.
select c.customer_id, max(o.order_date)
from customers c
inner join orders o on o.customer_id = c.customer_id
group by c.customer_id;
No subselect needed, which slows things down.