Oracle SQL Missing Keyword - sql

I'm trying to display the most popular product (format_id) in a given month (JAN-14) and group them by a count of each format_id.
Here is my Query :
select PRODUCT, AMOUNT
from (Select Order_108681091.Order_Date, order_line_108681091.Format_id as Product,
COUNT(*) AS AMOUNT FROM order_line_108681091
Inner Join order_108681091
On order_108681091.order_id = order_line_108681091.order_id
Where order_108681091.Order_Date like '%JAN-14%'
group by Format_id
order by AMOUNT desc);
How can i do this ?

You said you have the condition, so let's start with adding it.
There is no need to ORDER BY, so remove it.
Remove Order_date from the subquery SELECT
Use aliases
The subquery itself would be enough.
SELECT l.Format_id as PRODUCT,
COUNT(*) AS AMOUNT
FROM order_line_108681091 l
INNER JOIN order_108681091 o
ON o.order_id = l.order_id
WHERE o.Order_Date LIKE '%JAN-14%'
GROUP BY
l.Format_id;

You have to specify the inner join:
inner join order_108681091 ON order_1078681091.ID = order_line_018681091.ID
or something like that. Also your where clause probably wont work unless you're storing that date as a string and not a datetime datatype.

You don't need the subquery at all. And, I'm very uncomfortable using like on a date directly. Explicitly convert the date to a string:
select ol.Format_id as Product, COUNT(*) AS AMOUNT
from order_line_108681091 ol Inner Join
order_108681091 o
ON o.order_id = ol.order_id
where to_char(o.Order_Date, 'MMM-YYYY') = 'JAN-2014'
group by ol.Format_id
order by count(*) desc;
Actually, if you have in index on OrderDate, you can use the following (to take advantage of the index):
select ol.Format_id as Product, COUNT(*) AS AMOUNT
from order_line_108681091 ol Inner Join
order_108681091 o
ON o.order_id = ol.order_id
where o.Order_Date >= to_date('2014-01-01', 'YYYY-MM-DD') and
o.Order_Date < to_date('2014-02-01', 'YYYY-MM-DD')
group by ol.Format_id
order by count(*) desc;
Moving the function from the column to the constant allows the use of an index on the column.

Related

Why use GROUP BY in WINDOW FUNCTION

Im currently working with the northwind database and want to see the companies with more orders place in 1997. Im being ask to use windows function so i wrote this
select c.customerid,
c.companyname,
rank() over (order by count(orderid) desc )
from customers c
inner join orders o on c.customerid = o.customerid
where date_part('year',orderdate) = 1997;
However this code ask me to use GROUP BY with c.customerid. And i simply don't understand why. Supposedly this code will give me all the customers id and names and after that the window function kicks in giving them a rank base on the amount of orders. So why group them?
Here:
rank() over (order by count(orderid) desc )
You have an aggregate function in the over() clause of the window function (count(orderid)), so you do need a group by clause. Your idea is to put in the same group all orders of the same customer:
select c.customerid,
c.companyname,
rank() over (order by count(*) desc) as rn
from customers c
inner join orders o on c.customerid = o.customerid
where o.orderdate = date '1997-01-01' and o.orderdate < '1998-01-01'
group by c.customerid;
Notes:
Filtering on literal dates is much more efficient than applying a date function on the date column
count(orderid) is equivalent to count(*) in the context of this query
Postgres understands functionnaly-dependent column: assuming that customerid is the primary key of customer, it is sufficient to put just that column in the group by clause
It is a good practice to give aliases to expressions in the select clause
Another good practice is to prefix all columns with the (alias of) table they belong to
You would use it correctly in an aggregation query. That would be:
select c.customerid, c.companyname, count(*) as num_orders,
rank() over (order by count(*) desc) as ranking
from customers c inner join
orders o
on c.customerid = o.customerid
where date_part('year',orderdate) = 1997
group by c.customerid, c.companyname;
This counts the number of orders per customer in 1997. It then ranks the customers based on the number of orders.
I would advise you to use:
where orderdate >= '1997-01-01' and
orderdate < '1998-01-01'
For the filtering by year. This allows Postgres to use an index if one is available.

SQL Maximum number of orders

I have this task:
Write a query to find out in which year the maximum number of orders was made by the company.
How do i write it?
This is everything I could do, but it is not what I need at all...
SELECT company_name, order_date
FROM customers
INNER JOIN orders
ON customers.order_id = orders.order_id
WHERE order_date = (SELECT MAX(order_date) FROM orders)
GROUP BY customer_id, order_date;
You can try the below options - choose top/limit based on your DBMS
If SQL Server
SELECT top 1 company_name, year(order_date),count(orders.order_id) as total_order
FROM customers
INNER JOIN orders
ON customers.order_id = orders.order_id
GROUP BY company_name, year(order_date)
order by total_order desc
OR
If MySQL:
SELECT company_name, year(order_date),count(orders.order_id) as total_order
FROM customers
INNER JOIN orders
ON customers.order_id = orders.order_id
GROUP BY company_name, year(order_date)
order by total_order desc limit 1
I think you want to aggregate by year and pull the top row. In standard SQL, that would be:
SELECT EXTRACT(year FROM o.order_date) as yyyy, COUNT(*) as num_orders
FROM orders o
GROUP BY EXTRACT(year FROM o.order_date)
ORDER BY COUNT(*) DESC
FETCH FIRST 1 ROW ONLY;
Note that the exact syntax might vary by database. For instance, many databases support YEAR() in addition to or instead of EXTRACT(). Not all databases support FETCH FIRST. Some use TOP or LIMIT. But the above is standard SQL which is the tag on this question.
Also, the customers table is not needed for the query. No JOIN is needed.

Getting max value before given date

I am pretty new to using MS SQL 2012 and I am trying to create a query that will:
Report the order id, the order date and the employee id that processed the order
report the maximum shipping cost among the orders processed by the same employee prior to that order
This is the code that I've come up with, but it returns the freight of the particular order date. Whereas I am trying to get the maximum freight from all the orders before the particular order.
select o.employeeid, o.orderid, o.orderdate, t2.maxfreight
from orders o
inner join
(
select employeeid, orderdate, max(freight) as maxfreight
from orders
group by EmployeeID, OrderDate
) t2
on o.EmployeeID = t2.EmployeeID
inner join
(
select employeeid, max(orderdate) as mostRecentOrderDate
from Orders
group by EmployeeID
) t3
on t2.EmployeeID = t3.EmployeeID
where o.freight = t2.maxfreight and t2.orderdate < t3.mostRecentOrderDate
Step one is to read the order:
select o.employeeid, o.orderid, o.orderdate
from orders o
where o.orderid = #ParticularOrder;
That gives you everything you need to go out and get the previous orders from the same employee and join each one to the row you get from above.
select o.employeeid, o.orderid, o.orderdate, o2.freight
from orders o
join orders o2
on o2.employeeid = o.employeeid
and o2.orderdate < o.orderdate
where o.orderid = #ParticularOrder;
Now you have a whole bunch of rows with the first three values the same and the fourth is the freight cost of each previous order. So just group by the first three fields and select the maximum of the previous orders.
select o.employeeid, o.orderid, o.orderdate, max( o2.freight ) as maxfreight
from orders o
join orders o2
on o2.employeeid = o.employeeid
and o2.orderdate < o.orderdate
where o.orderid = #ParticularOrder
group by o.employeeid, o.orderid, o.orderdate;
Done. Build your query in stages and many times it will turn out to be much simpler than you at first thought.
It is unclear why you are using t3. From the question it doesn't sound like the employee's most recent order date is relevant at all, unless I am misunderstanding (which is absolutely possible).
I believe the issue lies in t2. You are grouping by orderdate, which will return the max freight for that date and employeeid, as you describe. You need to calculate a maximum total from all orders that occurred before the date that the order occurred on, for that employee, for every row you are returning.
It probably makes more sense to use a subquery for this.
SELECT o.employeeid, o.orderid, o.orderdate, m.maxfreight
FROM
orders o LEFT OUTER JOIN
(SELECT max(freight) as maxfreight
FROM orders AS f
WHERE f.orderdate <= o.orderdate AND f.employeeid = o.employeeid
) AS m
Hoping this is syntactically correct as I'm not in front of SSMS right now. I also included a left outer join as your previous query with an inner join would have excluded any rows where an employee had no previous orders (i.e. first order ever).
You can do what you want with a correlated subquery or apply. Here is one way:
select o.employeeid, o.orderid, o.orderdate, t2.maxfreight
from orders o outer apply
(select max(freight) as maxfreight
from orders o2
where o2.employeeid = o.employeid and
o2.orderdate < o.orderdate
) t2;
In SQL Server 2012+, you can also do this with a cumulative maximum:
select o.employeeid, o.orderid, o.orderdate,
max(freight) over (partition by employeeid
order by o.orderdate rows between unbounded preceding and 1 preceding
) as maxfreight
from orders o;

query with subquery with 1 result(max) for each year

I have to make a query where I show for each year wich shipper had the maximum total cost.
My query now show for each year the total cost of each shipper. So in the result i must have a list of the years, for each year the shipper and the total cost.
Thanks in advance.
select year(OrderDate), s.ShipperID, sum(freight)
from orders o
join shippers s on o.ShipVia = s.ShipperID
group by year(OrderDate),s.ShipperID
Select a.FreightYear, a,ShipperID, a.FreightValue
from
(
select year(OrderDate) FreightYear, s.ShipperID, sum(freight) FreightValue
from orders o
join shippers s on o.ShipVia = s.ShipperID
group by year(OrderDate),s.ShipperID
) a
inner join
(
select FreightYear, max(FrieghtTotal) MaxFreight
from
(
select year(OrderDate) FreightYear, s.ShipperID, sum(freight) FreightTotal
from orders o
join shippers s on o.ShipVia = s.ShipperID
group by year(OrderDate),s.ShipperID
) x
group by FreightYear
) max on max.FreightYear = a.FreightYear and max.MaxFreight = a.FreightValue
order by FreightYear
Inner query a is your original query, getting the value of freight by shipper.
Inner query max gets the max value for each year, and then query max is joined to query a, restricting the rows in a to be those with a value for a year = to the max value for the year.
Cheers -
It's marginally shorter if you use windowing functions.
select shippers_ranked.OrderYear as OrderYear,
shippers_ranked.ShipperId as ShipperId,
shippers_ranked.TotalFreight as TotalFreight
from
(
select shippers_freight.*, row_number() over (partition by shippers_freight.OrderYear order by shippers_freight.TotalFreight desc) as Ranking
from
(
select year(OrderDate) as OrderYear,
s.ShipperID as ShipperId,
sum(freight) as TotalFreight
from orders o
inner join shippers s on o.ShipVia = s.ShipperID
group by year(OrderDate), s.ShipperID
) shippers_freight
) shippers_ranked
where shippers_ranked.Ranking = 1
order by shippers_ranked.OrderYear
;
You need to decide what you would like to happen if two shippers have the same TotalFreight for a year - as the code above stands you will get one row (non-deterministically). If you would like one row, I would add ShipperId to the order by in the over() clause so that you always get the same row. If in the same TotalFreight case you would like multiple rows returned, use dense_rank() rather than row_number().

SQL Statement Help - Select latest Order for each Customer

Say I have 2 tables: Customers and Orders. A Customer can have many Orders.
Now, I need to show any Customers with his latest Order. This means if a Customer has more than one Orders, show only the Order with the latest Entry Time.
This is how far I managed on my own:
SELECT a.*, b.Id
FROM Customer a INNER JOIN Order b ON b.CustomerID = a.Id
ORDER BY b.EntryTime DESC
This of course returns all Customers with one or more Orders, showing the latest Order first for each Customer, which is not what I wanted. My mind was stuck in a rut at this point, so I hope someone can point me in the right direction.
For some reason, I think I need to use the MAX syntax somewhere, but it just escapes me right now.
UPDATE: After going through a few answers here (there's a lot!), I realized I made a mistake: I meant any Customer with his latest record. That means if he does not have an Order, then I do not need to list him.
UPDATE2: Fixed my own SQL statement, which probably caused no end of confusion to others.
I don't think you do want to use MAX() as you don't want to group the OrderID. What you need is an ordered sub query with a SELECT TOP 1.
select *
from Customers
inner join Orders
on Customers.CustomerID = Orders.CustomerID
and OrderID = (
SELECT TOP 1 subOrders.OrderID
FROM Orders subOrders
WHERE subOrders.CustomerID = Orders.CustomerID
ORDER BY subOrders.OrderDate DESC
)
Something like this should do it:
SELECT X.*, Y.LatestOrderId
FROM Customer X
LEFT JOIN (
SELECT A.Customer, MAX(A.OrderID) LatestOrderId
FROM Order A
JOIN (
SELECT Customer, MAX(EntryTime) MaxEntryTime FROM Order GROUP BY Customer
) B ON A.Customer = B.Customer AND A.EntryTime = B.MaxEntryTime
GROUP BY Customer
) Y ON X.Customer = Y.Customer
This assumes that two orders for the same customer may have the same EntryTime, which is why MAX(OrderID) is used in subquery Y to ensure that it only occurs once per customer. The LEFT JOIN is used because you stated you wanted to show all customers - if they haven't got any orders, then the LatestOrderId will be NULL.
Hope this helps!
--
UPDATE :-) This shows only customers with orders:
SELECT A.Customer, MAX(A.OrderID) LatestOrderId
FROM Order A
JOIN (
SELECT Customer, MAX(EntryTime) MaxEntryTime FROM Order GROUP BY Customer
) B ON A.Customer = B.Customer AND A.EntryTime = B.MaxEntryTime
GROUP BY Customer
While I see that you've already accepted an answer, I think this one is a bit more intuitive:
select a.*
,b.Id
from customer a
inner join Order b
on b.CustomerID = a.Id
where b.EntryTime = ( select max(EntryTime)
from Order
where a.Id = b.CustomerId
);
a.Id = b.CustomerId because you want the max EntryTime of all orders (in b) for the customer (a.Id).
I would have to run something like this through an execution plan to see the difference in execution, but where the TOP function is done after-the-fact and that using order by can be expensive, I believe that using max(EntryTime) would be the best way to run this.
You can use a window function.
SELECT *
FROM (SELECT a.*, b.*,
ROW_NUMBER () OVER (PARTITION BY a.ID ORDER BY b.orderdate DESC,
b.ID DESC) rn
FROM customer a, ORDER b
WHERE a.ID = b.custid)
WHERE rn = 1
For each customer (a.id) it sorts all orders and discards everything but the latest.
ORDER BY clause includes both order date and entry id, in case there are multiple orders on the same date.
Generally, window functions are much faster than any look-ups using MAX() on large number of records.
This query is much faster than the accepted answer :
SELECT c.id as customer_id,
(SELECT co.id FROM customer_order co WHERE
co.customer_id=c.id
ORDER BY some_date_column DESC limit 1) as last_order_id
FROM customer c
SELECT Cust.*, Ord.*
FROM Customers cust INNER JOIN Orders ord ON cust.ID = ord.CustID
WHERE ord.OrderID =
(SELECT MAX(OrderID) FROM Orders WHERE Orders.CustID = cust.ID)
Something like:
SELECT
a.*
FROM
Customer a
INNER JOIN Order b
ON a.OrderID = b.Id
INNER JOIN (SELECT Id, max(EntryTime) as EntryTime FROM Order b GROUP BY Id) met
ON
b.EntryTime = met.EntryTime and b.Id = met.Id
One approach that I haven't seen above yet:
SELECT
C.*,
O1.ID
FROM
dbo.Customers C
INNER JOIN dbo.Orders O1 ON
O1.CustomerID = C.ID
LEFT OUTER JOIN dbo.Orders O2 ON
O2.CustomerID = C.ID AND
O2.EntryTime > O1.EntryTime
WHERE
O2.ID IS NULL
This (as well as the other solutions I believe) assumes that no two orders for the same customer can have the exact same entry time. If that's a concern then you would have to make a choice as to what determines which one is the "latest". If that's a concern post a comment and I can expand the query if needed to account for that.
The general approach of the query is to find the order for a customer where there is not another order for the same customer with a later date. It is then the latest order by definition. This approach often gives better performance then the use of derived tables or subqueries.
A simple max and "group by" is sufficient.
select c.customer_id, max(o.order_date)
from customers c
inner join orders o on o.customer_id = c.customer_id
group by c.customer_id;
No subselect needed, which slows things down.