I have this task:
Write a query to find out in which year the maximum number of orders was made by the company.
How do i write it?
This is everything I could do, but it is not what I need at all...
SELECT company_name, order_date
FROM customers
INNER JOIN orders
ON customers.order_id = orders.order_id
WHERE order_date = (SELECT MAX(order_date) FROM orders)
GROUP BY customer_id, order_date;
You can try the below options - choose top/limit based on your DBMS
If SQL Server
SELECT top 1 company_name, year(order_date),count(orders.order_id) as total_order
FROM customers
INNER JOIN orders
ON customers.order_id = orders.order_id
GROUP BY company_name, year(order_date)
order by total_order desc
OR
If MySQL:
SELECT company_name, year(order_date),count(orders.order_id) as total_order
FROM customers
INNER JOIN orders
ON customers.order_id = orders.order_id
GROUP BY company_name, year(order_date)
order by total_order desc limit 1
I think you want to aggregate by year and pull the top row. In standard SQL, that would be:
SELECT EXTRACT(year FROM o.order_date) as yyyy, COUNT(*) as num_orders
FROM orders o
GROUP BY EXTRACT(year FROM o.order_date)
ORDER BY COUNT(*) DESC
FETCH FIRST 1 ROW ONLY;
Note that the exact syntax might vary by database. For instance, many databases support YEAR() in addition to or instead of EXTRACT(). Not all databases support FETCH FIRST. Some use TOP or LIMIT. But the above is standard SQL which is the tag on this question.
Also, the customers table is not needed for the query. No JOIN is needed.
Related
I am trying to add a new column to my table which will be the average value calculated as the division of two existing columns. Therefore Average value = Total Sales / Number of Orders.
My data looks like this:click to view picture
I don't understand why Example Code A does not work but Example Code B does. Please can someone explain?
Example Code A
%%sql
SELECT
c.country,
count(distinct c.customer_id) customer_num,
count(i.invoice_id) order_num,
ROUND(SUM(i.total),2) total_sales,
order_num / total_sales avg_order_value
FROM customer c
LEFT JOIN invoice i ON c.customer_id = i.customer_id
GROUP BY 1
ORDER BY 4 DESC;
Example Code B
%%sql
WITH
customer_sales AS
(
SELECT
c.country,
count(distinct c.customer_id) customer_num,
count(i.invoice_id) order_num,
ROUND(SUM(i.total),2) total_sales
FROM customer c
LEFT JOIN invoice i ON c.customer_id = i.customer_id
GROUP BY 1
ORDER BY 4 DESC
)
SELECT
country,
customer_num,
order_num,
total_sales,
total_sales / order_num avg_order_value
FROM customer_sales;
Thank you!
Depending on the DBMS some allow you to reference the alias in the calculation (in the same select) and others require you to either bring it outside in an outer query or state your previous aggregation/functions, such as counts or sums.
SELECT
c.country,
count(distinct c.customer_id) customer_num,
count(i.invoice_id) order_num,
ROUND(SUM(i.total),2) total_sales,
count(i.invoice_id) / ROUND(SUM(i.total),2) avg_order_value
FROM customer c
LEFT JOIN invoice i ON c.customer_id = i.customer_id
GROUP BY 1
ORDER BY 4 DESC;
Im currently working with the northwind database and want to see the companies with more orders place in 1997. Im being ask to use windows function so i wrote this
select c.customerid,
c.companyname,
rank() over (order by count(orderid) desc )
from customers c
inner join orders o on c.customerid = o.customerid
where date_part('year',orderdate) = 1997;
However this code ask me to use GROUP BY with c.customerid. And i simply don't understand why. Supposedly this code will give me all the customers id and names and after that the window function kicks in giving them a rank base on the amount of orders. So why group them?
Here:
rank() over (order by count(orderid) desc )
You have an aggregate function in the over() clause of the window function (count(orderid)), so you do need a group by clause. Your idea is to put in the same group all orders of the same customer:
select c.customerid,
c.companyname,
rank() over (order by count(*) desc) as rn
from customers c
inner join orders o on c.customerid = o.customerid
where o.orderdate = date '1997-01-01' and o.orderdate < '1998-01-01'
group by c.customerid;
Notes:
Filtering on literal dates is much more efficient than applying a date function on the date column
count(orderid) is equivalent to count(*) in the context of this query
Postgres understands functionnaly-dependent column: assuming that customerid is the primary key of customer, it is sufficient to put just that column in the group by clause
It is a good practice to give aliases to expressions in the select clause
Another good practice is to prefix all columns with the (alias of) table they belong to
You would use it correctly in an aggregation query. That would be:
select c.customerid, c.companyname, count(*) as num_orders,
rank() over (order by count(*) desc) as ranking
from customers c inner join
orders o
on c.customerid = o.customerid
where date_part('year',orderdate) = 1997
group by c.customerid, c.companyname;
This counts the number of orders per customer in 1997. It then ranks the customers based on the number of orders.
I would advise you to use:
where orderdate >= '1997-01-01' and
orderdate < '1998-01-01'
For the filtering by year. This allows Postgres to use an index if one is available.
I am pretty new to using MS SQL 2012 and I am trying to create a query that will:
Report the order id, the order date and the employee id that processed the order
report the maximum shipping cost among the orders processed by the same employee prior to that order
This is the code that I've come up with, but it returns the freight of the particular order date. Whereas I am trying to get the maximum freight from all the orders before the particular order.
select o.employeeid, o.orderid, o.orderdate, t2.maxfreight
from orders o
inner join
(
select employeeid, orderdate, max(freight) as maxfreight
from orders
group by EmployeeID, OrderDate
) t2
on o.EmployeeID = t2.EmployeeID
inner join
(
select employeeid, max(orderdate) as mostRecentOrderDate
from Orders
group by EmployeeID
) t3
on t2.EmployeeID = t3.EmployeeID
where o.freight = t2.maxfreight and t2.orderdate < t3.mostRecentOrderDate
Step one is to read the order:
select o.employeeid, o.orderid, o.orderdate
from orders o
where o.orderid = #ParticularOrder;
That gives you everything you need to go out and get the previous orders from the same employee and join each one to the row you get from above.
select o.employeeid, o.orderid, o.orderdate, o2.freight
from orders o
join orders o2
on o2.employeeid = o.employeeid
and o2.orderdate < o.orderdate
where o.orderid = #ParticularOrder;
Now you have a whole bunch of rows with the first three values the same and the fourth is the freight cost of each previous order. So just group by the first three fields and select the maximum of the previous orders.
select o.employeeid, o.orderid, o.orderdate, max( o2.freight ) as maxfreight
from orders o
join orders o2
on o2.employeeid = o.employeeid
and o2.orderdate < o.orderdate
where o.orderid = #ParticularOrder
group by o.employeeid, o.orderid, o.orderdate;
Done. Build your query in stages and many times it will turn out to be much simpler than you at first thought.
It is unclear why you are using t3. From the question it doesn't sound like the employee's most recent order date is relevant at all, unless I am misunderstanding (which is absolutely possible).
I believe the issue lies in t2. You are grouping by orderdate, which will return the max freight for that date and employeeid, as you describe. You need to calculate a maximum total from all orders that occurred before the date that the order occurred on, for that employee, for every row you are returning.
It probably makes more sense to use a subquery for this.
SELECT o.employeeid, o.orderid, o.orderdate, m.maxfreight
FROM
orders o LEFT OUTER JOIN
(SELECT max(freight) as maxfreight
FROM orders AS f
WHERE f.orderdate <= o.orderdate AND f.employeeid = o.employeeid
) AS m
Hoping this is syntactically correct as I'm not in front of SSMS right now. I also included a left outer join as your previous query with an inner join would have excluded any rows where an employee had no previous orders (i.e. first order ever).
You can do what you want with a correlated subquery or apply. Here is one way:
select o.employeeid, o.orderid, o.orderdate, t2.maxfreight
from orders o outer apply
(select max(freight) as maxfreight
from orders o2
where o2.employeeid = o.employeid and
o2.orderdate < o.orderdate
) t2;
In SQL Server 2012+, you can also do this with a cumulative maximum:
select o.employeeid, o.orderid, o.orderdate,
max(freight) over (partition by employeeid
order by o.orderdate rows between unbounded preceding and 1 preceding
) as maxfreight
from orders o;
I'm trying to display the most popular product (format_id) in a given month (JAN-14) and group them by a count of each format_id.
Here is my Query :
select PRODUCT, AMOUNT
from (Select Order_108681091.Order_Date, order_line_108681091.Format_id as Product,
COUNT(*) AS AMOUNT FROM order_line_108681091
Inner Join order_108681091
On order_108681091.order_id = order_line_108681091.order_id
Where order_108681091.Order_Date like '%JAN-14%'
group by Format_id
order by AMOUNT desc);
How can i do this ?
You said you have the condition, so let's start with adding it.
There is no need to ORDER BY, so remove it.
Remove Order_date from the subquery SELECT
Use aliases
The subquery itself would be enough.
SELECT l.Format_id as PRODUCT,
COUNT(*) AS AMOUNT
FROM order_line_108681091 l
INNER JOIN order_108681091 o
ON o.order_id = l.order_id
WHERE o.Order_Date LIKE '%JAN-14%'
GROUP BY
l.Format_id;
You have to specify the inner join:
inner join order_108681091 ON order_1078681091.ID = order_line_018681091.ID
or something like that. Also your where clause probably wont work unless you're storing that date as a string and not a datetime datatype.
You don't need the subquery at all. And, I'm very uncomfortable using like on a date directly. Explicitly convert the date to a string:
select ol.Format_id as Product, COUNT(*) AS AMOUNT
from order_line_108681091 ol Inner Join
order_108681091 o
ON o.order_id = ol.order_id
where to_char(o.Order_Date, 'MMM-YYYY') = 'JAN-2014'
group by ol.Format_id
order by count(*) desc;
Actually, if you have in index on OrderDate, you can use the following (to take advantage of the index):
select ol.Format_id as Product, COUNT(*) AS AMOUNT
from order_line_108681091 ol Inner Join
order_108681091 o
ON o.order_id = ol.order_id
where o.Order_Date >= to_date('2014-01-01', 'YYYY-MM-DD') and
o.Order_Date < to_date('2014-02-01', 'YYYY-MM-DD')
group by ol.Format_id
order by count(*) desc;
Moving the function from the column to the constant allows the use of an index on the column.
I have to make a query where I show for each year wich shipper had the maximum total cost.
My query now show for each year the total cost of each shipper. So in the result i must have a list of the years, for each year the shipper and the total cost.
Thanks in advance.
select year(OrderDate), s.ShipperID, sum(freight)
from orders o
join shippers s on o.ShipVia = s.ShipperID
group by year(OrderDate),s.ShipperID
Select a.FreightYear, a,ShipperID, a.FreightValue
from
(
select year(OrderDate) FreightYear, s.ShipperID, sum(freight) FreightValue
from orders o
join shippers s on o.ShipVia = s.ShipperID
group by year(OrderDate),s.ShipperID
) a
inner join
(
select FreightYear, max(FrieghtTotal) MaxFreight
from
(
select year(OrderDate) FreightYear, s.ShipperID, sum(freight) FreightTotal
from orders o
join shippers s on o.ShipVia = s.ShipperID
group by year(OrderDate),s.ShipperID
) x
group by FreightYear
) max on max.FreightYear = a.FreightYear and max.MaxFreight = a.FreightValue
order by FreightYear
Inner query a is your original query, getting the value of freight by shipper.
Inner query max gets the max value for each year, and then query max is joined to query a, restricting the rows in a to be those with a value for a year = to the max value for the year.
Cheers -
It's marginally shorter if you use windowing functions.
select shippers_ranked.OrderYear as OrderYear,
shippers_ranked.ShipperId as ShipperId,
shippers_ranked.TotalFreight as TotalFreight
from
(
select shippers_freight.*, row_number() over (partition by shippers_freight.OrderYear order by shippers_freight.TotalFreight desc) as Ranking
from
(
select year(OrderDate) as OrderYear,
s.ShipperID as ShipperId,
sum(freight) as TotalFreight
from orders o
inner join shippers s on o.ShipVia = s.ShipperID
group by year(OrderDate), s.ShipperID
) shippers_freight
) shippers_ranked
where shippers_ranked.Ranking = 1
order by shippers_ranked.OrderYear
;
You need to decide what you would like to happen if two shippers have the same TotalFreight for a year - as the code above stands you will get one row (non-deterministically). If you would like one row, I would add ShipperId to the order by in the over() clause so that you always get the same row. If in the same TotalFreight case you would like multiple rows returned, use dense_rank() rather than row_number().