Why use GROUP BY in WINDOW FUNCTION - sql

Im currently working with the northwind database and want to see the companies with more orders place in 1997. Im being ask to use windows function so i wrote this
select c.customerid,
c.companyname,
rank() over (order by count(orderid) desc )
from customers c
inner join orders o on c.customerid = o.customerid
where date_part('year',orderdate) = 1997;
However this code ask me to use GROUP BY with c.customerid. And i simply don't understand why. Supposedly this code will give me all the customers id and names and after that the window function kicks in giving them a rank base on the amount of orders. So why group them?

Here:
rank() over (order by count(orderid) desc )
You have an aggregate function in the over() clause of the window function (count(orderid)), so you do need a group by clause. Your idea is to put in the same group all orders of the same customer:
select c.customerid,
c.companyname,
rank() over (order by count(*) desc) as rn
from customers c
inner join orders o on c.customerid = o.customerid
where o.orderdate = date '1997-01-01' and o.orderdate < '1998-01-01'
group by c.customerid;
Notes:
Filtering on literal dates is much more efficient than applying a date function on the date column
count(orderid) is equivalent to count(*) in the context of this query
Postgres understands functionnaly-dependent column: assuming that customerid is the primary key of customer, it is sufficient to put just that column in the group by clause
It is a good practice to give aliases to expressions in the select clause
Another good practice is to prefix all columns with the (alias of) table they belong to

You would use it correctly in an aggregation query. That would be:
select c.customerid, c.companyname, count(*) as num_orders,
rank() over (order by count(*) desc) as ranking
from customers c inner join
orders o
on c.customerid = o.customerid
where date_part('year',orderdate) = 1997
group by c.customerid, c.companyname;
This counts the number of orders per customer in 1997. It then ranks the customers based on the number of orders.
I would advise you to use:
where orderdate >= '1997-01-01' and
orderdate < '1998-01-01'
For the filtering by year. This allows Postgres to use an index if one is available.

Related

Find customer who bought least on W3schools SQL

I'm new to SQL Server and I'm trying to do some exercises. I want to find customers who bought least on W3schools database. My solution for this case is:
Join Customers with OrderDetails via CustomerID
Select CustomerNames that have least OrderID appeared after using JOIN.
Here is my query:
SELECT COUNT(OrderID), CustomerID
FROM Orders
GROUP BY CustomerID
ORDER BY COUNT(CustomerID) ASC
HAVING COUNT(OrderID) = '1'
When I ran this query, message says "Syntax error near "Having". What happened with my query?
Please help me to figure out.
My solution for this case is:
Join Customers with OrderDetails via CustomerID
Select CustomerNames that have least OrderID appeared after using JOIN.
As #thorsten-kettner lamented:
You say in your explanation that you join and then show the customer
name. Your query does neither of the two things...
Furthermore, your question has severe grammatical errors making it hard to decipher.
I want to find customers who bought least on W3schools database.
Nonetheless,
The Try-SQL Editor at w3schools.com
To get the list of customers who have at least 1 order:
SELECT C.CustomerName FROM [Customers] AS C
JOIN [Orders] AS O
ON C.CustomerID = O.CustomerID
GROUP BY C.CustomerID
ORDER BY C.CustomerName
To get the list of customers who have exactly 1 order:
SELECT C.CustomerName FROM [Customers] AS C
JOIN [Orders] AS O
ON C.CustomerID = O.CustomerID
GROUP BY C.CustomerID
HAVING COUNT(O.OrderID) = 1
ORDER BY C.CustomerName
To get the customer who made the least number of orders:
Including the ones who made no order. Use JOIN instead of LEFT JOIN if you only want to consider the ones who made at least one order.
You can remove LIMIT 1 to get the whole list sorted by the number of orders placed.
SELECT C.CustomerName, COUNT(O.OrderID) FROM [Customers] AS C
LEFT JOIN [Orders] AS O
ON C.CustomerID = O.CustomerID
GROUP BY C.CustomerID
ORDER BY COUNT(O.OrderID), C.CustomerName
LIMIT 1;
Addendum
As commented by #sticky-bit ,
The ORDER BY clause has to come after the HAVING clause.
You want a TOP 1 WITH TIES query, something like this:
SELECT TOP 1 WITH TIES CustomerID
FROM Orders
GROUP BY CustomerID
ORDER BY COUNT(OrderID);
In case you are using MySQL, try the following version:
SELECT CustomerID
FROM Orders
GROUP BY CustomerID
HAVING COUNT(OrderID) = (
SELECT COUNT(OrderID)
FROM ORDERS
GROUP BY CustomerID
ORDER BY COUNT(OrderID)
LIMIT 1
);

SQL Maximum number of orders

I have this task:
Write a query to find out in which year the maximum number of orders was made by the company.
How do i write it?
This is everything I could do, but it is not what I need at all...
SELECT company_name, order_date
FROM customers
INNER JOIN orders
ON customers.order_id = orders.order_id
WHERE order_date = (SELECT MAX(order_date) FROM orders)
GROUP BY customer_id, order_date;
You can try the below options - choose top/limit based on your DBMS
If SQL Server
SELECT top 1 company_name, year(order_date),count(orders.order_id) as total_order
FROM customers
INNER JOIN orders
ON customers.order_id = orders.order_id
GROUP BY company_name, year(order_date)
order by total_order desc
OR
If MySQL:
SELECT company_name, year(order_date),count(orders.order_id) as total_order
FROM customers
INNER JOIN orders
ON customers.order_id = orders.order_id
GROUP BY company_name, year(order_date)
order by total_order desc limit 1
I think you want to aggregate by year and pull the top row. In standard SQL, that would be:
SELECT EXTRACT(year FROM o.order_date) as yyyy, COUNT(*) as num_orders
FROM orders o
GROUP BY EXTRACT(year FROM o.order_date)
ORDER BY COUNT(*) DESC
FETCH FIRST 1 ROW ONLY;
Note that the exact syntax might vary by database. For instance, many databases support YEAR() in addition to or instead of EXTRACT(). Not all databases support FETCH FIRST. Some use TOP or LIMIT. But the above is standard SQL which is the tag on this question.
Also, the customers table is not needed for the query. No JOIN is needed.

SQL TOP 1 Syntax for a nested query

New to SQL Server and I am trying to use top 1 to get the company with the most order in my DB within my code that is already working but I don't know how to use it properly. Only missing syntax I think.
Query #1 is working fine:
SELECT
c.CompanyName, COUNT(DISTINCT OrderID) as Nombre_Commande
FROM
Orders O
INNER JOIN
Customers C ON O.CustomerID = c.CustomerID
GROUP BY
c.CompanyName
What I am trying to do
SELECT TOP (1) *
FROM
(SELECT
c.CompanyName, COUNT(DISTINCT OrderID) AS Nombre_Commande
FROM
Orders O
INNER JOIN
Customers C ON O.CustomerID = c.CustomerID
GROUP BY
c.CompanyName)
You need to give the derived table an alias, and also, specifying top without an order by clause is pretty pointless as rows are returned as a set without any order unless the order is explicitly specified with an order by clause:
SELECT TOP (1) *
FROM (
SELECT c.CompanyName, COUNT(DISTINCT OrderID) as Nombre_Commande
FROM Orders O
INNER JOIN Customers C ON O.CustomerID=c.CustomerID
GROUP by c.CompanyName
) AS YourTable
ORDER BY something_meaningful_maybe_nombre_commande?
How about this?
SELECT TOP 1 c.CompanyName, COUNT(DISTINCT OrderID) as Nombre_Commande
FROM Orders O INNER JOIN
Customers C
ON O.CustomerID = c.CustomerID
GROUP by c.CompanyName
ORDER BY Nombre_Commande DESC;
This assumes that Nombre_Commande is what you want to order by.
By the way, I would be surprised if COUNT(DISTINCT) were really needed for this query. COUNT(*) or COUNT(OrderId) should be sufficient.

Query to pull second order date for a customer(SQL 2014)

I have a schema with customers, orders and order dates.
A customer can have orders in multiple dates. I need a calculated member to bring the first order date and the second order date with other associated metrics.
I was able to get the first order date and associated data using min(order date) as a first order but having issues querying for the second order date. Any suggestion would help! Thanks
my query
---I have all the information in one table so my query looks like
Select customerid, order id, min(orderdate) as firstorderdate,...
From customer Where first ordedate between 01/01/2015’ and GETDATE()
(since I only want those customers who made their first purchase this year)
Query their second purchase
Select customerid, orderid, orderdate from ( select customerid,
orderid, orderdate, rwo_number() over (partition by customerid,
orderid order by orderdate) rn from customer
Where rn<=2
Without seeing your current query, it's difficult to understand. I assume your current query is like this:
select c.customerid, o.orderid, min(od.orderdate)
from customers c
join orders o on c.customerid = o.customerid
join orderdates od on o.orderid = od.orderid
group by c.customerid, o.orderid
Another way of doing the same query is to use row_number. Doing it this way, you're not restricted to just the first in the group:
select customerid, orderid, orderdate
from (
select c.customerid, o.orderid, od.orderdate,
row_number() over (partition by c.customerid, o.orderid
order by od.orderdate) rn
from customers c
join orders o on c.customerid = o.customerid
join orderdates od on o.orderid = od.orderid
) t
where rn <= 2

Getting max value before given date

I am pretty new to using MS SQL 2012 and I am trying to create a query that will:
Report the order id, the order date and the employee id that processed the order
report the maximum shipping cost among the orders processed by the same employee prior to that order
This is the code that I've come up with, but it returns the freight of the particular order date. Whereas I am trying to get the maximum freight from all the orders before the particular order.
select o.employeeid, o.orderid, o.orderdate, t2.maxfreight
from orders o
inner join
(
select employeeid, orderdate, max(freight) as maxfreight
from orders
group by EmployeeID, OrderDate
) t2
on o.EmployeeID = t2.EmployeeID
inner join
(
select employeeid, max(orderdate) as mostRecentOrderDate
from Orders
group by EmployeeID
) t3
on t2.EmployeeID = t3.EmployeeID
where o.freight = t2.maxfreight and t2.orderdate < t3.mostRecentOrderDate
Step one is to read the order:
select o.employeeid, o.orderid, o.orderdate
from orders o
where o.orderid = #ParticularOrder;
That gives you everything you need to go out and get the previous orders from the same employee and join each one to the row you get from above.
select o.employeeid, o.orderid, o.orderdate, o2.freight
from orders o
join orders o2
on o2.employeeid = o.employeeid
and o2.orderdate < o.orderdate
where o.orderid = #ParticularOrder;
Now you have a whole bunch of rows with the first three values the same and the fourth is the freight cost of each previous order. So just group by the first three fields and select the maximum of the previous orders.
select o.employeeid, o.orderid, o.orderdate, max( o2.freight ) as maxfreight
from orders o
join orders o2
on o2.employeeid = o.employeeid
and o2.orderdate < o.orderdate
where o.orderid = #ParticularOrder
group by o.employeeid, o.orderid, o.orderdate;
Done. Build your query in stages and many times it will turn out to be much simpler than you at first thought.
It is unclear why you are using t3. From the question it doesn't sound like the employee's most recent order date is relevant at all, unless I am misunderstanding (which is absolutely possible).
I believe the issue lies in t2. You are grouping by orderdate, which will return the max freight for that date and employeeid, as you describe. You need to calculate a maximum total from all orders that occurred before the date that the order occurred on, for that employee, for every row you are returning.
It probably makes more sense to use a subquery for this.
SELECT o.employeeid, o.orderid, o.orderdate, m.maxfreight
FROM
orders o LEFT OUTER JOIN
(SELECT max(freight) as maxfreight
FROM orders AS f
WHERE f.orderdate <= o.orderdate AND f.employeeid = o.employeeid
) AS m
Hoping this is syntactically correct as I'm not in front of SSMS right now. I also included a left outer join as your previous query with an inner join would have excluded any rows where an employee had no previous orders (i.e. first order ever).
You can do what you want with a correlated subquery or apply. Here is one way:
select o.employeeid, o.orderid, o.orderdate, t2.maxfreight
from orders o outer apply
(select max(freight) as maxfreight
from orders o2
where o2.employeeid = o.employeid and
o2.orderdate < o.orderdate
) t2;
In SQL Server 2012+, you can also do this with a cumulative maximum:
select o.employeeid, o.orderid, o.orderdate,
max(freight) over (partition by employeeid
order by o.orderdate rows between unbounded preceding and 1 preceding
) as maxfreight
from orders o;