Execution orders of SQL aggregate functions - sql

I have a sales table in SQLite:
purchase_date
units_sold
customer_id
15
1
1
17
1
1
30
3
1
I want to get the total unit_solds for each customer on the first date and last date of their purchases. My query is:
select customer_id,
sum(units_sold) total_units_sold
from sales
group by customer_id
having purchase_date = min(purchase_date)
or purchase_date = max(purchase_date)
I was expecting results like:
customer_id
total_units_sold
1
4
but I got:
customer_id
total_units_sold
1
5
I would like to know why this solution doesn't work.

The order of the phrase is incorrect
Note: The having statement is executed after compilation.
You need to get the results as partial queries
For example, I arranged to know the first line of the date according to each customer
as well as the last line of the date (by getting the first line after descending order)
and then execute the group statement
The example is complete
select customer_id,sum(units_sold) from (
select customer_id, units_sold,purchase_date,
ROW_NUMBER() over(partition by customer_id order by purchase_date) As RowDatefirst,
ROW_NUMBER() over(partition by customer_id order by purchase_date desc)As RowDatelast
from sales
) t where t.RowDatefirst = 1 or t.RowDatelast=1
group by customer_id

Try this:
SELECT a.customer_id, SUM(a.units_sold) as total_units_sold
FROM sales a
INNER JOIN (
SELECT customer_id, MIN(purchase_date) as _first ,MAX(purchase_date) as _last
FROM sales
GROUP BY customer_id
) b ON a.customer_id = b.customer_id AND
(a.purchase_date = b._first OR a.purchase_date = b._last)
GROUP BY a.customer_id
http://sqlfiddle.com/#!7/0a4a4/7

Related

On same row: last purchase quantity + date, and total quantity in stock (several stock places) - SQL server

I am trying to get the following result in SQL server:
From the purchase order rows, last purchase quantity + date from all item codes in the order rows table and from the warehouse table amount in stock for the item codes I get from the rows table.
Order rows:
ORDER_DATE ITEM_CODE QTY
2019-03-01 A 5
2019-03-02 A 3
2019-03-05 A 4
2019-03-03 B 3
2019-03-04 B 10
Warehouse:
ITEM_CODE INSTOCK STOCKPLACE
A 10 VV
A 3 LP
A 8 XV
B 5 VV
B 15 LP
Wanted result (Latest order date, latest order qty and total in stock):
ORDER_DATE ITEM_CODE QTY INSTOCK
2019-03-05 A 4 21
2019-03-04 B 10 20
I have tried some queries but only failed. I have a steep learning curve ahead of me :) Thanks in advance for all the help!
Here is one method:
select o.*, wh.*
from (select wh.item_code, sum(wh.instock) as instock
from warehouse wh
group by wh.item_code
) wh outer apply
(select top (1) o.*
from orders o
where o.item_code = wh.item_code
order by o.order_date desc
) o;
You can use row_number() with apply :
select t.*, wh.instock
from (select o.*, row_number () over (partition by item_code order by o.order_date desc) as seq
from Order o
) t cross apply
( select sum(wh.instock) as instock
from warehouse wh
where wh.item_code = t.item_code
) wh
where t.seq = 1;
Your Orders aren't identified with a unique ID, and therefore if multiple Orders were to coincide on the same date, you have no way of telling which is the most recent order on that day.
Anyway, assuming that the database you posted is correct and an Order date + Item Code combines to form a unique key, you could use grouping and some CTE to get the desired output as follows.
;WITH MostRecentOrders (ITEM_CODE, ORDER_DATE)
AS (
SELECT
O.ITEM_CODE
, MAX(O.ORDER_DATE) AS ORDER_DATE
FROM
#Order O
GROUP BY ITEM_CODE
)
SELECT
O.ORDER_DATE
, O.ITEM_CODE
, O.QTY
, SUM(WH.INSTOCK) AS INSTOCK
FROM
#Warehouse WH
INNER JOIN #Order O ON O.ITEM_CODE = WH.ITEM_CODE
INNER JOIN MostRecentOrders MRO ON MRO.ITEM_CODE = O.ITEM_CODE
AND MRO.ORDER_DATE = O.ORDER_DATE
GROUP BY
O.ORDER_DATE
, O.ITEM_CODE
, O.QTY
ORDER BY O.ITEM_CODE

How to find the date range between two orders from the Order table with respect to subsequent Customer_Ids?

For eg- lets say we have a customer_id =1 and he has placed 3 orders in 2 years and his
1st Order_date = '1st Jan 2015'
2nd Order_date = '5th June 2015'
3rd Order_date = '2nd Feb 2016'.
This has to be calculated yearly from the date he has placed his first order.
Please let me know how to achieve this scenario in HiveQL.
select ord_rnk_1.customer_id,ord_rnk_1.order_id as 1st_order, ord_rnk_2.order_id as 2nd_order, ord_rnk_1.order_date as 1st_order_date, ord_rnk_2.order_date as 2nd_order_date,
CASE
WHEN nullif(ord_rnk_2.order_id,0)=0 THEN '1st purchase'
WHEN datediff(ord_rnk_2.order_date,ord_rnk_1.order_date) <=365 THEN 'repeat purchase'
ELSE '1st purchase'
end as customer_type
from
(
select customer_id,order_id, order_date from
(select customer_id,order_id, order_date,row_number() over(partition by customer_id order by order_date asc) rank from
(select distinct customer_id, order_id, to_date(order_date,"dd/mm/yyyy") as order_date
from table_t1
) abc
) order_rank where order_rank.rank=1
) ord_rnk_1
left join
(
select customer_id,order_id, order_date from
(select customer_id,order_id, order_date,row_number() over(partition by customer_id order by order_date asc) rank
from
(select distinct customer_id, order_id, to_date(order_date,"dd/mm/yyyy") as order_date
from table_t1
) abc
) order_rank where order_rank.rank=2
) ord_rnk_2
on ord_rnk_1.customer_id=ord_rnk_2.customer_id

how to filter data in sql based on percentile

I have 2 tables, the first one is contain customer information such as id,age, and name . the second table is contain their id, information of product they purchase, and the purchase_date (the date is from 2016 to 2018)
Table 1
-------
customer_id
customer_age
customer_name
Table2
------
customer_id
product
purchase_date
my desired result is to generate the table that contain customer_name and product who made purchase in 2017 and older than 75% of customer that make purchase in 2016.
Depending on your flavor of SQL, you can get quartiles using the more general ntile analytical function. This basically adds a new column to your query.
SELECT MIN(customer_age) as min_age FROM (
SELECT customer_id, customer_age, ntile(4) OVER(ORDER BY customer_age) AS q4 FROM table1
WHERE customer_id IN (
SELECT customer_id FROM table2 WHERE purchase_date = 2016)
) q
WHERE q4=4
This returns the lowest age of the 4th-quartile customers, which can be used in a subquery against the customers who made purchases in 2017.
The argument to ntile is how many buckets you want to divide into. In this case 75%+ equals 4th quartile, so 4 buckets is OK. The OVER() clause specifies what you want to sort by (customer_age in our case), and also lets us partition (group) the data if we want to, say, create multiple rankings for different years or countries.
Age is a horrible field to include in a database. Every day it changes. You should have date-of-birth or something similar.
To get the 75% oldest value in 2016, there are several possibilities. I usually go for row_number() and count(*):
select min(customer_age)
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c join
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
)
where seqnum >= 0.75 * cnt;
Then, to use this for a query for 2017:
with a2016 as (
select min(customer_age) as customer_age
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
) c
where seqnum >= 0.75 * cnt
)
select c.*, cp.product_id
from customers c join
customer_products cp
on cp.customer_id = c.customer_id and
cp.purchase_date >= '2017-01-01' and
cp.purchase_date < '2018-01-01' join
a2016 a
on c.customer_age >= a.customer_age;

How to do a group by without having to pass all the columns from the select?

I have the following select, whose goal is to select all customers who had no sales since the day X, and also bringing the date of the last sale and the number of the sale:
select s.customerId, s.saleId, max (s.date) from sales s
group by s.customerId, s.saleId
having max(s.date) <= '05-16-2013'
This way it brings me the following:
19 | 300 | 26/09/2005
19 | 356 | 29/09/2005
27 | 842 | 10/05/2012
In another words, the first 2 lines are from the same customer (id 19), I wish to get only one record for each client, which would be the record with the max date, in the case, the second record from this list.
By that logic, I should take off s.saleId from the "group by" clause, but if I do, of course, I get the error:
Invalid expression in the select list (not contained in either an
aggregate function or the GROUP BY clause)
I'm using Firebird 1.5
How can I do this?
GROUP BY summarizes data by aggregating a group of rows, returning one row per group. You're using the aggregate function max(), which will return the maximum value from one column for a group of rows.
Let's look at some data. I renamed the column you called "date".
create table sales (
customerId integer not null,
saleId integer not null,
saledate date not null
);
insert into sales values
(1, 10, '2013-05-13'),
(1, 11, '2013-05-14'),
(1, 12, '2013-05-14'),
(1, 13, '2013-05-17'),
(2, 20, '2013-05-11'),
(2, 21, '2013-05-16'),
(2, 31, '2013-05-17'),
(2, 32, '2013-03-01'),
(3, 33, '2013-05-14'),
(3, 35, '2013-05-14');
You said
In another words, the first 2 lines are from the same customer(id 19), i wish he'd get only one record for each client, which would be the record with the max date, in the case, the second record from this list.
select s.customerId, max (s.saledate)
from sales s
where s.saledate <= '2013-05-16'
group by s.customerId
order by customerId;
customerId max
--
1 2013-05-14
2 2013-05-16
3 2013-05-14
What does that table mean? It means that the latest date on or before May 16 on which customer "1" bought something was May 14; the latest date on or before May 16 on which customer "2" bought something was May 16. If you use this derived table in joins, it will return predictable results with consistent meaning.
Now let's look at a slightly different query. MySQL permits this syntax, and returns the result set below.
select s.customerId, s.saleId, max(s.saledate) max_sale
from sales s
where s.saledate <= '2013-05-16'
group by s.customerId
order by customerId;
customerId saleId max_sale
--
1 10 2013-05-14
2 20 2013-05-16
3 33 2013-05-14
The sale with ID "10" didn't happen on May 14; it happened on May 13. This query has produced a falsehood. Joining this derived table with the table of sales transactions will compound the error.
That's why Firebird correctly raises an error. The solution is to drop saleId from the SELECT clause.
Now, having said all that, you can find the customers who have had no sales since May 16 like this.
select distinct customerId from sales
where customerID not in
(select customerId
from sales
where saledate >= '2013-05-16')
And you can get the right customerId and the "right" saleId like this. (I say "right" saleId, because there could be more than one on the day in question. I just chose the max.)
select sales.customerId, sales.saledate, max(saleId)
from sales
inner join (select customerId, max(saledate) max_date
from sales
where saledate < '2013-05-16'
group by customerId) max_dates
on sales.customerId = max_dates.customerId
and sales.saledate = max_dates.max_date
inner join (select distinct customerId
from sales
where customerID not in
(select customerId
from sales
where saledate >= '2013-05-16')) no_sales
on sales.customerId = no_sales.customerId
group by sales.customerId, sales.saledate
Personally, I find common table expressions make it easier for me to read SQL statements like that without getting lost in the SELECTs.
with no_sales as (
select distinct customerId
from sales
where customerID not in
(select customerId
from sales
where saledate >= '2013-05-16')
),
max_dates as (
select customerId, max(saledate) max_date
from sales
where saledate < '2013-05-16'
group by customerId
)
select sales.customerId, sales.saledate, max(saleId)
from sales
inner join max_dates
on sales.customerId = max_dates.customerId
and sales.saledate = max_dates.max_date
inner join no_sales
on sales.customerId = no_sales.customerId
group by sales.customerId, sales.saledate
then you can use following query ..
EDIT changes made after comment by likeitlikeit for only one row per CustomerID even when we will have one case where we have multiple saleID for customer with certain condition -
select x.customerID, max(x.saleID), max(x.x_date) from (
select s.customerId, s.saleId, max (s.date) x_date from sales s
group by s.customerId, s.saleId
having max(s.date) <= '05-16-2013'
and max(s.date) = ( select max(s1.date)
from sales s1
where s1.customeId = s.customerId))x
group by x.customerID
You can Try Maxing the s.saleId (Max(s.saleId)) and removing it from the Group By clause
A subquery should do the job, I can't test it right now but it seems ok:
SELECT s.customerId, s.saleId, subq.maxdate
FROM sales AS s
INNER JOIN (SELECT customerId, MAX(date) AS maxdate
FROM sales
GROUP BY customerId, saleId
HAVING MAX(s.date) <= '05-16-2013'
) AS subq
ON s.customerId = subq.customerId AND s.date = subq.maxdate

SQL: Need help with query construction

I am relatively new with sql and I need some help with some basic query construction.
Problem: To retrieve the number of orders and the customer id from a table based on a set of parameters.
I want to write a query to figure out the number of orders under each customer (Column: Customerid) along with the CustomerID where the number of orders should be greater or equal to 10 and the status of the order should be Active. Moreover, I also want to know the first transaction date of an order belonging to each customerid.
Table Description:
product_orders
Orderid CustomerId Transaction_date Status
------- ---------- ---------------- -------
1 23 2-2-10 Active
2 22 2-3-10 Active
3 23 2-3-10 Deleted
4 23 2-3-10 Active
Query that I have written:
select count(*), customerid
from product_orders
where status = 'Active'
GROUP BY customerid
ORDER BY customerid;
The above statement gives me
the sum of all order under a customer
id but does not fulfil the condition
of atleast 10 orders.
I donot know how
to display the first transaction date
along with the order under a
customerid (status: could be active
or delelted doesn't matter)
Ideal solutions should look like:
Total Orders CustomerID Transaction Date (the first transaction date)
------------ ---------- ----------------
11 23 1-2-10
Thanks in advance. I hope you guys would be kind enough to stop by and help me out.
Cheers,
Leonidas
SELECT
COUNT(*) AS [Total Orders],
CustomerID,
MIN(Transaction_date) AS [Transaction Date]
FROM product_orders
WHERE product_orders.Status = 'Active'
GROUP BY
CustomerId
HAVING COUNT(*) >= 10
HAVING will allow you to filter aggregates like COUNT() & MIN() will show the first date.
select
count(*),
customerid,
MIN(order_date)
from product_orders
where status = 'Active'
GROUP BY customerid
HAVING COUNT(*) >= 10
ORDER BY customerid
If you want the earliest date irrespective of status you can sub-query for it
select
count(*),
customerid,
(SELECT min(order_date) FROM product_orders WHERE product_orders.customerid = p.customerid) AS FirstDate
from product_orders P
where status = 'Active'
GROUP BY customerid
HAVING COUNT(*) >= 10
ORDER BY customerid
This query should give you the total active orders for each customer that has 10 or more active orders. It will also display the first active order date.
Select Count(OrderId) as TotalOrders,
CustomerId,
Min(Transaction_Date) as FirstActiveOrder
From Product_Orders
Where [Status] = 'Active'
Group By CustomerId
Having Count(OrderId)>10
select count(*), customerid, MIN(Transaction_date) from product_orders
where status = 'Active'
GROUP BY customerid having count(*) >= 10
ORDER BY customerid