Query on MAX on date column, and COUNT of another column - sql

I performed the following query with cte's, but I was wondering if there was a simpler way of writing the code, maybe with subqueries? I'm retrieving everything from one table SALES, but I'm using 3 columns: AgentID, SaleDate, and OrderID.
WITH RECENT_SALE AS(
SELECT AGENTID,(
SALEDATE,
ROW_NUMBER() OVER (PARTITION BY AGENTID ORDER BY SALEDATE DESC) AS RN
FROM SALES
)
,
COUNT_SALE AS (
SELECT AGENTID,
COUNT(ORDERID) AS COUNTORDERS
FROM SALES
)
SELECT RECENT_SALE.MRN,
SALEDATE,
COUNTORDERS
FROM RECENT_SALE
INNER JOIN COUNT_SALE ON RECENT_SALE.AGENTID = COUNT_SALE.AGENTID;

It looks to me like you're just trying to get the total number of sales per agent as well as the date of his or her most recent sale? If I understand your structure correctly (and I may not), then it seems pretty straightforward. I'm guessing orderid is the primary key of SALES?
SELECT agentid, MAX(saledate) AS saledate -- Most recent sale date
, COUNT(orderid) AS countsales -- total sales
FROM sales
GROUP BY agentid;
There does not seem to be any need for CTEs or subqueries here.

Try this:
SELECT
saledate,
AGENTID,
count(orderid) over(partition by AGENTID order by saledate)
FROM SALES
group by
saledate,
AGENTID

Related

Trouble getting SQL Server subquery to pick desired results

I am given a database to use in SQL server.
The tables are:
Price (prodID, from, price)
Product (prodID, name, quantity)
PO (prodID, orderID, amount)
Order (orderID, date, address, status, trackingNumber, custID,
shipID)
Shipping (shipID, company, time, price)
Customer (custID, name)
Address (addrID, custID, address)
I need to Determine the ID and current price of each product.
The from attribute in the Price table are the dates that the prices were updated i.e. each ID in the table has multiple prices and dates associated with them but there is no common date between all of the IDs and the dates are in the 'YYYY-MM-DD' format and range is from 2018 to 2019-12-31.
My current query looks like:
select distinct p.prodID, p.price
from Price as p
where p.[from] >= '2019-12-23' and p.[from] in (select [from]
from Price
group by [from]
having max([from]) <= '2019-12-31')
order by p.prodID;
which returns a table with multiple prices for some of the IDs and also excludes other IDs altogether.
I was told that I needed a subquery to perform this.
I believe that I may be being too specific in my query to produce the desired results.
My main goal is to fix my current query to select one of each prodID and price from the most recent from date.
One option uses window functions:
select *
from (
select p.*, row_number() over(partition by p.prodid order by p.from desc) rn
from price p
where p.from <= convert(date, getdate())
) t
where rn = 1
This returns the latest row for each prodid where from is not greater that the current date.
As an alternative, you could also use with ties:
select top (1) with ties p.*
from price p
where p.from <= convert(date, getdate())
order by row_number() over(partition by p.prodid order by p.from desc)

SQL: Difference between consecutive rows

Table with 3 columns: order id, member id, order date
Need to pull the distribution of orders broken down by No. of days b/w 2 consecutive orders by member id
What I have is this:
SELECT
a1.member_id,
count(distinct a1.order_id) as num_orders,
a1.order_date,
DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1
inner join orders as a2
on a2.member_id = a1.member_id+1;
It's not helping me completely as the output I need is:
You can use lag() to get the date of the previous order by the same customer:
select o.*,
datediff(
order_date,
lag(order_date) over(partition by member_id order by order_date, order_id)
) days_diff
from orders o
When there are two rows for the same date, the smallest order_id is considered first. Also note that I fixed your datediff() syntax: in Hive, the function just takes two dates, and no unit.
I just don't get the logic you want to compute num_orders.
May be something like this:
SELECT
a1.member_id,
count(distinct a1.order_id) as num_orders,
a1.order_date,
DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1
inner join orders as a2
on a2.member_id = a1.member_id
where not exists (
select intermediate_order
from orders as intermedite_order
where intermediate_order.order_date < a1.order_date and intermediate_order.order_date > a2.order_date) ;

Difference between multiple dates

I am working in a database with multiple orders of multiple suppliers. Now I would like to know the difference in days between order 1 and order 2, order 2 and order 3, order 3 and order 4 and so on.. For each supplier on its own. I need this to generate the Standard Deviation for each supplier based on their days between orders.
Hopefully someone can help..
What you describe is lag() with aggregation:
select supplier,
stddev(orderdate - prev_orderdate) as std_orderdate
from (select t.*,
lag(orderdate) over (partition by supplier order by orderdate) as prev_orderdate
from t
) t
group by supplier;
You would typically use window function lag() and date arithmetics.
Assuming the following data structure for table orders:
order_id int primary key
supplier_id int
order_date date
You would go:
select
i.*,
order_date
- lag(order_date) over(partition by supplier_id order by order_date) date_diff
from orders o
Which gives you, for each order, the difference in days from the previous order of the same supplier (or null if this is the first order of the supplier).
You can then compute the standard deviation with aggregation:
select supplier_id, stddev(date_diff)
from (
select
o.*,
order_date
- lag(order_date) over(partition by supplier_id order by order_date) date_diff
from orders o
) x
group by supplier_id

How do you find the average datediff by customer?

I have the following data:
CustomerID OrderDate
1 2011-02-16
1 2011-04-20
2 2011-04-25
2 2011-10-24
2 2011-11-14
How do I find the average DATEDIFF for each customer? The results I want would be the CustomerID and the average difference of dates between their orders. I really appreciate the help. This has had me stuck for months. Thank you in advance.
Additional Notes**
I cannot use the lag function because of the server I am using.
In SQLServer 2012 you can partition the lag changing the query of Binaya Regmi to
with cte as (
SELECT CustomerID, OrderDate,
LAG(OrderDate) OVER (PARTITION BY CustomerID
ORDER BY CustomerID, OrderDate) AS PrevDate
FROM T)
Select customerid, avg(datediff(d, prevdate, orderdate)) average
From cte
Group By customerid
A query, non optimized, but using mostly standard sql (as the requester has not stated his RDBMS) is
SELECT customerid, avg(datediff(d,prevdate, OrderDate)) average
FROM (SELECT ext.customerid, ext.OrderDate, max(prevdate) prevdate
FROM orders ext
INNER JOIN (SELECT customerid, orderdate prevdate
FROM orders) sub
ON ext.customerid = sub.customerid
AND ext.OrderDate > sub.prevdate
GROUP BY ext.customerid, orderdate) a
GROUP BY customerid
Assuming T is your table and you want average difference of dates in day, following is the code in SQL Server 2012:
with cte as (
SELECT CustomerID, OrderDate,
LAG(OrderDate) OVER (PARTITION BY CustomerID
ORDER BY CustomerID, OrderDate) AS PrevDate
FROM T)
select customerid, avg(datediff(d, prevdate, orderdate )) as AvgDay
from cte group by customerid;

Can I limit the amount of rows to be used for a group in a GROUP BY statement

I'm having an odd problem
I have a table with the columns product_id, sales and day
Not all products have sales every day. I'd like to get the average number of sales that each product had in the last 10 days where it had sales
Usually I'd get the average like this
SELECT product_id, AVG(sales)
FROM table
GROUP BY product_id
Is there a way to limit the amount of rows to be taken into consideration for each product?
I'm afraid it's not possible but I wanted to check if someone has an idea
Update to clarify:
Product may be sold on days 1,3,5,10,15,17,20.
Since I don't want to get an the average of all days but only the average of the days where the product did actually get sold doing something like
SELECT product_id, AVG(sales)
FROM table
WHERE day > '01/01/2009'
GROUP BY product_id
won't work
If you want the last 10 calendar day since products had a sale:
SELECT product_id, AVG(sales)
FROM table t
JOIN (
SELECT product_id, MAX(sales_date) as max_sales_date
FROM table
GROUP BY product_id
) t_max ON t.product_id = t_max.product_id
AND DATEDIFF(day, t.sales_date, t_max.max_sales_date) < 10
GROUP BY product_id;
The date difference is SQL server specific, you'd have to replace it with your server syntax for date difference functions.
To get the last 10 days when the product had any sale:
SELECT product_id, AVG(sales)
FROM (
SELECT product_id, sales, DENSE_RANK() OVER
(PARTITION BY product_id ORDER BY sales_date DESC) AS rn
FROM Table
) As t_rn
WHERE rn <= 10
GROUP BY product_id;
This asumes sales_date is a date, not a datetime. You'd have to extract the date part if the field is datetime.
And finaly a windowing function free version:
SELECT product_id, AVG(sales)
FROM Table t
WHERE sales_date IN (
SELECT TOP(10) sales_date
FROM Table s
WHERE t.product_id = s.product_id
ORDER BY sales_date DESC)
GROUP BY product_id;
Again, sales_date is asumed to be date, not datetime. Use other limiting syntax if TOP is not suported by your server.
Give this a whirl. The sub-query selects the last ten days of a product where there was a sale, the outer query does the aggregation.
SELECT t1.product_id, SUM(t1.sales) / COUNT(t1.*)
FROM table t1
INNER JOIN (
SELECT TOP 10 day, Product_ID
FROM table t2
WHERE (t2.product_ID=t1.Product_ID)
ORDER BY DAY DESC
)
ON (t2.day=t1.day)
GROUP BY t1.product_id
BTW: This approach uses a correlated subquery, which may not be very performant, but it should work in theory.
I'm not sure if I get it right but If you'd like to get the average of sales for last 10 days for you products you can do as follows :
SELECT Product_Id,Sum(Sales)/Count(*) FROM (SELECT ProductId,Sales FROM Table WHERE SaleDAte>=#Date) table GROUP BY Product_id HAVING Count(*)>0
OR You can use AVG Aggregate function which is easier :
SELECT Product_Id,AVG(Sales) FROM (SELECT ProductId,Sales FROM Table WHERE SaleDAte>=#Date) table GROUP BY Product_id
Updated
Now I got what you meant ,As far as I know it is not possible to do this in one query.It could be possible if we could do something like this(Northwind database):
select a.CustomerId,count(a.OrderId)
from Orders a INNER JOIN(SELECT CustomerId,OrderDate FROM Orders Order By OrderDate) AS b ON a.CustomerId=b.CustomerId GROUP BY a.CustomerId Having count(a.OrderId)<10
but you can't use order by in subqueries unless you use TOP which is not suitable for this case.But maybe you can do it as follows:
SELECT PorductId,Sales INTO #temp FROM table Order By Day
select a.ProductId,Sum(a.Sales) /Count(a.Sales)
from table a INNER JOIN #temp AS b ON a.ProductId=b.ProductId GROUP BY a.ProductId Having count(a.Sales)<=10
If this is a table of sales transactions, then there should not be any rows in there for days on which there were no Sales. I.e., If ProductId 21 had no sales on 1 June, then this table should not have any rows with productId = 21 and day = '1 June'... Therefore you should not have to filter anything out - there should not be anything to filter out
Select ProductId, Avg(Sales) AvgSales
From Table
Group By ProductId
should work fine. So if it's not, then you have not explained the problem completely or accurately.
Also, in yr question, you show Avg(Sales) in the example SQL query but then in the text you mention "average number of sales that each product ... " Do you want the average sales amount, or the average count of sales transactions? And do you want this average by Product alone (i.e., one output value reported for each product) or do you want the average per product per day ?
If you want the average per product alone, for just thpse sales in the ten days prior to now? or the ten days prior to the date of the last sale for each product?
If the latter then
Select ProductId, Avg(Sales) AvgSales
From Table T
Where day > (Select Max(Day) - 10
From Table
Where ProductId = T.ProductID)
Group By ProductId
If you want the average per product alone, for just those sales in the ten days with sales prior to the date of the last sale for each product, then
Select ProductId, Avg(Sales) AvgSales
From Table T
Where (Select Count(Distinct day) From Table
Where ProductId = T.ProductID
And Day > T.Day) <= 10
Group By ProductId