Difference between multiple dates - sql

I am working in a database with multiple orders of multiple suppliers. Now I would like to know the difference in days between order 1 and order 2, order 2 and order 3, order 3 and order 4 and so on.. For each supplier on its own. I need this to generate the Standard Deviation for each supplier based on their days between orders.
Hopefully someone can help..

What you describe is lag() with aggregation:
select supplier,
stddev(orderdate - prev_orderdate) as std_orderdate
from (select t.*,
lag(orderdate) over (partition by supplier order by orderdate) as prev_orderdate
from t
) t
group by supplier;

You would typically use window function lag() and date arithmetics.
Assuming the following data structure for table orders:
order_id int primary key
supplier_id int
order_date date
You would go:
select
i.*,
order_date
- lag(order_date) over(partition by supplier_id order by order_date) date_diff
from orders o
Which gives you, for each order, the difference in days from the previous order of the same supplier (or null if this is the first order of the supplier).
You can then compute the standard deviation with aggregation:
select supplier_id, stddev(date_diff)
from (
select
o.*,
order_date
- lag(order_date) over(partition by supplier_id order by order_date) date_diff
from orders o
) x
group by supplier_id

Related

Calculating Top N items per dimension

I have the following query that shows total sales for each product on an hourly basis. However, it is very big data and I don't want to see all products, so would like to see the top 1000 product_id based on sales for each date, hour, and category_id dimensions.
SELECT date,
hour,
category_id,
product_id,
sum(sales) AS sales
FROM a
LEFT JOIN
ON a.product_id = b.product_id
WHERE date(date) >= date('2021-01-01')
GROUP BY 1, 2, 3, 4
How to do it in the Athena?
Thanks in advance.
You can use rank function on your result and then filter out corresponding ranks:
SELECT date,
hour,
category_id,
product_id,
sales
FROM
(
SELECT *,
rank() OVER (PARTITION BY date, hour, category_id
ORDER BY sales DESC) AS rnk
FROM (your query)
)
WHERE rnk <= 1000

Trouble getting SQL Server subquery to pick desired results

I am given a database to use in SQL server.
The tables are:
Price (prodID, from, price)
Product (prodID, name, quantity)
PO (prodID, orderID, amount)
Order (orderID, date, address, status, trackingNumber, custID,
shipID)
Shipping (shipID, company, time, price)
Customer (custID, name)
Address (addrID, custID, address)
I need to Determine the ID and current price of each product.
The from attribute in the Price table are the dates that the prices were updated i.e. each ID in the table has multiple prices and dates associated with them but there is no common date between all of the IDs and the dates are in the 'YYYY-MM-DD' format and range is from 2018 to 2019-12-31.
My current query looks like:
select distinct p.prodID, p.price
from Price as p
where p.[from] >= '2019-12-23' and p.[from] in (select [from]
from Price
group by [from]
having max([from]) <= '2019-12-31')
order by p.prodID;
which returns a table with multiple prices for some of the IDs and also excludes other IDs altogether.
I was told that I needed a subquery to perform this.
I believe that I may be being too specific in my query to produce the desired results.
My main goal is to fix my current query to select one of each prodID and price from the most recent from date.
One option uses window functions:
select *
from (
select p.*, row_number() over(partition by p.prodid order by p.from desc) rn
from price p
where p.from <= convert(date, getdate())
) t
where rn = 1
This returns the latest row for each prodid where from is not greater that the current date.
As an alternative, you could also use with ties:
select top (1) with ties p.*
from price p
where p.from <= convert(date, getdate())
order by row_number() over(partition by p.prodid order by p.from desc)

SQL: Difference between consecutive rows

Table with 3 columns: order id, member id, order date
Need to pull the distribution of orders broken down by No. of days b/w 2 consecutive orders by member id
What I have is this:
SELECT
a1.member_id,
count(distinct a1.order_id) as num_orders,
a1.order_date,
DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1
inner join orders as a2
on a2.member_id = a1.member_id+1;
It's not helping me completely as the output I need is:
You can use lag() to get the date of the previous order by the same customer:
select o.*,
datediff(
order_date,
lag(order_date) over(partition by member_id order by order_date, order_id)
) days_diff
from orders o
When there are two rows for the same date, the smallest order_id is considered first. Also note that I fixed your datediff() syntax: in Hive, the function just takes two dates, and no unit.
I just don't get the logic you want to compute num_orders.
May be something like this:
SELECT
a1.member_id,
count(distinct a1.order_id) as num_orders,
a1.order_date,
DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1
inner join orders as a2
on a2.member_id = a1.member_id
where not exists (
select intermediate_order
from orders as intermedite_order
where intermediate_order.order_date < a1.order_date and intermediate_order.order_date > a2.order_date) ;

Query on MAX on date column, and COUNT of another column

I performed the following query with cte's, but I was wondering if there was a simpler way of writing the code, maybe with subqueries? I'm retrieving everything from one table SALES, but I'm using 3 columns: AgentID, SaleDate, and OrderID.
WITH RECENT_SALE AS(
SELECT AGENTID,(
SALEDATE,
ROW_NUMBER() OVER (PARTITION BY AGENTID ORDER BY SALEDATE DESC) AS RN
FROM SALES
)
,
COUNT_SALE AS (
SELECT AGENTID,
COUNT(ORDERID) AS COUNTORDERS
FROM SALES
)
SELECT RECENT_SALE.MRN,
SALEDATE,
COUNTORDERS
FROM RECENT_SALE
INNER JOIN COUNT_SALE ON RECENT_SALE.AGENTID = COUNT_SALE.AGENTID;
It looks to me like you're just trying to get the total number of sales per agent as well as the date of his or her most recent sale? If I understand your structure correctly (and I may not), then it seems pretty straightforward. I'm guessing orderid is the primary key of SALES?
SELECT agentid, MAX(saledate) AS saledate -- Most recent sale date
, COUNT(orderid) AS countsales -- total sales
FROM sales
GROUP BY agentid;
There does not seem to be any need for CTEs or subqueries here.
Try this:
SELECT
saledate,
AGENTID,
count(orderid) over(partition by AGENTID order by saledate)
FROM SALES
group by
saledate,
AGENTID

Postgresql: How to use a WITH subquery with JOIN

I have 2 tables: orders and contragents. Each contragent might have many orders. Each order has an order_date. I want to get a first order date for each contragent, but with a caveat: if there was a gap between orders more than 180 days, I need to "forget" those before the gap (and thus the first order after the gap is considered "the first".
For this, I've implement a following statement:
with o1 as (
select order_date, lag(order_date) over(order by order_date ASC) as prev_order_date
from orders o
where o.contragent_code = :code
order by order_date desc)
select o1.date_debts from o1
where extract(day from o1.order_date-o1.prev_order_date)>=180 or o1.prev_order_date is null
order by order_date desc
limit 1
this results in a single value being returned for a contragent with code code, which is what I need.
But I cannot figure out how to run a select that would return this date for every contragent in a table!
The only way I was able to do it was using a CREATE FUNCTION, but I will be unable to do it on production, so.. any advice is highly appreciated!
You want to add partition by, which is kinda like group by for over.
with o1 as (
select order_date, lag(order_date) over(partition by contragent_code order by order_date ASC) as prev_order_date
from orders o
order by order_date desc)
select o1.date_debts from o1
where extract(day from o1.order_date-o1.prev_order_date)>=180 or o1.prev_order_date is null
order by order_date desc
Now lag looks for the previous order_date of rows with same contragent_code.
UPDATE: at the end, it appears that that was not exactly enough. This is the final statement:
with s as (
select o.contragent_code, o.order_date,
case
when
extract(day from order_date-lag(order_date) over(partition by contragent_code order by order_date asc))>=180
then o.order_date else null
end as date_with_gap
from orders o
) select contragent_code, coalesce(max(date_with_gap), min(order_date)) from s
group by contragent_code