Query item with closest date based on current date - sql

I am trying to get the closest date for item no and price based on the current date. The query is giving me output, but not the way I want.
There is a different price for the same item and it's not filtering.
Here's my query:
SELECT distinct [ITEM_NO]
,min(REQUIRED_DATE) as Date
,[PRICE]
FROM [DATA_WAREHOUSE].[app].[OHCMS_HOPS_ORDERS]
where (REQUIRED_DATE) >= GETDATE() and PRICE is not null
group by ITEM_NO,PRICE
order by ITEM_NO
Any Ideas?

You can try to use ROW_NUMBER window function to make it.
SELECT ITEM_NO,
REQUIRED_DATE,
PRICE
FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY ITEM_NO ORDER BY REQUIRED_DATE) rn
FROM DATA_WAREHOUSE].[app].[OHCMS_HOPS_ORDERS]
where REQUIRED_DATE >= GETDATE() and PRICE is not null
)t1
WHERE rn = 1

Could you order by the the absolute value of DATEDIFF?
ORDER BY ABS(DATEDIFF(day, REQUIRED_DATE, GETDATE()))

This seems like an iteration of the greatest-n-per-group problem
I'm not quite certain what constraints you're looking to impose
Largest Date
Most Recent Date (but not in future)
Closest Date to today (past or present)
Here's an example table and which row we'd want if queried on 6/3/2019:
| Item | RequiredDate | Price |
|------|--------------|-------|
| A | 2019-05-29 | 10 |
| A | 2019-06-01 | 20 | <-- #2
| A | 2019-06-04 | 30 | <-- #3
| A | 2019-06-05 | 40 | <-- #1
| B | 2019-06-01 | 80 |
But I'm going to guess you're looking for #2
We can identify we the row / largest date by grouping by item and using an aggregate operation like MAX on each group
SELECT o.Item, MAX(o.RequiredDate) AS MostRecentDt
FROM Orders o
WHERE o.RequiredDate <= GETDATE()
GROUP BY o.Item
Which returns this:
| Item | MostRecentDt |
|------|--------------|
| A | 2019-05-29 |
| A | 2019-06-01 |
| B | 2019-06-01 |
However, once we've grouped by that record, the trouble is then in joining back to the original table to get the full row/record in order to select any other information not part of the original GROUP BY statement
Using ROW_NUMBER we can sort elements in a set, and indicate their order (highest...lowest)
SELECT *, ROW_NUMBER() OVER(PARTITION BY Item ORDER BY RequiredDate DESC) rn
FROM Orders o
WHERE o.RequiredDate <= GETDATE()
| Item | RequiredDate | Price | rn |
|------|--------------|-------|----|
| A | 2019-05-29 | 10 | 1 |
| A | 2019-06-01 | 20 | 2 |
| B | 2019-06-01 | 80 | 1 |
Since we've sorted DESC, now we just want to query this group to get the most recent values per group (rn=1)
WITH OrderedPastItems AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY Item ORDER BY RequiredDate DESC) rn
FROM Orders o
WHERE o.RequiredDate <= GETDATE()
)
SELECT *
FROM OrderedPastItems
WHERE rn = 1
Here's a MCVE in SQL Fiddle
Further Reading:
SQL selecting rows by most recent date
Select row with most recent date per user

Related

Count distinct customers over rolling window partition

My question is similar to redshift: count distinct customers over window partition but I have a rolling window partition.
My query looks like this but distinct within COUNT in Redshift is not supported
select p_date, seconds_read,
count(distinct customer_id) over (order by p_date rows between unbounded preceding and current row) as total_cumulative_customer
from table_x
My goal is to calculate total unique customer up to every date (hence rolling window).
I tried using the dense_rank() approach but it would simply fail since I cannot use window function like this
select p_date, max(total_cumulative_customer) over ()
(select p_date, seconds_read,
dense_rank() over (order by customer_id rows between unbounded preceding and current row) as total_cumulative_customer -- WILL FAIL HERE
from table_x
Any workaround or different approach would be helpful!
EDIT:
INPUT DATA sample
+------+----------+--------------+
| Cust | p_date | seconds_read |
+------+----------+--------------+
| 1 | 1-Jan-20 | 10 |
| 2 | 1-Jan-20 | 20 |
| 4 | 1-Jan-20 | 30 |
| 5 | 1-Jan-20 | 40 |
| 6 | 5-Jan-20 | 50 |
| 3 | 5-Jan-20 | 60 |
| 2 | 5-Jan-20 | 70 |
| 1 | 5-Jan-20 | 80 |
| 1 | 5-Jan-20 | 90 |
| 1 | 7-Jan-20 | 100 |
| 3 | 7-Jan-20 | 110 |
| 4 | 7-Jan-20 | 120 |
| 7 | 7-Jan-20 | 130 |
+------+----------+--------------+
Expected Output
+----------+--------------------------+------------------+--------------------------------------------+
| p_date | total_distinct_cum_cust | sum_seconds_read | Comment |
+----------+--------------------------+------------------+--------------------------------------------+
| 1-Jan-20 | 4 | 100 | total distinct cust = 4 i.e. 1,2,4,5 |
| 5-Jan-20 | 6 | 450 | total distinct cust = 6 i.e. 1,2,3,4,5,6 |
| 7-Jan-20 | 7 | 910 | total distinct cust = 6 i.e. 1,2,3,4,5,6,7 |
+----------+--------------------------+------------------+--------------------------------------------+
For this operation:
select p_date, seconds_read,
count(distinct customer_id) over (order by p_date rows between unbounded preceding and current row) as total_cumulative_customer
from table_x;
You can do pretty much what you want with two levels of aggregation:
select min_p_date,
sum(count(*)) over (order by min_p_date rows between unbounded preceding and current row) as running_distinct_customers
from (select customer_id, min(p_date) as min_p_date
from table_x
group by customer_id
) c
group by min_p_date;
Summing the seconds read as well is a bit tricky, but you can use the same idea:
select p_date,
sum(sum(seconds_read)) over (order by p_date rows between unbounded preceding and current row) as seconds_read,
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by p_date rows between unbounded preceding and current row) as running_distinct_customers
from (select customer_id, p_date, seconds_read,
row_number() over (partition by customer_id order by p_date) as seqnum
from table_x
) c
group by min_p_date;
One workaround uses a subquery:
select p_date, seconds_read,
(
select count(distinct t1.customer_id)
from table_x t1
where t1.p_date <= t.p_date
) as total_cumulative_customer
from table_x t
I'd like to add that you can also accomplish this with an explicit self join which is, in my opinion, more straightforward and readable than the subquery approaches described in the other answers.
select
t1.p_date,
sum(t2.seconds_read) as sum_seconds_read,
count(distinct t2.customer_id) as distinct_cum_cust_totals
from
table_x t1
join
table_x t2
on
t2.date <= t1.date
group by
t1.date
Most query planners will reduce a correlated subquery like in the solutions above to an efficient join like this, so either solution is usually fine, but for the general case, I believe this is a better solution since some engines (like BigQuery) won't allow correlated subqueries and will force you to explicitly define the join in your query.

Getting date, and count of unique customers when first order was placed

I have a table called orders that looks like this:
+--------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| memberid | int(11) | YES | | NULL | |
| deliverydate | date | YES | | NULL | |
+--------------+---------+------+-----+---------+-------+
And that contains the following data:
+------+----------+--------------+
| id | memberid | deliverydate |
+------+----------+--------------+
| 1 | 991 | 2019-10-25 |
| 2 | 991 | 2019-10-26 |
| 3 | 992 | 2019-10-25 |
| 4 | 992 | 2019-10-25 |
| 5 | 993 | 2019-10-24 |
| 7 | 994 | 2019-10-21 |
| 6 | 994 | 2019-10-26 |
| 8 | 995 | 2019-10-26 |
+------+----------+--------------+
I would like a result set returning each unique date, and a separate column showing how many customers that placed their first order that day.
I'm having problems with querying this the right way, especially when the data consists of multiple orders the same day from the same customer.
My approach has been to
Get all unique memberids that placed an order during the time period I want to look at
Filter out the ones that placed their first order during the period by comparing the memberids that has placed an order before the timeperiod
Grouping by delivery date, and counting all unique memberids (but this obviously counts unique memberids each day individually!)
Here's the corresponding SQL:
SELECT deliverydate,COUNT(DISTINCT memberid) FROM orders
WHERE
MemberId IN (SELECT DISTINCT memberid FROM orders WHERE deliverydate BETWEEN '2019-10-25' AND '2019-10-26')
AND NOT
MemberId In (SELECT DISTINCT memberid FROM orders WHERE deliverydate < '2019-10-25')
GROUP BY deliverydate
ORDER BY deliverydate ASC;
But this results in the following with the above data:
+--------------+--------------------------+
| deliverydate | COUNT(DISTINCT memberid) |
+--------------+--------------------------+
| 2019-10-25 | 2 |
| 2019-10-26 | 2 |
+--------------+--------------------------+
The count for 2019-10-26 should be 1.
Appreciate any help :)
You can aggregate twice:
select first_deliverydate, count(*) cnt
from (
select min(deliverydate) first_deliverydate
from orders
group by memberid
) t
group by first_deliverydate
order by first_deliverydate
The subquery gives you the first order data of each member, then the outer query aggregates and counts by first order date.
This demo on DB Fiddle with your sample data returns:
first_deliverydate | cnt
:----------------- | --:
2019-10-21 | 1
2019-10-24 | 1
2019-10-25 | 2
2019-10-26 | 1
In MySQL 8.0, This can also be achieved with window functions:
select deliverydate first_deliverydate, count(*) cnt
from (
select deliverydate, row_number() over(partition by memberid order by deliverydate) rn
from orders
) t
where rn = 1
group by deliverydate
order by deliverydate
Demo on DB Fiddle
you have first to figure out when was the first delivery date:
SELECT firstdeliverydate,COUNT(DISTINCT memberid) FROM (
select memberid, min(deliverydate) as firstdeliverydate
from orders
WHERE
MemberId IN (SELECT DISTINCT memberid FROM orders WHERE deliverydate BETWEEN '2019-10-25' AND '2019-10-26')
AND NOT
MemberId In (SELECT DISTINCT memberid FROM orders WHERE deliverydate < '2019-10-25')
group by memberid)
t1
group by firstdeliverydate
Get the first order of each customer with NOT EXISTS and then GROUP BY deliverydate to count the distinct customers who placed their order:
select o.deliverydate, count(distinct o.memberid) counter
from orders o
where not exists (
select 1 from orders
where memberid = o.memberid and deliverydate < o.deliverydate
)
group by o.deliverydate
See the demo.
Results:
| deliverydate | counter |
| ------------------- | ------- |
| 2019-10-21 00:00:00 | 1 |
| 2019-10-24 00:00:00 | 1 |
| 2019-10-25 00:00:00 | 2 |
| 2019-10-26 00:00:00 | 1 |
But if you want results for all the dates in the table including those dates where there where no orders from new customers (so the counter will be 0):
select d.deliverydate, count(distinct o.memberid) counter
from (
select distinct deliverydate
from orders
) d left join orders o
on o.deliverydate = d.deliverydate and not exists (
select 1 from orders
where memberid = o.memberid and deliverydate < o.deliverydate
)
group by d.deliverydate

Oracle query to find start and end dates of active status records

I have data like following
+------+-----------+----------+-------+------------+
| Code | StartDate | EndDate | Unit | CodeStatus |
+------+-----------+----------+-------+------------+
| 1001 | 20100101 | 20101231 | UnitA | Active |
| 1001 | 20110101 | 20151231 | UnitB | Active |
| 1001 | 20160101 | 21000101 | UnitB | Inactive |
| 1002 | 20160101 | 20181231 | UnitA | Active |
| 1002 | 20190101 | 21000101 | UnitA | Inactive |
| 1003 | 20140101 | 21000101 | UnitC | Active |
+------+-----------+----------+-------+------------+
If we look at first code(1001) there are two active records, in output I want least start date and highest end date of active records. Something like below
+------+-----------+----------+----------+
| Code | StartDate | EndDate | Status |
+------+-----------+----------+----------+
| 1001 | 20100101 | 20151231 | Inactive |
| 1002 | 20181231 | 20181231 | Inactive |
| 1003 | 20140101 | 21000101 | Active |
+------+-----------+----------+----------+
This table has around million records and I pull data using api, so performance also a matter.
Can someone please help me with query.
You seem to want the period when the code is active and the current status. This is a basic aggregation query with a twist:
select code,
min(case when codestatus = 'Active' then start_date end) as active_start_date,
max(case when codestatus = 'Active' then end_date end) as active_end_date,
max(codestatus) keep (dense_rank first order by start_date desc) as current_code_status
from codes
group by code;
keep is a nice feature in Oracle that is essentially an aggregate first_value() function.
you need just max and min aggregate function
with cte as
(
select code,min(StartDate) as mStartDate,
max(EndDate) as mEndDate from table
where CodeStatus='Active' group by code
)
,cte1 as
(
select CodeStatus,code,
row_number()over(partition by code order by StartDate desc) rn
from table
) select cte.*,cte1.CodeStatus
from cte join cte1 on cte.code=cte1.code
where cte1.rn=1
You can use something like this to get the start and end dates as well as the current status for your codes. Performance will be determined by what you limit the selection in the WITH section to either by code or by date.
WITH CodeDates AS
(
select
code,
min(StartDate) startdate,
max(EndDate) enddate
from table
Group by code
)
Select c.code,
c.startdate
c.enddate,
t.codestatus
From CodeDates c
Join table t
on t.code = c.code
and t.enddate = c.enddate
To get the expected result you posted from the sample data you posted, you could use conditional aggregation. I.e. only pass the dates to min() or max() if the status is active. Like that you get your minimum and maximum. To get the current status check if the current time is after or at the start time and before the end time. If it is pass the status to e.g. max().
SELECT code,
to_char(min(CASE
WHEN codestatus = 'Active' THEN
startdate
END), 'YYYYMMDD') startdate,
to_char(min(CASE
WHEN codestatus = 'Active' THEN
enddate
END), 'YYYYMMDD') enddate,
max(CASE
WHEN startdate <= sysdate
AND enddate > sysdate THEN
codestatus
END) status
FROM elbat
GROUP BY code
ORDER BY code;
db<>fiddle
But I suspect there may be more to it. What if there are more active periods than one? Is it correct to take the start of the earliest and the end of the latest? What if there's more periods matching current time, which one determines the current status?

Getting max and latest rows in SQL

I have a table containing Orders, where in the same day multiple orders can be created for a given Name. I need to return the latest Order for a given date and name, and if there are multiple orders on that day for a name, return the one with the largest order value.
Sample data:
ID | NAME | OrderDate | OrderValue
----+------+--------------+--------------
1 | A | 2019-01-15 | 100
2 | B | 2019-01-15 | 200
3 | A | 2019-01-15 | 150
4 | C | 2019-01-17 | 450
5 | D | 2019-01-18 | 300
6 | C | 2019-01-17 | 500
Result returned should be:
ID | NAME | OrderDate | OrderValue
----+------+--------------+--------------
2 | B | 2019-01-15 | 200
3 | A | 2019-01-15 | 150
5 | D | 2019-01-18 | 300
6 | C | 2019-01-17 | 500
I can do this in multiple SQL queries, but is there a simplistic query to achieve the above result?
Starting SQL Server 2005, just use ROW_NUMBER():
SELECT ID, Name, OrderDate, OrderValue
FROM (
SELECT
o.*,
ROW_NUMBER() OVER(PARTITION BY Name, OrderDate ORDER BY OrderValue DESC) rn
FROM orders o
) x WHERE rn = 1
ROW_NUMBER() assigns a rank to each record within groups of records having the same Name and OrderDate, sorted by OrderValue. The record with the highest order value gets row number 1.
With older versions, a solution to filter the table is to use a correlated subquery with a NOT EXITS condition :
SELECT ID, Name, OrderDate, OrderValue
FROM orders o
WHERE NOT EXISTS (
SELECT 1
FROM orders o1
WHERE
o1.Name = o.Name
AND o1.OrderDate = o.OrderDate
AND o1.OrderValue > o.OrderValue
)
The NOT EXISTS condition ensures that there is no other record with a highest OrderValue for the same Name and OrderDate.
Use cross apply:
select o.id, name, orderdate, o.ordervalue
from orders o
cross apply (select top 1 id, ordervalue from orders where name=o.name and orderdate=o.OrderDate order by ordervalue desc) oo
where o.id=oo.id
order by o.id

SELECT based on multiple fields in MS-SQL

I have a table with 4 columns:
AcctNumb | PeriodEndingDate | WaterConsumption | ReadingType
There are multiple records for each AcctNumb, with the date that each record was recorded.
What I want to do is grab the most recent date, consumption reading, and reading type for each account.
I have tried using MAX(PeriodEndingDate) and GROUP BY AcctNumb, but I would need to aggregate all the other values, and none of the aggregate functions help me for the WaterConsumption, etc.
Can anyone point me in the right direction?
Thanks
EDIT
Here is a sample table
+----------+------------------+------------------+-------------+
| AcctNumb | PeriodEndingDate | WaterConsumption | ReadingType |
+----------+------------------+------------------+-------------+
| 1000 | 2018-03-31 | 122230 | A |
| 1001 | 2018-03-31 | 24850 | A |
| 1002 | 2018-03-31 | 88540 | A |
| 1000 | 2017-12-31 | 123800 | A |
| 1001 | 2017-12-31 | 3000 | E |
+----------+------------------+------------------+-------------+
The ReadingType is whether it's an actual (A) reading, or an estimate (E).
Try this
SELECT
AcctNumb,
PeriodEndingDate,
WaterConsumption,
ReadingType
FROM (SELECT
AcctNumb,
PeriodEndingDate,
WaterConsumption,
ReadingType,
ROW_NUMBER() OVER (PARTITION BY AcctNumb ORDER BY PeriodEndingDate DESC) AS MostrecentRecord
FROM <TableName>) dt
WHERE MostrecentRecord= 1
This can be done using ROW_NUMBER. It has been asked an answered thousands of times but the query is easier to write than find a duplicate.
select *
from
(
select *
, RowNum = ROW_NUMBER() over(partition by AcctNumb order by PeriodEndingDate)
from YourTable
) x
where x.RowNum = 1
SELECT DQ.* FROM
(SELECT *,
Row_Number() OVER (PARTITION BY AcctNumb ORDER BY PeriodEndingDate DESC) AS RN
FROM YourTable
) AS DQ
WHERE DQ.RN = 1