I have weekly data of each product stock. I want to group it by year-month and get the first value of each month. In other words, I want to get the opening stock of each month, regardless the day of the month.
+------------+---------+
| MyDate | MyValue |
+------------+---------+
| 2018-01-06 | 2 |*
| 2018-01-13 | 7 |
| 2018-01-20 | 5 |
| 2018-01-27 | 2 |
| 2018-02-03 | 3 |*
| 2018-02-10 | 10 |
| 2018-02-17 | 6 |
| 2018-02-24 | 4 |
| 2018-03-03 | 7 |*
| 2018-03-10 | 5 |
| 2018-03-17 | 3 |
| 2018-03-24 | 4 |
| 2018-03-31 | 6 |
+------------+---------+
Desired results:
+----------------+---------+
| FirstDayOfMonth| MyValue |
+----------------+---------+
| 2018-01-01 | 2 |
| 2018-02-01 | 3 |
| 2018-03-01 | 7 |
+----------------+---------+
I thought this might work, but it ain't.
select
[product],
datefromparts(year([MyDate]), month([MyDate]), 1),
FIRST_VALUE(MyValue) OVER (PARTITION BY [Product], YEAR([MyDate]), MONTH([MyDate]) ORDER BY [MyDate] ASC) AS MyValue
from
MyTable
group by
[Product],
YEAR([MyDate]), MONTH([MyDate])
Edit. Thank you. The accent in my question is not how to get the first day of the month. I know that there are different techniques for that.
The accent is how to get the FIRST value in month (the opening stock). If there is a chance to get the closing stock in one shot - it would be great. The answers based on ROW_NUMBER do not allow to get closing stock in one shot, would require two joins.
Edit after accepting answer
Please consider John Cappelletti's answer as an alternative to the accepted one: https://stackoverflow.com/a/53559750/1903793
You don't really need the GROUP BY if you have chosen the window function route:
SELECT Product, DATEADD(DAY, 1, EOMONTH(MyDate, -1)) AS Month, MyValue
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Product, DATEADD(DAY, 1, EOMONTH(MyDate, -1)) ORDER BY MyDate) AS rn
FROM t
) AS x
WHERE rn = 1
UPDATE
To get the last row for the month just do a UNION ALL <above query> but change the order by clause to ORDER BY MyDate DESC. This will give you two rows per product-month.
You can use apply & eomonth to find the last day of month & add one day :
select distinct dateadd(day, 1, eomonth(t1.mydate, -1)) as FistDayOfMonth, t1.myvalue
from table t cross apply
( select top (1) t1.mydate, t1.myvalue
from table t1
where t1.product = t.product and
year(t1.MyDate) = year(t.MyDate) and month(t1.MyDate) = month(t.MyDate)
order by t1.mydate
) t1;
Could also use a rowNumber and cte.
DEMO
WITH CTE as (
SELECT '2018-01-06' myDate, 2 Myvalue UNION ALL
SELECT '2018-01-13', 7 UNION ALL
SELECT '2018-01-20', 5 UNION ALL
SELECT '2018-01-27', 2 UNION ALL
SELECT '2018-02-03', 3 UNION ALL
SELECT '2018-02-10', 10 UNION ALL
SELECT '2018-02-17', 6 UNION ALL
SELECT '2018-02-24', 4 UNION ALL
SELECT '2018-03-03', 7 UNION ALL
SELECT '2018-03-10', 5 UNION ALL
SELECT '2018-03-17', 3 UNION ALL
SELECT '2018-03-24', 4 UNION ALL
SELECT '2018-03-31', 6),
CTE2 as (SELECT *
, Row_Number() over (partition by DATEADD(month, DATEDIFF(month, 0, MyDate), 0) order by myDate) RN
FROM CTE)
SELECT DATEADD(month, DATEDIFF(month, 0, MyDate), 0), MyValue
FROM cte2
WHERE RN = 1
Giving us:
+----+---------------------+---------+
| | (No column name) | MyValue |
+----+---------------------+---------+
| 1 | 01.01.2018 00:00:00 | 2 |
| 2 | 01.02.2018 00:00:00 | 3 |
| 3 | 01.03.2018 00:00:00 | 7 |
+----+---------------------+---------+
Just another option is using the WITH TIES, and then a little cheat for the date
Example
Select top 1 with ties
MyDate = convert(varchar(7),MyDate,120)+'-01'
,MyValue
from YourTable
Order By Row_Number() over (Partition By convert(varchar(7),MyDate,120) Order By MyDate)
Returns
MyDate MyValue
2018-01-01 2
2018-02-01 3
2018-03-01 7
Related
I am new to SQL and trying to write a statement similar to a 'for loop' in other languages and am stuck. I want to filter out rows of the table where for all of attribute 1, attribute2=attribute3 without using functions.
For example:
| Year | Month | Day|
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 4 | 4 |
| 2 | 3 | 4 |
| 2 | 3 | 3 |
| 2 | 4 | 4 |
| 3 | 4 | 4 |
| 3 | 4 | 4 |
| 3 | 4 | 4 |
I would only want the row
| Year | Month | Day|
|:---- |:------:| -----:|
| 3 | 4 | 4 |
because it is the only where month and day are equal for all of the values of year they share.
So far I have
select year, month, day from dates
where month=day
but unsure how to apply the constraint for all of year
-- month/day need to appear in aggregate functions (since they are not in the GROUP BY clause),
-- but the HAVING clause ensure we only have 1 month/day value (per year) here, so MIN/AVG/SUM/... would all work too
SELECT year, MAX(month), MAX(day)
FROM my_table
GROUP BY year
HAVING COUNT(DISTINCT (month, day)) = 1;
year
max
max
3
4
4
View on DB Fiddle
So one way would be
select distinct [year], [month], [day]
from [Table] t
where [month]=[day]
and not exists (
select * from [Table] x
where t.[year]=x.[year] and t.[month] <> x.[month] and t.[day] <> x.[day]
)
And another way would be
select distinct [year], [month], [day] from (
select *,
Lead([month],1) over(partition by [year] order by [month])m2,
Lead([day],1) over(partition by [year] order by [day])d2
from [table]
)x
where [month]=m2 and [day]=d2
Hi have a table A with the following data:
+------+-------+----+--------+
| YEAR | MONTH | PA | AMOUNT |
+------+-------+----+--------+
| 2020 | 1 | N | 100 |
+------+-------+----+--------+
| 2020 | 2 | N | 100 |
+------+-------+----+--------+
| 2020 | 3 | O | 100 |
+------+-------+----+--------+
| 2020 | 4 | N | 100 |
+------+-------+----+--------+
| 2020 | 5 | N | 100 |
+------+-------+----+--------+
| 2020 | 6 | O | 100 |
+------+-------+----+--------+
I'd like to have the following result:
+---------+---------+--------+
| FROM | TO | AMOUNT |
+---------+---------+--------+
| 2020-01 | 2020-02 | 200 |
+---------+---------+--------+
| 2020-03 | 2020-03 | 100 |
+---------+---------+--------+
| 2020-04 | 2020-05 | 200 |
+---------+---------+--------+
| 2020-06 | 2020-06 | 100 |
+---------+---------+--------+
My DB is DB2/400.
I have tried with ROW_NUMBER partitioning, subqueries but I can't figure out how to solve this.
I understand this as a gaps-and-island problem, where you want to group together adjacent rows that have the same PA.
Here is an approach using the difference between row numbers to build the groups:
select min(year_month) year_month_start, max(year_month) year_month_end, sum(amount) amount
from (
select a.*, year * 100 + month year_month
row_number() over(order by year, month) rn1,
row_number() over(partition by pa order by year, month) rn2
from a
) a
group by rn1 - rn2
order by year_month_start
You can try the below -
select min(year)||'-'||min(month) as from_date,max(year)||'-'||max(month) as to_date,sum(amount) as amount from
(
select *,row_number() over(order by month)-
row_number() over(partition by pa order by month) as grprn
from t1
)A group by grprn,pa order by grprn
This works in tsql, guess you can adaot it to db2-400?
SELECT MIN(Dte) [From]
, MAX(Dte) [To]
-- ,PA
, SUM(Amount)
FROM (
SELECT year * 100 + month Dte
, Pa
, Amount
, ROW_NUMBER() OVER (PARTITION BY pa ORDER BY year * 100 + month) +
10000- (YEar*100+Month) rn
FROM tabA a
) b
GROUP BY Pa
, rn
ORDER BY [From]
, [To]
The trick is the row number function partitioned by PA and ordered by date, This'll count one up for each month for the, when added to the descnding count of month and month, you will get the same number for consecutive months with same PA. You the group by PA and the grouping yoou made, rn, to get the froups, and then Bob's your uncle.
I have below table, it shows user_id and ride_date.
+---------+------------+
| user_id | ride_date |
+---------+------------+
| 1 | 2019-11-01 |
| 1 | 2019-11-03 |
| 1 | 2019-11-05 |
| 2 | 2019-11-03 |
| 2 | 2019-11-04 |
| 2 | 2019-11-05 |
| 2 | 2019-11-06 |
| 3 | 2019-11-03 |
| 3 | 2019-11-04 |
| 3 | 2019-11-05 |
| 3 | 2019-11-06 |
| 4 | 2019-11-05 |
| 4 | 2019-11-07 |
| 4 | 2019-11-08 |
| 4 | 2019-11-09 |
| 5 | 2019-11-11 |
| 5 | 2019-11-13 |
+---------+------------+
I want user_id who took rides for 3 or more consecutive days along with days on which they took consecutive rides
The desired result is as below
+---------+-----------------------+
| user_id | consecutive_ride_date |
+---------+-----------------------+
| 2 | 2019-11-03 |
| 2 | 2019-11-04 |
| 2 | 2019-11-05 |
| 2 | 2019-11-06 |
| 3 | 2019-11-03 |
| 3 | 2019-11-04 |
| 3 | 2019-11-05 |
| 3 | 2019-11-06 |
| 4 | 2019-11-08 |
| 4 | 2019-11-09 |
| 4 | 2019-11-10 |
+---------+-----------------------+
SQL Fiddle
With LAG() and LEAD() window functions:
with cte as (
select *,
datediff(
day,
lag([ride_date]) over (partition by [user_id] order by [ride_date]),
[ride_date]
) prev1,
datediff(
day,
lag([ride_date], 2) over (partition by [user_id] order by [ride_date]),
[ride_date]
) prev2,
datediff(
day,
[ride_date],
lead([ride_date]) over (partition by [user_id] order by [ride_date])
) next1,
datediff(
day,
[ride_date],
lead([ride_date], 2) over (partition by [user_id] order by [ride_date])
) next2
from Table1
)
select [user_id], [ride_date]
from cte
where
(prev1 = 1 and prev2 = 2) or
(prev1 = 1 and next1 = 1) or
(next1 = 1 and next2 = 2)
See the demo.
Results:
> user_id | ride_date
> ------: | :---------
> 2 | 03/11/2019
> 2 | 04/11/2019
> 2 | 05/11/2019
> 2 | 06/11/2019
> 3 | 03/11/2019
> 3 | 04/11/2019
> 3 | 05/11/2019
> 3 | 06/11/2019
> 4 | 07/11/2019
> 4 | 08/11/2019
> 4 | 09/11/2019
Here is one way to adress this gaps-and-island problem:
first, assign a rank to each user ride with row_number(), and recover the previous ride_date (aliased lag_ride_date)
then, compare the date of the previous ride to the current one in a conditional sum, that increases when the dates are successive ; by comparing this with the rank of the user ride, you get groups (aliased grp) that represent consecutive rides with a 1 day spacing
do a window count how many records belong to each group (aliased cnt)
filter on records whose window count is greater than 3
Query:
select user_id, ride_date
from (
select
t.*,
count(*) over(partition by user_id, grp) cnt
from (
select
t.*,
rn1
- sum(case when ride_date = dateadd(day, 1, lag_ride_date) then 1 else 0 end)
over(partition by user_id order by ride_date) grp
from (
select
t.*,
row_number() over(partition by user_id order by ride_date) rn1,
lag(ride_date) over(partition by user_id order by ride_date) lag_ride_date
from Table1 t
) t
) t
) t
where cnt >= 3
Demo on DB Fiddle
This is a typical gaps and island problems.
We can solve it as follows
with data
as (
select user_id
,ride_date
,dateadd(day
,-row_number() over(partition by user_id order by ride_date asc)
,ride_date) as grp_field
from Table1
)
,consecutive_days
as(
select user_id
,ride_date
,count(*) over(partition by user_id,grp_field) as cnt
from data
)
select *
from consecutive_days
where cnt>=3
order by user_id,ride_date
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=7bb851d9a12966b54afb4d8b144f3d46
There is no need to apply gaps-and-islands methodologies to this problem. The problem is much simpler to solve.
You can return the users and first date just by using LEAD():
SELECT t1.*
FROM (SELECT t1.*,
LEAD(ride_date, 2) OVER (PARTITION BY user_id ORDER BY ride_date) as ride_date_2
FROM table1 t1
) t1
WHERE ride_date_2 = DATEADD(day, 2, ride_date);
If you want the actual dates, you can unpivot the results:
SELECT DISTINCT t1.user_id, v.ride_date
FROM (SELECT t1.*,
LEAD(ride_date, 2) OVER (PARTITION BY user_id ORDER BY ride_date) as ride_date_2
FROM table1 t1
) t1 CROSS APPLY
(VALUES (t1.ride_date),
(DATEADD(day, 1, t1.ride_date)),
(DATEADD(day, 2, t1.ride_date))
) v(ride_date)
WHERE t1.ride_date_2 = DATEADD(day, 2, t1.ride_date)
ORDER BY t1.user_id, v.ride_date;
I have a table called finance that I store all payment of the customer. The main columns are: ID,COSTUMERID,DATEPAID,AMOUNTPAID.
What I need is a list of dates by COSTUMERID with dates of its first payment and any other payment that is grater than 1 year of the last one. Example:
+----+------------+------------+------------+
| ID | COSTUMERID | DATEPAID | AMOUNTPAID |
+----+------------+------------+------------+
| 1 | 1 | 2015-01-10 | 10 |
| 2 | 1 | 2016-01-05 | 30 |
| 2 | 1 | 2017-02-20 | 30 |
| 3 | 2 | 2016-03-15 | 100 |
| 4 | 2 | 2017-02-15 | 100 |
| 5 | 3 | 2017-05-01 | 25 |
+----+------------+------------+------------+
What I expect as result:
+------------+------------+
| COSTUMERID | DATEPAID |
+------------+------------+
| 1 | 2015-01-01 |
| 1 | 2017-02-20 |
| 2 | 2016-03-15 |
| 3 | 2017-05-01 |
+------------+------------+
Costumer 1 have 2 dates: the first one + one more that have more then 1 year after the last one.
I hope I make my self clear.
I think you just want lag():
select t.*
from (select t.*,
lag(datepaid) over (partition by customerid order by datepaid) as prev_datepaid
from t
) t
where prev_datepaid is null or
datepaid > dateadd(year, 1, prev_datepaid);
Gordon's solution is correct, as long as you are only looking at the previous row (previous payment) diff, but I wonder if Antonio is looking for payments greater than one year from the last 1 year payment, in which case this becomes a more complex problem to solve. Take the following example:
CREATE TABLE #Test (
CustomerID smallint
,DatePaid date
,AmountPaid smallint )
INSERT INTO #Test
SELECT 1, '2015-1-10', 10
INSERT INTO #Test
SELECT 1, '2016-1-05', 30
INSERT INTO #Test
SELECT 1, '2017-2-20', 30
INSERT INTO #Test
SELECT 1, '2017-6-30', 50
INSERT INTO #Test
SELECT 1, '2018-3-5', 50
INSERT INTO #Test
SELECT 1, '2018-5-15', 50
INSERT INTO #Test
SELECT 2, '2016-3-15', 100
INSERT INTO #Test
SELECT 2, '2017-6-15', 100
WITH CTE AS (
SELECT
CustomerID
,DatePaid
,LAG(DatePaid) OVER (PARTITION BY CustomerID ORDER BY DatePaid) AS PreviousPaidDate
,AmountPaid
FROM #Test )
SELECT
*
,-DATEDIFF(DAY, DatePaid, PreviousPaidDate) AS DayDiff
,CASE WHEN DATEDIFF(DAY, PreviousPaidDate, DatePaid) >= 365 THEN 1 ELSE 0 END AS Paid
FROM CTE
Row number 5 is > 1 year from the last 1 year payment, but subtracting from previous row doesn't address this. This may or may not matter but I wanted to point it out in case that is what he means.
I am not sure if this is possible.... But can I pull data for 01-01-16 - 12/31/16 only if there was no data before then. So if they made a purchase before 01-01-16 I would not want to see any data. Bascially I am trying to find new customers for 2016. So in the example below John would NOT pull any data.
Name | Item # | Qty | Purch Date |
-----------+---------+-----+------------+
John | 12 | 7 | 2016-01-05 |
John | 22 | 14 | 2011-01-06 |
John | 11 | 9 | 2013-02-05 |
In this example it would pull all 3.
Name | Item # | Qty | Purch Date |
-----------+---------+-----+------------+
John | 12 | 7 | 2016-01-05 |
John | 22 | 14 | 2016-01-06 |
John | 11 | 9 | 2016-02-05 |
Standard SQL solution based on a Scalar Subquery:
select * from tab as t1
where purch_date between '2016-01-01' and '2016-12-31' -- purchase in 2016
and not exists -- but no purchase by the same customer in a previous year
( select * from tab as t2
where t1.name = t2.name
and t2.purch_date < '2016-01-01'
)
Use FIRST_VALUE function to get the first purchase date for each name and check if it is in the expected range of dates.
select name,item,qty,purch_date
from (
select t.*,
first_value(purch_date) over(partition by name order by purch_date) as first_purchase_date
from tablename t
) x
where first_purchase_date>='2016-01-01' and first_purchase_date<='2016-12-31'
--add a filter condition for purch_date if needed
I would be inclined to use min() as a window function:
select t.*
from (select t.*, min(purch_date) over (partition by name) as min_purch_date
from t
) t
where min_purchase_date >= '2016-01-01' and min_purchase_date < '2017-01-01';
If you just want new customers in 2016, but don't need the details:
select t.name
from t
group by t.name
having min(purchase_date) >= '2016-01-01' and min(purchase_date) < '2017-01-01';