SQL Server: how to do 3 months' partition? - sql

I want to count the number of rows in the partition 0-3months. Months are specified by MYMONTH in the format such that 201601 for 2016 January. I am using SQL Server 2014. How can I do the partition over 3 months?
SELECT COUNT(*),
COUNT(*)
/
(COUNT(*) OVER (PARTITION
BY MYMONTH RANGE BETWEEN 3 MONTH PRECEDING AND CURRENT MONTH))
FROM myData
Sample
| Month | Value | ID |
-------------------------|
| 201601 | 1 | X |
| 201601 | 1 | Y |
| 201601 | 1 | Y |
| 201602 | 1 | Z |
| 201603 | 1 | A |
| 201604 | 1 | B |
| 201605 | 1 | C |
| 201607 | 1 | E |
| 201607 | 10 | EE |
| 201607 | 100 | EEE|
Counts
| Month | Count | Count3M | Count/Count3M |
-------------------------------------------
| 201601| 3 | 3 | 3/3 |
| 201602| 1 | 4 | 1/4 |
| 201603| 1 | 5 | 1/5 |
| 201604| 1 | 6 | 1/6 |
| 201605| 1 | 4 | 1/4 |
| 201607| 3 | 5 | 3/5 |

You can try this (MSSQL 2012):
Sample data
CREATE TABLE mytable(
MONT INTEGER NOT NULL
,Value INTEGER NOT NULL
,ID VARCHAR(5) NOT NULL
);
INSERT INTO mytable(MONT,Value,ID) VALUES (201601,1,'X');
INSERT INTO mytable(MONT,Value,ID) VALUES (201601,1,'Y');
INSERT INTO mytable(MONT,Value,ID) VALUES (201601,1,'Y');
INSERT INTO mytable(MONT,Value,ID) VALUES (201602,1,'Z');
INSERT INTO mytable(MONT,Value,ID) VALUES (201603,1,'A');
INSERT INTO mytable(MONT,Value,ID) VALUES (201604,1,'B');
INSERT INTO mytable(MONT,Value,ID) VALUES (201605,1,'C');
INSERT INTO mytable(MONT,Value,ID) VALUES (201607,1,'E');
INSERT INTO mytable(MONT,Value,ID) VALUES (201607,10,'EE');
INSERT INTO mytable(MONT,Value,ID) VALUES (201607,100,'EEE');
Query 1
SELECT MONT, RC, RC+ LAG(RC,3,0) OVER ( ORDER BY MONT)+ LAG(RC,2,0) OVER ( ORDER BY MONT) + LAG(RC,1,0) OVER ( ORDER BY MONT) AS RC_3M_PREC -- + COALESCE( LEAD(RC) OVER ( ORDER BY MONT),0) AS RC_3M
FROM (SELECT MONT
, COUNT(*) RC
FROM mytable
GROUP BY MONT
) A
Output:
MONT RC RC_3M_PREC
----------- ----------- -----------
201601 3 3
201602 1 4
201603 1 5
201604 1 6
201605 1 4
201607 3 6
Or using what you proposed (option ROWS ... PRECEDING):
Query 2:
SELECT MONT, RC
, COALESCE(SUM(RC) OVER (ORDER BY MONT ROWS BETWEEN 3 PRECEDING AND CURRENT ROW),0) AS RC_3M
FROM (SELECT MONT
, COUNT(*) RC
FROM mytable
GROUP BY MONT
) A
Output:
MONT RC RC_3M
----------- ----------- -----------
201601 3 3
201602 1 4
201603 1 5
201604 1 6
201605 1 4
201607 3 6

If you want to count rows in the previous three months, just use conditional aggregation. You do need a way to enumerate the months:
SELECT COUNT(*),
SUM(CASE WHEN yyyymm_counter <= 3 THEN 1 ELSE 0 END)
FROM (SELECT md.*,
DENSE_RANK() OVER (ORDER BY MYMONTH DESC) as yyyymm_counter
FROM myData md
) md;
Another way without the subquery converts the month value to an actual date. Let me assume that it is a string:
SELECT COUNT(*),
SUM( CASE WHEN DATEDIFF(month, CAST(MYMONTH + '01' as DATE), GETDATE()) <= 3
THEN 1 ELSE 0
END)
FROM MyData;
I've left the / out of the answer. You need to be aware that SQL Server does integer division, so you may not get the results you want -- unless you convert values to non-integer number (I would suggest multiplying by 1.0 or using 1.0 instead of 1 in the queries).

Related

SQL Query to apply a command to multiple rows

I am new to SQL and trying to write a statement similar to a 'for loop' in other languages and am stuck. I want to filter out rows of the table where for all of attribute 1, attribute2=attribute3 without using functions.
For example:
| Year | Month | Day|
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 4 | 4 |
| 2 | 3 | 4 |
| 2 | 3 | 3 |
| 2 | 4 | 4 |
| 3 | 4 | 4 |
| 3 | 4 | 4 |
| 3 | 4 | 4 |
I would only want the row
| Year | Month | Day|
|:---- |:------:| -----:|
| 3 | 4 | 4 |
because it is the only where month and day are equal for all of the values of year they share.
So far I have
select year, month, day from dates
where month=day
but unsure how to apply the constraint for all of year
-- month/day need to appear in aggregate functions (since they are not in the GROUP BY clause),
-- but the HAVING clause ensure we only have 1 month/day value (per year) here, so MIN/AVG/SUM/... would all work too
SELECT year, MAX(month), MAX(day)
FROM my_table
GROUP BY year
HAVING COUNT(DISTINCT (month, day)) = 1;
year
max
max
3
4
4
View on DB Fiddle
So one way would be
select distinct [year], [month], [day]
from [Table] t
where [month]=[day]
and not exists (
select * from [Table] x
where t.[year]=x.[year] and t.[month] <> x.[month] and t.[day] <> x.[day]
)
And another way would be
select distinct [year], [month], [day] from (
select *,
Lead([month],1) over(partition by [year] order by [month])m2,
Lead([day],1) over(partition by [year] order by [day])d2
from [table]
)x
where [month]=m2 and [day]=d2

Identify two rows with 1 year or more of difference

I have a table called finance that I store all payment of the customer. The main columns are: ID,COSTUMERID,DATEPAID,AMOUNTPAID.
What I need is a list of dates by COSTUMERID with dates of its first payment and any other payment that is grater than 1 year of the last one. Example:
+----+------------+------------+------------+
| ID | COSTUMERID | DATEPAID | AMOUNTPAID |
+----+------------+------------+------------+
| 1 | 1 | 2015-01-10 | 10 |
| 2 | 1 | 2016-01-05 | 30 |
| 2 | 1 | 2017-02-20 | 30 |
| 3 | 2 | 2016-03-15 | 100 |
| 4 | 2 | 2017-02-15 | 100 |
| 5 | 3 | 2017-05-01 | 25 |
+----+------------+------------+------------+
What I expect as result:
+------------+------------+
| COSTUMERID | DATEPAID |
+------------+------------+
| 1 | 2015-01-01 |
| 1 | 2017-02-20 |
| 2 | 2016-03-15 |
| 3 | 2017-05-01 |
+------------+------------+
Costumer 1 have 2 dates: the first one + one more that have more then 1 year after the last one.
I hope I make my self clear.
I think you just want lag():
select t.*
from (select t.*,
lag(datepaid) over (partition by customerid order by datepaid) as prev_datepaid
from t
) t
where prev_datepaid is null or
datepaid > dateadd(year, 1, prev_datepaid);
Gordon's solution is correct, as long as you are only looking at the previous row (previous payment) diff, but I wonder if Antonio is looking for payments greater than one year from the last 1 year payment, in which case this becomes a more complex problem to solve. Take the following example:
CREATE TABLE #Test (
CustomerID smallint
,DatePaid date
,AmountPaid smallint )
INSERT INTO #Test
SELECT 1, '2015-1-10', 10
INSERT INTO #Test
SELECT 1, '2016-1-05', 30
INSERT INTO #Test
SELECT 1, '2017-2-20', 30
INSERT INTO #Test
SELECT 1, '2017-6-30', 50
INSERT INTO #Test
SELECT 1, '2018-3-5', 50
INSERT INTO #Test
SELECT 1, '2018-5-15', 50
INSERT INTO #Test
SELECT 2, '2016-3-15', 100
INSERT INTO #Test
SELECT 2, '2017-6-15', 100
WITH CTE AS (
SELECT
CustomerID
,DatePaid
,LAG(DatePaid) OVER (PARTITION BY CustomerID ORDER BY DatePaid) AS PreviousPaidDate
,AmountPaid
FROM #Test )
SELECT
*
,-DATEDIFF(DAY, DatePaid, PreviousPaidDate) AS DayDiff
,CASE WHEN DATEDIFF(DAY, PreviousPaidDate, DatePaid) >= 365 THEN 1 ELSE 0 END AS Paid
FROM CTE
Row number 5 is > 1 year from the last 1 year payment, but subtracting from previous row doesn't address this. This may or may not matter but I wanted to point it out in case that is what he means.

Grouping by multiple ranges in SQL Server

I tried to search for a solution, but with no success.
How can I group my table from looking like this:
from | to | zone
1 | 1 | 1
1 | 2 | 1
1 | 3 | 1
1 | 4 | 2
1 | 5 | 2
1 | 6 | 2
1 | 7 | 1
1 | 8 | 1
1 | 9 | 1
1 | 10 | 9
2 | 1 | 7
2 | 2 | 7
2 | 3 | 7
2 | 4 | 2
2 | 5 | 2
2 | 6 | 2
2 | 7 | 7
2 | 8 | 7
2 | 9 | 7
To look like this :
from | to | zone
1 | 1-3 | 1
1 | 4-6 | 2
1 | 7-9 | 1
1 | 10 | 9
2 | 1-3 | 7
2 | 4-6 | 2
2 | 7-9 | 7
Thank you for your help
One approach here is to use the difference of row numbers method, using to to column as one row number, and a row number over a partition using from and zone as the other row number. It is a bit difficult to explain why this works in so many words. It might be best to view the demo link below to explore the query.
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [from], zone ORDER BY [to]) rn
FROM yourTable
)
SELECT
t.[from],
CONVERT(varchar(10), MIN(t.[to])) + '-' + CONVERT(varchar(10), MAX([to])) AS [to],
t.zone
FROM cte t
GROUP BY
t.[from],
t.zone,
t.[to] - t.rn
ORDER BY
t.[from],
MIN(t.[to]);
Demo here:
Rextester
This is generally called as Gaps and Islands problem. If you are using SQL Server 2012+ then
;WITH cte
AS (SELECT *,
Sum(CASE WHEN zone = prev_zone THEN 0 ELSE 1 END)OVER(partition BY [from] ORDER BY [to]) AS grp
FROM (SELECT *,
Lag(zone)OVER(partition BY [from] ORDER BY [to]) AS prev_zone
FROM yourtable ) cs ([from], [to], zone)) a)
SELECT [from],
[to] = Concat(Min([to]), '-', Max([to])),
zone = Min(zone)
FROM cte
GROUP BY [from],grp
;with mycte
AS
(
select
,[from]
,min([to]) minto
,max([to]) maxto
,[zone]
from
mytable
group by
[from]
,[zone]
)
[from] AS [from]
,concat(minto, '-', maxto) AS [to]
,[zone] AS [zone]
from
mycte

Update a column and refer back it in the same query

I have a table in SQL Server 2014 and need to recursively update a column based on its previous value. For e.g.
---------------------------------------
ID | price | diff_with_prev_price |
---------------------------------------
1 | 29 | 0 |
2 | 25 | 0 |
3 | 20 | 0 |
4 | 35 | 0 |
5 | 40 | 0 |
--------------------------------------|
I want to recursively update third column like below
---------------------------------------
ID | price | diff_with_prev_price |
---------------------------------------
1 | 29 | 0 |
2 | 25 | 25 |
3 | 20 | 5 |
4 | 35 | -30 |
5 | 40 | 10 |
--------------------------------------|
It is the summation of previous value of third column with next value of 'price'.
Can someone please give some hint to do this either using CTE or LEAD/LAG, but without using cursors. I have to update million rows.
You can try this:
SELECT 1 AS ID , 29 AS price, 0 AS diff_with_prev_prive
INTO #tmp
UNION SELECT 2 AS ID , 25 AS price, 0 AS diff_with_prev_prive
UNION SELECT 3 AS ID , 20 AS price, 0 AS diff_with_prev_prive
UNION SELECT 4 AS ID , 35 AS price, 0 AS diff_with_prev_prive
UNION SELECT 5 AS ID , 40 AS price, 0 AS diff_with_prev_prive
WITH cte AS
(
SELECT
ID
, price
, diff_with_prev_prive
, price - ISNULL(LAG(price) OVER (ORDER BY ID),0) AS new_value
FROM #tmp
)
UPDATE t
SET diff_with_prev_prive = t.new_value
FROM cte t
SELECT * FROM #tmp

Get list of counts by date

I have two tables. One containing a list of applications. And another one containing counts associated to them every week. Now I want to get as a result the app name and the count for this week and the previous. Let me explain this.
app:
+----+-------------+
| id | name |
+----+-------------+
| 1 | Office 2007 |
+----+-------------+
| 2 | Office 2010 |
+----+-------------+
| 3 | Office 2013 |
+----+-------------+
count:
+----+--------+-------+------------+
| id | app_id | count | date |
+----+--------+-------+------------+
| 1 | 1 | 200 | 2016-01-11 |
+----+--------+-------+------------+
| 2 | 2 | 500 | 2016-01-11 |
+----+--------+-------+------------+
| 3 | 3 | 750 | 2016-01-11 |
+----+--------+-------+------------+
| 4 | 1 | 180 | 2016-01-18 |
+----+--------+-------+------------+
| 5 | 2 | 378 | 2016-01-18 |
+----+--------+-------+------------+
| 6 | 3 | 1000 | 2016-01-18 |
+----+--------+-------+------------+
And this is the result I need. I need all the applications with the count of this week and the previous:
+-------------+-----------------+-----------------+
| app | count_this_week | count_prev_week |
+-------------+-----------------+-----------------+
| Office 2007 | 180 | 200 |
+-------------+-----------------+-----------------+
| Office 2010 | 378 | 500 |
+-------------+-----------------+-----------------+
| Office 2013 | 1000 | 750 |
+-------------+-----------------+-----------------+
A script runs every week which fills the count table. And now I need to get a report also on a weekly basis.
Honestly I'm a bit lost as I don't know how to declare the conditions for the columns.
You can try to group first by DATEPART(WEEK,C.date),name and then split the counts into 2 columns using another GROUP BY. Something like this
EDIT
If there are exactly 1 record per week per app, you can do with just one group by like this.
SELECT
appname,
SUM(CASE WHEN weekno = 0 THEN sumcount ELSE 0 END) as thisweek,
SUM(CASE WHEN weekno = 1 THEN sumcount ELSE 0 END) as lastweek
FROM
(
SELECT
DATEPART(WEEK,CURRENT_TIMESTAMP) - DATEPART(WEEK,C.date) as weekno,
name as appname,
count as sumcount
FROM App A
INNER JOIN CountTable C ON A.[id] = C.[app_id]
WHERE DATEPART(WEEK,C.date) BETWEEN DATEPART(WEEK,CURRENT_TIMESTAMP) - 1 AND DATEPART(WEEK,CURRENT_TIMESTAMP)
)T
GROUP BY appname
Query
SELECT
appname,
SUM(CASE WHEN weekno = 0 THEN sumcount ELSE 0 END) as thisweek,
SUM(CASE WHEN weekno = 1 THEN sumcount ELSE 0 END) as lastweek
FROM
(
SELECT
DATEPART(WEEK,CURRENT_TIMESTAMP) - DATEPART(WEEK,C.date) as weekno,
name as appname,
SUM(count) as sumcount
FROM App A INNER JOIN CountTable C ON A.[id] = C.[app_id]
WHERE DATEPART(WEEK,C.date) BETWEEN DATEPART(WEEK,CURRENT_TIMESTAMP) - 1 AND DATEPART(WEEK,CURRENT_TIMESTAMP)
GROUP BY DATEPART(WEEK,C.date),name
) AS T
GROUP BY appname
SQL Fiddle
Output
| appname | thisweek | lastweek |
|-------------|----------|----------|
| Office 2007 | 180 | 200 |
| Office 2010 | 378 | 500 |
| Office 2013 | 1000 | 750 |
You can use this generic query with a variable for the current week day:
DECLARE #week date = '2016-01-18';
WITH data AS (
SELECT a.name, c.[count]
, w = CASE WHEN c.[date] = #week THEN 0 ELSE 1 END
FROM #Counts c
INNER JOIN #Apps a ON c.app_id = a.id
WHERE [date] = #week OR [date] = DATEADD(day, -7, #week)
)
SELECT App = name, count_this_week = [0], count_prev_week = [1]
FROM data d
PIVOT (
MAX([count])
FOR w IN ([0], [1])
) p
Output:
App count_this_week count_prev_week
Office 2007 180 200
Office 2010 378 500
Office 2013 1000 750
Your data:
DECLARE #Apps TABLE ([id] int, [name] varchar(11));
DECLARE #Counts TABLE([id] int, [app_id] int, [count] int, [date] date);
INSERT INTO #Apps([id], [name])
VALUES
(1, 'Office 2007'),
(2, 'Office 2010'),
(3, 'Office 2013')
;
INSERT INTO #Counts([id], [app_id], [count], [date])
VALUES
(1, 1, 200, '2016-01-11'),
(2, 2, 500, '2016-01-11'),
(3, 3, 750, '2016-01-11'),
(4, 1, 180, '2016-01-18'),
(5, 2, 378, '2016-01-18'),
(6, 3, 1000, '2016-01-18')
;
SELECT *
FROM count
JOIN app ON app.id=count.app_id
WHERE date BETWEEN '2016-01-18' AND '2016-01-11'