I have a SQL table of Customer_ID, showing Payments by Year. The first (of many) customer appears like this:
ID Payment Year
112 0 2004
112 0 2005
112 0 2006
112 9592 2007
112 12332 2008
112 9234 2011
112 5400 2012
112 7392 2014
112 8321 2015
Note that some years are missing. I need to create 10 new columns, showing the Payments in the previous 10 years, for each row. The resulting table should look like this:
ID Payment Year T-1 T-2 T-3 T-4 T-5 T-6 T-7 T-8 T-9 T-10
112 0 2004 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
112 0 2005 0 NULL NULL NULL NULL NULL NULL NULL NULL NULL
112 0 2006 0 0 NULL NULL NULL NULL NULL NULL NULL NULL
112 952 2007 0 0 0 NULL NULL NULL NULL NULL NULL NULL
112 1232 2008 952 0 0 0 NULL NULL NULL NULL NULL NULL
112 924 2011 NULL NULL 1232 952 0 0 0 NULL NULL NULL
112 500 2012 924 NULL NULL 1232 952 0 0 0 NULL NULL
112 392 2014 NULL 500 924 NULL NULL 1232 952 0 0 0
112 821 2015 392 NULL 500 924 NULL NULL 1232 952 0 0
I am well aware that this is a large duplication of data, and so seems like a strange thing to do. However, I would still like to do it! (the data is being prepared for a predictive model, in which previous payments (and other info) will be used to predict the current year's payment)
I'm not really sure where to start with this. I have been looking at using pivot, but can't figure out how to get it to select values from a customer's previous year.
I would very much like to do this in SQL. If that is not possible I may be able to copy the table into R - but SQL is my preference.
Any help much appreciated.
You could use lag() if you had full data:
select t.*,
lag(payment, 1) over (partition by id order by year) as t_1,
lag(payment, 2) over (partition by id order by year) as t_2,
. . .
from t;
However, for your situation with missing intermediate years, left join may be simpler:
select t.*,
t1.payment as t_1,
t2.payment as t_2,
. . .
from t left join
t t1
on t1.id = t.id and
t1.year = t.year - 1 left join
t t2
on t1.id = t.id and
t1.year = t.year - 2 left join
. . .;
I thnk your friend will be LAG
Here's an implementation:
Declare #t table (
ID int,
Payment int,
Yr int
)
Insert Into #t Values(112,0,2004)
Insert Into #t Values(112,0,2005)
Insert Into #t Values(112,0,2006)
Insert Into #t Values(112,9592,2007)
Insert Into #t Values(112,12332,2008)
Insert Into #t Values(112,9234,2011)
Insert Into #t Values(112,5400,2012)
Insert Into #t Values(112,7392,2014)
Insert Into #t Values(112,8321,2015)
Insert Into #t Values(113,0,2009)
Insert Into #t Values(113,9234,2011)
Insert Into #t Values(113,5400,2013)
Insert Into #t Values(113,8321,2015)
;with E1(n) as (Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1 Union All Select 1)
,E2(n) as (Select 1 From E1 a, E1 b)
,E4(n) as (Select 1 From E2 a, E2 b)
,E5(n) as (Select row_number() over(order by isnull(null,1)) From E4 a, E1 b)
,IDYears as (
Select z.ID, Yr = y.n
From (
Select
Id,
MinYear = min(Yr),
MaxYear = max(Yr)
From #t a
Group By Id
) z
Inner Join E5 y On y.n between z.MinYear and z.MaxYear
)
Select
*,
[t-1] = Lag(B.Payment, 1) Over(Partition By a.ID Order By a.Yr),
[t-2] = Lag(B.Payment, 2) Over(Partition By a.ID Order By a.Yr),
[t-3] = Lag(B.Payment, 3) Over(Partition By a.ID Order By a.Yr),
[t-4] = Lag(B.Payment, 4) Over(Partition By a.ID Order By a.Yr),
[t-5] = Lag(B.Payment, 5) Over(Partition By a.ID Order By a.Yr),
[t-6] = Lag(B.Payment, 6) Over(Partition By a.ID Order By a.Yr),
[t-7] = Lag(B.Payment, 7) Over(Partition By a.ID Order By a.Yr),
[t-8] = Lag(B.Payment, 8) Over(Partition By a.ID Order By a.Yr),
[t-9] = Lag(B.Payment, 9) Over(Partition By a.ID Order By a.Yr),
[t-10] = Lag(B.Payment, 10) Over(Partition By a.ID Order By a.Yr)
From IDYears a
Left Join #t b On a.ID = b.ID and a.Yr = b.Yr
Order By A.ID
Related
I have two tables tab_a as
SUB_ID AMOUNT
1 10
2 5
3 7
4 15
5 4
2 table tab_b as
slab_number slab_start slab_end
1 12 20
2 21 25
3 26 35
slab_start will always be 1 more than slab_end of previous slab number
If I run the running total for tab_a my result is
select sub_id , sum(amount) OVER(ORDER BY sub_id) run_sum
from tab_a
sub_id run_sum
1 10
2 15
3 22
4 37
5 41
I need to SQL query to check which slab_NUMBER if run_sum is less than first slab_number from then it should be Zero , if run_sum is more than last slab number then blank except the row which crosses the limit .
Expected result is
sub_id run_sum slab_number
1 10 0
2 15 1
3 22 2
4 37 3
5 41 NULL
I have tried this .
First find the running sum which crosses the limit i. e last slab_end
select min( run_sum )
from (select sub_id , sum(amount) OVER(ORDER BY sub_id) run_sum
from tab_a ) where run_sum>=35
then use below query
select sub_id,
run_sum,
case
when run_sum <
(select SLAB_START from tab_b where slab_number = '1') then
0
when run_sum = 37 then
(select max(slab_number) from tab_b)
when run_sum > 37 then
NULL
else
(select slab_number
from tab_b
where run_sum between SLAB_START and slab_end)
end slab_number
from (select sub_id, sum(amount) OVER(ORDER BY sub_id) run_sum from tab_a)
is there any other way to improve.
Somewhat strange requirement :) Use some analytic functions and case when's. Row_number when you need to find something first, max() over() and sum() over() when you need information from over rows:
with
a as (
select sub_id, row_number() over (order by sub_id) rn,
sum(amount) over (order by sub_id) rs
from tab_a),
b as (select tab_b.*, max(slab_number) over () msn from tab_b )
select sub_id, rs,
case when sn is null and row_number() over (partition by sn order by sub_id) = 1
then msn else sn
end sn
from (
select sub_id, rs, max(msn) over () msn,
case when slab_number is null and rn = 1 then 0 else slab_number end sn
from a left join b on rs between slab_start and slab_end)
dbfiddle demo
you could try this:
select a.sub_id , sum(a.amount) OVER(ORDER BY a.sub_id) run_sum
,case when b.slab_number=1 then 0 else lag(b.slab_number,1) over (order by a.sub_id)end slab_number
from tab_a a
left join tab_b b on a.SUB_ID = b.slab_number
I think this is basically a left join with a default value:
select a.*,
(case when a.run_sum < bb.min_slab_num then 0
else b.slab_num
end) as slab_num
from (select sub_id,
sum(amount) over (order by sub_id) as run_sum
from tab_a
) a left join
tab_b b
on a.run_sum between slab_start and slab_end cross join
(select min(slab_start) as min_slab_start
from tab_b
) bb;
I'm struggling to find if this is possible to use SQL Server 2008 to assign a sequence without having to use cursors. Let's say I have the following table which defines a driver's driving route going from one location to another (null means he is going from home):
RouteID SourceLocationID DestinationLocationID DriverID Created Updated
------- ---------------- --------------------- -------- ------- -------
1 NULL 219 1 10:20 10:23
2 219 266 1 10:21 10:24
3 266 NULL 1 10:22 10:25
4 NULL 54 2 10:23 10:26
5 54 NULL 2 10:24 10:27
6 NULL 300 1 10:25 10:28
7 300 NULL 1 10:26 10:29
I want to group the records between the rows where sourceLID is NULL and the destinationLID is null, so I get the following (generating a sequence number for each grouping set):
DriverID DestinationLocationID TripNumber
-------- --------------------- ----------
1 219 1 (his first trip)
1 266 1
1 300 2 (his second trip)
2 54 1
Is there a way I could use GROUP BY here rather than cursors?
a quick try:
with cte as
( select DestinationLocationID
, DriverID
, tripid = row_number()
over ( partition by driverid
order by DestinationLocationID)
from table1
where sourcelocationid is NULL
UNION ALL
select table1.DestinationLocationID
, table1.DriverID
, cte.tripid
from table1
join cte on table1.SourceLocationID=cte.DestinationLocationID
and table1.DriverID=cte.DriverID
where cte.DestinationLocationID is not null
)
select * from cte
Try this:
select driverid, destinationlocationid, count(destinationlocationid) from
(
select driverid, destinationlocationid from table1 where sourcelocationid is NULL
union all
select driverid, sourcelocationid from table1 where destinationlocationid is NULL
)A group by driverid, destinationlocationid
Try this,
Declare #t table(RouteID int, SourceLocationID int,DestinationLocationID int
,DriverID int,Created time, Updated time)
insert into #t
values(1, NULL, 219, 1, '10:20','10:23'),
(2 ,219,266, 1, '10:21','10:24'),
(3,266, NULL, 1, '10:22','10:25'),
(4, NULL, 54, 2, '10:23','10:26'),
(5,54, NULL, 2, '10:24','10:27'),
(6,NULL,300, 1, '10:25','10:28'),
(7,300,NULL, 1, '10:26','10:29')
;
WITH CTE
AS (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY DriverID ORDER BY Created
) RN
FROM #t
)
,CTE1
AS (
SELECT *
,1 TripNumber
FROM CTE
WHERE RN = 1
UNION ALL
SELECT A.*
,CASE
WHEN A.SourceLocationID IS NULL
THEN B.TripNumber + 1
ELSE B.TripNumber
END
FROM CTE1 B
INNER JOIN CTE A ON B.DriverID = A.DriverID
WHERE A.RN > B.RN
)
SELECT DISTINCT DestinationLocationID
,DriverID
,TripNumber
FROM CTE1
WHERE DestinationLocationID IS NOT NULL
ORDER BY DriverID
Use a correlated sub-query to count previous trips, plus 1 to get this trip number.
select DriverID,
DestinationLocationID,
(select count(*) + 1
from routes t2
where t1.DriverID = t2.DriverID
and t1.RouteID > t2.RouteID
and DestinationLocationID IS NULL) as TripNumber
from routes t1
where DestinationLocationID IS NOT NULL
order by DriverID, DestinationLocationID;
Executes like this:
SQL>select DriverID,
SQL& DestinationLocationID,
SQL& (select count(*) + 1
SQL& from routes t2
SQL& where t1.DriverID = t2.DriverID
SQL& and t1.RouteID > t2.RouteID
SQL& and DestinationLocationID IS NULL) as TripNumber
SQL&from routes t1
SQL&where DestinationLocationID IS NOT NULL
SQL&order by DriverID, DestinationLocationID;
DriverID DestinationLocationID TripNumber
=========== ===================== ============
1 219 1
1 266 1
1 300 2
2 54 1
4 rows found
I've got code like this:
SELECT id, YEAR(datek) AS YEAR, COUNT(*) AS NUM
FROM Orders
GROUP BY GROUPING SETS
(
(id, YEAR(datek)),
id,
YEAR(datek),
()
);
It gives me this output:
1 NULL 4
2 NULL 11
3 NULL 6
NULL NULL 21
1 2006 36
2 2006 56
3 2006 51
NULL 2006 143
1 2007 130
2 2007 143
3 2007 125
NULL 2007 398
1 2008 79
2 2008 116
3 2008 73
NULL 2008 268
NULL NULL 830
1 NULL 249
2 NULL 326
3 NULL 255
What I need to do is write it without "grouping sets" (nor cube or rollup) but with the same result. I thought about writing three different queries and join them with "union". I try something like "null" in group by settings but it does not work.
SELECT id, YEAR(datek) AS rok, COUNT(*) AS NUM
FROM Orders
GROUP BY id, YEAR(datek)
UNION
SELECT id, YEAR(datek) AS rok, COUNT(*) AS NUM
FROM Orders
GROUP BY id, null
order by id, YEAR(datek)
I also have a question about "PIVOT". What kind of syntax can replace query with "PIVOT"?
Thanks for your time and all the answers!
You are right in that you need separate queries, although you actually need 4, and rather than GROUP BY NULL, just group by the columns in the corresponding grouping set, and replace the column in the SELECT with NULL:
SELECT id, YEAR(datek) AS rok, COUNT(*) AS NUM
FROM Orders
GROUP BY id, YEAR(datek)
UNION ALL
SELECT id, NULL, COUNT(*) AS NUM
FROM Orders
GROUP BY id
UNION ALL
SELECT NULL, YEAR(datek), COUNT(*) AS NUM
FROM Orders
GROUP BY YEAR(datek)
UNION ALL
SELECT NULL, NULL, COUNT(*) AS NUM
FROM Orders
ORDER BY ID, Rok
With regard to a replacement for PIVOT I think the best alternative is to use a conditional aggregate, e.g. instead of:
SELECT pvt.SomeGroup,
pvt.[A],
pvt.[B],
pvt.[C]
FROM T
PIVOT (SUM(Val) FOR Col IN ([A], [B], [C])) AS pvt;
You would use:
SELECT T.SomeGroup,
[A] = SUM(CASE WHEN T.Col = 'A' THEN T.Val ELSE 0 END),
[B] = SUM(CASE WHEN T.Col = 'B' THEN T.Val ELSE 0 END),
[C] = SUM(CASE WHEN T.Col = 'C' THEN T.Val ELSE 0 END)
FROM T
GROUP BY T.SomeGroup;
This is what I have
ID Name DateTime Value Group
1 Mark 1/1/2010 0 1
2 Mark 1/2/2010 1 1
3 Mark 1/3/2010 0 1
4 Mark 1/4/2010 0 2
40 Mark 1/5/2010 1 2
5 Mark 1/9/2010 1 2
6 Mark 1/6/2010 1 2
7 Kelly 1/1/2010 0 3
8 Kelly 1/2/2010 1 3
9 Kelly 1/3/2010 1 3
10 Nancy 1/4/2010 0 4
11 Nancy 1/5/2010 0 4
12 Nancy 1/6/2010 1 5
13 Nancy 1/7/2010 0 5
What I want is to get the rows per "name" per "group" with minimum datetime after the value becomes 1. From the above example, I would need to get
3 Mark 1/3/2010 0 1
6 Mark 1/6/2010 1 2
9 Kelly 1/3/2010 1 3
13 Nancy 1/7/2010 0 5
Based on the description of your rules, I believe the output will actually be a bit different since 2010-01-05 was the first DateTime where the Value = 1 for Group 2 for Mark.
ID Name DateTime Value Group
3 Mark 2010-01-03 0 1
6 Mark 2010-01-06 1 2
9 Kelly 2010-01-03 1 3
13 Nancy 2010-01-07 0 5
The below code will work as demonstrated in this SQLFiddle.
SELECT sub.ID
, sub.Name
, sub.[DateTime]
, sub.Value
, sub.[Group]
FROM
(SELECT t.ID
, t.Name
, t.[DateTime]
, t.Value
, t.[Group]
, SequentialOrder = ROW_NUMBER() OVER
(PARTITION BY t.Name, t.[Group]
ORDER BY t.[DateTime])
FROM Test t
JOIN
(SELECT Name
, [Group]
, MinimumDateTime = MIN([DateTime])
FROM Test
WHERE Value = 1
GROUP BY Name
, [Group]) mint
ON t.Name = mint.Name
AND t.[Group] = mint.[Group]
WHERE t.[DateTime] > mint.MinimumDateTime) sub
WHERE sub.SequentialOrder = 1
ORDER BY ID;
Below is my query and it goes on assumption that records are received in order of their dates
WITH TBL_1 AS
(
SELECT A.*, ROW_NUMBER() OVER(PARTITION BY NAME, GROUP ORDER BY DATE) AS RN
FROM TABLE
WHERE (NAME, GROUP) IN
(SELECT NAME, GROUP FROM TABLE WHERE VALUE = 1)
),
TBL_2 AS
(
SELECT * FROM TBL_1 WHERE VALUE = 1
),
TBL_3 AS
(
SELECT A.*
FROM TBL_1 AS A
INNER JOIN TBL_2 AS B
ON B.NAME = A.NAME
AND B.GROUP = A.GROUP
AND A.RN > B.RN
)
SELECT *
FROM TBL_3
WHERE (NAME, GROUP, DATE) IN
(SELECT NAME, GROUP, MIN(DATE) FROM TBL_3 GROUP BY NAME, GROUP)
In SQL Server 2012 you can do this:
SELECT * FROM (
SELECT DISTINCT
ID,
Name,
DateTime,
Value,
Gr,
LAG(ID) OVER (PARTITION BY Name, Gr ORDER BY DateTime) F
FROM (
SELECT
ID,
Name,
DateTime,
Value,
Gr,
CASE WHEN LAG(Value) OVER (PARTITION BY Name, Gr ORDER BY DateTime) = 1 THEN 1 ELSE 0 END F
FROM
T
) TT
WHERE F = 1
) TT WHERE F IS NULL
ORDER BY Gr, Name, DateTime
Fiddle: http://www.sqlfiddle.com/#!6/5a0fa2/19
using window functions:
with cte as (
select
*,
row_number() over(partition by [Group], Name order by [DateTime]) as rn,
dense_rank() over(order by [Group], Name) as rnk
from Table1
)
select c1.*
from cte as c1
inner join cte as c2 on c2.rn = c1.rn - 1 and c2.rnk = c1.rnk and c2.Value = 1
where
not exists (select * from cte as c3 where c3.rn <= c1.rn - 2 and c3.rnk = c1.rnk and c3.Value = 1)
or apply:
select t1.*
from Table1 as t1
cross apply (
select top 1 t2.Value, t2.DateTime
from Table1 as t2
where
t2.[Group] = t1.[Group] and t2.Name = t1.Name and
t2.[DateTime] < t1.[DateTime]
order by t2.[Datetime] desc
) as t2
where
t2.Value = 1 and
not exists (
select *
from Table1 as t3
where
t3.[Group] = t1.[Group] and t3.Name = t1.Name and
t3.[DateTime] < t2.[DateTime] and t3.Value = 1
)
sql fiddle demo
update forgot to mention that your output seems to be incorrect - there should id = 6 instead of 5 in second row (see sql fiddle).
My table records is like below
ym cnt
200901 57
200902 62
200903 67
...
201001 84
201002 75
201003 75
...
201101 79
201102 77
201103 80
...
I want to computer the diff between current month and per month .
the result would like below ...
ym cnt diff
200901 57 57
200902 62 5 (62 - 57)
200903 67 5 (67 - 62)
...
201001 84 ...
201002 75
201003 75
...
201101 79
201102 77
201103 80
...
Can anyone told me how to wrote a sql to got the result and with a good performance ?
UPDATE:
sorry for simple words
my solution is
step1: input the currentmonth data into temp table1
step2: input the permonth data into temp table2
step3: left join 2 tables to compute the result
Temp_Table1
SELECT (ym - 1) as ym , COUNT( item_cnt ) as cnt
FROM _table
GROUP BY (ym - 1 )
order by ym
Temp_Table2
SELECT ym , COUNT( item_cnt ) as cnt
FROM _table
GROUP BY ym
order by ym
select ym , (b.cnt - a.cnt) as diff from Temp_Table2 a
left join Temp_Table1 b
on a.ym = b.ym
*If i want to compare the diff between the month in this year and last year
I can only change the ym - 1 to ym - 100*
but , actually , the group by key is not only ym
there is max 15 keys and max 100 millions records
so , I wonder a good solution can easy to manager the source
and good performance.
For MSSQL, this has one reference to the table, so potentially it can be faster (maybe not) than left join which has two references to the table:
-- ================
-- sample data
-- ================
declare #t table
(
ym varchar(6),
cnt int
)
insert into #t values ('200901', 57)
insert into #t values ('200902', 62)
insert into #t values ('200903', 67)
insert into #t values ('201001', 84)
insert into #t values ('201002', 75)
insert into #t values ('201003', 75)
-- ===========================
-- solution
-- ===========================
select
ym2,
diff = case when cnt1 is null then cnt2
when cnt2 is null then cnt1
else cnt2 - cnt1
end
from
(
select
ym1 = max(case when k = 2 then ym end),
cnt1 = max(case when k = 2 then cnt end),
ym2 = max(case when k = 1 then ym end),
cnt2 = max(case when k = 1 then cnt end)
from
(
select
*,
rn = row_number() over(order by ym)
from #t
) t1
cross join
(
select k = 1 union all select k = 2
) t2
group by rn + k
) t
where ym2 is not null
Can anyone told me how to wrote a sql to got the result
Absolutely. Simply get the row with the next highest date, and subtract.
and with a good performance ?
No. Relational databases are not really meant to be traversed linearly, and even using indexes appropriately would require a virtual linear traversal.