Linear Interpolation in SQL - sql

I work with crashes and mileage for the same year which is Year in table. Crashes are are there for every record, but annual mileage is not. NULLs for mileage could be at the beginning or at the end of the time period for certain customer. Also, couple of annual mileage records can be missing as well. I do not know how to overcome this. I try to do it in CASE statement but then I do not know how to code it properly. Issue needs to be resolved in SQL and use SQL Server.
This is how the output looks like and I need to have mileage for every single year for each customer.
The info I am pulling from is proprietary database and the records themselves should be untouched as is. I just need code in query which will modify my current output to output where I have mileage for every year. I appreciate any input!
Year
Customer
Crashes
Annual_Mileage
2009
123
5
3453453
2010
123
1
NULL
2011
123
0
54545
2012
123
14
376457435
2013
123
3
63453453
2014
123
4
NULL
2015
123
15
6346747
2016
123
0
NULL
2017
123
2
534534
2018
123
7
NULL
2019
123
11
NULL
2020
123
15
565435
2021
123
12
474567546
2022
123
7
NULL
Desired Results
Year
Customer
Crashes
Annual_Mileage
2009
123
5
3453453
2010
123
1
175399 (prior value is taken)
2011
123
0
54545
2012
123
14
376457435
2013
123
3
63453453
2014
123
4
34900100 (avg of 2 adjacent values)
2015
123
15
6346747
2016
123
0
3440641 (avg of 2 adjacent values)
2017
123
2
534534
2018
123
7
534534 ( prior value is taken)
2019
123
11
549985 (avg of 2 adjacent values)
2020
123
15
565435
2021
123
12
474567546
2022
123
7
474567546 (prior value is taken)
SELECT Year,
Customer,
Crashes,
CASE
WHEN Annual_Mlg IS NOT NULL THEN Annual_Mlg
WHEN Annual_Mlg IS NULL THEN
CASE
WHEN PREV.Annual_Mlg IS NOT NULL
AND NEXT.Annual_Mlg IS NOT NULL
THEN ( PREV.Annual_Mlg + NEXT.Annual_Mlg ) / 2
ELSE 0
END
END AS Annual_Mlg
FROM #table
The above code doesn't work, but I just need to start somehow and that what I have currently.
I understand what I need to do I just do not know how to code it in SQL.
After i applied row_number () function i got this output for first 2 clients and for the rest of the 4 clients row_number() function gave correct output. i have no idea why is that. I thought may be because i used "full join" before to combine mileage and crashes table?
enter image description here

Your use of #table tells me that you're using MS SQL Server (a temporary table, probably in a stored procedure).
You want to:
select all the rows in #table
joined with the matching row (if any) for the previous year, and
joined with the matching row (if any) for the next year
Then it's easy. Assuming the primary key on your #table is composed of the year and customer columns, something like this ought to do you:
select t.year ,
t.customer ,
t.crashes ,
annual_milage = coalesce(
t.annual_milage ,
( coalesce( p.annual_mileage, 0 ) +
coalesce( n.annual_mileage, 0 )
) / 2
)
from #table t -- take all the rows
left join #table p on p.year = t.year - 1 -- with the matching row for
and p.customer = t.customer -- the previous year (if any)
left join #table n on n.year = t.year + 1 -- and the matching row for
and n.customer = t.customer -- the next year (if any)
Notes:
What value you default to if the previous or next year doesn't exist is up to you (zero? some arbitrary value?)
Is the previous/next year guaranteed to be the current year +/- 1?
If not, you may have to use derived tables as the source for the
prev/next data, selecting the closest previous/next year (that sort
of thing rather complicates the query significantly).
Edited To Note:
If you have discontiguous years for each customer such that the "previous" and "next" years for a given customer are not necessarily the current year +/- 1, then something like this is probably the most straightforward way to find the previous/next year.
We use a derived table in our from clause, and assign a sequential number in lieu of year for each customer, using the ranking function row_number() function. This query, then
select row_nbr = row_number() over (
partition by x.customer
order by x.year
) ,
x.*
from #table x
would produce results along these lines:
row_nbr
customer
year
...
1
123
1992
...
2
123
1993
...
3
123
1995
...
4
123
2020
...
1
456
2001
...
2
456
2005
...
3
456
2020
...
And that leads us to this:
select year = t.year ,
customer = t.customer ,
crashes = t.crashes ,
annual_mileage = coalesce(
t.mileage,
coalesce(
t.annual_mileage,
(
coalesce(p.annual_mileage,0) +
coalesce(n.annual_mileage,0)
) / 2
),
)
from (
select row_nbr = row_number() over (
partition by x.customer
order by x.year
) ,
x.*
from #table x
) t
left join #table p on p.customer = t.customer and p.row_nbr = t.row_nbr-1
left join #table n on n.customer = t.customer and n.row_nbr = t.row_nbr+1

Related

Combining Two Tables & Summing REV amts by Mth

Below are my two tables of data
Acct BillingDate REV
101 01/05/2018 5
101 01/30/2018 4
102 01/15/2018 2
103 01/4/2018 3
103 02/05/2018 2
106 03/06/2018 5
Acct BillingDate Lease_Rev
101 01/15/2018 2
102 01/16/2018 1
103 01/19/2018 2
104 02/05/2018 3
105 04/02/2018 1
Desired Output
Acct Jan Feb Mar Apr
101 11
102 3
103 5 2
104 3
105 1
106 5
My SQL Script is Below:
SELECT [NewSalesHistory].[Region]
,[NewSalesHistory].[Account]
,SUM(case when [NewSalesHistory].[billingdate] between '6/1/2016' and '6/30/2016' then REV else 0 end ) + [X].[Jun-16] AS 'Jun-16'
FROM [NewSalesHistory]
FULL join (SELECT [Account]
,SUM(case when [BWLease].[billingdate] between '6/1/2016' and '6/30/2016' then Lease_REV else 0 end ) as 'Jun-16'
FROM [AirgasPricing].[dbo].[BWLease]
GROUP BY [Account]) X ON [NewSalesHistory].[Account] = [X].[Account]
GROUP BY [NewSalesHistory].[Region]
,[NewSalesHistory].[Account]
,[X].[Jun-16]
I am having trouble combining these tables. If there is a rev amt and lease rev amt then it will combine (sum) for that account. If there is not a lease rev amt (which is the majority of the time), it brings back NULLs for all other rev amts accounts in Table 1. Table one can have duplicate accounts with different Rev, while the Table two is one unique account only w Lease rev. The output above is how I would like to see the data.
What am I missing here? Thanks!
I would suggest union all and group by:
select acct,
sum(case when billingdate >= '2016-01-01' and billingdate < '2016-02-01' then rev end) as rev_201601,
sum(case when billingdate >= '2016-02-01' and billingdate < '2016-03-01' then rev end) as rev_201602,
. . .
from ((select nsh.acct, nsh.billingdate, nsh.rev
from NewSalesHistory
) union all
(select bl.acct, bl.billingdate, bl.rev
from AirgasPricing..BWLease bl
)
) x
group by acct;
Okay, so there are a few things going on here:
1) As Gordon Linoff mentioned you can perform a union all on the two tables. Be sure to limit your column selections and name your columns appropriately:
select
x as consistentname1,
y as consistentname2,
z as consistentname3
from [NewSalesHistory]
union all
select
a as consistentname1,
b as consistentname2,
c as consistentname3
from [BWLease]
2) Your desired result contains a pivoted month column. Generate a column with your desired granularity on the result of the union in step one. F.ex. months:
concat(datepart(yy, Date_),'-',datename(mm,Date_)) as yyyyM
Then perform aggregation using a group by:
select sum(...) as desiredcolumnname
...
group by PK1, PK2, yyyyM
Finally, PIVOT to obtain your result: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
3) If you have other fields/columns that you wish to present then you first need to determine whether they are measures (can be aggregated) or are dimensions. That may be best addressed in a follow up question after you've achieved what you set out for in this part.
Hope it helps
As an aside, it seems like you are preparing data for reporting. Performing these transformations can be facilitated using a GUI such as MS Power Query. As long as your end goal is not data manipulation in the DB itself, you do not need to resort to raw sql.

Aggregate payments per year per customer per type

Please consider the following payment data:
customerID paymentID pamentType paymentDate paymentAmount
---------------------------------------------------------------------
1 1 A 2015-11-28 500
1 2 A 2015-11-29 -150
1 3 B 2016-03-07 300
2 4 A 2015-03-03 200
2 5 B 2016-05-25 -100
2 6 C 2016-06-24 700
1 7 B 2015-09-22 110
2 8 B 2016-01-03 400
I need to tally per year, per customer, the sum of the diverse payment types (A = invoice, B = credit note, etc), as follows:
year customerID paymentType paymentSum
-----------------------------------------------
2015 1 A 350 : paymentID 1 + 2
2015 1 B 110 : paymentID 7
2015 1 C 0
2015 2 A 200 : paymentID 4
2015 2 B 0
2015 2 C 0
2016 1 A 0
2016 1 B 300 : paymentID 3
2016 1 C 0
2016 2 A 0
2016 2 B 300 : paymentID 5 + 8
2016 2 C 700 : paymentId 6
It is important that there are values for every category (so for 2015, customer 1 has 0 payment value for type C, but still it is good to see this).
In reality, there are over 10 payment types and about 30 customers. The total date range is 10 years.
Is this possible to do in only SQL, and if so could somebody show me how? If possible by using relatively easy queries so that I can learn from it, for instance by storing intermediary result into a #temptable.
Any help is greatly appreciated!
a simple GROUP BY with SUM() on the paymentAmount will gives you what you wanted
select year = datepart(year, paymentDate),
customerID,
paymentType,
paymentSum = sum(paymentAmount)
from payment_data
group by datepart(year, paymentDate), customerID, paymentType
This is a simple query that generates the required 0s. Note that it may not be the most efficient way to generate this result set. If you already have lookup tables for customers or payment types, it would be preferable to use those rather than the CTEs1 I use here:
declare #t table (customerID int,paymentID int,paymentType char(1),paymentDate date,
paymentAmount int)
insert into #t(customerID,paymentID,paymentType,paymentDate,paymentAmount) values
(1,1,'A','20151128', 500),
(1,2,'A','20151129',-150),
(1,3,'B','20160307', 300),
(2,4,'A','20150303', 200),
(2,5,'B','20160525',-100),
(2,6,'C','20160624', 700),
(1,7,'B','20150922', 110),
(2,8,'B','20160103', 400)
;With Customers as (
select DISTINCT customerID from #t
), PaymentTypes as (
select DISTINCT paymentType from #t
), Years as (
select DISTINCT DATEPART(year,paymentDate) as Yr from #t
), Matrix as (
select
customerID,
paymentType,
Yr
from
Customers
cross join
PaymentTypes
cross join
Years
)
select
m.customerID,
m.paymentType,
m.Yr,
COALESCE(SUM(paymentAmount),0) as Total
from
Matrix m
left join
#t t
on
m.customerID = t.customerID and
m.paymentType = t.paymentType and
m.Yr = DATEPART(year,t.paymentDate)
group by
m.customerID,
m.paymentType,
m.Yr
Result:
customerID paymentType Yr Total
----------- ----------- ----------- -----------
1 A 2015 350
1 A 2016 0
1 B 2015 110
1 B 2016 300
1 C 2015 0
1 C 2016 0
2 A 2015 200
2 A 2016 0
2 B 2015 0
2 B 2016 300
2 C 2015 0
2 C 2016 700
(We may also want to play games with a numbers table and/or generate actual start and end dates for years if the date processing above needs to be able to use an index)
Note also how similar the top of my script is to the sample data in your question - except it's actual code that generates the sample data. You may wish to consider presenting sample code in such a way in the future since it simplifies the process of actually being able to test scripts in answers.
1CTEs - Common Table Expressions. They may be thought of as conceptually similar to temp tables - except we don't actually (necessarily) materialize the results. They also are incorporated into the single query that follows them and the whole query is optimized as a whole.
Your suggestion to use temp tables means that you'd be breaking this into multiple separate queries that then necessarily force SQL to perform the task in an order that we have selected rather than letting the optimizer choose the best approach for the above single query.

Data check between source and the target as per transformation rule

I have two tables one is source and the other is target.
I need to check if the data is properly transformed according to the transformation logic or not.
Here is the source table:EmpIn
empid year quarter amount
5 2007 q1 100
5 2007 q2 200
5 2007 q3 300
5 2007 q3 100
5 2007 q4 50
5 2007 q4 100
5 2007 q4 150
Target table after the transformation:EmpOut
empid year quarter amount sequence number
5 2007 q1 100 0
5 2007 q2 200 0
5 2007 q3 300 0
5 2007 q3 400 1
5 2007 q4 50 0
5 2007 q4 150 1
5 2007 q4 300 2
Transformation logic is : If there is another entry of amount with respect to same quarter and the same year
the amount will be addded to the previous amount and the sequence number will be increased by 1 in the target.
for example in the source table in 2007 q3 we have two amounts the first one is 300 it will go as is to the target with sequence number
of zero
the next entry is a addition to the previous amount which is 400 and the sequence number is incresed by 1.
the same transformation is happening to the fourth quarter also
We(I) need to validate if the data is properly transformed according to this logic to the target table or not.
/*Creating dataset*/
create table #tmp1 (empid int, year int, quarter varchar(25), amount int)
Insert into #tmp1
select 5,2007,'q1',100 union
select 5,2007,'q2',200 union
select 5,2007,'q3',300 union
select 5,2007,'q3',100 union
select 5,2007,'q4',50 union
select 5,2007,'q4',100 union
select 5,2007,'q4',150
/*Intermediate dataset*/
select
ROW_NUMBER() over(partition by empid,quarter order by amount) as ID
,*
Into
#tmp2
from
#tmp1 order by 2,3,4
/*Desired output dataset*/
select
a.empid
,a.year
,a.quarter
,sum(b.amount)
, a.ID-1 as [sequence number]
from #tmp2 a , #tmp2 b
where
a.empid=b.empid
and
a.year=b.year
and
a.quarter=b.quarter
and
a.ID>=b.ID
group by
a.ID,
a.empid,
a.year,
a.quarter,
a.amount
Order By 2,3,4,1
I guess you can't just validate the transformation algorithm?
Add a field "original_amount" in the destination table
Loop over all rows with sequence no > 0
Subtract the amount from the former row (sequence no - 1) from the current row's amount and update the new field with the result
Update the field with the amount of the same row for sequence no = 0
Compare the two table's amount and amount_original
Like so.

SQL Rolling Total up to a certain date

I have two tables that I'm working with. Let's call them "Customers" and "Points".
The Points table looks like this:
Account Year M01 M02 M03 M04 M05 M06 M07 M08 M09 M10 M11 M12
123 2011 10 0 0 0 10 0 10 0 0 0 0 10
123 2012 0 0 0 0 10 0 0 10 10 10 10 20
123 2013 5 0 0 0 0 0 0 0 0 0 0 0
But these points work on a rolling 12 months. Calculating a current customer's points is simple enough, but the challenge is for customers who are no longer active. Say Customer 123 became inactive on Jan 2013, we would only want to calculate Feb'12-Jan'13. This is where the other table, Customers, comes in, let's simplify and say it looks just like this:
Account End Date
123 20130105
Now, what I want to do is create a query that calculates the amount of points that each customer has. (Current 12 months for active customers, last 12 months they were active for customers who are no longer active.)
Here's some more information:
I'm running SQL Server 2008.
These tables have been supplied to me like this, I can't modify them.
An active customer is one who has an end date of 99991231 (Dec 31 9999)
The points table only populates for years that the customer is an active customer. Aka, someone becomes an active customer Feb 2009, they have an entry for the year 2009, if they became inactive in July 2009, their points is only calculating Feb-July 2009, there is no row for 2008 because they weren't a customer back then. Jan & Aug-Dec 2009 will show 0's.
Additionally, the record is only created if the customer gains any points that year. If a customer gets 0 points in a year, there will be no record of it.
For border cases, if you get into the first day of a month, then that month is counted. Example, let's say today is April 1st, 2013, that means we sum up May'12-April'13.
This is a pretty complex question. If there's anything I can explain better please let me know. Thank you!
Unfortunately with your table structure of points you will have to unpivot the data. An unpivot takes the data from the multiple columns into rows. Once the data is in the rows, it will be much easier to join, filter the data and total the points for each account. The code to unpivot the data will be similar to this:
select account,
cast(cast(year as varchar(4))+'-'+replace(month_col, 'M', '')+'-01' as date) full_date,
pts
from points
unpivot
(
pts
for month_col in ([M01], [M02], [M03], [M04], [M05], [M06], [M07], [M08], [M09], [M10], [M11], [M12])
) unpiv
See SQL Fiddle with Demo. The query gives a result similar to this:
| ACCOUNT | FULL_DATE | PTS |
------------------------------
| 123 | 2011-01-01 | 10 |
| 123 | 2011-02-01 | 0 |
| 123 | 2011-03-01 | 0 |
| 123 | 2011-04-01 | 0 |
| 123 | 2011-05-01 | 10 |
Once the data is in this format, you can join the Customers table to get the total points for each account, so the code will be similar to the following:
select
c.account, sum(pts) TotalPoints
from customers c
inner join
(
select account,
cast(cast(year as varchar(4))+'-'+replace(month_col, 'M', '')+'-01' as date) full_date,
pts
from points
unpivot
(
pts
for month_col in ([M01], [M02], [M03], [M04], [M05], [M06], [M07], [M08], [M09], [M10], [M11], [M12])
) unpiv
) p
on c.account = p.account
where
(
c.enddate = '9999-12-31'
and full_date >= dateadd(year, -1, getdate())
and full_date <= getdate()
)
or
(
c.enddate <> '9999-12-31'
and dateadd(year, -1, [enddate]) <= full_date
and full_date <= [enddate]
)
group by c.account
See SQL Fiddle with Demo
Lousy data structure. The first thing to do is to unpivot it. Then you get a table with year-month-points as the columns.
From here, you can just select the most recent 12 months. In fact, you don't even have to worry about when a customer left, since presumably they have not collected points since then.
Here is an example in SQL:
with points as (
select 123 as account, 2012 as year,
10 as m01, 0 as m02, 0 as m03, 0 as m04, 10 as m05, 0 as m06,
10 as m07, 0 as m08, 0 as m09, 0 as m10, 0 as m11, 10 as m12
),
points_ym as (
select account, YEAR, mon, cast(right(mon, 2) as int) as monnum, points
from points
unpivot (points for mon in (m01, m02, m03, m04, m05, m06, m07, m08, m09, m10, m11, m12)
) as unpvt
)
select account, SUM(points)
from points_ym
where year*12+monnum >= year(getdate())*12+MONTH(getdate()) - 12
group by account

Generate year to date by month report in SQL [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Running total by grouped records in table
I am trying to put together an SQL statement that returns the SUM of a value by month, but on a year to date basis. In other words, for the month of March, I am looking to get the sum of a value for the months of January, February, and March.
I can easily do a group by to get a total for each month by itself, and potentially calculate the year to date value I need in my application from this data by looping through the results set. However, I was hoping to have some of this work handled with my SQL statement.
Has anyone ever tackled this type of problem with an SQL statement, and if so, what is the trick that I am missing?
My current sql statement for monthly data is similar to the following:
Select month, year, sum(value) from mytable group by month, year
If I include a where clause on the month, and only group by the year, I can get the result for a single month that I am looking for:
select year, sum(value) from mytable where month <= selectedMonth group by year
However, this requires me to have a particular month pre-selected or to utilize 12 different SQL statements to generate one clean result set.
Any guidance that can be provided would be greatly appreciated!
Update: The data is stored on an IBM iSeries.
declare #Q as table
(
mmonth INT,
value int
)
insert into #Q
values
(1,10),
(1,12),
(2,45),
(3,23)
select sum(January) as UpToJanuary,
sum(February)as UpToFebruary,
sum(March) as UpToMarch from (
select
case when mmonth<=1 then sum(value) end as [January] ,
case when mmonth<=2 then sum(value) end as [February],
case when mmonth<=3 then sum(value) end as [March]
from #Q
group by mmonth
) t
Produces:
UpToJanuary UpToFebruary UpToMarch
22 67 90
You get the idea, right?
NOTE: This could be done easier with PIVOT tables but I don't know if you are using SQL Server or not.
As far as I know DB2 does support windowing functions although I don't know if this is also supported on the iSeries version.
If windowing functions are supported (I believe IBM calls them OLAP functions) then the following should return what you want (provided I understood your question correctly)
select month,
year,
value,
sum(value) over (partition by year order by month asc) as sum_to_date
from mytable
order by year, month
create table mon
(
[y] int not null,
[m] int not null,
[value] int not null,
primary key (y,m))
select a.y, a.m, a.value, sum(b.value)
from mon a, mon b
where a.y = b.y and a.m >= b.m
group by a.y, a.m, a.value
2011 1 120 120
2011 2 130 250
2011 3 500 750
2011 4 10 760
2011 5 140 900
2011 6 100 1000
2011 7 110 1110
2011 8 90 1200
2011 9 70 1270
2011 10 150 1420
2011 11 170 1590
2011 12 600 2190
You should try to join the table to itself by month-behind-a-month condition and generate a synthetic month-group code to group by as follows:
select
sum(value),
year,
up_to_month
from (
select a.value,
a.year,
b.month as up_to_month
from table as a join table as b on a.year = b.year and b.month => a.month
)
group by up_to_month, year
gives that:
db2 => select * from my.rep
VALUE YEAR MONTH
----------- ----------- -----------
100 2011 1
200 2011 2
300 2011 3
400 2011 4
db2 -t -f rep.sql
1 YEAR UP_TO_MONTH
----------- ----------- -----------
100 2011 1
300 2011 2
600 2011 3
1000 2011 4