Get multiple counts by 5 year increments

Get multiple counts by 5 year increments - sql

This is my table:
index_melanoma_yr Total_Melanoma Total_Virus
2000 700 12
2001 746 7
2002 724 12
2003 815 15
2004 893 16
2005 1020 22
I would like to count by 5 year increments. So, 2000-2004, 2005-2009, etc. I can hard code this, but since there are so many years, I'm wondering if there is a more efficient way.
Here's how I got the initial counts:
SELECT index_melanoma_yr,
COUNT(DISTINCT PersonID) AS Total_Melanoma,
SUM( CASE
WHEN index_virus_yr IS NOT NULL THEN
1
ELSE
0
END
) AS Total_Virus
FROM Asare_ViralMelanoma_IndexDates
GROUP BY index_melanoma_yr
ORDER BY index_melanoma_yr

you can perform some simple maths year / 5 * 5 on the year column, and then GROUP BY that. Assuming that the year column is integer
SELECT MIN(index_melanoma_yr) AS Year_Start,
MAX(index_melanoma_yr) AS Year_End,
COUNT(DISTINCT PersonID) AS Total_Melanoma,
SUM( CASE
WHEN index_virus_yr IS NOT NULL THEN
1
ELSE
0
END
) AS Total_Virus
FROM Asare_ViralMelanoma_IndexDates
GROUP BY index_melanoma_yr / 5 * 5
ORDER BY Year_Start

Related

Linear Interpolation in SQL

I work with crashes and mileage for the same year which is Year in table. Crashes are are there for every record, but annual mileage is not. NULLs for mileage could be at the beginning or at the end of the time period for certain customer. Also, couple of annual mileage records can be missing as well. I do not know how to overcome this. I try to do it in CASE statement but then I do not know how to code it properly. Issue needs to be resolved in SQL and use SQL Server.
This is how the output looks like and I need to have mileage for every single year for each customer.
The info I am pulling from is proprietary database and the records themselves should be untouched as is. I just need code in query which will modify my current output to output where I have mileage for every year. I appreciate any input!
Year
Customer
Crashes
Annual_Mileage
2009
123
5
3453453
2010
123
1
NULL
2011
123
0
54545
2012
123
14
376457435
2013
123
3
63453453
2014
123
4
NULL
2015
123
15
6346747
2016
123
0
NULL
2017
123
2
534534
2018
123
7
NULL
2019
123
11
NULL
2020
123
15
565435
2021
123
12
474567546
2022
123
7
NULL
Desired Results
Year
Customer
Crashes
Annual_Mileage
2009
123
5
3453453
2010
123
1
175399 (prior value is taken)
2011
123
0
54545
2012
123
14
376457435
2013
123
3
63453453
2014
123
4
34900100 (avg of 2 adjacent values)
2015
123
15
6346747
2016
123
0
3440641 (avg of 2 adjacent values)
2017
123
2
534534
2018
123
7
534534 ( prior value is taken)
2019
123
11
549985 (avg of 2 adjacent values)
2020
123
15
565435
2021
123
12
474567546
2022
123
7
474567546 (prior value is taken)
SELECT Year,
Customer,
Crashes,
CASE
WHEN Annual_Mlg IS NOT NULL THEN Annual_Mlg
WHEN Annual_Mlg IS NULL THEN
CASE
WHEN PREV.Annual_Mlg IS NOT NULL
AND NEXT.Annual_Mlg IS NOT NULL
THEN ( PREV.Annual_Mlg + NEXT.Annual_Mlg ) / 2
ELSE 0
END
END AS Annual_Mlg
FROM #table
The above code doesn't work, but I just need to start somehow and that what I have currently.
I understand what I need to do I just do not know how to code it in SQL.
After i applied row_number () function i got this output for first 2 clients and for the rest of the 4 clients row_number() function gave correct output. i have no idea why is that. I thought may be because i used "full join" before to combine mileage and crashes table?
enter image description here

Your use of #table tells me that you're using MS SQL Server (a temporary table, probably in a stored procedure).
You want to:
select all the rows in #table
joined with the matching row (if any) for the previous year, and
joined with the matching row (if any) for the next year
Then it's easy. Assuming the primary key on your #table is composed of the year and customer columns, something like this ought to do you:
select t.year ,
t.customer ,
t.crashes ,
annual_milage = coalesce(
t.annual_milage ,
( coalesce( p.annual_mileage, 0 ) +
coalesce( n.annual_mileage, 0 )
) / 2
)
from #table t -- take all the rows
left join #table p on p.year = t.year - 1 -- with the matching row for
and p.customer = t.customer -- the previous year (if any)
left join #table n on n.year = t.year + 1 -- and the matching row for
and n.customer = t.customer -- the next year (if any)
Notes:
What value you default to if the previous or next year doesn't exist is up to you (zero? some arbitrary value?)
Is the previous/next year guaranteed to be the current year +/- 1?
If not, you may have to use derived tables as the source for the
prev/next data, selecting the closest previous/next year (that sort
of thing rather complicates the query significantly).
Edited To Note:
If you have discontiguous years for each customer such that the "previous" and "next" years for a given customer are not necessarily the current year +/- 1, then something like this is probably the most straightforward way to find the previous/next year.
We use a derived table in our from clause, and assign a sequential number in lieu of year for each customer, using the ranking function row_number() function. This query, then
select row_nbr = row_number() over (
partition by x.customer
order by x.year
) ,
x.*
from #table x
would produce results along these lines:
row_nbr
customer
year
...
1
123
1992
...
2
123
1993
...
3
123
1995
...
4
123
2020
...
1
456
2001
...
2
456
2005
...
3
456
2020
...
And that leads us to this:
select year = t.year ,
customer = t.customer ,
crashes = t.crashes ,
annual_mileage = coalesce(
t.mileage,
coalesce(
t.annual_mileage,
(
coalesce(p.annual_mileage,0) +
coalesce(n.annual_mileage,0)
) / 2
),
)
from (
select row_nbr = row_number() over (
partition by x.customer
order by x.year
) ,
x.*
from #table x
) t
left join #table p on p.customer = t.customer and p.row_nbr = t.row_nbr-1
left join #table n on n.customer = t.customer and n.row_nbr = t.row_nbr+1

rolling sum to calculate YTD for each month group by product and save to separate columns using SQL

I have a data like this:
Order_No Product Month Qty
3001 r33 1 8
3002 r34 1 11
3003 r33 1 17
3004 r33 2 3
3005 r34 2 11
3006 r34 3 1
3007 r33 3 -10
3008 r33 3 18
I'd like to calculate total YTD qty for product and each month and save to separate columns. Below is what I want
Product Qty_sum_jan Qty_sum_feb Qty_sum_mar
r33 25 28 36
r34 11 22 23
I know how to use window function to calculate rolling sums but I have no idea to group them to separate columns. I currently use something like this:
case when Month = 1 then sum(Qty) over(partition by Product order by Month) else 0 end as Qty_sum_jan,
case when Month <=2 then sum(Qty) over(partition by Product order by Month) else 0 end as Qty_sum_feb,
case when Month <=3 then sum(Qty) over(partition by Product order by Month) else 0 end as Qty_sum_mar,
This will get me rolling sum by order but how to get to product level like what I show above? If I use group by then it will throw an error since Month is not in group by clause. I also cannot just use max to get the last value since qty can be negative so the last value may not be maximum. I use sparkSQL by the way

To my understanding, there is no need to use window functions. The following query achieves your desired output:
select
product,
sum(case when month = 1 then qty else 0 end) as sum_qty_jan,
sum(case when month <= 2 then qty else 0 end) as sum_qty_feb,
sum(case when month <= 3 then qty else 0 end) as sum_qty_mar
from your_table
group by 1;
Output:
product
sum_qty_jan
sum_qty_feb
sum_qty_mar
r33
25
28
36
r34
11
22
23

Get the latest price SQLITE

I have a table which contain _id, underSubheadId, wefDate, price.
Whenever a product is created or price is edited an entry is made in this table also.
What I want is if I enter a date, I get the latest price of all distinct UnderSubheadIds before the date (or on that date if no entry found)
_id underHeadId wefDate price
1 1 2016-11-01 5
2 2 2016-11-01 50
3 1 2016-11-25 500
4 3 2016-11-01 20
5 4 2016-11-11 30
6 5 2016-11-01 40
7 3 2016-11-20 25
8 5 2016-11-15 52
If I enter 2016-11-20 as date I should get
1 5
2 50
3 25
4 30
5 52
I have achieved the result using ROW NUMBER function in SQL SERVER, but I want this result in Sqlite which don't have such function.
Also if a date like 2016-10-25(which have no entries) is entered I want the price of the date which is first.
Like for 1 we will get price as 5 as the nearest and the 1st entry is 2016-11-01.
This is the query for SQL SERVER which is working fine. But I want it for Sqlite which don't have ROW_NUMBER function.
select underSubHeadId,price from(
select underSubHeadId,price, ROW_NUMBER() OVER (Partition By underSubHeadId order by wefDate desc) rn from rates
where wefDate<='2016-11-19') newTable
where newTable.rn=1
Thank You

This is a little tricky, but here is one way:
select t.*
from t
where t.wefDate = (select max(t2.wefDate)
from t t2
where t2.underSubHeadId = t.underSubHeadId and
t2.wefdate <= '2016-11-20'
);

select underHeadId, max(price)
from t
where wefDate <= "2016-11-20"
group by underHead;

SQL script to partition data on a column and return the max value [duplicate]

This question already has answers here:
How to group by on consecutive values in SQL
(2 answers)
Closed 6 years ago.
I have a requirement to compute bonus payout based on spread goal and date achieved as follows:
Spread Goal | Date Achieved | Bonus Payout
----------------------------------------------
$3,500 | < 27 wks | $2,000
$3,500 | 27 wks to 34 wks | $1,000
$3,500 | > 34 wks | $0
I have a table in SQL Server 2014 where the subset of the data is as follows:
EMP_ID WK_NUM NET_SPRD_LCL
123 10 0
123 11 1500
123 15 3600
123 18 3800
123 19 4000
Based on the requirement, I need to look for records where NET_SPRD_LCL is greater than or equal to 3500 during 2 continuous wk_num.
So, in my example, WK_NUM 15 and 18 (which in my case are continuous because I have a calendar table that I join to to exclude the holiday weeks) are less than 27 wks and have NET_SPRD_LCL > 3500.
For this case, I want to output the MAX(WK_NUM), it's associated NET_SPRD_LCL and BONUSPAYOUT = 2000. So, the output should be as follows:
EMP_ID WK_NUM NET_SPRD_LCL BONUSPAYOUT
123 18 3800 2000
If this meets the first requirement, the script should output and quit. If not, then I will look for the second requirement where Date Achieved is between 27 wks to 34 wks.
I hope I was able to explain my requirement clearly :-)
Thanks for the help.

Nice question! I broke my mind on situations like 4 rows in a turn are with 3500 and more. And came up with this.
You can use CTE, recursive CTE and ROW_NUMBER():
;WITH cte AS(
SELECT EMP_ID,
WK_NUM,
NET_SPRD_LCL,
ROW_NUMBER() OVER (PARTITION BY EMP_ID ORDER BY WK_NUM) rn
FROM YourTable
)
, recur AS (
SELECT EMP_ID,
WK_NUM,
NET_SPRD_LCL,
rn,
1 as lev
FROM cte
WHERE rn = 1
UNION ALL
SELECT c.EMP_ID,
c.WK_NUM,
c.NET_SPRD_LCL,
c.rn,
CASE WHEN c.NET_SPRD_LCL < 3500 THEN Lev+1 ELSE Lev END
FROM cte c
INNER JOIN recur r
ON r.rn+1 = c.rn
)
SELECT TOP 1 WITH TIES
EMP_ID,
WK_NUM,
NET_SPRD_LCL,
CASE WHEN WK_NUM < 27 THEN $2000
WHEN WK_NUM between 27 and 34 THEN $1000
ELSE $0 END as Bonus
FROM recur
WHERE NET_SPRD_LCL >= 3500
ORDER BY ROW_NUMBER() OVER(PARTITION BY EMP_ID,lev ORDER BY WK_NUM)%2
Output for data you provided:
EMP_ID WK_NUM NET_SPRD_LCL Bonus
123 18 3800 2000,00

SQL Nested Multiple Select Statement

Im trying to create a nested SQL statement with multiple Nested Select Statements, to be used on a record set import in excel vba.
What i want to do is create something like:
SELECT
N.LimitN,
Sum(N.amountN),
Sum(N1.amountN1)
FROM (
SELECT year as yearN, Sum(amount) as amountN, limit as limitN
FROM table1
WHERE year = 2013
GROUP BY year, limit) as N
JOIN (
SELECT year as yearN1, Sum(amount) as amountN1, limit as limitN1
FROM table1
WHERE year = 2014
GROUP BY year, limit) as N1
ON N.LimitN = N1.LimitN1
GROUP BY N.LimitN
ORDER BY N.LimitN;
So that if my Raw data is like this:
Year Amount Limit
2013 100 20
2013 90 30
2013 120 40
2013 5 20
2013 100 30
2013 105 40
2013 150 50
2014 115 20
2014 50 30
2014 95 40
2014 110 50
2014 30 20
My Resulting Table/record set will be like this:
Limit AmountN (i.e. 2013) Amount N1 (i.e. 2014)
20 105 145
30 190 50
40 225 95
50 150 110
Thanks in Advance
Peter

It feels like you're overcomplicating the query a little, what you want is just a year wise sum of amount, grouped by limit. This can be done using a CASE;
SELECT
limit,
SUM(CASE WHEN year=2013 THEN amount ELSE 0 END) amountN,
SUM(CASE WHEN year=2014 THEN amount ELSE 0 END) amountN1
FROM myTable
GROUP BY limit
ORDER BY limit;
An SQLfiddle to test with.
(if we're talking Access here, you will need to use IIF instead of CASE)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get multiple counts by 5 year increments - sql

Related

Linear Interpolation in SQL

rolling sum to calculate YTD for each month group by product and save to separate columns using SQL

Get the latest price SQLITE

SQL script to partition data on a column and return the max value [duplicate]

SQL Nested Multiple Select Statement

Categories

Resources