Aggregate values with multiple conditions - sql

I've searched the forum but can't quite find what I'm looking for. Apologies if this has already been answered.
I have a table with the following example values:
FY Period Version Value
2013 3 1 9954
2013 3 2 9954
2013 4 1 11498
2013 4 2 11498
2013 4 3 11498
2014 1 1 448
2014 1 2 448
2014 1 3 0
2014 2 1 3150
2014 2 2 3150
2014 3 1 0
2014 3 2 0
2014 3 3 5059
2014 4 1 11118
2014 4 2 0
2014 4 3 11118
I'm looking to sum the values for the highest version number, within each period and each FY, so the expected result for this particular data set would be:
(9954 + 11498 + 0 + 3150 + 5059 + 11118) = 40,779
I've done something similar previously with the over partition approach but i can't get it to work on this data set. Any pointers would be greatly appreciated.

A simple approach is to use row_number():
select sum(value)
from (select t.*,
row_number() over (partition by fy, period order by version desc) as seqnum
from table t
) t
where seqnum = 1;

Related

R - get a vector that tells me if a value of another vector is the first appearence or not

I have a data frame of sales with three columns: the code of the customer, the month the customer bought that item, and the year.
A customer can buy something in september and then in december make another purchase, so appear two times. But I'm interested in knowing the absolutely new customoers by month and year.
So I have thought in make an iteration and some checks and use the %in% function and build a boolean vector that tells me if a customer is new or not and then count by month and year with SQL using this new vector.
But I'm wondering if there's a specific function or a better way to do that.
This is an example of the data I would like to have:
date cust month new_customer
1 14975 25 1 TRUE
2 14976 30 1 TRUE
3 14977 22 1 TRUE
4 14978 4 1 TRUE
5 14979 25 1 FALSE
6 14980 11 1 TRUE
7 14981 17 1 TRUE
8 14982 17 1 FALSE
9 14983 18 1 TRUE
10 14984 7 1 TRUE
11 14985 24 1 TRUE
12 14986 22 1 FALSE
So put it more simple: the data frame is sorted by date, and I'm interested in a vector (new_customer) that tells me if the customer purchased something for the first time or not. For example customer 25 bought something the first day, and then four days later bought something again, so is not a new customer. The same can be seen with customer 17 and 22.
I create dummy data my self with id, month of numeric format, and year
dat <-data.frame(
id = c(1,2,3,4,5,6,7,8,1,3,4,5,1,2,2),
month = c(1,6,7,8,2,3,4,8,11,1,10,9,1,12,2),
year = c(2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2021,2021,2021,2021,2021)
)
id month year
1 1 1 2019
2 2 6 2019
3 3 7 2019
4 4 8 2019
5 5 2 2019
6 6 3 2020
7 7 4 2020
8 8 8 2020
9 1 11 2020
10 3 1 2020
11 4 10 2021
12 5 9 2021
13 1 1 2021
14 2 12 2021
15 2 2 2021
Then, group by id and arrange by year and month (order is meaningful). Then use filter and row_number().
dat %>%
group_by(id) %>%
arrange(year, month) %>%
filter(row_number() == 1)
id month year
<dbl> <dbl> <dbl>
1 1 1 2019
2 5 2 2019
3 2 6 2019
4 3 7 2019
5 4 8 2019
6 6 3 2020
7 7 4 2020
8 8 8 2020
Sample Code
You can change in your code according to this logic:-
Create Table:-
CREATE TABLE PURCHASE(Posting_Date DATE,Customer_Id INT,Customer_Name VARCHAR(15));
Insert Data Into Table
Posting_Date Customer_Id Customer_Name
2018-01-01 C_01 Jack
2018-02-01 C_01 Jack
2018-03-01 C_01 Jack
2018-04-01 C_02 James
2019-04-01 C_01 Jack
2019-05-01 C_01 Jack
2019-05-01 C_03 Gill
2020-01-01 C_02 James
2020-01-01 C_04 Jones
Code
WITH Date_CTE (PostingDate,CustomerID,FirstYear)
AS
(
SELECT MIN(Posting_Date) as [Date],
Customer_Id,
YEAR(MIN(Posting_Date)) as [F_Purchase_Year]
FROM PURCHASE
GROUP BY Customer_Id
)
SELECT T.[ActualYear],(CASE WHEN T.[Customer Status] = 'new' THEN COUNT(T.[Customer Status]) END) AS [New Customer]
FROM (
SELECT DISTINCT YEAR(T2.Posting_Date) AS [ActualYear],
T2.Customer_Id,
(CASE WHEN T1.FirstYear = YEAR(T2.Posting_Date) THEN 'new' ELSE 'old' END) AS [Customer Status]
FROM Date_CTE AS T1
left outer join PURCHASE AS T2 ON T1.CustomerID = T2.Customer_Id
) AS T
GROUP BY T.[ActualYear],T.[Customer Status]
Final Result
ActualYear New Customer
2018 2
2019 1
2020 1
2019 NULL
2020 NULL

Compare data from for specific column grouping and Update based on criteria

I have a table with the following structure:
Employee Project Task Accomplishment Score Year
John A 1 5 60 2016
John A 1 6 40 2018
John A 2 3 30 2016
Simon B 2 0 30 2017
Simon B 2 4 30 2019
David C 1 3 20 2015
David C 1 2 40 2016
David C 3 0 25 2017
David C 3 5 35 2017
I want to create a view with Oracle SQLout of the above table which looks like as follows:
Employee Project Task Accomplishment Score Year UpdateScore Comment
John A 1 5 60 2016 60
John A 1 6 40 2018 100 (=60+40)
John A 2 3 30 2016 30
Simon B 2 0 30 2017 30
Simon B 2 4 40 2019 40 (no update because Accomplishement was 0)
David C 1 3 20 2015 20
David C 1 2 40 2016 60 (=20+40)
David C 3 0 25 2017 25
David C 3 5 35 2017 35 (no update because Accomplishement was 0)
The Grouping is: Employee-Project-Task.
The Rule of the UpdateScore column:
If for a specific Employee-Project-Task group Accomplishment column value is greater than 0 for the previous year, add the previous year's score to the latest year for the same Employee-Project-Task group.
For example: John-A-1 is a group which is different from John-A-2. So as we can see for John-A-1 the Accomplishment is 5 (which is greater than 0) in 2016, so we add the Score from 2016 with the score of 2018 for the John-A-1 and the updated score becomes 100.
For Simon-B-2, the accomplishment was 0, so there will be no update for 2019 for Simon-B-2.
Note: I don't need the Comment field, it is there just for more clarification.
Use analytic functions to determine if there was a score for the previous year, and if so, add it to the UpdatedScore.
select Employee, Project, Task, Accomplishment, Score, Year,
case when lag(Year) over (partition by Employee, Project order by Year) = Year - 1
then lag(Score) over (partition by Employee, Project order by Year)
else 0
end + Score as UpdatedScore
from EmployeeScore;
This is a bit strange -- you are counting the accomplishment of 0 in one year but not the next. Okay.
Use analytic functions:
select t.*,
(case when lag(accomplishment) over (partition by Employee, Project, Task order by year) > 0
then lag(score) over (partition by Employee, Project, Task order by year)
else 0
end) + score as update_score
from t;
from t

Add column value to next column in SQL

My sql table is
Week Year Applications
1 2017 0
2 2017 10
3 2017 20
4 2017 50
5 2017 0
1 2018 10
2 2018 0
3 2018 40
4 2018 50
5 2018 10
And I want SQL query which give below output
Week Year Applications
1 2017 0
2 2017 10
3 2017 30
4 2017 80
5 2017 80
1 2018 10
2 2018 10
3 2018 50
4 2018 100
5 2018 110
Can anyone help me to write below query?
You could use SUM() OVER to get cumulative sum:
SELECT *, SUM(Applications) OVER(PARTITION BY Year ORDER BY Week)
FROM tab
It looks like you want a cumulative sum:
select week, year,
sum(applications) over (partition by year order by week) as cumulative_applications
from t;

SQL Query Return 0 on weeks in between

i have this query that works , but the result is not like i want
returns only year and weeks that has data , i want to return 0 to the result
for example this returns
year week totalstop
2017 50 7
2018 1 3
2018 3 5
but i want to return
year week totalstop
2017 50 7
2017 51 0
2017 52 0
2018 1 3
2018 2 0
2018 3 5
and so on
here is the current query
SELECT year(Stopdate)[year],datepart(week,date1) [week],sum(stop) totalstop
from Table1 where
building in (select item from dbo.fn_Split('A1,A2,A3,A4,A5',','))
and
date1 between '2017-12-12' and '2018-05-08'
and grp = 1
group by year(date1),datepart(week,date1)
order by year(date1),[week]
iam using ms sql-server 2016
need help to modify it to my needs as iam out of ideas atm.

Using the lag function to find a moving average in SQL

I need to find a moving average for the previous 12 rows. I need to have my result set look like this.
t Year Month Sales MovingAverage
1 2010 3 20 NULL
2 2010 4 22 NULL
3 2010 5 24 NULL
4 2010 6 25 NULL
5 2010 7 23 NULL
6 2010 8 26 NULL
7 2010 9 28 NULL
8 2010 10 26 NULL
9 2010 11 29 NULL
10 2010 12 27 NULL
11 2011 1 28 NULL
12 2011 2 30 NULL
13 2011 3 27 25.67
14 2011 4 29 26.25
15 2011 5 26 26.83
For row 13 I need to average rows 1 to 12 and have the result returned in row 13 column MovingAverage. Rows 1-12 have a MovingAverage of NULL because there should be at least 12 previous rows for the calculation. Rows t, Year, Month, and Sales already exist. I need to create the MovingAverage row. I am using postgreSQL but the syntax should be very similar.
Don't use the lag() function. There is a build in moving average function. Well, almost:
select t.*, avg(sales) over (order by t range between 12 preceding and current row
from table t;
The problem is that this will produce an average for the first 11 months. To prevent that:
select t.*,
(case when row_number() over (order by t) >= 12
then avg(sales) over (order by t range between 12 preceding and current row
end) as MovingAvg
from table t;
Note that the syntax rows between instead of range between would be very similar for this query.