I would like to identify the last state for a variable regarding a specific month. For example
Variable Date Operation State
A 01Jan2019 1 10
A 10Jan2019 3 20
A 31Jan2019 4 50
A 05Feb2019 7 60
A 22Feb2019 8 70
B 06Jan2019 2 10
B 07Jan2019 3 20
B 07Feb2019 6 60
B 15Mar2019 9 80
The result should like
Variable Month Year Last_State_Until_End_of_Month
A 1 2019 50
A 2 2019 70
A 3 2019 70
B 1 2019 20
B 2 2019 60
B 3 2019 80
Please note that for the variable A in March the last state is the same as for February. (No change was made during March). I don't know if it helps, but there is an operation ID the is increasing for each change of a state independent of the choise of variable.
Thanks Panagiotis Kanavos the follwoing query works:
declare #table_test TABLE (variable_name nvarchar(255), date_var date, op_id int, state int)
insert into #table_test Values
('A', '01Jan2019', 1 ,10),
('A', '10Jan2019', 3 ,20),
('A', '31Jan2019', 4 ,50),
('A', '05Feb2019', 7 ,60),
('A', '22Feb2019', 8 ,70),
('B', '06Jan2019', 2 ,10),
('B', '07Jan2019', 3 ,20),
('B', '07Feb2019', 6 ,60),
('B', '15Mar2019', 9 ,80)
select
year(date_var) as year
,month(date_var) as month
,variable_name,
,Last_value(State) OVER (partition by year(date_var),month(date_var),variable_name order by date_var ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as last_value
from #table_test
Order by variable_name, year(date_var),month(date_var)
This query returns:
year month variable_name last_value
2019 1 A 50
2019 1 A 50
2019 1 A 50
2019 2 A 70
2019 2 A 70
2019 1 B 20
2019 1 B 20
2019 2 B 60
2019 3 B 80
Related
I would like to group Highest values in month column group by year and Sum the value column
value
Year
Month
4
2019
10
1
2019
11
5
2019
11
1
2019
11
1
2019
12
8
2019
12
1
2019
12
1
2020
1
10
2020
1
3
2021
1
2
2021
2
11
2021
2
1
2021
2
3
2021
2
2
2021
3
In above table I would like to extract highest value of month group by year
in year 2019 highest month is 12 so there are 3 rows and sum of value column will be 10
The output should be
value
Year
Month
10
2019
12
11
2020
1
2
2021
3
supposing that the table is called "example_table" you can use the following query:
select sum(example_table.value), example_table.year, example_table.month
from example_table
join (
select year, max(month) "month"
from example_table
group by year
) sub on example_table.year = sub.year and example_table.month = sub.month
group by example_table.year, example_table.month
order by example_table.year
I have a data frame of sales with three columns: the code of the customer, the month the customer bought that item, and the year.
A customer can buy something in september and then in december make another purchase, so appear two times. But I'm interested in knowing the absolutely new customoers by month and year.
So I have thought in make an iteration and some checks and use the %in% function and build a boolean vector that tells me if a customer is new or not and then count by month and year with SQL using this new vector.
But I'm wondering if there's a specific function or a better way to do that.
This is an example of the data I would like to have:
date cust month new_customer
1 14975 25 1 TRUE
2 14976 30 1 TRUE
3 14977 22 1 TRUE
4 14978 4 1 TRUE
5 14979 25 1 FALSE
6 14980 11 1 TRUE
7 14981 17 1 TRUE
8 14982 17 1 FALSE
9 14983 18 1 TRUE
10 14984 7 1 TRUE
11 14985 24 1 TRUE
12 14986 22 1 FALSE
So put it more simple: the data frame is sorted by date, and I'm interested in a vector (new_customer) that tells me if the customer purchased something for the first time or not. For example customer 25 bought something the first day, and then four days later bought something again, so is not a new customer. The same can be seen with customer 17 and 22.
I create dummy data my self with id, month of numeric format, and year
dat <-data.frame(
id = c(1,2,3,4,5,6,7,8,1,3,4,5,1,2,2),
month = c(1,6,7,8,2,3,4,8,11,1,10,9,1,12,2),
year = c(2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2021,2021,2021,2021,2021)
)
id month year
1 1 1 2019
2 2 6 2019
3 3 7 2019
4 4 8 2019
5 5 2 2019
6 6 3 2020
7 7 4 2020
8 8 8 2020
9 1 11 2020
10 3 1 2020
11 4 10 2021
12 5 9 2021
13 1 1 2021
14 2 12 2021
15 2 2 2021
Then, group by id and arrange by year and month (order is meaningful). Then use filter and row_number().
dat %>%
group_by(id) %>%
arrange(year, month) %>%
filter(row_number() == 1)
id month year
<dbl> <dbl> <dbl>
1 1 1 2019
2 5 2 2019
3 2 6 2019
4 3 7 2019
5 4 8 2019
6 6 3 2020
7 7 4 2020
8 8 8 2020
Sample Code
You can change in your code according to this logic:-
Create Table:-
CREATE TABLE PURCHASE(Posting_Date DATE,Customer_Id INT,Customer_Name VARCHAR(15));
Insert Data Into Table
Posting_Date Customer_Id Customer_Name
2018-01-01 C_01 Jack
2018-02-01 C_01 Jack
2018-03-01 C_01 Jack
2018-04-01 C_02 James
2019-04-01 C_01 Jack
2019-05-01 C_01 Jack
2019-05-01 C_03 Gill
2020-01-01 C_02 James
2020-01-01 C_04 Jones
Code
WITH Date_CTE (PostingDate,CustomerID,FirstYear)
AS
(
SELECT MIN(Posting_Date) as [Date],
Customer_Id,
YEAR(MIN(Posting_Date)) as [F_Purchase_Year]
FROM PURCHASE
GROUP BY Customer_Id
)
SELECT T.[ActualYear],(CASE WHEN T.[Customer Status] = 'new' THEN COUNT(T.[Customer Status]) END) AS [New Customer]
FROM (
SELECT DISTINCT YEAR(T2.Posting_Date) AS [ActualYear],
T2.Customer_Id,
(CASE WHEN T1.FirstYear = YEAR(T2.Posting_Date) THEN 'new' ELSE 'old' END) AS [Customer Status]
FROM Date_CTE AS T1
left outer join PURCHASE AS T2 ON T1.CustomerID = T2.Customer_Id
) AS T
GROUP BY T.[ActualYear],T.[Customer Status]
Final Result
ActualYear New Customer
2018 2
2019 1
2020 1
2019 NULL
2020 NULL
I'm trying to get a running total as of a date. This is the data I have
Date
transaction Amount
End of Week Balance
jan 1
5
100
jan 2
3
100
jan 3
4
100
jan 4
3
100
jan 5
1
100
jan 6
3
100
I would like to find out what the daily end balance is. My thought is to get a running total from each day to the end of the week and subtract it from the end of week balance, like below
Date
transaction Amount
Running total
End of Week Balance
Balance - Running total
jan 1
5
19
100
86
jan 2
3
14
100
89
jan 3
4
11
100
93
jan 4
3
7
100
96
jan 5
1
4
100
97
jan 6
3
3
100
100
I can use
SUM(transactionAmount) OVER (Order by Date)
to get a running total, is there a way to specify that I only want the total of transactions that have taken place after the date?
You can use sum() as a window function, but accumulate in reverse:
select t.*,
(end_of_week_balance -
sum(transactionAmount) over (order by date desc)
)
from t;
If you have this example:
1> select i, sum(i) over (order by i) S from integers where i<10;
2> go
i S
----------- -----------
1 1
2 3
3 6
4 10
5 15
6 21
7 28
8 36
9 45
you can also do:
1> select i, sum(case when i>3 then i else 0 end) over (order by i) S from integers where i<10;
2> go
i S
----------- -----------
1 0
2 0
3 0
4 4
5 9
6 15
7 22
8 30
9 39
Good morning everybody.
I work on the last version of SQL Workbench.
I got a table group by Year and Week, and a type of document verification (3 at total).
Year
Week
Verif_Type
Total
2020
1
1
3
2020
1
3
1
2020
2
1
1
2020
2
2
6
2020
2
3
3
I want to know how many percent of each verification type are performed by week and year.
My question is there : How can I print the percent column next such that
Year
Week
Verif_Type
Total
Percent
2020
1
1
3
75
2020
1
3
1
25
2020
2
1
1
10
2020
2
2
6
60
2020
2
3
3
30
I have already computed total count per week but the table have different sizes, so I can't use operations with it.
Thank you for your help :)
For MySql version 8.0 use Query#1 using windows function and for older version of MySQL use Query#2 (using subquery).
Schema and insert statements:
create table yourtable(Year int, Week int, Verif_Type int, Total int);
insert into yourtable values(2020, 1, 1, 3);
insert into yourtable values(2020, 1, 3, 1);
insert into yourtable values(2020, 2, 1, 1);
insert into yourtable values(2020, 2, 2, 6);
insert into yourtable values(2020, 2, 3, 3);
Query#1 (using window function)
Select year, week, verif_type, total, (total*100/sum(total)over(partition by year, week)) percent
From yourtable
Output:
year
week
verif_type
total
percent
2020
1
1
3
75.0000
2020
1
3
1
25.0000
2020
2
1
1
10.0000
2020
2
2
6
60.0000
2020
2
3
3
30.0000
Query#2 (using subquery)
select year, week, verif_type, total, (total*100/(select sum(total) from yourtable b where a.year=b.year and a.Week=b.Week)) percent
From yourtable a
Output:
year
week
verif_type
total
percent
2020
1
1
3
75.0000
2020
1
3
1
25.0000
2020
2
1
1
10.0000
2020
2
2
6
60.0000
2020
2
3
3
30.0000
db<fiddle here
I need to find a moving average for the previous 12 rows. I need to have my result set look like this.
t Year Month Sales MovingAverage
1 2010 3 20 NULL
2 2010 4 22 NULL
3 2010 5 24 NULL
4 2010 6 25 NULL
5 2010 7 23 NULL
6 2010 8 26 NULL
7 2010 9 28 NULL
8 2010 10 26 NULL
9 2010 11 29 NULL
10 2010 12 27 NULL
11 2011 1 28 NULL
12 2011 2 30 NULL
13 2011 3 27 25.67
14 2011 4 29 26.25
15 2011 5 26 26.83
For row 13 I need to average rows 1 to 12 and have the result returned in row 13 column MovingAverage. Rows 1-12 have a MovingAverage of NULL because there should be at least 12 previous rows for the calculation. Rows t, Year, Month, and Sales already exist. I need to create the MovingAverage row. I am using postgreSQL but the syntax should be very similar.
Don't use the lag() function. There is a build in moving average function. Well, almost:
select t.*, avg(sales) over (order by t range between 12 preceding and current row
from table t;
The problem is that this will produce an average for the first 11 months. To prevent that:
select t.*,
(case when row_number() over (order by t) >= 12
then avg(sales) over (order by t range between 12 preceding and current row
end) as MovingAvg
from table t;
Note that the syntax rows between instead of range between would be very similar for this query.