In SQL Server, I have maintained following details.
S.No
Year
Component1
Total Run years
1
2011
AAA
3
2
2011
BBB
5
3
2011
CCC
7
4
2012
AAA
6
5
2012
BBB
2
6
2012
CCC
4
7
2013
AAA
3
8
2013
BBB
2
9
2013
CCC
5
I would like to calculate Cumulative Total Run years BY Year and Component1 group
Required result like this,
Year
Component1
Total Run years
2011
AAA
3
2011
BBB
5
2011
CCC
7
2012
AAA
9
2012
BBB
7
2012
CCC
11
2013
AAA
12
2013
BBB
9
2013
CCC
16
The following should easily solve your problem.
SELECT
Year,
Component1,
Total Run years,
SUM(Total Run years) OVER (PARTITION BY Component1 ORDER BY Year ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'CUMULATIVE_TOTAL'
FROM table
GROUP BY Year, Component1, Total Run years
ORDER BY Year, Component1
See here https://stevestedman.com/2013/04/rows-and-range-preceding-and-following/ for more examples using ROWS, PRECEDING, FOLLOWING.
Related
I have a pandas df of the following format
STOCK YR MONTH DAY PRICE
AAA 2022 1 1 10
AAA 2022 1 2 11
AAA 2022 1 3 10
AAA 2022 1 4 15
AAA 2022 1 5 10
BBB 2022 1 1 5
BBB 2022 1 2 10
BBB 2022 2 1 10
BBB 2022 2 2 15
What I am looking to do is to filter this df such that I am grouping by STOCK and YR and MONTH and selecting the groups with 3 or more entries.
So the resulting df looks like
STOCK YR MONTH DAY PRICE
AAA 2022 1 1 10
AAA 2022 1 2 11
AAA 2022 1 3 10
AAA 2022 1 4 15
AAA 2022 1 5 10
Note that BBB is eliminated as it had only 2 rows in each group, when grouped by STOCK, YR and MONTH
I have tried df.groupby(['STOCK','YR','MONTH']).filter(lambda x: x.STOCK.nunique() > 5) but this resulted in an empty frame.
Also tried df.groupby(['STOCK','YR','MONTH']).filter(lambda x: x['STOCK','YR','MONTH'].nunique() > 5) but this resulted in a KeyError: ('STOCK', 'YR', 'MONTH')
Thanks!
Use GroupBy.transform('count'):
df[df.groupby(['STOCK', 'YR', 'MONTH'])['STOCK'].transform('count').ge(3)]
or 'size':
df[df.groupby(['STOCK', 'YR', 'MONTH'])['STOCK'].transform('size').ge(3)]
output:
STOCK YR MONTH DAY PRICE
0 AAA 2022 1 1 10
1 AAA 2022 1 2 11
2 AAA 2022 1 3 10
3 AAA 2022 1 4 15
4 AAA 2022 1 5 10
Use GroupBy.transform:
If need counts (not exclude possible NaNs):
#if need test number of unique values
df[df.groupby(['STOCK', 'YR', 'MONTH'])['STOCK'].transform('size').gt(3)]
Or:
#in large df should be slow
df.groupby(['STOCK','YR','MONTH']).filter(lambda x: len(x) > 3)
I have a data frame of sales with three columns: the code of the customer, the month the customer bought that item, and the year.
A customer can buy something in september and then in december make another purchase, so appear two times. But I'm interested in knowing the absolutely new customoers by month and year.
So I have thought in make an iteration and some checks and use the %in% function and build a boolean vector that tells me if a customer is new or not and then count by month and year with SQL using this new vector.
But I'm wondering if there's a specific function or a better way to do that.
This is an example of the data I would like to have:
date cust month new_customer
1 14975 25 1 TRUE
2 14976 30 1 TRUE
3 14977 22 1 TRUE
4 14978 4 1 TRUE
5 14979 25 1 FALSE
6 14980 11 1 TRUE
7 14981 17 1 TRUE
8 14982 17 1 FALSE
9 14983 18 1 TRUE
10 14984 7 1 TRUE
11 14985 24 1 TRUE
12 14986 22 1 FALSE
So put it more simple: the data frame is sorted by date, and I'm interested in a vector (new_customer) that tells me if the customer purchased something for the first time or not. For example customer 25 bought something the first day, and then four days later bought something again, so is not a new customer. The same can be seen with customer 17 and 22.
I create dummy data my self with id, month of numeric format, and year
dat <-data.frame(
id = c(1,2,3,4,5,6,7,8,1,3,4,5,1,2,2),
month = c(1,6,7,8,2,3,4,8,11,1,10,9,1,12,2),
year = c(2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2021,2021,2021,2021,2021)
)
id month year
1 1 1 2019
2 2 6 2019
3 3 7 2019
4 4 8 2019
5 5 2 2019
6 6 3 2020
7 7 4 2020
8 8 8 2020
9 1 11 2020
10 3 1 2020
11 4 10 2021
12 5 9 2021
13 1 1 2021
14 2 12 2021
15 2 2 2021
Then, group by id and arrange by year and month (order is meaningful). Then use filter and row_number().
dat %>%
group_by(id) %>%
arrange(year, month) %>%
filter(row_number() == 1)
id month year
<dbl> <dbl> <dbl>
1 1 1 2019
2 5 2 2019
3 2 6 2019
4 3 7 2019
5 4 8 2019
6 6 3 2020
7 7 4 2020
8 8 8 2020
Sample Code
You can change in your code according to this logic:-
Create Table:-
CREATE TABLE PURCHASE(Posting_Date DATE,Customer_Id INT,Customer_Name VARCHAR(15));
Insert Data Into Table
Posting_Date Customer_Id Customer_Name
2018-01-01 C_01 Jack
2018-02-01 C_01 Jack
2018-03-01 C_01 Jack
2018-04-01 C_02 James
2019-04-01 C_01 Jack
2019-05-01 C_01 Jack
2019-05-01 C_03 Gill
2020-01-01 C_02 James
2020-01-01 C_04 Jones
Code
WITH Date_CTE (PostingDate,CustomerID,FirstYear)
AS
(
SELECT MIN(Posting_Date) as [Date],
Customer_Id,
YEAR(MIN(Posting_Date)) as [F_Purchase_Year]
FROM PURCHASE
GROUP BY Customer_Id
)
SELECT T.[ActualYear],(CASE WHEN T.[Customer Status] = 'new' THEN COUNT(T.[Customer Status]) END) AS [New Customer]
FROM (
SELECT DISTINCT YEAR(T2.Posting_Date) AS [ActualYear],
T2.Customer_Id,
(CASE WHEN T1.FirstYear = YEAR(T2.Posting_Date) THEN 'new' ELSE 'old' END) AS [Customer Status]
FROM Date_CTE AS T1
left outer join PURCHASE AS T2 ON T1.CustomerID = T2.Customer_Id
) AS T
GROUP BY T.[ActualYear],T.[Customer Status]
Final Result
ActualYear New Customer
2018 2
2019 1
2020 1
2019 NULL
2020 NULL
I need to create a custom quarter calculator to start always from previous month no matter month, year we are at and count back to get quarter. Previous year wuarters are to be numbered 5, 6 etc
So the goal is to move quarter grouping one month back.
Assume we run query on December 11th, result should be:
YEAR MNTH QTR QTR_ALT
2017 1 1 12
2017 2 1 12
2017 3 1 11
2017 4 2 11
2017 5 2 11
2017 6 2 10
2017 7 3 10
2017 8 3 10
2017 9 3 9
2017 10 4 9
2017 11 4 9
2017 12 4 8
2018 1 1 8
2018 2 1 8
2018 3 1 7
2018 4 2 7
2018 5 2 7
2018 6 2 6
2018 7 3 6
2018 8 3 6
2018 9 3 5
2018 10 4 5
2018 11 4 5
2018 12 4 1
2019 1 1 1
2019 2 1 1
2019 3 1 2
2019 4 2 2
2019 5 2 2
2019 6 2 3
2019 7 3 3
2019 8 3 3
2019 9 3 4
2019 10 4 4
2019 11 4 4
2019 12 4 THIS IS SKIPPED
Starting point is eliminating current_date so data end at previous month's last day
SELECT DISTINCT
YEAR,
MNTH,
QTR
FROM TABLE
WHERE DATA BETWEEN
(SELECT DATE_TRUNC(YEAR,ADD_MONTHS(CURRENT_DATE, -24))) AND
(SELECT DATE_TRUNC(MONTH,CURRENT_DATE)-1)
ORDER BY YEAR, MNTH, QTR
The following gets you all the dates you need, with the extra columns.
select to_char(add_months(a.dt, -b.y), 'YYYY') as year,
to_char(add_months(a.dt, -b.y), 'MM') as month,
ceil(to_number(to_char(add_months(a.dt, -b.y), 'MM')) / 3) as qtr,
ceil(b.y/3) as alt_qtr
from
(select trunc(sysdate, 'MONTH') as dt from dual) a,
(select rownum as y from dual connect by level <= 24) b;
i have this query that works , but the result is not like i want
returns only year and weeks that has data , i want to return 0 to the result
for example this returns
year week totalstop
2017 50 7
2018 1 3
2018 3 5
but i want to return
year week totalstop
2017 50 7
2017 51 0
2017 52 0
2018 1 3
2018 2 0
2018 3 5
and so on
here is the current query
SELECT year(Stopdate)[year],datepart(week,date1) [week],sum(stop) totalstop
from Table1 where
building in (select item from dbo.fn_Split('A1,A2,A3,A4,A5',','))
and
date1 between '2017-12-12' and '2018-05-08'
and grp = 1
group by year(date1),datepart(week,date1)
order by year(date1),[week]
iam using ms sql-server 2016
need help to modify it to my needs as iam out of ideas atm.
How do I normalize this table:
Frequency (PK) Year (PK) Quarter (PK) Month (PK) Value
Monthly 2013 1 1 1
Quarterly 2013 1 0 2
Yearly 2013 0 0 3
The table is not in 2nd normal form, because when Frequency = Yearly Value depends on a subset of the primary key (Frequency, Year)
I've thougt about adding a surrogate key. Then Quarter and Month columns could be nullable.
Surrogate (PK) Frequency Year Quarter Month Value
1 Monthly 2013 1 1 1
2 Quarterly 2013 1 NULL 2
3 Yearly 2013 NULL NULL 3
But this doesn't solve the problem, because the 2nd normal form definition also applies to candidate keys. Dividing the table into three tables based on Frequency doesn't sound like a good idea, because it will introduce if statemments into my business logic:
if (frequency == Monthly) then select from DataMonthly
I'm going to assume that a couple of year's worth of data might look something like this. Correct me if I'm wrong. (I'm going to ignore the issue of whether using zeroes is a good idea or a bad idea.)
Frequency Year Quarter Month Value
--
Monthly 2012 1 1 1
Monthly 2012 1 2 2
Monthly 2012 1 3 3
Monthly 2012 2 4 4
Monthly 2012 2 5 5
Monthly 2012 2 6 6
Monthly 2012 3 7 7
Monthly 2012 3 8 8
Monthly 2012 3 9 9
Monthly 2012 4 10 10
Monthly 2012 4 11 11
Monthly 2012 4 12 12
Quarterly 2012 1 0 2
Quarterly 2012 2 0 5
Quarterly 2012 3 0 8
Quarterly 2012 4 0 11
Yearly 2012 0 0 3
Monthly 2013 1 1 1
Monthly 2013 1 2 2
Monthly 2013 1 3 3
Monthly 2013 2 4 4
Monthly 2013 2 5 5
Monthly 2013 2 6 6
Monthly 2013 3 7 7
Monthly 2013 3 8 8
Monthly 2013 3 9 9
Monthly 2013 4 10 10
Monthly 2013 4 11 11
Monthly 2013 4 12 12
Quarterly 2013 1 0 2
Quarterly 2013 2 0 5
Quarterly 2013 3 0 8
Quarterly 2013 4 0 11
Yearly 2013 0 0 3
From that data we can deduce two functional dependencies. A functional dependency answers the question, "Given one value for the set of attributes 'X', do we know one and only one value for the set of attributes 'Y'?"
{Year, Quarter, Month}->Frequency
{Year, Quarter, Month}->Value
Given one value for the set of attributes {Year, Quarter, Month}, we know one and only one value for the set of attributes {Frequency}. And given one value for the set of attributes {Year, Quarter, Month}, we know one and only one value for the set of attributes {Value}.
The problem you were running into involved including "Frequency" as part of the primary key. It's really not.
This table could do probably without the [Frequency] and [Quarter] column.
Why do you want to have these in? Is there any added value in having the Quarterly and Yearly values precalculated in this table? Comment: Since it's Value's are not just the sum of it's Month's.
So [Quarter] is mandatory.
This will work too:
Year (PK) Quarter (PK) Month (PK) Value
2013 1 1 1
2013 1 0 2
2013 0 0 3
Yearly results:
SELECT
[Value]
FROM [Table1]
WHERE [Year] = 2013 AND [Quarter] = 0 AND [Month] = 0
Quarterly results:
SELECT
[Value]
FROM [Table1]
WHERE [Year] = 2013 AND [Quarter] = 1 AND [Month] = 0
Monthly results:
SELECT
[Value] AS [Results]
FROM [Table1]
WHERE [Year] = 2013 AND [Quarter] = 1 AND [Month] = 1
Would this work for you?