Normalize a Table That Contains Monthly, Yearly and Quarterly Data - sql

How do I normalize this table:
Frequency (PK) Year (PK) Quarter (PK) Month (PK) Value
Monthly 2013 1 1 1
Quarterly 2013 1 0 2
Yearly 2013 0 0 3
The table is not in 2nd normal form, because when Frequency = Yearly Value depends on a subset of the primary key (Frequency, Year)
I've thougt about adding a surrogate key. Then Quarter and Month columns could be nullable.
Surrogate (PK) Frequency Year Quarter Month Value
1 Monthly 2013 1 1 1
2 Quarterly 2013 1 NULL 2
3 Yearly 2013 NULL NULL 3
But this doesn't solve the problem, because the 2nd normal form definition also applies to candidate keys. Dividing the table into three tables based on Frequency doesn't sound like a good idea, because it will introduce if statemments into my business logic:
if (frequency == Monthly) then select from DataMonthly

I'm going to assume that a couple of year's worth of data might look something like this. Correct me if I'm wrong. (I'm going to ignore the issue of whether using zeroes is a good idea or a bad idea.)
Frequency Year Quarter Month Value
--
Monthly 2012 1 1 1
Monthly 2012 1 2 2
Monthly 2012 1 3 3
Monthly 2012 2 4 4
Monthly 2012 2 5 5
Monthly 2012 2 6 6
Monthly 2012 3 7 7
Monthly 2012 3 8 8
Monthly 2012 3 9 9
Monthly 2012 4 10 10
Monthly 2012 4 11 11
Monthly 2012 4 12 12
Quarterly 2012 1 0 2
Quarterly 2012 2 0 5
Quarterly 2012 3 0 8
Quarterly 2012 4 0 11
Yearly 2012 0 0 3
Monthly 2013 1 1 1
Monthly 2013 1 2 2
Monthly 2013 1 3 3
Monthly 2013 2 4 4
Monthly 2013 2 5 5
Monthly 2013 2 6 6
Monthly 2013 3 7 7
Monthly 2013 3 8 8
Monthly 2013 3 9 9
Monthly 2013 4 10 10
Monthly 2013 4 11 11
Monthly 2013 4 12 12
Quarterly 2013 1 0 2
Quarterly 2013 2 0 5
Quarterly 2013 3 0 8
Quarterly 2013 4 0 11
Yearly 2013 0 0 3
From that data we can deduce two functional dependencies. A functional dependency answers the question, "Given one value for the set of attributes 'X', do we know one and only one value for the set of attributes 'Y'?"
{Year, Quarter, Month}->Frequency
{Year, Quarter, Month}->Value
Given one value for the set of attributes {Year, Quarter, Month}, we know one and only one value for the set of attributes {Frequency}. And given one value for the set of attributes {Year, Quarter, Month}, we know one and only one value for the set of attributes {Value}.
The problem you were running into involved including "Frequency" as part of the primary key. It's really not.

This table could do probably without the [Frequency] and [Quarter] column.
Why do you want to have these in? Is there any added value in having the Quarterly and Yearly values precalculated in this table? Comment: Since it's Value's are not just the sum of it's Month's.
So [Quarter] is mandatory.
This will work too:
Year (PK) Quarter (PK) Month (PK) Value
2013 1 1 1
2013 1 0 2
2013 0 0 3
Yearly results:
SELECT
[Value]
FROM [Table1]
WHERE [Year] = 2013 AND [Quarter] = 0 AND [Month] = 0
Quarterly results:
SELECT
[Value]
FROM [Table1]
WHERE [Year] = 2013 AND [Quarter] = 1 AND [Month] = 0
Monthly results:
SELECT
[Value] AS [Results]
FROM [Table1]
WHERE [Year] = 2013 AND [Quarter] = 1 AND [Month] = 1
Would this work for you?

Related

SQL query to Find highest value in table and sum the corresponding value

I would like to group Highest values in month column group by year and Sum the value column
value
Year
Month
4
2019
10
1
2019
11
5
2019
11
1
2019
11
1
2019
12
8
2019
12
1
2019
12
1
2020
1
10
2020
1
3
2021
1
2
2021
2
11
2021
2
1
2021
2
3
2021
2
2
2021
3
In above table I would like to extract highest value of month group by year
in year 2019 highest month is 12 so there are 3 rows and sum of value column will be 10
The output should be
value
Year
Month
10
2019
12
11
2020
1
2
2021
3
supposing that the table is called "example_table" you can use the following query:
select sum(example_table.value), example_table.year, example_table.month
from example_table
join (
select year, max(month) "month"
from example_table
group by year
) sub on example_table.year = sub.year and example_table.month = sub.month
group by example_table.year, example_table.month
order by example_table.year

R - get a vector that tells me if a value of another vector is the first appearence or not

I have a data frame of sales with three columns: the code of the customer, the month the customer bought that item, and the year.
A customer can buy something in september and then in december make another purchase, so appear two times. But I'm interested in knowing the absolutely new customoers by month and year.
So I have thought in make an iteration and some checks and use the %in% function and build a boolean vector that tells me if a customer is new or not and then count by month and year with SQL using this new vector.
But I'm wondering if there's a specific function or a better way to do that.
This is an example of the data I would like to have:
date cust month new_customer
1 14975 25 1 TRUE
2 14976 30 1 TRUE
3 14977 22 1 TRUE
4 14978 4 1 TRUE
5 14979 25 1 FALSE
6 14980 11 1 TRUE
7 14981 17 1 TRUE
8 14982 17 1 FALSE
9 14983 18 1 TRUE
10 14984 7 1 TRUE
11 14985 24 1 TRUE
12 14986 22 1 FALSE
So put it more simple: the data frame is sorted by date, and I'm interested in a vector (new_customer) that tells me if the customer purchased something for the first time or not. For example customer 25 bought something the first day, and then four days later bought something again, so is not a new customer. The same can be seen with customer 17 and 22.
I create dummy data my self with id, month of numeric format, and year
dat <-data.frame(
id = c(1,2,3,4,5,6,7,8,1,3,4,5,1,2,2),
month = c(1,6,7,8,2,3,4,8,11,1,10,9,1,12,2),
year = c(2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2021,2021,2021,2021,2021)
)
id month year
1 1 1 2019
2 2 6 2019
3 3 7 2019
4 4 8 2019
5 5 2 2019
6 6 3 2020
7 7 4 2020
8 8 8 2020
9 1 11 2020
10 3 1 2020
11 4 10 2021
12 5 9 2021
13 1 1 2021
14 2 12 2021
15 2 2 2021
Then, group by id and arrange by year and month (order is meaningful). Then use filter and row_number().
dat %>%
group_by(id) %>%
arrange(year, month) %>%
filter(row_number() == 1)
id month year
<dbl> <dbl> <dbl>
1 1 1 2019
2 5 2 2019
3 2 6 2019
4 3 7 2019
5 4 8 2019
6 6 3 2020
7 7 4 2020
8 8 8 2020
Sample Code
You can change in your code according to this logic:-
Create Table:-
CREATE TABLE PURCHASE(Posting_Date DATE,Customer_Id INT,Customer_Name VARCHAR(15));
Insert Data Into Table
Posting_Date Customer_Id Customer_Name
2018-01-01 C_01 Jack
2018-02-01 C_01 Jack
2018-03-01 C_01 Jack
2018-04-01 C_02 James
2019-04-01 C_01 Jack
2019-05-01 C_01 Jack
2019-05-01 C_03 Gill
2020-01-01 C_02 James
2020-01-01 C_04 Jones
Code
WITH Date_CTE (PostingDate,CustomerID,FirstYear)
AS
(
SELECT MIN(Posting_Date) as [Date],
Customer_Id,
YEAR(MIN(Posting_Date)) as [F_Purchase_Year]
FROM PURCHASE
GROUP BY Customer_Id
)
SELECT T.[ActualYear],(CASE WHEN T.[Customer Status] = 'new' THEN COUNT(T.[Customer Status]) END) AS [New Customer]
FROM (
SELECT DISTINCT YEAR(T2.Posting_Date) AS [ActualYear],
T2.Customer_Id,
(CASE WHEN T1.FirstYear = YEAR(T2.Posting_Date) THEN 'new' ELSE 'old' END) AS [Customer Status]
FROM Date_CTE AS T1
left outer join PURCHASE AS T2 ON T1.CustomerID = T2.Customer_Id
) AS T
GROUP BY T.[ActualYear],T.[Customer Status]
Final Result
ActualYear New Customer
2018 2
2019 1
2020 1
2019 NULL
2020 NULL

Determine the first occurrence of a particular customer visiting the store in a particular month

I need to determine the counts breakdown to per month (and year) of customers [alias'ed as Patient_ID] which made their first visit to a store. The date times of store visits are stored in the [MDT Review Date] column of the table.
Customers can come to the store multiple times throughout the year and increase the total count-> but what I require is ONLY the first time a customer visited.
E.g. Tom Bombadil visited the store once in January 2019, so count increased to 1, then again 4 times in March, so count should be 1 for the month of March and 0 for febraury and 1 for January, then again 4 times in October, then again 2 times in December.
I require that Tom Bombadil should be counted one and only once for a particular month, his first occurrence which was per month
The output should be like :
rn1 YEAR Month_Number Month Total_Count
1 2010 6 June 2
1 2010 7 July 1
1 2010 8 August 5
1 2010 10 October 5
1 2010 11 November 3
1 2011 1 January 4
1 2011 2 February 6
1 2011 4 April 7
1 2011 5 May 4
1 2011 6 June 10
1 2011 7 July 10
1 2011 8 August 14
1 2011 9 September 4
1 2011 10 October 8
1 2011 11 November 11
1 2011 12 December 11
1 2012 1 January 8
1 2012 2 February 21​
Please refer to my query. What I have attempts to use the windowing function COUNT to count the store visits per month. Then the ROW_NUMBER function attempts to assign a unique number to each visit. What am I doing wrong?
select
*
from
(select distinct
row_number() over (partition by p.Patient_ID, p.PAT_Forename1, p.PAT_Surname
order by PAT_Forename1, p.Patient_ID, PAT_Surname) AS rn1,
datepart(year, [DATE_COLUMN]) as YEAR,
datepart(month, [DATE_COLUMN]) as Month_Number,
datename(month,[DATE_COLUMN]) as Month,
count(p.Patient_ID) over (partition by datepart(year,[DATE_COLUMN]),
datename(month, [DATE_COLUMN])) as Total_Count
from
Tablename m
inner join
TableName p on m.PK_ID = p.PK_ID
) as temp
where
rn1 = 1​

SQL Query Return 0 on weeks in between

i have this query that works , but the result is not like i want
returns only year and weeks that has data , i want to return 0 to the result
for example this returns
year week totalstop
2017 50 7
2018 1 3
2018 3 5
but i want to return
year week totalstop
2017 50 7
2017 51 0
2017 52 0
2018 1 3
2018 2 0
2018 3 5
and so on
here is the current query
SELECT year(Stopdate)[year],datepart(week,date1) [week],sum(stop) totalstop
from Table1 where
building in (select item from dbo.fn_Split('A1,A2,A3,A4,A5',','))
and
date1 between '2017-12-12' and '2018-05-08'
and grp = 1
group by year(date1),datepart(week,date1)
order by year(date1),[week]
iam using ms sql-server 2016
need help to modify it to my needs as iam out of ideas atm.

TSQL query to filter configuration dates list

How should i form a query if i wanted dates/records from the below table such that Year is greater than or equals 2012 and Month is greater than September.
This query does not work it brings Months 2012 (7,8, 9,10,11,12) , 2013(1 upto 12) which is not right because i wanted to see 2012(9,10,11,12) 2013( 1 upto 12). It is including 7 and 8 th month of 2012 Year
select * from ConfigurationDate
where Year >= 2012 OR ( Year = 2012 AND Month >= 9 )
Order By Year,Month ASC
Table Schema
DateId INT Auto Inc
Year INT
Month INT
Dummy Data
DateId Year Month
1 2012 7
2 2012 8
3 2012 9
4 2012 10
5 2012 11
6 2012 12
7 2013 1
8 2013 2
9 2013 3
10 2013 4
11 2013 5
12 2013 6
13 2013 7
14 2013 8
15 2013 9
16 2013 10
Actually thinking of it, i don't need to include the 2012 date in First where condition again since it is covered by second condition. So the answer is below
select * from ConfigurationDate
where Year > 2012 OR ( Year = 2012 AND Month >= 9 )
Order By Year,Month ASC