R - get a vector that tells me if a value of another vector is the first appearence or not - sql

I have a data frame of sales with three columns: the code of the customer, the month the customer bought that item, and the year.
A customer can buy something in september and then in december make another purchase, so appear two times. But I'm interested in knowing the absolutely new customoers by month and year.
So I have thought in make an iteration and some checks and use the %in% function and build a boolean vector that tells me if a customer is new or not and then count by month and year with SQL using this new vector.
But I'm wondering if there's a specific function or a better way to do that.
This is an example of the data I would like to have:
date cust month new_customer
1 14975 25 1 TRUE
2 14976 30 1 TRUE
3 14977 22 1 TRUE
4 14978 4 1 TRUE
5 14979 25 1 FALSE
6 14980 11 1 TRUE
7 14981 17 1 TRUE
8 14982 17 1 FALSE
9 14983 18 1 TRUE
10 14984 7 1 TRUE
11 14985 24 1 TRUE
12 14986 22 1 FALSE
So put it more simple: the data frame is sorted by date, and I'm interested in a vector (new_customer) that tells me if the customer purchased something for the first time or not. For example customer 25 bought something the first day, and then four days later bought something again, so is not a new customer. The same can be seen with customer 17 and 22.

I create dummy data my self with id, month of numeric format, and year
dat <-data.frame(
id = c(1,2,3,4,5,6,7,8,1,3,4,5,1,2,2),
month = c(1,6,7,8,2,3,4,8,11,1,10,9,1,12,2),
year = c(2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2021,2021,2021,2021,2021)
)
id month year
1 1 1 2019
2 2 6 2019
3 3 7 2019
4 4 8 2019
5 5 2 2019
6 6 3 2020
7 7 4 2020
8 8 8 2020
9 1 11 2020
10 3 1 2020
11 4 10 2021
12 5 9 2021
13 1 1 2021
14 2 12 2021
15 2 2 2021
Then, group by id and arrange by year and month (order is meaningful). Then use filter and row_number().
dat %>%
group_by(id) %>%
arrange(year, month) %>%
filter(row_number() == 1)
id month year
<dbl> <dbl> <dbl>
1 1 1 2019
2 5 2 2019
3 2 6 2019
4 3 7 2019
5 4 8 2019
6 6 3 2020
7 7 4 2020
8 8 8 2020

Sample Code
You can change in your code according to this logic:-
Create Table:-
CREATE TABLE PURCHASE(Posting_Date DATE,Customer_Id INT,Customer_Name VARCHAR(15));
Insert Data Into Table
Posting_Date Customer_Id Customer_Name
2018-01-01 C_01 Jack
2018-02-01 C_01 Jack
2018-03-01 C_01 Jack
2018-04-01 C_02 James
2019-04-01 C_01 Jack
2019-05-01 C_01 Jack
2019-05-01 C_03 Gill
2020-01-01 C_02 James
2020-01-01 C_04 Jones
Code
WITH Date_CTE (PostingDate,CustomerID,FirstYear)
AS
(
SELECT MIN(Posting_Date) as [Date],
Customer_Id,
YEAR(MIN(Posting_Date)) as [F_Purchase_Year]
FROM PURCHASE
GROUP BY Customer_Id
)
SELECT T.[ActualYear],(CASE WHEN T.[Customer Status] = 'new' THEN COUNT(T.[Customer Status]) END) AS [New Customer]
FROM (
SELECT DISTINCT YEAR(T2.Posting_Date) AS [ActualYear],
T2.Customer_Id,
(CASE WHEN T1.FirstYear = YEAR(T2.Posting_Date) THEN 'new' ELSE 'old' END) AS [Customer Status]
FROM Date_CTE AS T1
left outer join PURCHASE AS T2 ON T1.CustomerID = T2.Customer_Id
) AS T
GROUP BY T.[ActualYear],T.[Customer Status]
Final Result
ActualYear New Customer
2018 2
2019 1
2020 1
2019 NULL
2020 NULL

Related

SQL query to Find highest value in table and sum the corresponding value

I would like to group Highest values in month column group by year and Sum the value column
value
Year
Month
4
2019
10
1
2019
11
5
2019
11
1
2019
11
1
2019
12
8
2019
12
1
2019
12
1
2020
1
10
2020
1
3
2021
1
2
2021
2
11
2021
2
1
2021
2
3
2021
2
2
2021
3
In above table I would like to extract highest value of month group by year
in year 2019 highest month is 12 so there are 3 rows and sum of value column will be 10
The output should be
value
Year
Month
10
2019
12
11
2020
1
2
2021
3
supposing that the table is called "example_table" you can use the following query:
select sum(example_table.value), example_table.year, example_table.month
from example_table
join (
select year, max(month) "month"
from example_table
group by year
) sub on example_table.year = sub.year and example_table.month = sub.month
group by example_table.year, example_table.month
order by example_table.year

Reverse track forced records relationships based on user-defined tagging

I have this table where the tagging [Tag_To] is updated by an algorithm based on Year and Period of coverage. My current task (in question) is to update the Status given the Year.
ID Year Method Period_From Period_To SeqNo Tag_To Status
-----------------------------------------------------------------------------------
10 2019 A 2019-01-01 2019-12-31 1
11 2019 B 2019-01-01 2019-06-30 2 1
12 2019 B 2019-07-01 2019-12-31 3 1
13 2019 C 2019-01-01 2019-06-30 4 2
14 2020 A 2020-01-01 2020-12-31 1
15 2020 B 2020-01-01 2020-06-30 2 1
16 2020 B 2020-07-01 2020-12-31 3 1
17 2020 C 2020-01-01 2020-12-31 4 2,3
18 2021 A 2021-01-01 2021-12-31 1
19 2021 B 2021-01-01 2021-12-31 2 1
20 2021 C 2021-07-01 2021-12-31 3 2
The SeqNo is applied per Year and the Tag_To is done based on period of coverage.
11 and 12 are tagged to 10 since B follows A and their period falls within 10 period coverage.
13 is tagged to 11 since C follows B and the period...
15 and 16 to 14
Also note that 17 is tagged to 15 and 16 (2,3) because 17's coverage spans across the 2 periods of 15 and 16 combined
and so on...
The objective is to update the Status by Year such that each path is considered Closed if the path already has Methods A, B and C (there are actually more methods, but to simplify). Status should be Open for paths that haven't completed the methods.
From the example above, there are 5 paths:
10(A)-->11(B)-->13(C) = Closed
10(A)-->12(B)-->??? = Open
14(A)-->15(B)-->17(C) = Closed
14(A)-->16(B)-->17(C) = Closed
18(A)-->19(B)-->20(C) = Closed
Therefore the status update should be:
ID Year Method Period_From Period_To SeqNo Tag_To Status
-----------------------------------------------------------------------------------
10 2019 A 2019-01-01 2019-12-31 1 Open
11 2019 B 2019-01-01 2019-06-30 2 1 Closed
12 2019 B 2019-07-01 2019-12-31 3 1 Open
13 2019 C 2019-01-01 2019-06-30 4 2 Closed
14 2020 A 2020-01-01 2020-12-31 1 Closed
15 2020 B 2020-01-01 2020-06-30 2 1 Closed
16 2020 B 2020-07-01 2020-12-31 3 1 Closed
17 2020 C 2020-01-01 2020-12-31 4 2,3 Closed
18 2021 A 2021-01-01 2021-12-31 1 Closed
19 2021 B 2021-01-01 2021-12-31 2 1 Closed
20 2021 C 2021-07-01 2021-12-31 3 2 Closed
I hope I have explained everything clearly. Would really appreciate if anyone could help.
Just to update viewers that I have managed to solve this on my own although the solution is super non-dynamic and quite inefficient, it pretty much did the job for me. Here's what I did.
UPDATE Table SET
Status =
CASE WHEN Method = 'B'
AND NOT EXISTS ( SELECT * FROM Table P INNER JOIN
(
SELECT VALUE AS Tag_To
FROM Table AV
CROSS APPLY STRING_SPLIT(AV.Tag_To, ',')
WHERE AV.Method = 'C'
) C ON P.Sequence_No = C.Tag_To
WHERE P.ID = AValue.ID
)
THEN 'Open'
WHEN Method = 'A'
AND NOT EXISTS ( SELECT * FROM Table P INNER JOIN
(
SELECT VALUE AS Tag_To
FROM Table AV
CROSS APPLY STRING_SPLIT(AV.Tag_To, ',')
WHERE AV.Method = 'B'
) C ON P.Sequence_No = C.Tag_To
WHERE P.ID = AValue.ID
)
THEN 'Open'
ELSE 'Closed'
END
FROM Table AValue
WHERE Year = #Year
;WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY A.Method ORDER BY A.Sequence_No ASC) SN,
A.ID,
A.Method,
A.Sequence_No,
A.Tag_To,
A.Period_From,
A.Period_To,
A.Status
FROM Table A
LEFT JOIN
(
SELECT VALUE AS Tag_To
FROM Table AV
CROSS APPLY STRING_SPLIT(AV.Tag_To, ',')
WHERE Year = #Year
) B ON A.Sequence_No = B.Tag_To
WHERE Year = #Year
),
CTE2 AS
(
SELECT DISTINCT SN FROM CTE
WHERE Status = 'Open'
)
UPDATE Table SET
Status = 'Open'
FROM Table
INNER JOIN CTE ON Table.ID = CTE.ID
INNER JOIN CTE2 ON CTE.SN = CTE2.SN
Yeah, it's ugly but, hey, it did the job! :)

Computing weekly percent on grouped by table

Good morning everybody.
I work on the last version of SQL Workbench.
I got a table group by Year and Week, and a type of document verification (3 at total).
Year
Week
Verif_Type
Total
2020
1
1
3
2020
1
3
1
2020
2
1
1
2020
2
2
6
2020
2
3
3
I want to know how many percent of each verification type are performed by week and year.
My question is there : How can I print the percent column next such that
Year
Week
Verif_Type
Total
Percent
2020
1
1
3
75
2020
1
3
1
25
2020
2
1
1
10
2020
2
2
6
60
2020
2
3
3
30
I have already computed total count per week but the table have different sizes, so I can't use operations with it.
Thank you for your help :)
For MySql version 8.0 use Query#1 using windows function and for older version of MySQL use Query#2 (using subquery).
Schema and insert statements:
create table yourtable(Year int, Week int, Verif_Type int, Total int);
insert into yourtable values(2020, 1, 1, 3);
insert into yourtable values(2020, 1, 3, 1);
insert into yourtable values(2020, 2, 1, 1);
insert into yourtable values(2020, 2, 2, 6);
insert into yourtable values(2020, 2, 3, 3);
Query#1 (using window function)
Select year, week, verif_type, total, (total*100/sum(total)over(partition by year, week)) percent
From yourtable
Output:
year
week
verif_type
total
percent
2020
1
1
3
75.0000
2020
1
3
1
25.0000
2020
2
1
1
10.0000
2020
2
2
6
60.0000
2020
2
3
3
30.0000
Query#2 (using subquery)
select year, week, verif_type, total, (total*100/(select sum(total) from yourtable b where a.year=b.year and a.Week=b.Week)) percent
From yourtable a
Output:
year
week
verif_type
total
percent
2020
1
1
3
75.0000
2020
1
3
1
25.0000
2020
2
1
1
10.0000
2020
2
2
6
60.0000
2020
2
3
3
30.0000
db<fiddle here

Compare data from for specific column grouping and Update based on criteria

I have a table with the following structure:
Employee Project Task Accomplishment Score Year
John A 1 5 60 2016
John A 1 6 40 2018
John A 2 3 30 2016
Simon B 2 0 30 2017
Simon B 2 4 30 2019
David C 1 3 20 2015
David C 1 2 40 2016
David C 3 0 25 2017
David C 3 5 35 2017
I want to create a view with Oracle SQLout of the above table which looks like as follows:
Employee Project Task Accomplishment Score Year UpdateScore Comment
John A 1 5 60 2016 60
John A 1 6 40 2018 100 (=60+40)
John A 2 3 30 2016 30
Simon B 2 0 30 2017 30
Simon B 2 4 40 2019 40 (no update because Accomplishement was 0)
David C 1 3 20 2015 20
David C 1 2 40 2016 60 (=20+40)
David C 3 0 25 2017 25
David C 3 5 35 2017 35 (no update because Accomplishement was 0)
The Grouping is: Employee-Project-Task.
The Rule of the UpdateScore column:
If for a specific Employee-Project-Task group Accomplishment column value is greater than 0 for the previous year, add the previous year's score to the latest year for the same Employee-Project-Task group.
For example: John-A-1 is a group which is different from John-A-2. So as we can see for John-A-1 the Accomplishment is 5 (which is greater than 0) in 2016, so we add the Score from 2016 with the score of 2018 for the John-A-1 and the updated score becomes 100.
For Simon-B-2, the accomplishment was 0, so there will be no update for 2019 for Simon-B-2.
Note: I don't need the Comment field, it is there just for more clarification.
Use analytic functions to determine if there was a score for the previous year, and if so, add it to the UpdatedScore.
select Employee, Project, Task, Accomplishment, Score, Year,
case when lag(Year) over (partition by Employee, Project order by Year) = Year - 1
then lag(Score) over (partition by Employee, Project order by Year)
else 0
end + Score as UpdatedScore
from EmployeeScore;
This is a bit strange -- you are counting the accomplishment of 0 in one year but not the next. Okay.
Use analytic functions:
select t.*,
(case when lag(accomplishment) over (partition by Employee, Project, Task order by year) > 0
then lag(score) over (partition by Employee, Project, Task order by year)
else 0
end) + score as update_score
from t;
from t

Custom SQL for quarter count starting from previous month

I need to create a custom quarter calculator to start always from previous month no matter month, year we are at and count back to get quarter. Previous year wuarters are to be numbered 5, 6 etc
So the goal is to move quarter grouping one month back.
Assume we run query on December 11th, result should be:
YEAR MNTH QTR QTR_ALT
2017 1 1 12
2017 2 1 12
2017 3 1 11
2017 4 2 11
2017 5 2 11
2017 6 2 10
2017 7 3 10
2017 8 3 10
2017 9 3 9
2017 10 4 9
2017 11 4 9
2017 12 4 8
2018 1 1 8
2018 2 1 8
2018 3 1 7
2018 4 2 7
2018 5 2 7
2018 6 2 6
2018 7 3 6
2018 8 3 6
2018 9 3 5
2018 10 4 5
2018 11 4 5
2018 12 4 1
2019 1 1 1
2019 2 1 1
2019 3 1 2
2019 4 2 2
2019 5 2 2
2019 6 2 3
2019 7 3 3
2019 8 3 3
2019 9 3 4
2019 10 4 4
2019 11 4 4
2019 12 4 THIS IS SKIPPED
Starting point is eliminating current_date so data end at previous month's last day
SELECT DISTINCT
YEAR,
MNTH,
QTR
FROM TABLE
WHERE DATA BETWEEN
(SELECT DATE_TRUNC(YEAR,ADD_MONTHS(CURRENT_DATE, -24))) AND
(SELECT DATE_TRUNC(MONTH,CURRENT_DATE)-1)
ORDER BY YEAR, MNTH, QTR
The following gets you all the dates you need, with the extra columns.
select to_char(add_months(a.dt, -b.y), 'YYYY') as year,
to_char(add_months(a.dt, -b.y), 'MM') as month,
ceil(to_number(to_char(add_months(a.dt, -b.y), 'MM')) / 3) as qtr,
ceil(b.y/3) as alt_qtr
from
(select trunc(sysdate, 'MONTH') as dt from dual) a,
(select rownum as y from dual connect by level <= 24) b;