I have a large set of imperfect data, from this data I reverse engineering a table for the coding used.
For this particular task, it is know that all records with a specific division code should all have the same group ID and plan ID (which are not included in the data) from another source I been able to add a close but imperfect (and incomplete) mapping of the group ID and plan ID. Now I want to work backwards and build a division mapping table. I have gotten data down to a format like this:
Division Year Group Plan Cnt
52 2019 30 101 9031
52 2020 30 101 9562
54 2019 60 602 3510
54 2020 60 602 3385
56 2019 76 904 1113
56 2020 76 905 1125
56 2020 76 001 6
The Division and Year columns should from a primary key. As you can see 56, 2020 is not unique, but by looking at the cnt column it is easy to see that the record with a count of 6 is a bad record and should be dropped.
What I need is a method to return each division and year pair once with the group and plan IDs that have the largest count.
Thank You
I found the answer using the Rank() function and WHERE clause:
SELECT *
FROM (
SELECT Division, Year, Group, Plan_Cd
, RANK() OVER (PARTITION BY Division, Year ORDER BY Cnt DESC ) AS 'rk'
FROM DivisionMap ) R
WHERE rk = 1
Related
I have the below table
substring(area,6,3)
qty
101
10
103
15
102
11
104
30
105
25
107
17
108
23
106
48
And I am looking to get a result as below without repeating the IIF ( as it's a cumulative of 4 sequences) in the area:
new_area(substring(area,6,3)
sum_qty
101-104
66
105-108
117
I don't know how to create the new area column to be able to get the sum qty
Looking forward to your help.
Please also add an explanation so I will understand how the query is running.
I think this is what you are looking for.
We just use the window function row_number() to create the Grp
NOTE: If you have repeating values in AREA use dense_rank() instead of row_number()
Example
Select new_area = concat(min(area),'-',max(area))
,qty = sum(qty)
From (
Select area=substring(area,6,3)
,qty
,Grp = (row_number() over (order by substring(area,6,3))-1) / 4
From YourTable
) A
Group By Grp
Results
new_area qty
101-104 66
105-108 113 -- get different results
If you were to run the subquery, you would see the following.
Then it becomes a small matter to aggregate the data grouped by the created column GRP
Tables - Store
Stores
Date
Customer_ID
A
01/01/2020
1111
C
01/01/2020
1111
F
02/01/2020
1234
A
02/01/2020
1111
A
02/01/2020
2222
Tables - Customer
Customer_ID
Age_Group
Income_Level
1111
26-30
Low
1234
25 and below
Mid
2222
31-60
High
I want to know how I can get this output.
Stores
Age_Group
Percentage_by_Age
Income_Level
Percentage_By_Income
A
25 and below
10
Low
80
A
25 and below
10
Mid
10
A
25 and below
10
High
10
A
26 - 30
42
Low
15
A
26 - 30
42
Mid
65
A
26 - 30
42
High
20
A
31 - 60
48
Low
30
A
31 - 60
48
Mid
50
A
31 - 60
48
High
20
I am using SQL to query from different tables.
First I need to aggregate the number of customers by stores, then in each store, I want to find out how many customers visited Store A in a particular age group(25 and below), and how many of them are in which income level.
May I know how I can go about solving this query?
Thanks.
My current solution/thought process
SELECT
stores AS Stores,
Age_Group AS Age,
Income_Level AS Income
COUNT(DISTINCT(Customer_ID)) AS Number_of_Customers
FROM tables JOIN tables....
GROUP BY Stores, Ages, Income;
And then manually calculating the percentages.
But it doesn't seem right.
Is there a way to produce an example output table using just SQL?
As per your requirement, Common Table Expressions can be used . You can use below code to get the expected output.
WITH
data_for_percent_by_income AS (
SELECT
COUNT(customer_id) AS cus_count_in_per_income_level_and_agegrp,
Age_group AS age_g,income_level AS inc_lvl
FROM
`project.dataset.Customer2`
WHERE
customer_id IN (
SELECT customer_id
FROM
`project.dataset.Store5`
WHERE stores='A')
GROUP BY
Age_group,income_level),tot_cus_in_defined_income_level AS (
SELECT
COUNT(customer_id) AS cus_count_in_per_income_level,Age_group AS ag
FROM
`project.dataset.Customer2`
WHERE
customer_id IN (
SELECT
customer_id
FROM
`project.dataset.Store5`
WHERE stores='A')
GROUP BY
Age_group),
tot_cus_storeA AS(
SELECT
COUNT(*) AS tot_cus_in_A
FROM
`project.dataset.Customer2`
WHERE customer_id IN (
SELECT customer_id
FROM
`project.dataset.Store5`
WHERE stores='A') ),
final_view AS(
SELECT
ROUND(cus_count_in_per_income_level_and_agegrp*100/cus_count_in_per_income_level) AS p_by_inc,
age_g,inc_lvl
FROM
data_for_percent_by_income
INNER JOIN
tot_cus_in_defined_income_level
ON
data_for_percent_by_income.age_g=tot_cus_in_defined_income_level.ag )
SELECT
stores,tot_cus_in_defined_income_level.ag AS age_group,income_level,
ROUND(cus_count_in_per_income_level*100/tot_cus_in_A) AS percentage_by_age,
p_by_inc AS percentage_by_income
FROM
tot_cus_in_defined_income_level,tot_cus_storeA,`project.dataset.Customer2`,`project.dataset.Store5`
INNER JOIN
final_view
ON
age_group=final_view.age_g AND income_level=final_view.inc_lvl
WHERE
tot_cus_in_defined_income_level.ag = Age_group AND stores='A'
GROUP BY
stores,percentage_by_age,age_group,income_level,percentage_by_income
ORDER BY Age_group
I have attached the screenshots of the input table and output table.
Customer Table
Store Table
Output Table
SELECT
s.Stores AS Stores,
c.age_group AS Age,
a.income_level AS Affluence,
CAST(COUNT(DISTINCT c.Customer_ID) AS numeric)*100/SUM(CAST(COUNT(DISTINCT c.Customer_ID) AS numeric)) OVER(PARTITION BY s.Stores ) AS Perc_of_Members
This is what I did in the end.
I'm trying to get a running total within a group but my current code just gives me an aggregate sum.
For example, my data looks like this
ID ShiftNum Status Type Rate HourlyWage Hours Total_Amount
12542 1 Full A 1 12.5 40 500
12542 1 Full A 1 12.5 35 420
12542 2 Full A 1 10 40 400
12542 2 Full B 1.2 10 40 480
17842 1 Full A 1 11 27 297
17842 1 Full B 1.3 11 30 429
And what I want is a running total within the same ID, Shift Number, and Status. For example, I want something like this as my final result
ID ShiftNum Status Type Rate HourlyWage Hours Total_Amount Running_Tot
12542 1 Full A 1 12.5 40 500 500
12542 1 Full A 1 12.5 35 420 920
12542 2 Full A 1 10 40 400 400
12542 2 Full B 1.2 10 40 480 880
17842 1 Full A 1 11 27 297 297
17842 1 Full B 1.3 11 30 429 726
However, my current code just gives me the total sum within each group. For example, 920, 920 for row 1&2. Here's my code.
Select a.*,
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status ORDER BY ID, ShiftNum, Status) as Runnint_Tot
from table a
How do I fix my code to get the final result I want?
You need an ordering column that uniquely defines each row. There is not an obvious one in your row, but something like this:
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status ORDER BY hours) as Running_Tot
Or:
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status
ORDER BY (SELECT NULL)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as Running_Tot
The problem you are facing is because the ORDER BY keys have ties. The default window frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Note the RANGE. That means that all rows with ties are combined.
Also note that there is no utility to including the PARTITION BY keys in the ORDER BY (well . . . there is one exception in SQL Server if you don't care about the ordering, then including a key can be a handy short-cut). The ordering occurs within a partition.
If your rows can have exact duplicates, I would first suggest that you add a primary key. But, in the meantime, you could use:
with a as (
select a.*,
row_number() over (order by id, shiftnum, status) as seqnum
from tablea a
)
Select a.*,
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status ORDER BY seqnum) as Running_Tot
from a;
The ordering will be arbitrary, but it will at least accumulate.
I have a requirement to calculate rolling compound interest on several accounts in pl/sql. I was looking for help/advice on how to script calculate these calculations. The calculations I need are in the last two columns of the output below (INTERESTAMOUNT AND RUNNING TOTAL). I found similar examples of this on here, but nothing specifically fitting these requirements in pl/sql. I am also new to CTE/Recursive Techniques and the Model technique I found required a specific iteration which would be variable in this case. Please see my problem below:
Calculations:
INTERESTAMOUNT = (Previous Year RUNNING TOTAL+ Current Year AMOUNT) * INTEREST_RATE
RUNNINGTOTAL = (Previous Year RUNNING TOTAL+ Current Year AMOUNT) * (1 + INTEREST_RATE) - CURRENT YEAR EXPENSES
Input Table:
YEAR ACCT_ID AMOUNT INTEREST_RATE EXPENSES
2002 1 1000 0.05315 70
2003 1 1500 0.04213 80
2004 1 800 0.03215 75
2005 1 950 0.02563 78
2000 2 750 0.07532 79
2001 2 600 0.06251 75
2002 2 300 0.05315 70
Desired Output:
YEAR ACCT_ID AMOUNT INTEREST_RATE EXPENSES INTERESTAMOUNT RUNNINGTOTAL
2002 1 1000 0.05315 70 53.15 983.15
2003 1 1500 0.04213 80 104.62 2507.77
2004 1 800 0.03215 75 106.34 3339.11
2005 1 950 0.02563 78 109.93 4321.04
2000 2 750 0.07532 79 56.49 727.49
2001 2 600 0.06251 75 82.98 1335.47
2002 2 300 0.05315 70 86.93 1652.4
One way to do it is with a recursive cte.
with rownums as (select t.*
,row_number() over(partition by acct_id order by yr) as rn
from t) -- t is your tablename
,cte(rn,yr,acct_id,amount,interest_rate,expenses,running_total,interest_amount) as
(select rn,yr,acct_id,amount,interest_rate,expenses
,(amount*(1+interest_rate))-expenses
,amount*interest_rate
from rownums
where rn=1
union all
select t.rn,t.yr,t.acct_id,t.amount,t.interest_rate,t.expenses
,((c.running_total+t.amount)*(1+t.interest_rate))-t.expenses
,(c.running_total+t.amount)*t.interest_rate
from cte c
join rownums t on t.acct_id=c.acct_id and t.rn=c.rn+1
)
select * from cte
Sample Demo
Generate row numbers using row_number function
Calculate the interest and running total of the first row for each acct_id (anchor in the recursive cte).
Join every row to the next one (ordered by ascending order of year column) for each account_id and compute the running total and interest for the subsequent rows.
Some help please? Just a noob here starting to learn how to write SQL and ran into this problem. I know how to use the MAX function but I can't figure out how to join all these requirements together. I have two tables, Accounts and Books (below is an example of the data)
Accounts
ID Series YesorNot Dated Filed Plan Year
1 123 Yes 06/12/2015 2015
2 123 No 06/12/2015 2015
3 145 Yes 06/06/2015 2015
4 145 No 02/02/2015 2014
5 198 Yes 02/03/2015 2015
6 187 Yes 02/14/2013 2013
7 153 Yes 01/02/2011 2011
Books
Primary Key Date Created ID
1 06/13/2015 123
2 06/12/2015 123
3 06/07/2015 145
4 02/02/2015 145
5 02/03/2015 198
Two tables: Accounts and Books
Looking for:
1. Data that exists in both tables by the Project ID = Primary Key
2. I only want one unqiue Series (Series also = ID)
3. I want the MAX (most recent) value of Plan Year, and then if there are duplicates for Plan Year, I need the MAX (most recent) value of Date Created.
4. I just need the columns Project ID, Series, YesorNot, Date Filed, Plan Year so my output should be like this:
Project ID Series YesorNot Dated Filed Plan Year
1 123 Yes 06/12/2015 2015
3 145 Yes 06/06/2015 2015
4 145 No 02/02/2015 2014
5 198 Yes 02/03/2015 2015
First join the tables:
SELECT B.Primary_Key as Project_ID, A.Series, A.YesorNot, A.Date_Filed, A.Plan_Year
FROM Books B
JOIN Accounts A ON B.ID = A.Series
You should have been able to get this far on your own (and you should have posted it as part of the question) -- if you can't I'd say find a different career. Assuming you could now the slightly harder part.
Now we add a row number based on your criteria
ROW_NUMBER() PARTITION BY (B.Primary_Key, A.Series, A.YesorNot, A.Date_Filed ORDER BY A.Date_Year DESC, B.Date_Created DESC) AS RN
Now just take the first of the row number.
SELECT Project_ID, Series, YesorNot, Date_Filed, Plan_Year
FROM (
SELECT B.Primary_Key as Project_ID, A.Series, A.YesorNot, A.Date_Filed, A.Plan_Year,
ROW_NUMBER() PARTITION BY (B.Primary_Key, A.Series, A.YesorNot, A.Date_Filed ORDER BY A.Date_Year DESC, B.Date_Created DESC) AS RN
FROM Books B
JOIN Accounts A ON B.ID = A.Series
) X
WHERE RN = 1