SQLite: Generate rows for missing years within range (system year + 9) - sql

I have an SQLite 3.38.2 table that has rows for certain years:
with data (year_, amount) as (
values
(2024, 100),
(2025, 200),
(2025, 300),
(2026, 400),
(2027, 500),
(2028, 600),
(2028, 700),
(2028, 800),
(2029, 900),
(2031, 100)
)
select * from data
YEAR_
AMOUNT
2024
100
2025
200
2025
300
2026
400
2027
500
2028
600
2028
700
2028
800
2029
900
2031
100
Here's the Demo.
I want at least one row for each year within this range: system year + 9. In other words, I want rows for 10 years, starting with the current year (currently 2023).
However, I'm missing rows for certain years: 2023, 2030, and 2032. So I want to generate filler rows for those missing years. The amount for the filler rows would be null.
It would look like this:
YEAR_
AMOUNT
2023
--filler
2024
100
2025
200
2025
300
2026
400
2027
500
2028
600
2028
700
2028
800
2029
900
2030
--filler
2031
100
2032
--filler
In an SQLite query, how can I select the rows and generate filler rows within the 10 year range?
Edit: I would prefer not to manually create a list of years in the query or in a table. I would rather create a dynamic range within the query.

You may use a calendar table approach, which will require keeping a table containing all years which you want to appear in your report. Left join this calendar table to your current table to get the result you want.
WITH years AS (
SELECT 2023 AS YEAR_ UNION ALL
SELECT 2024 UNION ALL
...
SELECT 2040
)
SELECT y.YEAR_, d.AMOUNT
FROM years y
LEFT JOIN data d
ON d.YEAR_ = y.YEAR_
WHERE d.YEAR_ BETWEEN SYSYEAR AND SYSYEAR + 9
ORDER BY d.YEAR_;
Note that in practice, rather than using the above CTE years, you may create a bona fide table containing the next hundred years in it.

Yet another option is using a recursive approach to generate your N dates:
WITH RECURSIVE cte AS (
SELECT 2023 AS year_
UNION ALL
SELECT year_ + 1
FROM cte
WHERE year_ < 2023 + (10) -1
)
SELECT cte.year_, data.amount
FROM cte
LEFT JOIN data
ON cte.year_ = data.year_
Check the demo here.

Related

Get cumulative sum that reset for each year

Please consider this table:
Year Month Value YearMonth
2011 1 70 201101
2011 1 100 201101
2011 2 200 201102
2011 2 50 201102
2011 3 80 201103
2011 3 250 201103
2012 1 100 201201
2012 2 200 201202
2012 3 250 201203
I want to get a cumulative sum based on each year. For the above table I want to get this result:
Year Month Sum
-----------------------
2011 1 170
2011 2 420 <--- 250 + 170
2011 3 750 <--- 330 + 250 + 170
2012 1 100
2012 2 300 <--- 200 + 100
2012 3 550 <--- 250 + 200 + 100
I wrote this code:
Select c1.YearMonth, Sum(c2.Value) CumulativeSumValue
From #Tbl c1, #Tbl c2
Where c1.YearMonth >= c2.YearMonth
Group By c1.YearMonth
Order By c1.YearMonth Asc
But its CumulativeSumValue is calculated twice for each YearMonth:
YearMonth CumulativeSumValue
201101 340 <--- 170 * 2
201102 840 <--- 420 * 2
201103 1500
201201 850
201202 1050
201203 1300
How can I achieve my desired result?
I wrote this query:
select year, (Sum (aa.[Value]) Over (partition by aa.Year Order By aa.Month)) as 'Cumulative Sum'
from #Tbl aa
But it returned multiple records for 2011:
Year Cumulative Sum
2011 170
2011 170
2011 420
2011 420
2011 750
2011 750
2012 100
2012 300
2012 550
You are creating a cartesian product here. In your ANSI-89 implicit JOIN (you really need to stop using those and switch to ANSI-92 syntax) you are joining on c1.YearMonth >= c2.YearMonth.
For your first month you have two rows with the same value of the year and month, so each of those 2 rows joins to the other 2; this results in 4 rows:
Year
Month
Value1
Value2
2011
1
70
70
2011
1
70
100
2011
1
100
70
2011
1
100
100
When you SUM this value you get 340, not 170, as you have 70+70+100+100.
Instead of a triangular JOIN however, you should be using a windowed SUM. As you want to also get aggregate nmonths into a single rows, you'll need to also aggregate inside the windowed SUM like so:
SELECT V.YearMonth,
SUM(SUM(V.Value)) OVER (PARTITION BY Year ORDER BY V.YearMonth) AS CumulativeSum
FROM (VALUES (2011, 1, 70, 201101),
(2011, 1, 100, 201101),
(2011, 2, 200, 201102),
(2011, 2, 50, 201102),
(2011, 3, 80, 201103),
(2011, 3, 250, 201103),
(2012, 1, 100, 201201),
(2012, 2, 200, 201202),
(2012, 3, 250, 201203)) V (Year, Month, Value, YearMonth)
GROUP BY V.YearMonth,
V.Year;

Pivot table with Pandas

I have a small issue trying to do a simple pivot with pandas. I have on one column some values that are entered more than once with a different value in a second column and a year on a third column. What i want to do is get a sum of the second column for the year, using as rows the values on the first column.
import pandas as pd
year = 2022
base = pd.read_csv("Database.csv")
raw_monthly = pd.read_csv("Monthly.csv")
raw_types = pd.read_csv("types.csv")
monthly = raw_monthly.assign(Year= year)
ty= raw_types[['cparty', 'sales']]
typ= sec.rename(columns={"Sales": "sales"})
type= typ.assign(Year=year)
fin = pd.concat([base, monthly, type])
fin.drop(fin.tail(1).index,inplace=True)
currentYear = fin.loc[fin['Year'] == 2022]
final = pd.pivot_table(currentYear, index=['cparty', 'sales'], values='sales', aggfunc='sum')
With the above, I am getting this result, but what i want is to have
the 2 sales values of '3' for 2022 summed in a single value so later i can also break it down by year. Any help appreciated!
Edit: The issue seems to come from the fact that the 3 csvs are concatenated into a single dataframe. Doing the 3->1 CSV conversion manually in excel and then trying to use the Groupby answer works as intended, but it does not work if i try to automatically make the 3 CVS to 1 using the
fin = pd.concat([base, monthly, type])
The 3 csvs look like this.
Base looks like this:
cparty sales year
0 50969 -146602.14 2016
1 51056 -104626.62 2016
2 51129 -101742.99 2016
3 51036 -81801.84 2016
4 51649 -35992.60 2016
monthly looks like this, missing the year
cparty sales
0 818243 -330,052.47
1 82827 -178,630.85
2 508637 -156,369.87
3 29253 -104,028.30
4 596037 -95,312.07
type is like this.
cparty sales
0 582454 -16,056.46
1 597321 24,336.16
2 567172 20,736.78
3 614070 18,590.45
4 5601295 -3,661.46
What i am attempting to do is add a new column for the last 2 to have the Year set as 2022, so that later i can do the groupby per year. When i try to concat the 3 csvs, it breaks down.
Suppose cparty is a categorical metric
# create sales and retail dataframes with year
df = pd.DataFrame({
'year':[2022, 2022, 2018, 2019, 2020, 2021, 2022, 2022, 2022, 2021, 2019, 2018],
'cparty':['cparty1', 'cparty1', 'cparty1', 'cparty2', 'cparty2', 'cparty2', 'cparty2', 'cparty3', 'cparty4', 'cparty4', 'cparty4', 'cparty4'],
'sales':[230, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100]
})
df
###
year cparty sales
0 2022 cparty1 230
1 2022 cparty1 100
2 2018 cparty1 200
3 2019 cparty2 300
4 2020 cparty2 400
5 2021 cparty2 500
6 2022 cparty2 600
7 2022 cparty3 700
8 2022 cparty4 800
9 2021 cparty4 900
10 2019 cparty4 1000
11 2018 cparty4 1100
output = df.groupby(['year','cparty']).sum()
output
###
sales
year cparty
2018 cparty1 200
cparty4 1100
2019 cparty2 300
cparty4 1000
2020 cparty2 400
2021 cparty2 500
cparty4 900
2022 cparty1 330
cparty2 600
cparty3 700
cparty4 800
Filter by year
final = output.query('year == 2022')
final
###
sales
year cparty
2022 cparty1 330
cparty2 600
cparty3 700
cparty4 800
Have figured out the issue.
result = res.groupby(['Year', 'cparty']).sum()
output = result.query('Year == 2022')
output
##
sales
Year cparty
2022 3 -20409.04
4 12064.34
5 9656.64
8081 51588.55
8099 5625.22
... ...
Baron's groupby method was the way to go. The issue is that it only works if I have all the data in 1 csv from the beginning. I was trying to add the year manually for the 2 new csv that i concat to the base, setting Year = 2022. The errors come when i concat the 3 different CSVs. If i don't add the year = 2022 it works giving this:
cparty sales Year
87174 3 -3.89 2022.0
27 3 -20,405.15 NaN
If i do .fillna(2022) then it won't work as expected.
C:\Users\user\AppData\Local\Temp/ipykernel_14252/1015456002.py:32: FutureWarning: Dropping invalid columns in DataFrameGroupBy.add is deprecated. In a future version, a TypeError will be raised. Before calling .add, select only columns which should be valid for the function.
result = fin.groupby(['Year', 'cparty']).sum()
cparty sales Year
87174 3 -3.89 2022.0
27 3 -20,405.15 2022.0
Adding the year but not doing the sum to have 'cparty' 3, 'sales' -20,409.04, Year 2022.
Any feedback appreciated.

How to compare same period of different date ranges in columns in BigQuery standard SQL

i have a hard time figuring out how to compare the same period (e.g. iso week 48) from different years for a certain metric in different columns. I am new to SQL and haven't fully understand how PARTITION BY works but guess that i'll need it for my desired output.
How can i sum the data from column "metric" and compare same periods of different date ranges (e.g. YEAR) in a table?
current table
date iso_week iso_year metric
2021-12-01 48 2021 1000
2021-11-30 48 2021 850
...
2020-11-28 48 2020 800
2020-11-27 48 2020 950
...
2019-11-27 48 2019 700
2019-11-26 48 2019 820
desired output
iso_week metric_thisYear metric_prevYear metric_prev2Year
48 1850 1750 1520
...
Consider below simple approach
select * from (
select * except(date)
from your_table
)
pivot (sum(metric) as metric for iso_year in (2021, 2020, 2019))
if applied to sample data in your question - output is

SQL Query to pull the avg values for 1day gap dob's of clients

i have a requirement with a below table.
conditions:-
1> i have to take the avg of salaries clints, who has 1day date of birth gap.
2> if there are no nearest 1day dob's gap between the gap between the clients, then no need to take that client into consideration.
please see the results.
Table:
ClientID ClinetDOB's Slaries
1 2012-03-14 300
2 2012-04-11 400
3 2012-05-09 200
4 2012-06-06 400
5 2012-07-30 600
6 2012-08-14 1200
7 2012-08-15 1800
8 2012-08-17 1200
9 2012-08-20 2400
10 2012-08-21 1500
Result Should looks LIKE this:-
ClientID ClinetDOB's AVG(Slaries)
7 2012-08-15 1500 --This avg of 1200,1800(because clientID's 6,7 have dob's have 1day gap)
10 2012-08-20 1950 --This avg of 2400,1500(because clientID's 9,10 have dob's have 1day gap))
Please help.
Thank You In advance!
A self-join will connect current record with all records having yesterday's date. In this context group by allows many records having the same date to be counted. t1 needs to be accounted for separately, so the Salary is added afterwards, and count(*) is incremented to calculate average.
Here is Sql Fiddle with example.
select t1.ClientID,
t1.ClinetDOBs,
(t1.Slaries + sum (t2.Slaries)) / (count (*) + 1) Avg_Slaries
from table1 t1
inner join table1 t2
on t1.ClinetDOBs = dateadd(day, 1, t2.ClinetDOBs)
group by t1.ClientID,
t1.ClinetDOBs,
t1.Slaries

how to calculate % in PPS 2010

i have these columns in the table and made this table as the FACT table and also using time intelligence filter in the PPS2010..
i have measures , sum (materials), sum (sales) and sum (material_%)
in the PPS dashboard design i have included this cube and all the measures.. and using an analytic chart..
i have developed separate graphs for each columns (material, sales, material_%)..
for the sales and materials there is no problem , when i use the time filter
in the material_% graph i used the time filter current quarter in months (showing three months ) shows the correct value..
when i use the current quarter filter (sum of all the 3 months)
its showing 146% (83 +33 +30) --> for actual values
and 150 % ( 50+50+50) --> for target values
actually it showed show me 46% for actual and 50% for target ,
it should be sum of material in all the 3 months / sum of sales in all the 3 months but its just calculating sum of material_% column of all the 3 months
time filter : year :: Halfyear ::quarter:: Month::Day
DataBase Table:
Month Year Material sales Material_% [ material / sales]
Jan_Act 2011 500 600 83
Jan_target 2011 400 800 50
Feb_Act 2011 300 900 33
Feb_target 2011 300 600 50
Mar_Act 2011 300 900 30
Mar_target 2011 300 600 50
......
Jan_Act 2012 0 0 0
Jan_target 2012 600 1000 60
.............
Dec_Act 2012 0 0 0
Dec_target 2012 600 800 75
MDX Query:
SELECT
HIERARCHIZE( { [Time_dim].[Year - Half Year - Quarter - Month - Date].DEFAULTMEMBER } )ON COLUMNS,
HIERARCHIZE( { [Ven Bi Actfctmaster].[Act Fct].&[ACTUAL], [Ven Bi Actfctmaster].[Act Fct].&[TARGET] } )ON ROWS
FROM [Vin Finance]
WHERE ( [Measures].[Materials - Ven Bifullrptmaster] )
Please help me to sort out this issue.
i solved this issue by changing the measure of '%' columns from sum to averageofchild in the property tab..