How to calculate sum of different table which separated by years and what is the best data model to support ? Power BI - sum

Desire Result
Choose year from Slicer , return sum result of Scorecard .
Problems
No single measure can be putted in scorecard and to apply for calculating all tables while year is selected by slicer .
e.g.
sum(table20[Score])
sum(table21[Score])
sum(table22[Score])
Have to avoid append table as each tables of a single year contains over millions of rows.
All tables are imported from power bi dataflow without using Direct Query.
Data
table20
YEAR GEO Type Score
2020 Asia A 1
2020 Africa A 2.5
2020 CentralAfrica A 0
2020 Europe A 0
2020 MiddleEast A 0
2020 America A 1.5
table21
YEAR GEO Type Score
2021 Asia A 3
2021 Africa A 2
2021 CentralAfrica A 6
2021 Europe A 1
2021 MiddleEast A 2
2021 America A 8
table22
YEAR GEO Type Score
2022 Asia A 4
2022 Africa A 0
2022 CentralAfrica C 3
2022 Europe C 4
2022 MiddleEast A 1
2022 America A 5
Relationship
Calendar table link those tables .
Calander = CALENDAR(DATE(2020,1,1),DATE(2022,12,31))
What is the best way to achieve this without appends to be a larger dataset ?
Not sure aggregation based on "imported table" can help on the performance ?
Please advise if my concept is basically wrong like append table cannot be avoided .e.t.c.

Related

Produce weekly and quarterly stats from a monthly figure

I have a sample of a table as below:
Customer Ref
Bear Rate
Distance
Month
Revenue
ABA-IFNL-001
1000
01/01/2022
-135
ABA-IFNL-001
1000
01/02/2022
-135
ABA-IFNL-001
1000
01/03/2022
-135
ABA-IFNL-001
1000
01/04/2022
-135
ABA-IFNL-001
1000
01/05/2022
-135
ABA-IFNL-001
1000
01/06/2022
-135
I also have a sample of a calendar table as below:
Date
Year
Week
Quarter
WeekDay
Qtr Start
Qtr End
Week Day
04/11/2022
2022
45
4
Fri
30/09/2022
29/12/2022
1
05/11/2022
2022
45
4
Sat
30/09/2022
29/12/2022
2
06/11/2022
2022
45
4
Sun
30/09/2022
29/12/2022
3
07/11/2022
2022
45
4
Mon
30/09/2022
29/12/2022
4
08/11/2022
2022
45
4
Tue
30/09/2022
29/12/2022
5
09/11/2022
2022
45
4
Wed
30/09/2022
29/12/2022
6
10/11/2022
2022
45
4
Thu
30/09/2022
29/12/2022
7
11/11/2022
2022
46
4
Fri
30/09/2022
29/12/2022
1
12/11/2022
2022
46
4
Sat
30/09/2022
29/12/2022
2
13/11/2022
2022
46
4
Sun
30/09/2022
29/12/2022
3
14/11/2022
2022
46
4
Mon
30/09/2022
29/12/2022
4
15/11/2022
2022
46
4
Tue
30/09/2022
29/12/2022
5
16/11/2022
2022
46
4
Wed
30/09/2022
29/12/2022
6
17/11/2022
2022
46
4
Thu
30/09/2022
29/12/2022
7
How can I join/link the tables to report on revenue over weekly and quarterly periods using the calendar table? I can put into two tables if needed as an output eg:
Quarter Starting
31/12/2021
01/04/2022
01/07/2022
30/09/2022
Quarter
1
2
3
4
Revenue
500
400
540
540
Week Date Start
31/12/2021
07/01/2022
14/01/2022
21/01/2022
Week
41
42
43
44
Revenue
33.75
33.75
33.75
33.75
I am using alteryx for this but wouldnt mind explaination of possible logic in sql to apply it into the system
Thanks
Before I get into the answer, you're going to have an issue regarding data integrity. All the revenue data is aggregated at a monthly level, where your quarters start and end on someday within the month.
For example - Q4 starts September 30th (Friday) and ends Dec. 29th (Thursday). You may have a day or two that bleeds from another month into the quarters which might throw off the data a bit (esp. if there's a large amount of revenue during the days that bleed into a quarter.
Additionally, your revenue is aggregated at a monthly level - unless you have more granular data (weekly, daily would be best), it doesn't make sense to do a weekly calculation since you'll probably just be dividing revenue by 4.
That being said - You'll want to use a cross tab feature in alteryx to get the data how you want it. But before you do that, we want to aggregate your data at a quarterly level first.
You can do this with an if statement or some other data cleansing tool (sorry, been a while since I used alteryx). Something like:
# Pseudo code - this won't actually work!
# For determining quarter
if (month) between (30/09/2022,29/12/2022) then 4
where you can derive the logic from your calendar table. Then once you have the quarter, you can join in the Quarter Start date based on your quarter calculation.
Now you have a nice clean table that might look something like this:
Month
Revenue
Quarter
Quarter Start Date
01/01/2022
-135
4
30/09/2022
01/01/2022
-135
4
30/09/2022
Aggregate on your quarter to get a cleaner table
Quarter Start Date
Quarter
revenue
30/09/2022
4
300
Then use cross tab, where you pivot on the Quarter start date.
For SQL, you'd be pivoting the data. Essentially, taking the value from a row of data, and converting it into a column. It will look a bit janky because the data is so customized, but here's a good question that goes over pivioting - Simple way to transpose columns and rows in SQL?

Is there a function in SQL that automatically generates more rows by month?

I've got a large database that's got all our transactions and shipping costs from them, here's a simplified version:
Source Table
Date
ROUTE
Cost
01/20/21
USA to UK
$40
01/01/21
USA to UK
$40
01/10/21
USA to UK
$40
12/20/20
USA to UK
$30
11/20/20
USA to UK
$20
11/20/20
USA to UK
$20
And I want to see the average cost by month before so it would look like:
Route
Nov 2020
Dec 2020
Jan 2020
USA to UK
$20
$30
$40
How do I write a code that I can repeat for when say April comes around and I have to refresh this table and I don't need to create new columns for Feb, March, etc.?
Here is a possible way of doing it by using PIVOT in Snowflake: https://docs.snowflake.com/en/sql-reference/constructs/pivot.html
Let's say "monthname" is a column you extracted out of your "Date"-column, probably this helps:
select * from yourTable
pivot(sum(cost) for monthname in ('January', 'February', 'March', 'April'))
order by route;
The values of your monthname-column should match the one in the brackets.
As this is a more static solution, you still have to adjust the code every month. Here probably writing a stored procedure is helping: https://docs.snowflake.com/en/sql-reference/stored-procedures.html

Include "0" results in COUNT(*) aggregate

Good morning, I've searched in the forum one doubt that I have but the results that I've seen didn't give me a solution.
I have two tables.
CARS:
Id Model
1 Seat
2 Audi
3 Mercedes
4 Ford
BREAKDOWNS:
IdBd Description Date Price IdCar
1 Engine 01/01/2020 500 € 3
2 Battery 05/01/2020 0 € 1
3 Wheel's change 10/02/2020 110,25 € 4
4 Electronic system 15/03/2020 100 € 2
5 Brake failure 20/05/2020 0 € 4
6 Engine 25/05/2020 400 € 1
I wanna make a query that shows the number of breakdowns by month with 0€ of cost.
I have this query:
SELECT Year(breakdowns.[Date]) AS YEAR, StrConv(MonthName(Month(breakdowns.[Date])),3) AS MONTH, Count(*) AS [BREAKDOWNS]
FROM cars LEFT JOIN breakdowns ON (cars.Id = breakdowns.IdCar AND breakdowns.[Price]=0)
GROUP BY breakdowns.[Price], Year(breakdowns.[Date]), Month(breakdowns.[Date]), MonthName(Month(breakdowns.[Date]))
HAVING ((Year([breakdowns].[Date]))=[Insert a year:])
ORDER BY Year(breakdowns.[Date]), Month(breakdowns.[Date]);
And the result is (if I put year '2020'):
YEAR MONTH BREAKDOWNS
2020 January 1
2020 May 1
And I want:
YEAR MONTH BREAKDOWNS
2020 January 1
2020 February 0
2020 March 0
2020 May 1
Thanks!
The HAVING condition should be in WHERE (otherwise it changes the Outer to an Inner join). But as long as you don't use columns from cars there's no need to join it.
To get rows for months without a zero price you should switch to conditional aggregation (Access doesn't support Standard SQL CASE, but IIF?).
SELECT Year(breakdowns.[Date]) AS YEAR,
StrConv(MonthName(Month(breakdowns.[Date])),3) AS MONTH,
SUM(CASE WHEN breakdowns.[Price]=0 THEN 1 ELSE 0 END) AS [BREAKDOWNS]
FROM breakdowns
JOIN cars
ON (cars.Id = breakdowns.IdCar)
WHERE ((Year([breakdowns].[Date]))=[Insert a year:])
GROUP BY breakdowns.[Price], Year(breakdowns.[Date]), Month(breakdowns.[Date]), MonthName(Month(breakdowns.[Date]))
ORDER BY Year(breakdowns.[Date]), Month(breakdowns.[Date]

pandas count groupy 2 attributes

I have a list that looks like this:
Name arrival_date location
Tom 2019-12-12 Hardware store
Tina 2019-12-31 Post office
Tina 2019-12-14 Post office
Tina 2019-11-30 Police station
With a few thousand entries. The data goes from april 2018 to april 2020
Now I would like to count the number of arrivals for each stop for each date over 2 years
So that is basically looks like this:
October 2018
Hardware Store:26
Police Station:13
...
November 2019
Hardware Store:226
Police Station:113
...
What is a good way to do this with pandas?
Use Series.dt.strftime with GroupBy.size for counts per both attributes:
#if necessary
#df['arrival_date'] = pd.to_datetime(df['arrival_date'])
#df = df.sort_values('arrival_date')
s = df['arrival_date'].dt.strftime('%B %Y').rename('month-year')
df = df.groupby([s, 'location'], sort=False).size().reset_index(name='count')
print (df)
month-year location count
0 December 2019 Hardware store 1
1 December 2019 Post office 2
2 November 2019 Police station 1

Normalize monthly payments

First, sorry for my bad English. I'm trying to normalize a table in a pension system where subscribers are paid monthly. I need to know who has been paid and who has not and how much they've been paid. I believe I'm using SQL Server. Here's an example:
id_subscriber id_receipt year month pay_value payment type_pay
12 1 2016 January 100 80 1
13 1 2016 January 100 100 1
14 1 2016 January 100 100 1
12 2 2016 February 100 100 2
13 2 2016 February 100 80 1
But I'm not happy repeating the year and the month for every single subscriber. It doesn't seem right. Is there a better way to store this data?
EDIT:
The case is as follows: this company has many subscribers who must pay monthly and payment can be in various ways. They produce a single receipt for many customers, and each customer that receipt may be paying one or more installments.
These are my other tables:
tbl_subscriber
id_suscriber(PK) first_name last_name address tel_1 tel_2
12 Juan Perez xxx xxx xxx
13 Pedro Lainez xxx xxx xxx
14 Maria Lopez xxx xxx xxx
tbl_receipt
id_receipt(PK) value elaboration_date deposit_date
1 1,000.00 2015-09-16 2015-09-20
2 890.00 2015-12-01 2015-12-18
tbl_type_paym
id type description
1 bank xxxx
2 ventanilla xxx
This basically seems fine. You could split dates out into a separate table and reference that, but that strikes me as a kind of silly way to do it. I would recommend storing the month as an integer instead of a varchar column though. Besides not storing the same string over and over you can more reasonably do comparisons.
You could also use date values, although that might not be worth the trouble when you don't want greater granularity than the month.