Calculate sum by grouping dates into weeks in Power BI - sum

I have a 'Date' column and its data type is of date format. There is also another column 'Amount' tied to each date. How can I create new columns representing the sum of total amount collected each 'week' based on each day of the 'Date' column? The 'week' needs to be derived from 'Date' column by grouping the first 7 dates into week 1, next 7 dates into week 2 and so on. Basically I need to show week over week comparison in a chart for different categories in different years for which I need to prepare my data.
I'm providing a sample dataset below to provide a better understanding:
| Categ | Date | Amount |
|-------|------------|--------|
| abc | 11/29/2019 | $1 |
| abc | 11/30/2019 | $2 |
| abc | 12/1/2019 | $3 |
| abc | 12/2/2019 | $3.5 |
| abc | 12/3/2019 | $0 |
| abc | 12/4/2019 | $6 |
| abc | 12/5/2019 | $4 |
| abc | 12/6/2019 | $6.5 |
| abc | 12/7/2019 | $4.5 |
| xyz | 11/29/2019 | $3 |
| xyz | 11/30/2019 | $2 |
| xyz | 12/01/2019 | $4.5 |
| xyz | 12/02/2019 | $8 |
| xyz | 12/03/2019 | $2 |
| xyz | 12/04/2019 | $4 |
| xyz | 12/05/2019 | $2.5 |
| xyz | 12/06/2019 | $9 |
| xyz | 12/07/2019 | $6 |
| abc | 11/29/2020 | $0.5 |
| abc | 11/30/2020 | $3.5 |
| abc | 12/01/2020 | $2 |
| abc | 12/02/2020 | $6 |
| abc | 12/03/2020 | $7 |
| abc | 12/04/2020 | $5 |
| abc | 12/05/2020 | $3 |
| abc | 12/06/2020 | $4 |
| abc | 12/07/2020 | $2 |
I'm expecting my final table to look either like Table 1 or Table 2 provided below:
TABLE 1:
| Categ | Year | Week1 | Week2 |
|-------|------|-------|-------|
| abc | 2019 | $19.5 | $30.5 |
| abc | 2020 | $27 | $6 |
| xyz | 2019 | $26 | $15 |
TABLE 2:
| Categ | Week 1 - 2019 | Week 2 - 2019 | Week 1 - 2020 | Week 2 - 2020 |
|-------|---------------|---------------|---------------|---------------|
| abc | $19.5 | $30.5 | $27 | $6 |
| xyz | $26 | $15 | - | - |

First off, your sample seems to have detail data that adds to 104.5 while the summary adds to 124.0 so something is off there
Beyond that, its a question of date math. Find the year using custom column with formula
= Date.Year([Date])
Find the lowest date and year using
MinDate= Table.Min(#"PreviousStepName", "Date")[Date],
MinYear= Table.Min(#"PreviousStepName", "Year")[Year],
The week numbers you want are (date)-(MinimumDate) - (year-MinimumYear)*365 - (year-MinimumYear) or in code, the formula below in a custom column
= Number.From([Date]) - Number.From(MinDate) -([Year]-MinYear)*365 -([Year]-MinYear)
that math gives you this far right column
Then integer.divide that column results by 7 and add 1. That give you this
remove extra columns, pivot, and done. Sample code for desired output #1 below
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}}),
#"Added Custom1" = Table.AddColumn(#"Changed Type", "Year", each Date.Year([Date])),
MinDate = Table.Min(#"Added Custom1", "Date")[Date],
MinYear = Table.Min(#"Added Custom1", "Year")[Year],
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Week", each Number.From([Date]) - Number.From(MinDate) -([Year]-MinYear)*365 -([Year]-MinYear) ),
#"Integer-Divided Column" = Table.TransformColumns(#"Added Custom2", {{"Week", each 1+Number.IntegerDivide(_, 7), Int64.Type}}),
#"Removed Columns" = Table.RemoveColumns(#"Integer-Divided Column",{"Date"}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns", {{"Week", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns", {{"Week", type text}}, "en-US")[Week]), "Week", "Amount", List.Sum)
in #"Pivoted Column"

Related

Converting multi-indexed dataframe to markdown

Currently I have multi-indexed dataframe as below.
I wanted it's markdown to look like this
| | | col |
|-------|----------|-------|
| April | 1 | 6 |
| April | 2 | 7 |
| August| 1 | 14 |
But what I get after df.to_markdown(tablefmt='github') is
| | col |
|------------------|-------|
| ('April', 1) | 6 |
| ('April', 2) | 7 |
| ('August', 1) | 14 |
Can anyone tell me how to do get the markdown as I want.
You need first to flatten the multi index with pandas.DataFrame.reset_index then set index=False in pandas.DataFrame.to_markdown :
out= (
df.reset_index()
.rename(columns={'idx1': '', 'idx2': ''})
.to_markdown(tablefmt='github', index=False)
)
# Output:
print(out)
| | | col |
|----------|----|-------|
| April | 1 | 6 |
| April | 2 | 7 |
| August | 1 | 14 |
| August | 2 | 15 |
| December | 1 | 22 |
| December | 2 | 23 |
| February | 1 | 2 |
| February | 2 | 3 |
| January | 1 | 0 |

How to split single row yearly values into multi row monthly values in PostgreSQL?

I have a table of values from yearly payments like this:
| id | date | yearly_payment |
| ------ | -------- | :--------------: |
| 1 | 06/01/21 | $600 |
| 2 | 06/01/22 | $720 |
What I am trying to achieve is:
| id | date | monthly_payment |
| ------ | -------- | :---------------: |
| 1 | 06/01/21 | $50 |
| 1 | 07/01/21 | $50 |
| 1 | ... | $50 |
| 1 | 05/01/21 | $50 |
| 2 | 06/01/22 | $60 |
| 2 | 07/01/22 | $60 |
| 2 | ... | $60 |
| 2 | 05/01/22 | $60 |
I thought I could achieve this through some transformation on a pivot table, but to no avail. This solution gets me close, but I can't quite figure out how to achieve it within Postgres.
Would this work?
select
y.id, y.date + interval '1 month' * gs.a as date,
y.yearly_payment / 12 as monthly_payment
from
yearly_payments y
cross join generate_series (0, 11) gs (a)
Beware of rounding... if yearly_payment is an integer, then you would want to divide by 12.0 to force a numeric context.

SQL some selections into one (or get two colums from one)

I use PostgreSql, I have two tables (for example)
Let table1 will contain stores, there are 2 types 'candy store' and 'dental store'.
Each row contains information about a customer's purchase in a particular store
In result i want to get money from each type of store group by id and the last date of purchase. Money from candy stores start sum since 2016, but money from dental stores start sum from 2018
table1:
+----+---------+------------------+-------+
| id | store | date of purchase | money |
| 1 | store 1 | 2016-01-01 | 10 |
| 1 | store 5 | 2018-01-01 | 50 |
| 2 | store 2 | 2017-01-20 | 10 |
| 2 | store 3 | 2019-02-20 | 15 |
| 3 | store 2 | 2017-02-02 | 20 |
| 3 | store 6 | 2019-01-01 | 60 |
| 1 | store 1 | 2015-01-01 | 20 |
+----+---------+------------------+-------+
table2 :
+---------+--------+
| store | type |
| store 1 | candy |
| store 2 | candy |
| store 3 | candy |
| store 4 | dental |
| store 5 | dental |
| store 6 | dental |
+---------+--------+
I want my query to return a table like this:
+----+---------------+-----------------+---------------+-----------------+
| id | money( candy) | the last date c | money(dental) | the last date d |
| 1 | 10 | 2016-01-01 | 50 | 2018-01-01 |
| 2 | 25 | 2019-02-20 | - | - |
| 3 | 20 | 2017-02-02 | 60 | 2019-01-01 |
+----+---------------+-----------------+---------------+-----------------+
if I understand correctly , this is what you want to do :
select id
, sum(money) filter (where ty.type = 'candy') candymoney
, max(purchasedate) filter (where ty.type = 'candy') candylastdate
, sum(money) filter (where ty.type = 'dental') dentalmoney
, max(purchasedate) filter (where ty.type = 'dental') dentallastdate
from table t
join storetype table st on t.store = ty.store
group by id

Combining certain columns from multiple files based on a particular column , and not eliminating duplicate names

I had to combine values of 2nd columns from multiple files, based on the 7th column of all files, so based on Ed Morton's answer in similar question (Combining certain columns of several tab-delimited files based on first column) , I wrote code like this :
awk 'FNR==1 { ++numFiles}
!seen[$7]++ { keys[++numKeys] = $7 }
{ a[$7,numFiles] = $2 }
END {
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
printf "%s", key
for (fileNr=1;fileNr<=numFiles;fileNr++) {
printf "\t%s", ((key,fileNr) in a ? a[key,fileNr] : "NA")
}
print ""
} } ' file1.txt file2.txt file3.txt > combined.txt
INPUT FILE 1 :
+-------+-----------------+----------+-------------+----------+-------------+-------------+
| ID | adj.P.Val_file1 | P.Value | t | B | logFC | Gene.symbol |
+-------+-----------------+----------+-------------+----------+-------------+-------------+
| 36879 | 1.66E-09 | 7.02E-14 | -12.3836337 | 21.00111 | -2.60060826 | AA |
| 33623 | 1.66E-09 | 7.39E-14 | -12.3599517 | 20.95461 | -2.53106808 | AA |
| 23271 | 2.70E-09 | 2.30E-13 | -11.8478184 | 19.93024 | -2.15050984 | BB |
| 67 | 2.70E-09 | 2.40E-13 | -11.829044 | 19.892 | -3.06680932 | BB |
| 33207 | 1.21E-08 | 1.35E-12 | -11.0793461 | 18.32425 | -2.65246816 | CC |
| 24581 | 1.81E-08 | 2.41E-12 | -10.8325542 | 17.79052 | -1.87937753 | CC |
| 32009 | 3.25E-08 | 5.05E-12 | -10.5240537 | 17.11081 | -1.46505166 | CC |
+-------+-----------------+----------+-------------+----------+-------------+-------------+
INPUT FILE 2 :
+-------+-----------------+----------+------------+-----------+------------+--------------+
| ID | adj.P.Val_file2 | P.Value | t | B | logFC | Gene.symbol |
+-------+-----------------+----------+------------+-----------+------------+--------------+
| 40000 | 5.43E-13 | 1.21E-17 | 17.003819 | 29.155646 | 2.4805744 | FGH |
| 32388 | 1.15E-11 | 5.12E-16 | 14.920047 | 25.829874 | 2.2497567 | FGH |
| 33623 | 6.08E-11 | 4.43E-15 | -13.8115 | 23.870549 | -2.8161587 | ASD |
| 25002 | 6.08E-11 | 5.40E-15 | 13.713018 | 23.689571 | 2.2164681 | ASD |
| 33207 | 2.03E-10 | 2.29E-14 | -13.009752 | 22.36291 | -2.8787392 | ASD |
| 13018 | 2.03E-10 | 2.71E-14 | 12.929201 | 22.207038 | 3.0181585 | ASD |
| 5539 | 2.24E-10 | 3.48E-14 | 12.810902 | 21.976634 | 3.0849706 | ASD |
+-------+-----------------+----------+------------+-----------+------------+--------------+
DESIRED OUTPUT :
+-------------+-----------------+-----------------+
| Gene.symbol | adj.P.Val_file1 | adj.P.Val_file2 |
+-------------+-----------------+-----------------+
| AA | 1.66E-09 | NA |
| AA | 1.66E-09 | NA |
| BB | 2.70E-09 | NA |
| BB | 2.70E-09 | NA |
| CC | 1.21E-08 | NA |
| CC | 1.81E-08 | NA |
| CC | 3.25E-08 | NA |
| FGH | NA | 5.43E-13 |
| FGH | NA | 1.15E-11 |
| ASD | NA | 6.08E-11 |
| ASD | NA | 6.08E-11 |
| ASD | NA | 2.03E-10 |
| ASD | NA | 2.03E-10 |
| ASD | NA | 2.24E-10 |
+-------------+-----------------+-----------------+
The problem is that the 7th column has repetitive names, and the code takes the first occurrence of a particular name, I want the results for all the repetitive names. I tried deleting each line of the code and understand, but couldn't come up with solution
Finally figured out answer by myself !
I just have to eliminate the line : !seen[$7]++ from my code , as by including that it would only consider the first occurrence of any replicated name of 7th column (nth column in general)

PDI Kettle - How to Normalize Advanced Structure?

I have 7 columns of data in a MySQL Database. The Year1 column belongs to the Revenue1 column. The following columns have the same structure. I know how to handle this in SQL, but not in PDI. Can anyone describe how to do it?
mySQL table structure
+--------+-------+-------+-------+----------+----------+----------+
| Ticker | Year1 | Year2 | Year3 | Revenue1 | Revenue2 | Revenue3 |
+--------+-------+-------+-------+----------+----------+----------+
| | | | | | | |
| ABC | 2010 | 2011 | 2012 | 250000 | 500000 | 1000000 |
+--------+-------+-------+-------+----------+----------+----------+
Desired normalized output from PDI:
+------------+------+-----------+---------+
| Ticker | Year | Keyfigure | Value |
+------------+------+-----------+---------+
| | | | |
| ABC | 2010 | Revenue | 250000 |
| | | | |
| ABC | 2011 | Revenue | 500000 |
| | | | |
| ABC | 2012 | Revenue | 1000000 |
+------------+------+-----------+---------+
Have you tried using the row denormaliser?