How to sum consecutive rows in Power Query - sum

I have in Power Query a Column "% sum of all". I need to create a custom column "Sum Consecutive" that each row has as value the "% sum of all" of the current row + the value of "Sum Consecutive" of the previous row.
Current row situation
New Custom Column Expectation
You can see two images that show the current situation and the next situation I need in the Power Query.
Can you please help me find a code/command to create this new column like that?
Although there are similar solved questions in DAX, I still need to keep editing the file after that, so it should be performed in M language in power query.
Thank you!

Not sure how performant my approaches are. I would think both should be reasonably efficient as they only loop over each row in the table once (and "remember" the work done in the previous rows). However, maybe the conversion to records/list and then back to table is slow for large tables (I don't know).
Approach 1: Isolate the input column as a list, transform the list by cumulatively adding, put the transformed list back in the table as a new column.
let
someTable = Table.FromColumns({List.Repeat({0.0093}, 7) & List.Repeat({0.0086}, 7) & {0.0068, 0.0068}}, {"% of sum of all"}),
listToLoopOver = someTable[#"% of sum of all"],
cumulativeSum = List.Accumulate(List.Positions(listToLoopOver), {}, (listState, currentIndex) =>
let
numberToAdd = listToLoopOver{currentIndex},
sum = try listState{currentIndex - 1} + numberToAdd otherwise numberToAdd,
append = listState & {sum}
in
append
),
backToTable = Table.FromColumns(Table.ToColumns(someTable) & {cumulativeSum}, Table.ColumnNames(someTable) & {"Cumulative sum"})
in
backToTable
Approach 2: Convert the table to a list of records, loop over each record and add a new field (representing the new column) to each record, then convert the transformed list of records back into a table.
let
someTable = Table.FromColumns({List.Repeat({0.0093}, 7) & List.Repeat({0.0086}, 7) & {0.0068, 0.0068}}, {"% of sum of all"}),
listToLoopOver = Table.ToRecords(someTable),
cumulativeSum = List.Accumulate(List.Positions(listToLoopOver), {}, (listState, currentIndex) =>
let
numberToAdd = Record.Field(listToLoopOver{currentIndex}, "% of sum of all"),
sum = try listState{currentIndex - 1}[Cumulative sum] + numberToAdd otherwise numberToAdd, // 'try' should only be necessary for first item
recordToAdd = listToLoopOver{currentIndex} & [Cumulative sum = sum],
append = listState & {recordToAdd}
in
append
),
backToTable = Table.FromRecords(cumulativeSum)
in
backToTable
I couldn't find a function in the reference for M/Power Query that sums a list cumulatively.

Related

Complex row manipulation based on column value in SQL or Power Query

I have a call dataset. Looks like this
If a call about a certain member comes in within 30 days of an "original call", that call is considered a callback. I need some logic or Power Query magic to handle this dataset using this logic. So the end result should look like this
Right now, I have the table left joined to itself which gives me every possible combination. I thought I could do something with that but it's proven difficult and when I have over 2 million unique case keys, the duplicates kill run time and overload memory. Any suggestions? I'd prefer to do the manipulation in Power Query editor but can do it in SQL. Plz and thank you.
I think you can do this in Power Query, but I have no idea how it will run with two million records.
It may be able to be sped up with judicious use of the Table.Buffer function. But give it a try as written first.
The code should be reasonably self-documenting
Group by Member ID
For each Member ID, create a table from a list of records which is created using the stated logic.
expand the tables
Mark the rows to be deleted by shifting up the Datediff column by one and applying appropriate logic to the Datediff and shifted columns.
Code assumes that the dates for each Member ID are in ascending order. If not, an extra sorting step would need to be added
Try this M code. (Change the Source line to be congruent with your own data source).
Edit:
Code edited to allow for multiple call backs from an initial call
let
//Change next line to be congruent with your actual data source
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"Case key", type text}, {"Member ID", Int64.Type}, {"Call Date", type date}}),
//Group by Member ID
// then create tables with call back date using the stated logic
#"Grouped Rows" = Table.Group(#"Changed Type", {"Member ID"}, {
{"Call Backs",(t)=>Table.FromRecords(
List.Generate(
()=>[ck=t[Case key]{0}, cd=t[Call Date]{0}, cb = null, df=null, idx=0],
each [idx] < Table.RowCount(t),
each [ck=if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30
then [ck] else t[Case key]{[idx]+1},
cd=if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30
then [cd] else t[Call Date]{[idx]+1},
cb = if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30
then t[Call Date]{[idx]+1} else null,
df = if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30
then Duration.Days(t[Call Date]{[idx]+1} - [cd]) else null,
idx = [idx]+1],
each Record.FromList({[ck],[cd],[cb],[df]},{"Case key","Call Date","Call Back Date", "Datediff"}))
)}
}),
#"Expanded Call Backs" = Table.ExpandTableColumn(#"Grouped Rows", "Call Backs",
{"Case key", "Call Date", "Call Back Date", "Datediff"},
{"Case key", "Call Date", "Call Back Date", "Datediff"}),
#"Shifted Datediff" = Table.FromColumns(
Table.ToColumns(#"Expanded Call Backs") & {
List.RemoveFirstN(#"Expanded Call Backs"[Datediff]) & {null}},
type table[Member ID=Int64.Type, Case key=text, Call Date=date, Call Back Date=date, Datediff=Int64.Type, shifted=Int64.Type ]),
#"Filter" = Table.SelectRows(#"Shifted Datediff", each [shifted]=null or [Datediff]<>null),
#"Removed Columns" = Table.RemoveColumns(Filter,{"shifted"})
in
#"Removed Columns"
Example with multiple callbacks
Think you can do this with Lead function.
here is the fiddle https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=f7cabdbe4d1193e5f0da6bd6a4571b96
select
a.*,
LEAD(CallDate, 1) OVER (
Partition by memberId
ORDER BY
CallDate
) AS "CallbackDate",
LEAD(CallDate, 1) OVER (
Partition by memberId
ORDER BY
CallDate
) - a.calldate AS DateDiff
from
mytable a

Get the item with the highest count

Can you please help me to get the item with the highest count using DAX?
Measure = FIRSTNONBLANK('Table1'[ItemName],CALCULATE(COUNT('Table2'[Instance])))
This shows the First ItemName in the table but doesnt get the ItemName of the Highest Value.
Thanks
Well, it's more complicated than I would have wanted, but here's what I came up with.
There things that you are hoping to do that are not so straightforward in DAX. First, you want an aggregated aggregation ;) -- in this case, the Max of a Count. The second thing is that you want to use a value from one column that you identify by what's in another column. That's row-based thinking and DAX prefers column-based thinking.
So, to do the aggregate of aggregates, we just have to slog through it. SUMMARIZE gives us counts of items. Max and Rank functions could help us find the biggest count, but wouldn't be so useful for getting Item Name. TOP N gives us the whole row where our count is the biggest.
But now we need to get our ItemName out of the row, so SELECTCOLUMNS lets us pick the field to work with. Finally, we really want a value not a 1-column, 1-row table. So FirstNonBlank finishes the job.
Hope it helps.
Here's my DAX
MostFrequentItem =
VAR SummaryTable = SUMMARIZE ( 'Table', 'Table'[ItemName], "CountsByItem", COUNT ( 'Table'[ItemName] ) )
VAR TopSummaryItemRow = TOPN(1, SummaryTable, [CountsByItem], DESC)
VAR TopItem = SELECTCOLUMNS (TopSummaryItemRow, "TopItemName", [ItemName])
RETURN FIRSTNONBLANK (TopItem, [TopItemName])
Here's the DAX without using variables (not tested, sorry. Should be close):
MostFrequentItem_2 =
FIRSTNONBLANK (
SELECTCOLUMNS (
TOPN (
1,
SUMMARIZE ( 'Table', 'Table'[ItemName], "Count", COUNT ( 'Table'[ItemName] ) ),
[Count], DESC
),
"ItemName", [ItemName]
),
[ItemName]
)
Here's the mock data:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcipNSspJTS/NVYrVIZ/nnFmUnJOKznRJzSlJxMlyzi9PSs3JAbODElMyizNQmLEA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Stuff = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Stuff", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Stuff", "ItemName"}})
in
#"Renamed Columns"

using criteria in an update query involving a join

I'm using MS Access
The SQL below updates the CurrNumTees field in the Parent tblContact records with the number of tblTorTee records that have an end date (which is not the ultimate effect I am aiming for, but I provide it as a starting point.
UPDATE tblContact
INNER JOIN tblTorTee ON tblContact.ContactId = tblTorTee.TorId
SET tblContact!CurNumTees = DCount("[tblTorTee.EndDate]",
"tbltortee","Torid = " & [ContactId]);
I need to update the CurrNumTees field with the number of records in tblTorTee that do not have an EndDate, in other words, that field is blank. I’ve tried using WHERE and HAVING and IS NULL in various combinations and locations, but without success. Could you help point me in the right direction?
The MS Access COUNT function does not count nulls, so I think you have to do this in two stages.
Firstly create a query like this:
SELECT TorId, IIF(ISNULL(EndDate),1,0) AS isN
FROM tblTorTee
WHERE EndDate IS NULL;
And save it as QryEndDateNull
Now you can run an Update Query like this:
UPDATE tblContact
SET tblContact.CurNumTees = DSUM("IsN","QryEndDateNull","TorId = " & [ContactID]);
Saving calculated data (data dependent on other data) is usually a bad design, especially aggregate data. Should just calculate when needed.
Did you try the IS NULL criteria within the DCount()?
UPDATE tblContact Set CurNumTees = DCount("*", "tblTorTee", "EndDate Is Null AND TorId = " & [ContactId]);

MS Access/Access-SQL: Switch Function to Calculate Distances based on Criteria

I am somewhat new to MS Access/SQL and StackOverflow, so bear with me. I have a problem that I can't seem to figure out. I had posted this before, but got no responses, and editing my original post did not help.
I have quantity and distance data for assets for each week. Not all weeks have Quantity or distances (some just have 1 or the other). Here is a sample of the data for one asset(put in CSV):
Asset,Week,Qty,Dist,Actual_Dist
2153,1,,125,
2153,2,,65,
2153,3,50.1,118,
2153,4,,123,
2153,5,96.6,91,
2153,6,,103,
2153,7,,120,
2153,8,,106,
2153,9,100.6,,
2153,13,96,,
2153,14,,102,
2153,15,,40,
2153,18,84.82,,
2153,21,97.8,,
2153,25,96.7,,
2153,28,31.27,63,
2153,29,77.5,,
What I want to be able to do, is take a SUM of the "DIST" field until the row has a corresponding "QTY" field value, and this would be calculated in the "Actual_Dist" field.
Looking at my example, Weeks 1 and 2 have no Qty values, however in Week 3, Qty is 50.1 and I would want the "Actual_Dist" calculated as the sum of "Dist" from Weeks 1-3 inclusive. So essentially, Row 3's "Actual_Dist" would be SUM(125+65+118).
Right now I see 3 cases:
Case 1: as above, if no Qty, but has a Dist, then sum the distances until the next Qty value.
Case 2: If Qty exists, but no Dist, then disregard
Case 3: If Qty has a value, and Dist has a value, and Qty has previous values before (ie., Week 28), then "Actual_Dist" = Dist
So I was thinking of doing a select switch to cover the two main cases (1 & 3):
Select Asset, Week, Qty, Dist, Switch (Qty like 'NULL' AND Dist <> 'NULL', SUM(Dist) AS Actual_Dist, Switch (Qty <> 'NULL AND Dist <> 'NULL', DIST) AS Actual_Dist
**Not sure if my Switch is done right, but I think you get the picture?
Now my issues comes in the sum function above. How do I get it sum properly and take the distance values before a qty value is present (Case 2)? I apologize if the formatting or presentation of sample data is poor. Just signed up. I will likely have to clarify some points, so let me know and I will clarify as necessary.
It is also important to note that this is just one asset, and there are many. For the sum function above, I need it to be able to sum the records above for ANY given number of records.
Hope someone can help.
Thanks
EDIT: #Cha had posted the following:
SELECT Data.Asset, Data.Week, Data.Qty, Data.Dist, Switch(Not IsNull([Qty]) And Not IsNull([Dist]),[Dist], Not IsNull([Qty]),Nz(DSum("Dist","Data","Asset=" & CStr([Asset]) & " And Week <= " & CStr([Week]) & " And Week > " & CStr(Nz(DMin("Week","Data","Asset=" & CStr([Asset]) & " And Week < " & CStr([Week]) & " And Not IsNULL(Qty)"),0))),0)) AS Actual_Dist FROM Data;
This code gave me errors due to data mismatch, so I changed all the data types to "Number" and modified the code as follows:
SELECT Data.Asset, Data.Week, Data.Qty, Data.Dist, Switch(Not IsNull([Qty]),Nz(DSum("Dist","[Data]","Asset=" & [Asset] & " And Week <= " & [Week] & " And Week > " & Nz(DMin("Week","[Data]","Asset=" & [Asset] & " And Week < " & [Week] & " And Not IsNULL([Qty])"),0)),0)) AS Actual_Dist
FROM Data;
The above code now satisfies Case 1, but only satisfies it for Row 3 and 5. This Case does not satisfy Rows 9 and 13, and it needs to apply there too. I believe the issue with those rows is that the "Dist" is NULL.
There is another issue, Case 1 and Case 3 overwrite eachother occasionally (when both Qty and Dist are not NULL. Is there a way to create 1 switch to run Case 1, and another (with the same code) to apply Case 3 but not Case 1?
Any help would be much appreciated!
Thanks
If you can use VBA, then try this:
Public Sub GetActual_Dist()
Const TABLE_NAME As String = "tblAssets"
Dim sTemp_Dist As Single
With CurrentDb.OpenRecordset("SELECT * FROM [tblAssets] WHERE NOT ([Qty] is not Null AND [Dist] is Null);")
If .EOF And .BOF Then
MsgBox "No data"
Exit Sub
End If
Do Until .EOF
If ![Qty] Is Null Then
sTemp_Dist = sTemp_Dist + ![Dist]
Else
.Edit
![Actual_Dist] = sTemp_Dist
.Update
sTemp_Dist = 0
End If
.MoveNext
Loop
End With
End Sub
Change the TABLE_NAME to the one in your database.

Creating filter with SQL queries

I am trying to create a filter with SQL queries but am having trouble with numeric values linking to other tables.
Every time I try to link to another table, it takes the same record and repeats it for every element in the other table.
For example, here is query:
SELECT ELEMENTS.RID,TAXONOMIES.SHORT_DESCRIPTION,[type],ELEMENT_NAME,ELEMENT_ID,SUBSTITUTION_GROUPS.DESCRIPTION,namespace_prefix,datatype_localname
FROM ELEMENTS,SUBSTITUTION_GROUPS,TAXONOMIES,SCHEMAS,DATA_TYPES
WHERE ELEMENTS.TAXONOMY_ID = TAXONOMIES.RID AND ELEMENTS.ELEMENT_SCHEMA_ID = SCHEMAS.RID AND
ELEMENTS.DATA_TYPE_ID = DATA_TYPES.RID
AND ELEMENTS.SUBSTITUTION_GROUP_ID = 0
The last line is the actual filtering criteria.
Here is an example result:
There should only be ONE result (Item has an RID of 0). But it's repeating a copy of the one record for every result inside the substitution groups table (there's 4).
Here is my database schema for reference. The lines indicate relationships between tables and the circles indicate the values I want:
You're forgot to join between ELEMENTS and SUBSTITUTION_GROUPS in your query.
SELECT
ELEMENTS.RID,TAXONOMIES.SHORT_DESCRIPTION,[type],ELEMENT_NAME,ELEMENT_ID,SUBSTITUTION_GROUPS.DESCRIPTION,namespace_prefix,datatype_localname
FROM
ELEMENTS,SUBSTITUTION_GROUPS,TAXONOMIES,SCHEMAS,DATA_TYPES
WHERE
ELEMENTS.TAXONOMY_ID = TAXONOMIES.RID AND ELEMENTS.ELEMENT_SCHEMA_ID = SCHEMAS.RID
AND ELEMENTS.DATA_TYPE_ID = DATA_TYPES.RID
AND ELEMENTS.SUBSTITUTION_GROUP_ID = SUBSTITUTION_GROUPS.RID
AND ELEMENTS.SUBSTITUTION_GROUP_ID = 0