Google Data Studio incorrect calculated metrics - calculated-field

I am creating calculated metrics in Data Studio and I am having trouble with the results.
Metric 1 uses this formula:
COUNT_DISTINCT(CASE WHEN ( Event Category = "ABC" AND Event Action = "XXX" AND Event Label = "123" ) THEN ga clientId (user) ELSE " " END )
[[To count the events with distinct clientIds]]
Metric 2 uses this formula:
COUNT_DISTINCT(CASE WHEN ( Event Category = "ABC" AND Event Action = "YYY" AND Event Label = "456" ) THEN ga clientId (user) ELSE " " END )
[[To count the events with distinct clientIds]]
Metric 3 uses this formula:
COUNT_DISTINCT(CASE WHEN ( Event Category = "ABC" AND Event Action = "ZZZ" AND Event Label = "789" ) THEN userId(user) ELSE " " END )
[[To count the events with distinct userIds]]
The formulas work fine and when I do Metric 2/ Metric 1 the number is correct for a one day time span. When I do Metric 3/Metric 2 the number is wrong. Why is this? It doesn't make sense to me since they are both numerical values.
Also, when I increase the date range the Metric 2 / Metric 1 is incorrect too! Any ideas why these are not working?

If you are aggregating over a certain amount of data, then these calculations will not be exact; they will be approximations.

I have noticed that Google Data Studio is more accurate with data properly loaded into BigQuery rather than data loaded through something else like a PostgreSQL connector. Otherwise, APPROX_COUNT_DISTINCT may be used.

Related

SQL select column group by where the ratio of a value is 1

I am using PSQL.
I have a table with a few columns, one column is event that can have 4 different values - X1, X2, Y1, Y2. I have another column that is the name of the service and I want to group by using this column.
My goal is to make a query that take an event and verify that for a specific service name I have count(X1) == count(X2) if not display a new column with "error"
Is this even possible? I am kinda new to SQL and not sure how to write this.
So far I tried something like this
select
service_name, event, count(service_name)
from
service_table st
group by
(service_name, event);
I am getting the count of each event for specific service_name but I would like to verify that count of event 1 == count of event 2 for each service_name.
I want to add that each service_name have a choice of 2 different event only.
You may not need a subquery/CTE for this, but it will work (and makes the logic easier to follow):
WITH event_counts_by_service AS (SELECT
service_name
, COUNT(CASE WHEN event='X1' THEN 1 END) AS count_x1
, COUNT(CASE WHEN event='X2' THEN 1 END) AS count_x2
FROM service_table
GROUP BY service_name)
SELECT service_name
, CASE WHEN count_x1=count_x2 THEN NULL ELSE 'Error' END AS are_counts_equal
FROM event_counts_by_service

Complex row manipulation based on column value in SQL or Power Query

I have a call dataset. Looks like this
If a call about a certain member comes in within 30 days of an "original call", that call is considered a callback. I need some logic or Power Query magic to handle this dataset using this logic. So the end result should look like this
Right now, I have the table left joined to itself which gives me every possible combination. I thought I could do something with that but it's proven difficult and when I have over 2 million unique case keys, the duplicates kill run time and overload memory. Any suggestions? I'd prefer to do the manipulation in Power Query editor but can do it in SQL. Plz and thank you.
I think you can do this in Power Query, but I have no idea how it will run with two million records.
It may be able to be sped up with judicious use of the Table.Buffer function. But give it a try as written first.
The code should be reasonably self-documenting
Group by Member ID
For each Member ID, create a table from a list of records which is created using the stated logic.
expand the tables
Mark the rows to be deleted by shifting up the Datediff column by one and applying appropriate logic to the Datediff and shifted columns.
Code assumes that the dates for each Member ID are in ascending order. If not, an extra sorting step would need to be added
Try this M code. (Change the Source line to be congruent with your own data source).
Edit:
Code edited to allow for multiple call backs from an initial call
let
//Change next line to be congruent with your actual data source
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"Case key", type text}, {"Member ID", Int64.Type}, {"Call Date", type date}}),
//Group by Member ID
// then create tables with call back date using the stated logic
#"Grouped Rows" = Table.Group(#"Changed Type", {"Member ID"}, {
{"Call Backs",(t)=>Table.FromRecords(
List.Generate(
()=>[ck=t[Case key]{0}, cd=t[Call Date]{0}, cb = null, df=null, idx=0],
each [idx] < Table.RowCount(t),
each [ck=if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30
then [ck] else t[Case key]{[idx]+1},
cd=if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30
then [cd] else t[Call Date]{[idx]+1},
cb = if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30
then t[Call Date]{[idx]+1} else null,
df = if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30
then Duration.Days(t[Call Date]{[idx]+1} - [cd]) else null,
idx = [idx]+1],
each Record.FromList({[ck],[cd],[cb],[df]},{"Case key","Call Date","Call Back Date", "Datediff"}))
)}
}),
#"Expanded Call Backs" = Table.ExpandTableColumn(#"Grouped Rows", "Call Backs",
{"Case key", "Call Date", "Call Back Date", "Datediff"},
{"Case key", "Call Date", "Call Back Date", "Datediff"}),
#"Shifted Datediff" = Table.FromColumns(
Table.ToColumns(#"Expanded Call Backs") & {
List.RemoveFirstN(#"Expanded Call Backs"[Datediff]) & {null}},
type table[Member ID=Int64.Type, Case key=text, Call Date=date, Call Back Date=date, Datediff=Int64.Type, shifted=Int64.Type ]),
#"Filter" = Table.SelectRows(#"Shifted Datediff", each [shifted]=null or [Datediff]<>null),
#"Removed Columns" = Table.RemoveColumns(Filter,{"shifted"})
in
#"Removed Columns"
Example with multiple callbacks
Think you can do this with Lead function.
here is the fiddle https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=f7cabdbe4d1193e5f0da6bd6a4571b96
select
a.*,
LEAD(CallDate, 1) OVER (
Partition by memberId
ORDER BY
CallDate
) AS "CallbackDate",
LEAD(CallDate, 1) OVER (
Partition by memberId
ORDER BY
CallDate
) - a.calldate AS DateDiff
from
mytable a

Find changes made in PGSQL table using SQL in Excel VBA

Is it possible to create an SQL query to compare a field within a single table to see if a change has been made and if possible list the before and after?
I have the following SQL query written in Excel 2010 VBA, which connects to an Oracle PostGreSQL database
Dim au As String
au = "SELECT id, priority, flag, code " _
& "FROM hist WHERE ( aud_dt >= '18/05/2020' AND aud_dt <='18/05/2020' ) " _
Set rs = conn.Execute(au)
With ActiveSheet.QueryTables.Add(Connection:=rs, Destination:=Range("A1"))
.Refresh
End With
Where fields include:
"priority" is the field that I'd like to check for changes which will
be a single number between 0-9
"code" is the record that has been
assigned the priority and is a mixture of numbers and letters up to 7
characters
"flag" shows a 1 as the active record, and 2 as an edited
record
"id" refers to the user account
I'd ideally like to end up with something like: id | priority | flag | priority_old | flag_old | code
Which should show the before and after changes to the priority. If the record shows priority=3 and flag=2 and code=Ab12, there must also be record with a 1 flag, as that is now the active record. If it has the same priority number for the code I'm not interested in it as that just means something else was changed instead as I have not listed all the column fields.
If the active record now shows priority=4, flag=1 and code=Ab12, that would be exactly the record I need to see.
Consider a self-join query (possibly you need to adjust date filter in WHERE depending on when items change):
SELECT h1.id, h1.priority, h1.flag,
h2.priority AS priority_old, h2.flag AS flag_old, h1.code
FROM hist h1
LEFT JOIN hist h2
ON h1.code = h2.code
AND h1.priority <> h2.priority
AND h1.flag = 1
AND h2.flag <> 1
WHERE (aud_dt >= '2020-05-18' AND aud_dt <='2020-05-18')

How to sum consecutive rows in Power Query

I have in Power Query a Column "% sum of all". I need to create a custom column "Sum Consecutive" that each row has as value the "% sum of all" of the current row + the value of "Sum Consecutive" of the previous row.
Current row situation
New Custom Column Expectation
You can see two images that show the current situation and the next situation I need in the Power Query.
Can you please help me find a code/command to create this new column like that?
Although there are similar solved questions in DAX, I still need to keep editing the file after that, so it should be performed in M language in power query.
Thank you!
Not sure how performant my approaches are. I would think both should be reasonably efficient as they only loop over each row in the table once (and "remember" the work done in the previous rows). However, maybe the conversion to records/list and then back to table is slow for large tables (I don't know).
Approach 1: Isolate the input column as a list, transform the list by cumulatively adding, put the transformed list back in the table as a new column.
let
someTable = Table.FromColumns({List.Repeat({0.0093}, 7) & List.Repeat({0.0086}, 7) & {0.0068, 0.0068}}, {"% of sum of all"}),
listToLoopOver = someTable[#"% of sum of all"],
cumulativeSum = List.Accumulate(List.Positions(listToLoopOver), {}, (listState, currentIndex) =>
let
numberToAdd = listToLoopOver{currentIndex},
sum = try listState{currentIndex - 1} + numberToAdd otherwise numberToAdd,
append = listState & {sum}
in
append
),
backToTable = Table.FromColumns(Table.ToColumns(someTable) & {cumulativeSum}, Table.ColumnNames(someTable) & {"Cumulative sum"})
in
backToTable
Approach 2: Convert the table to a list of records, loop over each record and add a new field (representing the new column) to each record, then convert the transformed list of records back into a table.
let
someTable = Table.FromColumns({List.Repeat({0.0093}, 7) & List.Repeat({0.0086}, 7) & {0.0068, 0.0068}}, {"% of sum of all"}),
listToLoopOver = Table.ToRecords(someTable),
cumulativeSum = List.Accumulate(List.Positions(listToLoopOver), {}, (listState, currentIndex) =>
let
numberToAdd = Record.Field(listToLoopOver{currentIndex}, "% of sum of all"),
sum = try listState{currentIndex - 1}[Cumulative sum] + numberToAdd otherwise numberToAdd, // 'try' should only be necessary for first item
recordToAdd = listToLoopOver{currentIndex} & [Cumulative sum = sum],
append = listState & {recordToAdd}
in
append
),
backToTable = Table.FromRecords(cumulativeSum)
in
backToTable
I couldn't find a function in the reference for M/Power Query that sums a list cumulatively.

Add Query with Month filter to an Access Database with Date/Time Field

I have an Access Database with many fields connected through a datagridview in my vb.net project. Two of these fields contain Date/Time Values. I want to create a query through the query builder that uses input from the user to find records that match the dates the user wants. This "where clause" works :
WHERE BETWEEN ? AND ?
This creates a toolstrip in which I can input 2 dates so that the query can fill the datagridview with the records.
What I want now is to make a query like the above only this time the user inputs the name of a month he wants (ex. February or 02 ). Is there any way to do that ?
EDIT: Tried using Plutonix's code (Sailing is the Column name) :
WHERE SAILING BETWEEN #01/31/yyyy# AND #03/01/yyyy#
and I got this error: "Cannot convert entry to valid datetime TO_DATE function might be required"
EDIT 2: I have created a combobox containing all 12 months and I have a commandbutton. I want to find a way so that if the user selects one of the 12 months from that combobox and clicks the commandbutton, to have the datagridview control (access database) show him only the records that go with that month based on their datetime(Short Date) fields. What code should I put in my commandbutton_click ?
Eventually I created a long query where I added a "Between ? and ?" in my where clause for every Year included in the database (2004-2015), created a huge if clause that gives the beginnings and endings of the month requested for every year and used all those strings to query the database.
If Month.Text = "JANUARY" Then
A04A = "01/01/04"
A04B = "31/01/04"
A05A = "01/01/05"
A05B = "31/01/05"
A06A = "01/01/06"
A06B = "31/01/06"
A07A = "01/01/07"
A07B = "31/01/07"
A08A = "01/01/08"
A08B = "31/01/08"
A09A = "01/01/09"
A09B = "31/01/09"
A10A = "01/01/10"
A10B = "31/01/10"
A11A = "01/01/11"
A11B = "31/01/11"
A12A = "01/01/12"
A12B = "31/01/12"
A13A = "01/01/13"
A13B = "31/01/13"
A14A = "01/01/14"
A14B = "31/01/14"
A15A = "01/01/15"
A15B = "31/01/15"
A16A = "01/01/16"
A16B = "31/01/16"
ElseIf etc etc