DAX calculating a rolling average with AVERAGE - sql

while I was setting up my visual page on PowerBI, I came across a weird issue to address with. I was trying to calculate an average of some values which came togheter with their dates (date) in tableA.
More precisely, tableA has a date field and a numeric feature (feature) and there might be more values for the same date. In addition, the data field points to another data field in a common calendar table (calendarTable). I would like to calculate an average of 'feature' (let's say, the daily average).
To achieve this, I've tried to calculate a new measure as stated here:
Average = CALCULATE(
AVERAGE('tableA'[feature]),
USERELATIONSHIP('tableA'[date], 'calendarTable'[date]),
GROUPBY(date, 'calendarTable'[date])
)
What I got is a 'cumulative' average instead of a daily average. In other terms, for each date the set of values to be averaged increases, including the previous values.
I've also tried to perform the calculation in SQL with success (in DAX there is no need to refer to tableB as I used a calculated column):
SELECT
CAST(a.Date AS Date) AS Dates,
AVG(DATEDIFF(MINUTE, b.Date, a.Date)) AS AVG_DURATION
FROM
tableB AS b
INNER JOIN
tableA AS a
ON
a.ID = b.ID
GROUP BY
CAST(a.Date AS Date)
ORDER BY
Dates ASC;
Does anyone have an idea on how to get in DAX the same result as in SQL? I've already tried to apply some filters on dates but with no luck.
Thanks.

Although there are some confusion in your question, this below instruction would help you achieving your requirement.
If I understand correct, you are looking for a daily AVEGARE of your column "Feature". You can do this with a new Custom Table, with GROUP BY functionality using DAX. Click on new table and use this below code and check you get your expected output or not-
group_by_date =
GROUPBY (
your_table_name,
your_table_name[date_column_name].[Date],
"Average Feature", AVERAGEX(CURRENTGROUP(), your_table_name[feature])
)
Now, if you looking for DAX to calculate the Same (but redundant) result in each row, you can use this below code-
date_wise_average =
VAR current_row_date = MIN(your_table_name[date_column_name].[Date])
RETURN
CALCULATE(
AVERAGE(your_table_name[feature]),
FILTER(
ALL(your_table_name),
your_table_name[date_column_name].[Date] = current_row_date
)
)

Related

Fill in gaps in time series after a group by date in BigQuery

I have a big dataset with multiple timeseries of different products and quantities sold. After I grouped by Product_ID and Date(monthly), products that have not been sold during a specific month generate gaps in the time series. How could I fill all these gaps with rows with quantity sold 0?
I found similar stuff but haven't been able to adapt it to my situation
Your question is rather vague, best to provide example input and outputs in future, but here is my guess as to what you are trying to do.
You will probably want to create a calendar table, with the intervals of your data. I will use daily as the example as you have not suggested, you can adapt for different regularities as required.
CREATE OR REPLACE TABLE `project.my_dataset.daily_calendar` (
MY_DATE DATE NOT NULL
)
INSERT INTO `project.my_dataset.daily_calendar`
SELECT
MY_DATE
FROM
(
SELECT DATE_ADD('2000-01-01',INTERVAL param DAY) AS MY_DATE
FROM unnest(GENERATE_ARRAY(0, 10000, 1)) as param
)
;
The above will create a calendar with daily intervals from 10000 days from the year 2000, adjust according to your requirements.
You will then want to cross join your data with the calendar data, this will create a record for every day, and every product.
select
cal.date,
your_data.product_name,
ifnull(your_data.quantity, 0) -- if no data then make the value 0
from `project.dataset.table`
cross join
`project.my_dataset.daily_calendar`
From your vague description, this could work. Ideally provide example input and output in future to get a more accurate response.
Edit:
An alternate (arguably better) method to fill the calendar table, as suggested by #Cylldby:
INSERT INTO `project.my_dataset.daily_calendar`
SELECT
MY_DATE
FROM
(
SELECT param AS MY_DATE
FROM unnest(GENERATE_DATE_ARRAY('2010-01-01', '2030-12-31')) as param
)
;

How to joint two tables in dax using custom condition

I have Cartons table, which contains two datatime columns - entering warehouse date and exiting warehouse date. For my report i need to calculate table which shows how many cartons are in the warehouse at the end of the each day. My idea was get number of cartons for each date which have entering date lower than current date and exiting date higher than current date. So i need to translate following sql into dax:
SELECT d.date, COUNT(c.Id) AS 'Count of cartons' FROM #dim d
INNER JOIN Inventory.Cartons c on d.date between c.EnteringWarehouseTime and c.ExitingWarehouseTime
GROUP BY d.date
ORDER By d.date
Where dim is table with all dates.
But all joins in dax can be performed only using relations. I can only make cross join of these tables and filter result, but this operation would take to much time. Do i have another options for this?
Actually you can simulate a relationship with dax. However, if I understand correctly your questions and the datamodell, you want to query all cartons that are still in the warehouse at a given time, right? For each day in the Date table you can calculate that how many rows in the Carton table are by filtering it by the currently iterated Day. So this formula calculates:
For each day in the date table - VALUES('Date') -, will calculate how many rows in the Cartons table present used some filtering - COUNTROWS('Cartons') -. And the filtering works like this: On the current value of the Day - think as a foreach in C# - it will check that how many rows are in the Cartons table present where it's Exiting date is higher or equal than the current Date's value in the iteration, and Enter date is lower the the current date, or it is BLANK() - so still in the warehouse.
CALCULATETABLE(
ADDCOLUMNS(
VALUES('Date'),
"Cartons",
CALCULATE(
COUNTROWS('Cartons'),
FILTER(
'Cartons',
'Cartons'[EnteringWarehouseTime] <= 'Date'[Date]
),
FILTER(
'Cartons',
OR('Cartons'[ExitingWarehouseTime] >= 'Date'[Date],ISBLANK('Cartons'[ExitingWarehouseTime])
)
)
)
)
This is very similar to the "Open orders" pattern. Check out daxpatterns.com
If you want to simulate a relationship you can always use the COUNTROWS() > 0 pattern as a filter.
Like if you want to do a SUM(Value) on your main table, but only for those rows that are present in the Referenced table - without relationship:
CALCULATE(
SUM('MainTable'[Value]),
FILTER(
'MainTable',
CALCULATE(
COUNTROWS('ReferencedTable'),
'ReferencedTable'[PK] = 'MainTable'[FK]
) > 0
)
)

Access SQL Query: Comparing Date In Select Statement

I have a problem that I simply cannot seem to figure out. I have a list of employees with different travel dates and I want to display all of them in a cascading list format. The problem is that I only want to see employees once, and only the date closest to today.
For example I could have 'Smith' in there multiple times with dates before and after today, as we also keep historical records. This means I can't just do min, as it will try and display a date before today, and max is too far forward.
The code example below ALMOST works. The problem is in the select statement. I want to show the minimum date after today, but instead it gives me 0's and -1's where the dates should be. There might just be another way to do this all together, but this is the only configuration that seems to allow the other information such as Site, Position, and Comments to be displayed correctly alongside it.
SELECT A.`Last Name` AS [Last Name], Min(A.`Date In`) > Now() AS [Date In], Max(B.Site) AS Site, Max(B.Position), Max(B.Comments) AS Comments
FROM Deployments AS A
INNER JOIN Deployments AS B ON A.ID = B.ID
GROUP BY A.`FSR Name`
HAVING (((Max(A.`Actual TEP IN`))>Now()));
I did a group by Name because I only want to see each individual once. If I don't add the table to itself with a join it gives a self reference error. This is my first time posting so I hope this makes sense! All help will be greatly appreciated!
Not sure what DB you're on, but in general, you need to return MIN(date) instead of the result of the comparison "Min(Date) > Now()" - I'm guessing this is where you're seeing 0's and -1's, since that would be the result of the comparison, when you want the minimum date value itself.
Also, if you are just wanting people who have a trip date in the future, just restrict your query with a WHERE clause, do a GROUP BY, and you get rid of the self-join. Also note that the example below aligns some discrepancies in your OP like where you're selecting based on "Last Name" but grouping on "FSR Name" - these things must be consistent, whichever field you're concerned about.
Example:
SELECT A.[FSR Name] AS [FSR Name],
Min(A.[Date In]) AS [Date In],
Max(A.Site) AS Site,
Max(A.Position) AS Position,
Max(A.Comments) AS Comments
FROM Deployments AS A
WHERE A.[Date In] > Now()
GROUP BY A.[FSR Name];
EDIT: If you need to make sure that Site,Position,Comments all came from the same row, you have to do something like one of these options:
If you have a Primary Key:
select * from Deployments A3 where A3.pk_value =
(select max(A2.pk_value) from Deployments A2
where A2.[Date In] =
(select Max([Date In]) from Deployments A where A.[FSR Name] = A2.[FSR Name])
and A2.[FSR Name] = A3.[FSR Name]
)
This guarantees you to get 1 row per FSR Name, even if there are multiple rows for that FSR with the same "latest" date.
Otherwise, you can leave out the secondary query dealing with the pk_value, but you run a risk of getting multiple rows for an FSR that has multiple records with the same "latest" date.
Note: when you get to queries this complex, running on a full-featured database (SQL Server, Oracle, anything but Access) allows you to use much more sophistication. For this example, "Windowing Functions" would give you the answer without as much wrangling. Not sure if you're stuck with Access for now, but consider this for the future, anyway.
Try something like this
Select A.LastName, A.DateIn, A.Site, A.Position, A.Comments
From deployments a
Where not exists (Select *
From deployments b
Where b.id <> a.id
and (abs(datediff(d, getdate(), a.datein))) > abs(datediff(d, getdate(), b.datein))
or abs(datediff(d, getdate(), a.datein)) = abs(datediff(d, getdate(), b.datein) and a.id > b.id))
Instead of the funny mins and maxes that you are using to try to get the row with the datein that is closest to today, try using datediff. With this function, you can specify what type of date or time value you are looking to compare (day, month, year, minute) and then find the difference between two different datetimes. In this case, I used getdate() to find the current date and time. Then, we want the datein with the least value for datediff, the datein that is closest to today. Datediff will return positive or negative values, so I used abs to get the absolute value of the result. I did this because it doesn't matter if the date is before today or after today.
Then we are looking in the deployment table. The subquery says that we should look at all the values which are not the current value. Then, find all the rows that have a smaller datediff than the current record. Also, find all the records that have the same datediff as the current record and a smaller id. We will only include the current record if there isn't anything that fits this criteria. It is a little weird to think about, but this type of query should help you find what you are looking for a lot easier. The only thing is that you will need to add criteria in the where clause of the subquery to determine which entries to compare. As it stands, this query will look at all of the entries in your deployments table and pull back the one row that has a datein closest to today. Since you want one row for each person, this will need few more specifications.

DAX formula for calculate Sum between 2 dates

I have a couple of tables in PowerPivot:
A Stock table - WKRelStrength whose fields are:
Ticker, Date, StockvsMarket% (values are percentages), RS+- (values can be 0 or 1)
A Calendar Table - Cal with a Date field.
There is a many to one relationship between the tables.
I am trying to aggregate RS+-against each row for dates between 3 months ago to the date for that row - i.e a 3 month to date sum. I have tried numerous calculations but the best I can return is an circular reference error. Here is my formula:
=calculate(sum([RS+-]),DATESINPERIOD(Cal[Date],LASTDATE(Cal[Date]),-3,Month))
Here is the xlsx file.
I couldn't download the file but what you are after is what Rob Collie calls the 'Greatest Formula in the World' (GFITW). This is untested but try:
= CALCULATE (
SUM ( WKRelStrength[RS+-] ),
FILTER (
ALL ( Cal ),
Cal[Date] <= MAX ( Cal[Date] )
&& Cal[Date]
>= MAX ( Cal[Date] ) - 90
) )
Note, this will give you the previous 90 days which is approx 3 months, getting exactly the prior 3 calendar months may be possible but arguably is less optimal as you are going to be comparing slightly different lengths of time (personal choice I guess).
Also, this will behave 'strangely' if you have a total in that it will use the last date in your selection.
First of all, the formula that you are using is designed to work as a Measure. This may not work well for a Calculated Column. Secondly, it is better to do such aggregations as a Measure level, than at individual records.
Then again, I do not fully understand your situation, but if it is absolutely important for you to do this at a Record level, you may want to use the "Earlier" Function.
If you want to filter a function, based on a value in the correspontinf row, you just have to wrap your Column name with the Earlier Function. Try changing the LastDate to Earlier in your formula.

Is there a way to handle immutability that's robust and scalable?

Since bigquery is append-only, I was thinking about stamping each record I upload to it with an 'effective date' similar to how peoplesoft works, if anybody is familiar with that pattern.
Then, I could issue a select statement and join on the max effective date
select UTC_USEC_TO_MONTH(timestamp) as month, sum(amt)/100 as sales
from foo.orders as all
join (select id, max(effdt) as max_effdt from foo.orders group by id) as latest
on all.effdt = latest.max_effdt and all.id = latest.id
group by month
order by month;
Unfortunately, I believe this won't scale because of the big query 'small joins' restriction, so I wanted to see if anyone else had thought around this use case.
Yes, adding a timestamp for each record (or in some cases, a flag that captures the state of a particular record) is the right approach. The small side of a BigQuery "Small Join" can actually return at least 8MB (this value is compressed on our end, so is usually 2 to 10 times larger), so for "lookup" table type subqueries, this can actually provide a lot of records.
In your case, it's not clear to me what the exact query you are trying to run is.. it looks like you are trying to return the most recent sales times of every individual item - and then JOIN this information with the SUM of sales amt per month of each item? Can you provide more info about the query?
It might be possible to do this all in one query. For example, in our wikipedia dataset, an example might look something like...
SELECT contributor_username, UTC_USEC_TO_MONTH(timestamp * 1000000) as month,
SUM(num_characters) as total_characters_used FROM
[publicdata:samples.wikipedia] WHERE (contributor_username != '' or
contributor_username IS NOT NULL) AND timestamp > 1133395200
AND timestamp < 1157068800 GROUP BY contributor_username, month
ORDER BY contributor_username DESC, month DESC;
...to provide wikipedia contributions per user per month (like sales per month per item). This result is actually really large, so you would have to limit by date range.
UPDATE (based on comments below) a similar query that finds "num_characters" for the latest wikipedia revisions by contributors after a particular time...
SELECT current.contributor_username, current.num_characters
FROM
(SELECT contributor_username, num_characters, timestamp as time FROM [publicdata:samples.wikipedia] WHERE contributor_username != '' AND contributor_username IS NOT NULL)
AS current
JOIN
(SELECT contributor_username, MAX(timestamp) as time FROM [publicdata:samples.wikipedia] WHERE contributor_username != '' AND contributor_username IS NOT NULL AND timestamp > 1265073722 GROUP BY contributor_username) AS latest
ON
current.contributor_username = latest.contributor_username
AND
current.time = latest.time;
If your query requires you to use first build a large aggregate (for example, you need to run essentially an accurate COUNT DISTINCT) another option is to break this query up into two queries. The first query could provide the max effective date by month along with a count and save this result as a new table. Then, could run a sum query on the resulting table.
You could also store monthly sales records in separate tables, and only query the particular table for the months you are interested in, simplifying your monthly sales summaries (this could also be a more economical use of BigQuery). When you need to find aggregates across all tables, you could run your queries with multiple tables listed after the FROM clause.