SSAS Trace Logs TextData Parsing - ssas

I logged Analysis Services trace with Extended Events.
I want to summarize this data, which columns and measures are used in Tabular cube?
I have TextData column, Is there any way to parse and get this information?

Related

How to update insert new record with updated value from staging table in Azure Data Explorer

I have requirement, where data is indigested from the Azure IoT hub. Sample incoming data
{
"message":{
"deviceId": "abc-123",
"timestamp": "2022-05-08T00:00:00+00:00",
"kWh": 234.2
}
}
I have same column mapping in the Azure Data Explorer Table, kWh is always comes as incremental value not delta between two timestamps. Now I need to have another table which can have difference between last inserted kWh value and the current kWh.
It would be great help, if anyone have a suggestion or solution here.
I'm able to calculate the difference on the fly using the prev(). But I need to update the table while inserting the data into table.
As far as I know, there is no way to perform data manipulation on the fly and inject Azure IoT data to Azure Data explorer through JSON Mapping. However, I found a couple of approaches you can take to get the calculations you need. Both the approaches involve creation of secondary table to store the calculated data.
Approach 1
This is the closest approach I found which has on-fly data manipulation. For this to work you would need to create a function that calculates the difference of Kwh field for the latest entry. Once you have the function created, you can then bind it to the secondary(target) table using policy update and make it trigger for every new entry on your source table.
Refer the following resource, Ingest JSON records, which explains with an example of how to create a function and bind it to the target table. Here is a snapshot of the function the resource provides.
Note that you would have to create your own custom function that calculates the difference in kwh.
Approach 2
If you do not need a real time data manipulation need and your business have the leniency of a 1-minute delay, you can create a query something similar to below which calculates the temperature difference from source table (jsondata in my scenario) and writes it to target table (jsondiffdata)
.set-or-append jsondiffdata <| jsondata | serialize
| extend temperature = temperature - prev(temperature,1), humidity, timesent
Refer the following resource to get more information on how to Ingest from query. You can use Microsoft Power Automate to schedule this query trigger for every minute.
Please be cautious if you decide to go the second approach as it is uses serialization process which might prevent query parallelism in many scenarios. Please review this resource on Windows functions and identify a suitable query approach that is better optimized for your business needs.

Google Analytics - Data Length of the Dimension & Metrics

Team,
We have a daily data pull for Google Analytics, we get Source, Geo, Geo Network,Device & AdWords data into SQL Server for reporting purpose.
Today we got the production failure because of the lengthy keyword dimension data from Google. Is there a way to figure out all the column data sizes? Including the list of Dimension & Metrics? Please share if there is a place where it is all defined.
Thanks
Senthil

SSAS: How to handle date dimension when date is null

I'm trying to add a new column to my SSAS cube. The column is a date field, and links to my DimDate table (a Date dimension). This date represents the project completion date.
However.... not all of the projects have a project completion date due to old projects not ever being assigned this value. And this is expected. We don't want to put bogus dates into the field just to get SSAS to work.
When processing the cube, it crashes with:
Errors in the OLAP storage engine: The attribute key cannot be found when
processing: Table: 'dbo_FactMyTable', Column: 'MyDate_id', Value: '0'.
The attribute is 'Date Id'.
I can't disable "missing values" for the entire project because in most cases, this really is an error. How can I disable missing values for this dimension?
Or is there a better way to handle missing dates/values like this?
Small correction - based on your question, you need to change Processing error handling for special Measure Group, not Dimension. You can do it for all dimensions linked to some measure group, but not to specific dimension.
You can process individual measure group with _Table: 'dbo_FactMyTable'_ first with necessary missing value settings, and then - process rest of your cube with default settings.
Main problem here - how to process rest of the cube. You might have sophisticated system which creates processing XMLA scripts dynamically based on data update knowledge (I do it with SSIS); in this case you would not ask this question. Suppose your environment is simpler - you update cube and would like to process it as a whole completely. In such scenario I would sudgest the following workflow:
Process Default all Dimensions (will do initial processing or in structure changes)
Process Update all Dimensions
Process Cube with Unprocess - invalidating it
Process your special measure group
Process Cube with Process Default
This will first update Dimensions, then - clear processing status flag from all measure groups in the Cube. After that you process your measure group with special flags; this set processing status for this MG. And then during Process Default on Cube - only unprocessed MGs will be covered, which excludes your special MG from processing scope.
The answer is a bit complicated, but this article did a great job of explaining it, including screen shots for the SSAS-challenged like me.
http://msbusinessintelligence.blogspot.com/2015/06/handling-null-dates-in-sql-server.html?m=1

How to send data to only one Azure SQL DB Table from Azure Streaming Analytics?

Background
I have set up an IoT project using an Azure Event Hub and Azure Stream Analytics (ASA) based on tutorials from here and here. JSON formatted messages are sent from a wifi enabled device to the event hub using webhooks, which are then fed through an ASA query and stored in one of three Azure SQL databases based on the input stream they came from.
The device (Particle Photon) transmits 3 different messages with different payloads, for which there are 3 SQL tables defined for long term storage/analysis. The next step includes real-time alerts, and visualization through Power BI.
Here is a visual representation of the idea:
The ASA Query
SELECT
ParticleId,
TimePublished,
PH,
-- and other fields
INTO TpEnvStateOutputToSQL
FROM TpEnvStateInput
SELECT
ParticleId,
TimePublished,
EventCode,
-- and other fields
INTO TpEventsOutputToSQL
FROM TpEventsInput
SELECT
ParticleId,
TimePublished,
FreshWater,
-- and other fields
INTO TpConsLevelOutputToSQL
FROM TpConsLevelInput
Problem: For every message received, the data is pushed to all three tables in the database, and not only the output specified in the query. The table in which the data belongs gets populated with a new row as expected, while the two other tables get populated with NULLs for columns which no data existed for.
From the ASA Documentation it was my understanding that the INTO keyword would direct the output to the specified sink. But that does not seem to be the case, as the output from all three inputs get pushed to all sinks (all 3 SQL tables).
The test script I wrote for the Particle Photon will send one of each type of message with hardcoded fields, in the order: EnvState, Event, ConsLevels, each 15 seconds apart, repeating.
Here is an example of the output being sent to all tables, showing one column from each table:
Which was generated using this query (in Visual Studio):
SELECT
t1.TimePublished as t1_t2_t3_TimePublished,
t1.ParticleId as t1_t2_t3_ParticleID,
t1.PH as t1_PH,
t2.EventCode as t2_EventCode,
t3.FreshWater as t3_FreshWater
FROM dbo.EnvironmentState as t1, dbo.Event as t2, dbo.ConsumableLevel as t3
WHERE t1.TimePublished = t2.TimePublished AND t2.TimePublished = t3.TimePublished
For an input event of type TpEnvStateInput where the key 'PH' would exist (and not keys 'EventCode' or 'FreshWater', which belong to TpEventInput and TpConsLevelInput, respectively), an entry into only the EnvironmentState table is desired.
Question:
Is there a bug somewhere in the ASA query, or a misunderstanding on my part on how ASA should be used/setup?
I was hoping I would not have to define three separate Stream Analytics containers, as they tend to be rather pricey. After running through this tutorial, and leaving 4 ASA containers running for one day, I used up nearly $5 in Azure credits. At a projected $150/mo cost, there's just no way I could justify sticking with Azure.
ASA is supposed to be purposed for Complex Event Processing. You are using ASA in your queries to essentially pass data from the event hub to tables. It will be much cheaper if you actually host a simple "worker web app" to process the incoming events.
This blog post covers the best practices:
http://blogs.msdn.com/b/servicebus/archive/2015/01/16/event-processor-host-best-practices-part-1.aspx
ASA is great if you are doing some transformations, filters, light analytics on your input data in real-time. Furthermore, it also works great if you have some Azure Machine Learning models that are exposed as functions (currently in preview).
In your example, all three "select into" statements are reading from the same input source, and don't have any filter clauses, so all rows would be selected.
If you only want to rows select specific rows for each of the output, you have to specify a filter condition. For example, assuming you only want records with a non null value in column "PH" for the output "TpEnvStateOutputToSQL", then ASA query would look like below
SELECT
ParticleId,
TimePublished,
PH
-- and other fields INTO TpEnvStateOutputToSQL FROM TpEnvStateInput WHERE PH IS NOT NULL

Access system tables in calculation

Is there a way to access system tables in a SSAS cube calcuation?
For example the following query can be executed on a SSAS cube to return a last processed date:
SELECT LAST_DATA_UPDATE FROM $System.MDSCHEMA_CUBES WHERE CUBE_NAME = 'Cube'
How would one access this information in a calculation?
Background: We were using ASSP before (a third party sproc) to get the last cube processed date. Recently, this sproc threw an exception on one of our cubes and caused SSAS to go down. Using the above line of MDX did not have this behavior. I would rather not have our cube depend on third party code so I am looking for a way to access LAST_DATA_UPDATE in a calc for a specific cube name.
I usually include a detached Dimension in my cubes e.g. ".Cube Information" which includes attributes like this. Other useful attributes could expose the currency of the data e.g. when did the last underlying ETL process complete, or the release/build of your cube.
I feed this "Cube Information" dimension from a SQL view which returns a single row with whatever data is needed - you could use your SELECT statement. It also needs to return a Key column with a fixed value e.g. 1.
By "detached" I mean the "Cube Information" dimension has no entries in the Cube Dimension Relationship tab.
In the cube Calculation script, I assign the DEFAULT_MEMBER property for that dimension to the fixed Key value from the SSAS view.
Any client tool can then access those Dimension attributes.