Stream Analytic Query for storing the custom metric of application insight to SQL Db - azure-stream-analytics

I want to store all my custom metrics which are being logged in App Ins, move to SQL database.
I have enabled the cont export on App Ins, which is dumping the App Ins Custom metrics in blob.
From here want a I want Stream Analytic to dump the data in SQL Azure.
The issue is I am not able to write the transformation query in SA.
We will have 100s of custom metric which will be logged.
I want to store them in SQL like this
Time Metric Value
-------------------------------------
I am trying to achieve this with query:
SELECT
flat.PropertyName,
flat.PropertyValue
INTO
[outputdb-ai3]
FROM
[storage-ai] A
OUTER APPLY
GetRecordProperties(A.[context].[custom]) AS flat
But no luck, please suggest.
Thanks

Here is the query to get the desired result.
SELECT
Input.internal.data.id,
Input.context.data.eventtime,
recordProperty.PropertyName AS Name,
recordProperty.PropertyValue.Value
INTO
[outputdb]
FROM
[storage-ai] AS Input TIMESTAMP BY Input.context.data.eventtime
CROSS APPLY GetElements(Input.[context].[custom].[metrics]) AS flat
CROSS APPLY GetRecordProperties(Flat.ArrayValue) AS recordProperty

Related

Improve performance of deducting values of same table in SQL

for a metering project I use a simple SQL table in the following format
ID
Timestamp: dat_Time
Metervalue: int_Counts
Meterpoint: fk_MetPoint
While this works nicely in general I have not found an efficient solution for one specific problem: There is one Meterpoint which is a submeter of another Meterpoint. I'd be interested in the Delta of those two Meterpoints to get the remaining consumption. As the registration of counts is done by one device I get datapoints for the various Meterpoints at the same Timestamp.
I think I found a solution applying a subquery which appears to be not very efficient.
SELECT
A.dat_Time,
(A.int_Counts- (SELECT B.int_Counts FROM tbl_Metering AS B WHERE B.fk_MetPoint=2 AND B.dat_Time=A.dat_Time)) AS Delta
FROM tbl_Metering AS A
WHERE fk_MetPoint=1
How could I improve this query?
Thanks in advance
You can try using a window function instead:
SELECT m.dat_Time,
(m.int_counts - m.int_counts_2) as delta
FROM (SELECT m.*,
MAX(CASE WHEN fk.MetPoint = 2 THEN int_counts END) OVER (PARTITION BY dat_time) as int_counts_2
FROM tbl_Metering m
) m
WHERE fk_MetPoint = 1
From a query point of view, you should as a minimum change to a set-based approach instead of an inline sub-query for each row, using a group by as a minimum but it is a good candidate for a windowing query, just as suggested by the "Great" Gordon Linoff
However if this is a metering project, then we are going to expect a high volume of records, if not now, certainly over time.
I would recommend you look into altering the input such that delta is stored as it's own first class column, this moves much of the performance hit to the write process which presumably will only ever occur once for each record, where as your select will be executed many times.
This can be performed using an INSTEAD OF trigger or you could write it into the business logic, in a recent IoT project we computed or stored these additional properties with each inserted reading to greatly simplify many types of aggregate and analysis queries:
Id of the Previous sequential reading
Timestamp of the Previous sequential reading
Value Delta
Time Delta
Number of readings between this and the previous reading
The last one sounds close to your scenario, we were deliberately batching multiple sequential readings into a single record.
You could also process the received data into a separate table that includes this level of aggregation information, so as not to pollute the raw feed and to allow you to re-process it on demand.
You could redirect your analysis queries to this second table, which is now effectively a data warehouse of sorts.

Hybrid Query Example in AgensGraph

I am using agensgraph but I dont know how to write a hybrid query, any examples of hybrid query in agensgraph would help a lot.
In AgensGraph you can write hybrid queries in two ways:
Let's say you are creating the followings:
CREATE GRAPH AG;
CREATE VLABEL dev;
CREATE (:dev {name: 'someone', year: 2015});
CREATE (:dev {name: 'somebody', year: 2016});
CREATE TABLE history (year, event)
AS VALUES (1996, 'PostgreSQL'), (2016, 'AgensGraph');
1- Cypher in SQL
Syntax:
SELECT [column_name]
FROM ({table_name|SQL-query|CYPHERquery})
WHERE [column_name operator value];
Example:
SELECT n->>'name' as name
FROM history, (MATCH (n:dev) RETURN n) as dev
WHERE history.year > (n->>'year')::int;
Result:
name ----
someone
(1 row)
2- SQL in Cypher
Syntax:
MATCH [table_name]
WHERE (column_name operator {value|SQLquery|CYPHERquery})
RETURN [column_name];
Example:
MATCH (n:dev)
WHERE n.year < (SELECT year FROM history WHERE event =
'AgensGraph')
RETURN properties(n) AS n;
Result:
n ----
{"name": "someone", "year": 2015}
(1 row)
You can find more information here
I found more info on the hybrid query language in these slides. Every other bit of information I have been able to find is just the same example that Eya posted, in different places.
I agree that more information about the hybrid queries in AgensGraph would be great, as it seems like a killer feature of software.
Let’s assume that we have a network management system and we are keeping our network topology in graph part of the AgensGraph (Graph Format) and our time-series data (such as date&time information regarding specific devices) in the relational part of the AgensGraph (Table Format). So, in this case, we know that we have a graph, tables and if we want, we can write a hybrid query to fetch data from both models.
In our graph, we have different devices that are connected to each other such as a modem, IoT sensors, etc. for each of these devices, we also have some information respectively stored in tables - related to those devices such as download speed, the upload speed or CPU usage.
In the following hybrid queries, our goal is to collect the information regarding specific devices by querying both from the graph and the tables simultaneously.
Cypher in SQL
In this hybrid query, we are looking to find modem devices which are having issues and their abnormality type is 2 (2 indicates that this device is having some issues regarding its download and upload speed) and after we find those devices, our goal is to return their id, download, and upload speed to investigate the issue. As you can see in the following query our inner query is Cypher and our outer query is SQL.
SELECT id,sysdnbps, sysupbps
from public.modemrdb where to_jsonb(id) in
(SELECT id FROM (MATCH(m:modem) where
m.abnormaltype=2
return m.name)
AS s(id));
SQL in Cypher
In this hybrid query, we are looking to find modem devices which their CPU usages are more than 80 (not in range of threshold) which indicate there is an issue with these devices and after we find those devices, our goal is to return that modems and any IoT devices that are connected to them. As you can see in the following example our inner query is SQL and our outer query is Cypher.
MATCH p=(n:modem)-[r*1..2]->(iot)
WHERE n.name in
(SELECT to_jsonb(id)
FROM public.modemrdb
WHERE syscpuusage >= 80)
RETURN p;
This can be another example of a hybrid query.

How to unnest Google Analytics custom dimension in Google Data Prep

Background story:
We use Google Analytics to track user behaviour on our website. The data is exported daily into Big Query. Our implementation is quite complex and we use a lot of custom dimensions.
Requirements:
1. The data needs to be imported into our internal databases to enable better and more strategic insights.
2. The process needs to run without requiring human interaction
The problem:
Google Analytics data needs to be in a flat format so that we can import it into our database.
Question: How can I unnest custom dimensions data using Google Data Prep?
What it looks like?
----------------
customDimensions
----------------
[{"index":10,"value":"56483799"},{"index":16,"value":"·|·"},{"index":17,"value":"N/A"}]
What I need it to look like?
----------------------------------------------------------
customDimension10 | customDimension16 | customDimension17
----------------------------------------------------------
56483799 | ·|· | N/A
I know how to achieve this using a standard SQL query in Big Query interface but I really want to have a Google Data Prep flow that does it automatically.
Define the flat format and create it in BigQuery first.
You could
create one big table and repeat several values using CROSS JOINs on all the arrays in the table
create multiple tables (per array) and use ids to connect them, e.g.
for session custom dimensions concatenate fullvisitorid / visitstarttime
for hits concatenate fullvisitorid / visitstarttime / hitnumber
for products concatenate fullvisitorid / visitstarttime / hitnumber / productSku
The second options is a bit more effort but you save storage because you're not repeating all the information for everything.

How to make the output as another input?

The case: We have a 1day aggregation window which sum the total value received from Event hub (1, 2, 3 … each minute send a value), we set the output to a blob called 1dayresult. Now we want to get the blob data as another 1week aggregation input, each week we want to get the data from the blob and do calculation, so can we set the 1day result blob as the input for the 1week aggregation? We know we can set the window unit is 7day, but we think it will slow down the performance, because if we make the 1day result blob as the input, we only need 7 values, but if we use the 7day window, we will get more than 7*24*60 values and then do the calculation. We also want to have a month aggregation, but the The maximum size of the window is 7 days. So how to achieve it?
You can use WITH statement to "chain" multiple subqueries together, where next subquery can be using output of the previous as input. Take a look at this doc
However, as you noted, sometimes it can be more efficient to persist intermediate output results in blob storage or another Event Hub. You can define same storage location both as input and as output and one subquery can be outputting while another is reading.
The maximum size of windows in Azure Stream Analytics is indeed 7 days. For bigger size window computations involving bigger amounts of historic data it might be better to use product like Azure Data Factory.

Azure Stream Analytics - long living calculation

I'm using Azure stream analytics for real-time analytics and I have a basic problem. I have a field which I would like to count the number of messages.
The json is in the following format:
{ categoryId: 100, name: 'hello' }
I would like to see the number of count by category, so I assume that the query in Azure stream analytics should be:
SELECT
categoryId,
count(*) as categoryCount
INTO
categoriesCount
FROM
categoriesInput
GROUP BY
categoryId
The problem is that I have to add TumblingWindow or SlidingWindows to the group by clause. Is there a way to avoid that and have the calculation running indefinitely ? Also I need to make sure the output is written to the SQL server.
How about a sliding window with the length of "1". This way it would act like a pointer and everytime it changes you can do the calculation?
Hope this helps!
Mert