KQL: Unpacking array into columns - kql

I have the following records:
Message = User: Value1 \r\nComponent: Value2\r\nResult description: Value3\r\nName: Value4
Message = Event type: ValueA\r\nApplication: ValueB\r\nApplication\\Path: ValueC\r\nUser: ValueD
Using | extend Message = split(Message, "\\r\\n") I get the following results:
Message = ["User:Value1", "Component:Value2", "Result description:Value3", "Name:value4"]
Message = [Event type:ValueA", "Application:ValueB", "Path:ValueC","User: ValueD"]
I would like to use the "keys" here as column names and the "values" to populate them as such:
User |Component| ResultDescription| Name | EventType | Application | Path
_________________________________________________________________________
Value1|Value2 | Value3 |Value4| | |
ValueD| | | | ValueA | ValueB | ValueC
I've tried using both mv_expand and bag_unpack but the most I've successfully been able to do looks like this:
Message
__________
["User:value1]
["Component:value2]
["ResultDescription:value3]
and so on.
How can I do this?

It'd be inefficient, but you could try this:
use extract_all() to extract the key-value pairs from the input message
expand the pairs using mv-apply, and create a property bag out of them using summarize make_bag()
use evaluate bag_unpack() to unpack the property bag into columns
[Note: you may need to adjust the regular expression used in the example below to match the contents of your keys/values]
datatable(s:string)
[
'Message = User: Value1 \r\nComponent: Value2\r\nResult description: Value3\r\nName: Value4',
'Message = Event type: ValueA\r\nApplication: ValueB\r\nApplication\\Path: ValueC\r\nUser: ValueD',
]
| mv-apply e = extract_all(#"(\w+): (\w+)", dynamic([1,2]), s) on (
project p = pack(tostring(e[0]), e[1])
| summarize b = make_bag(p)
)
| evaluate bag_unpack(b)
s
Application
Component
description
Name
Path
type
User
Message = User: Value1 Component: Value2Result description: Value3Name: Value4
Value2
Value3
Value4
Value1
Message = Event type: ValueAApplication: ValueBApplication\Path: ValueCUser: ValueD
ValueB
ValueC
ValueA
ValueD

Related

workspace does not return the correct value for me

i have problem with some code.
If i write Recenzes select: [:a | a komponenta nazev = 'Hitachi P21'] i got some right records. But if i use something like this:
| brzdy |
brzdy := (((
(Sekces select: [:b | b nazev = 'Brzdy']) collect: [:b | b komponenty]) flatten)
select: [:c | c vyrobce nazev = 'Hitachi']) collect: [:d | d nazev].
i can get 'Hitachi P21' with ^ command. But if i use variable 'brzdy' here: Recenzes select: [:a | a komponenta nazev = brzdy] i won't get anything.
In a nutshell. I want to show 'Recenzes' for 'Komponenty' which are in 'Sekces' with value 'Brzdy' and they are saved in column 'Komponenty' (Set) for 'Recenzes' and 'Sekces'.
Does anyone know why?
Since brzdy is the result of a #collect: message, it is a collection of strings, not a single string. Therefore no element a would satisfy the condition a komponenta nazev = brzdy, because you would be comparing objects of different classes. Try something on the lines of
Recenzes select: [:a | brzdy includes: a komponenta nazev]
As a side note, remember that you may eliminate some parentheses by using select:thenCollect: other than (select: blah) collect: bluh. For instance
brzdy := (Sekces select: [:b | b nazev = 'Brzdy'] thenCollect: [:b | b komponenty]) flatten
select: [:c | c vyrobce nazev = 'Hitachi']
thenCollect: [:d | d nazev]
(I'm not familiar with the #flatten message, so I can't tell whether it is necessary or superfluous).

KQL: mv-expand OR bag_unpack equivalent command to convert a list to multiple columns

According to mv-expand documentation:
Expands multi-value array or property bag.
mv-expand is applied on a dynamic-typed column so that each value in the collection gets a separate row. All the other columns in an expanded row are duplicated.
Just like the mv-expand operator will create a row each for the elements in the list -- Is there an equivalent operator/way to make each element in a list an additional column?
I checked the documentation and found Bag_Unpack:
The bag_unpack plugin unpacks a single column of type dynamic by treating each property bag top-level slot as a column.
However, it doesn't seem to work on the list, and rather works on top-level JSON property.
Using bag_unpack (like the below query):
datatable(d:dynamic)
[
dynamic({"Name": "John", "Age":20}),
dynamic({"Name": "Dave", "Age":40}),
dynamic({"Name": "Smitha", "Age":30}),
]
| evaluate bag_unpack(d)
It will do the following:
Name Age
John 20
Dave 40
Smitha 30
Is there a command/way (see some_command_which_helps) I can achieve the following (convert a list to columns):
datatable(d:dynamic)
[
dynamic(["John", "Dave"])
]
| evaluate some_command_which_helps(d)
That translates to something like:
Col1 Col2
John Dave
Is there an equivalent where I can convert a list/array to multiple columns?
For reference: We can run the above queries online on Log Analytics in the demo section if needed (however, it may require login).
you could try something along the following lines
(that said, from an efficiency standpoint, you may want to check your options of restructuring the data set to begin with, using a schema that matches how you plan to actually consume/query it)
datatable(d:dynamic)
[
dynamic(["John", "Dave"]),
dynamic(["Janice", "Helen", "Amber"]),
dynamic(["Jane"]),
dynamic(["Jake", "Abraham", "Gunther", "Gabriel"]),
]
| extend r = rand()
| mv-expand with_itemindex = i d
| summarize b = make_bag(pack(strcat("Col", i + 1), d)) by r
| project-away r
| evaluate bag_unpack(b)
which will output:
|Col1 |Col2 |Col3 |Col4 |
|------|-------|-------|-------|
|John |Dave | | |
|Janice|Helen |Amber | |
|Jane | | | |
|Jake |Abraham|Gunther|Gabriel|
To extract key value pairs from text and convert them to columns without hardcoding the key names in query:
print message="2020-10-15T15:47:09 Metrics: duration=2280, function=WorkerFunction, count=0, operation=copy_into, invocationId=e562f012-a994-4fc9-b585-436f5b2489de, tid=lct_b62e6k59_prd_02, table=SALES_ORDER_SCHEDULE, status=success"
| extend Properties = extract_all(#"(?P<key>\w+)=(?P<value>[^, ]*),?", dynamic(["key","value"]), message)
| mv-apply Properties on (summarize make_bag(pack(tostring(Properties[0]), Properties[1])))
| evaluate bag_unpack(bag_)
| project-away message

Kusto: How to unpivot - turn columns into rows?

Using the StormEvents table on the Samples database on the help cluster:
StormEvents
| where State startswith "AL"
| where EventType has "Wind"
| where StartTime == "2007-01-02T02:16:00Z"
| project StartTime, State, EventType, InjuriesDirect, InjuriesIndirect, DeathsDirect, DeathsIndirect
I would like row-based output of the form:
I see the pivot() function, but it appears to only go the other direction, from rows to columns.
I've been trying various pack() ideas, but can't seem to get the required output.
Example:
StormEvents
| where State startswith "AL"
| where EventType has "Wind"
| where StartTime == "2007-01-02T02:16:00Z"
| project StartTime, State, EventType, InjuriesDirect, InjuriesIndirect, DeathsDirect, DeathsIndirect
| extend Packed = pack(
"CasualtyType", "InjuriesDirect", "CasualtyCount", InjuriesDirect,
"CasualtyType", "InjuriesIndirect", "CasualtyCount", InjuriesIndirect,
"CasualtyType", "DeathsDirect", "CasualtyCount", DeathsDirect,
"CasualtyType", "DeathsIndirect", "CasualtyCount", DeathsIndirect
)
| project-away InjuriesDirect, InjuriesIndirect, DeathsDirect, DeathsIndirect
| mv-expand Packed
This gives me too many rows, and it's not clear to me how to convert them to columns anyway.
What's a correct pattern to use for the required output?
you could try something along the following lines:
let casualty_types = dynamic(["InjuriesDirect", "DeathsDirect", "InjuriesIndirect", "DeathsIndirect"]);
StormEvents
| where State startswith "AL"
| where EventType has "Wind"
| where StartTime == "2007-01-02T02:16:00Z"
| project StartTime, State, EventType, properties = pack_all()
| mv-apply casualty_type = casualty_types to typeof(string) on (
project casualty_type, casualty_count = tolong(properties[casualty_type])
)
| project-away properties

WSO2BAM REST stream input to BAM/Cassandra; can't get to the EVENT_KS data using hive query?

The background for this question is essentially an article written by Sachini Jayasekara # WSO2 called Using Different Reporting Frameworks with WSO2 Business Activity Monitor . I am doing more or less exactly the same, but using rather the REST API to define a data stream and invoke the REST WS API to push data into BAM. Then use the HIVE queries to get to the data. However, it seems that I have missed something, as the attribute data is not shown. Hence the query.
Currently using the REST api which is invoked through a Perl based daemon. This invokes the REST API using the following streams definition and payload:
{
"name":"currentcostRealtime2.stream",
"version": "1.0.6",
"nickName": "Currentcost Realtime",
"description": "This is the Currentcost realtime stream",
"payloadData":[
{
"name":"sensor",
"type":"INT"
},
{
"name":"temp",
"type":"FLOAT"
},
{
"name":"timestamp",
"type":"STRING"
},
{
"name":"watt",
"type":"INT"
}
]
}
.. and payload definition ..
[
{
"payloadData" : [SENSOR, TEMP, "TIMESTAMP", WATT] ,
}
]
I should note that the payload is string replaced before its committed; e.g. the actual payload that is committed looks like:
[
{
"payloadData" : [1, 18.7, "2014-06-15 16:15:56", 1] ,
}
]
The queries execute with no apparent problem, but I am having now an issue with the HIVE query in BAM, which gives me entries output, but not the values. E.g. trying to now execute the following HIVE query:
CREATE TABLE IF NOT EXISTS CurrentCostDataTemp ( sensor INT, temp FLOAT, ts TIMESTAMP, watt INT )
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
"cassandra.port" = "9160",
"cassandra.ks.name" = "EVENT_KS",
"cassandra.ks.username" = "admin",
"cassandra.ks.password" = "admin",
"cassandra.cf.name" = "currentcostRealtime2_stream",
"cassandra.columns.mapping" = "payload_sensor, payload_temp, payload_timestamp, payload_watt" );
select * from CurrentCostDataTemp;
.. but this gives only the following (see specific picture below) - e.g. that there is NO attribute level data that is shown. However, it is evident that there are EVENT_KS entries given it outputs 4 rows.. so question is how do I reference the data to extract the values, or is there something else going on here that I am not aware of?:
key sensor temp ts watt
1402816273765::192.168.1.106::9443::52
1402815283659::192.168.1.106::9443::51
1402815238323::192.168.1.106::9443::49
1402815280532::192.168.1.106::9443::50
Have verified that the data is in Cassandra by checking with Cqlsh - see here:
cqlsh:EVENT_KS> select * from "currentcostRealtime_stream";
key | Description | Name | Nick_Name | StreamId | Timestamp | Version | meta_ipAdd | payload_sensor | payload_temp | payload_timestamp | payload_watt
----------------------------------------+-----------------------------------------+----------------------------+----------------------+----------------------------------+---------------+---------+------------+----------------+--------------+---------------------+--------------
1402815283659::192.168.1.106::9443::51 | This is the Currentcost realtime stream | currentcostRealtime.stream | Currentcost Realtime | currentcostRealtime.stream:1.0.5 | 1402815283659 | 1.0.5 | null | 1 | 18.7 | 2014-06-15 14:54:43 | 1
1402815238323::192.168.1.106::9443::49 | This is the Currentcost realtime stream | currentcostRealtime.stream | Currentcost Realtime | currentcostRealtime.stream:1.0.5 | 1402815238323 | 1.0.5 | null | 1 | 18.7 | 2014-06-15 14:53:58 | 1
1402815280532::192.168.1.106::9443::50 | This is the Currentcost realtime stream | currentcostRealtime.stream | Currentcost Realtime | currentcostRealtime.stream:1.0.5 | 1402815280532 | 1.0.5 | null | 1 | 18.7 | 2014-06-15 14:54:40 | 1
1402816273765::192.168.1.106::9443::52 | This is the Currentcost realtime stream | currentcostRealtime.stream | Currentcost Realtime | currentcostRealtime.stream:1.0.5 | 1402816273765 | 1.0.5 | null | 1 | 18.7 | 2014-06-15 15:11:13 | 1
(4 rows)
cqlsh:EVENT_KS>
Most likely a minor issue only that I have overseen, but would be great if someone else have seen this and could respond as well..
When adding in a remote table definition to MySQL DB externally, the tables and all are created, but it seems like the problem is getting to the attribute data in the EVENT_KS table itself, and having that created and accessed through the HIVE script.
Thanks in advance!
/Jorgen
[UPDATE - Thursday 19th - SOLVED] Got it working with a few hints to this question. The following code works fine now, which is great.. greatly appreciated for the time to respond from you guys..
drop table CurrentCostDataTemp10;
drop table CurrentCostDataTemp_Summary10;
CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataTemp10 ( messageRowID STRING, payload_sensor INT, messageTimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT )
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
"cassandra.port" = "9160",
"cassandra.ks.name" = "EVENT_KS",
"cassandra.ks.username" = "<USER>",
"cassandra.ks.password" = "<PASSWORD>",
"cassandra.cf.name" = "currentcostsimple5_stream",
"cassandra.columns.mapping" = ":key, payload_sensor, Timestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt" );
CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataTemp_Summary10 ( messageRowID STRING, payload_sensor INT, messageTimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT )
STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler'
TBLPROPERTIES (
'mapred.jdbc.driver.class' = 'com.mysql.jdbc.Driver',
'mapred.jdbc.url' = 'jdbc:mysql://127.0.0.1:8889/currentcost' ,
'mapred.jdbc.username' = '<USER>',
'mapred.jdbc.password' = '<PASSWORD>',
'hive.jdbc.update.on.duplicate'= 'true',
'hive.jdbc.primary.key.fields' = 'messageRowID',
'hive.jdbc.table.create.query' = 'CREATE TABLE CurrentCostDataTemp1 ( messageRowID VARCHAR(100) NOT NULL PRIMARY KEY, payload_sensor TINYINT(4), messageTimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql DATETIME, payload_watt INT ) ');
insert overwrite table CurrentCostDataTemp_Summary10 select messageRowID, payload_sensor, messageTimestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt FROM CurrentCostDataTemp10;
Using Different Reporting Frameworks with WSO2 Business Activity Monitor. By Sachini Jayasekara
I have amended your query as follows. Please try with that.
CREATE external TABLE IF NOT EXISTS CurrentCostDataTemp ( key string, sensor INT, temp FLOAT, ts TIMESTAMP, watt INT )
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
"cassandra.port" = "9160",
"cassandra.ks.name" = "EVENT_KS",
"cassandra.ks.username" = "admin",
"cassandra.ks.password" = "admin",
"cassandra.cf.name" = "currentcostRealtime2_stream",
"cassandra.columns.mapping" = ":key,payload_sensor, payload_temp, payload_timestamp, payload_watt" );
select * from CurrentCostDataTemp;
Try changing the 1st line of the script as follows.
CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataTemp (key STRING, sensor INT, temp FLOAT, ts TIMESTAMP, watt INT)
(Remove key STRING part if it gives errors.)
Note: May be you will have to run DROP TABLE CurrentCostDataTemp before running above, in case it is already created, when you run it before.

Universal SQL query for different input elements

I have a SQL query problem with the following abstract sample context: There are 2 different input data for my sql query defined as ''MainElement'' with key 123 for the one and 789 for the other main element.
Further I have a table called Relation with columns pk, FirstElement, SecondElement and ThirdElement.
Furthermore there is a table called Props with the columns pk, name and valueString. The special feature about this context is that column name in Props defines 2 further columns called 4thElement and 5thElementof table Relation as a row with its values in column valueString .
|pk | 1stElement | 2ndElement | 3rdElement |
|abc|-----123----|-----456----|-----null---|
|def|-----789----|-----101112-|---131415---|
|Pk | Name | ValueString |
|def|4thElement|161718---|
|def|5thElement|ghi------|
As you can see the MainElement 789 has a value for 4thElement and 5thElement in Props, but MainElement 123 hasn't any value in Props.
What I need is an universal SQL query with input value 1stElement e.g., 123 or 789 that returns me a result for both main elements independent of the fact that MainElement 123 hasn't any value in Props.
Sample result:
|1stElement | 2ndElement | 3rdElement | 4thElement | 5thElement |
|123--------|------456---|-----null---|---null-----|----null----|
|789--------|----101112--|---131415---|---161718---|----ghi-----|
I am using Oracle SQL Developer.
Select
rel.1stElement,
....
From
Relation rel,
Props pro,
Where
?
Thanks in advance.
This should do the work, this is typically a pivot query that you need:
SELECT rel.pk, rel.1stElement, rel.2ndElement, rel.3rdElement
, MAX(CASE WHEN pro.Name = '4thElement'
THEN pro.ValueString
ELSE NULL
END) as 4thElement
, MAX(CASE WHEN pro.Name = '5thElement'
THEN pro.ValueString
ELSE NULL
END) as 5thElement
FROM Relation rel
LEFT OUTER JOIN Props pro
ON rel.pk = pro.pk
GROUP BY rel.pk, rel.1stElement, rel.2ndElement, rel.3rdElement