Kusto | calculate percentage grouped by 2 columns - kql

I have a result set that look something similar to the table below and I extended with Percentage like so:
datatable (Code:string, App:string, Requests:long)
[
"200", "tra", 63,
"200", "api", 1036,
"302", "web", 12,
"200", "web", 219,
"500", "web", 2,
"404", "api", 18
]
| as T
| extend Percentage = round(100.0 * Requests / toscalar(T | summarize sum(Requests)), 2)
The problem is I really want the percentage to be calculated from the total of Requests of the Code by App rather than the grand total.
For example, for the App "api" where Code is "200", instead of 76.74% of the total, I want to express it as a percentage of just the "api" Code values, which would be 98.29% of the total Requests for App "api".
I haven't really tried anything that would be considered valid syntax. Any help much appreciated.

you can use the join or lookup operators:
datatable (Code:string, App:string, Requests:long)
[
"200", "tra", 63,
"200", "api", 1036,
"302", "web", 12,
"200", "web", 219,
"500", "web", 2,
"404", "api", 18
]
| as T
| lookup ( T | summarize sum(Requests) by App ) on App
| extend Percentage = round(100.0 * Requests / sum_Requests, 2)
| project Code, App, Requests, Percentage
Code
App
Requests
Percentage
200
api
1036
98.29
404
api
18
1.71
200
tra
63
100
302
web
12
5.15
200
web
219
93.99
500
web
2
0.86

Related

Azure Data Factory: ErrorCode=TypeConversionFailure,Exception occurred when converting value : ErrorCode: 2200

Can somoene let me know why Azure Data Factory is trying to convert a value from String to type Double.
I am getting the error:
{
"errorCode": "2200",
"message": "ErrorCode=TypeConversionFailure,Exception occurred when converting value '+44 07878 44444' for column name 'telephone2' from type 'String' (precision:255, scale:255) to type 'Double' (precision:15, scale:255). Additional info: Input string was not in a correct format.",
"failureType": "UserError",
"target": "Copy Table to EnrDB",
"details": [
{
"errorCode": 0,
"message": "'Type=System.FormatException,Message=Input string was not in a correct format.,Source=mscorlib,'",
"details": []
}
]
}
My Sink looks like the following:
I don't have any mapping set
The column setting for the the field 'telephone2' is as follows:
I changed the 'table option' to none, however I got the following error:
{
"errorCode": "2200",
"message": "Failure happened on 'Source' side. ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed with the following error: 'Internal system error occurred.\r\nStatement ID: {C2C38377-5A14-4BB7-9298-28C3C351A40E} | Query hash: 0x2C885D2041993FFA | Distributed request ID: {6556701C-BA76-4D0F-8976-52695BBFE6A7}. Total size of data scanned is 134 megabytes, total size of data moved is 102 megabytes, total size of data written is 0 megabytes.',Source=,''Type=System.Data.SqlClient.SqlException,Message=Internal system error occurred.\r\nStatement ID: {C2C38377-5A14-4BB7-9298-28C3C351A40E} | Query hash: 0x2C885D2041993FFA | Distributed request ID: {6556701C-BA76-4D0F-8976-52695BBFE6A7}. Total size of data scanned is 134 megabytes, total size of data moved is 102 megabytes, total size of data written is 0 megabytes.,Source=.Net SqlClient Data Provider,SqlErrorNumber=75000,Class=17,ErrorCode=-2146232060,State=1,Errors=[{Class=17,Number=75000,State=1,Message=Internal system error occurred.,},{Class=0,Number=15885,State=1,Message=Statement ID: {C2C38377-5A14-4BB7-9298-28C3C351A40E} | Query hash: 0x2C885D2041993FFA | Distributed request ID: {6556701C-BA76-4D0F-8976-52695BBFE6A7}. Total size of data scanned is 134 megabytes, total size of data moved is 102 megabytes, total size of data written is 0 megabytes.,},],'",
"failureType": "UserError",
"target": "Copy Table to EnrDB",
"details": []
}
Any more thoughts
The issue was resolved by changing the column DataType on the database to match the DataType recorded in Azure Data Factory i.e StringType

Find and compare formatting standards

I'm looking for a web resource where you can preview what code would look like in different formatting standards so I can choose and possibly set it up in Prettier.
I recently came across this code and thought the formatting looked amazing.
{name: 'year', size: 365 * 24 * 60 * 60 * 1},
{name: 'day', size: 24 * 60 * 60 * 1},
{name: 'hour', size: 60 * 60 * 1},
{name: 'minute', size: 60 * 1},
{name: 'second', size: 1}
I think it was formatted customly but I've seen people do this kind of alignment when assigning values to variables with different name sizes like
const bob = "bob";
const alice = "alice";
If anyone could share some insight to the name of that I would greatly appreciate it.
And if you have any recommendations on which code formatting standard to generally follow when doing web dev feel free to share.

BigQuery - Extract all array elements inside the array Dynamically

I have already posted this question, but now Im trying to achieve this with BigQuery.
I have JSON data like below.
{
"userid": null,
"appnumber": "9",
"trailid": "1547383536",
"visit": [{
"visitNumber": "1",
"time": "0",
"hour": "18",
"minute": "15"
},
{
"visitNumber": "2",
"time": "2942",
"hour": "18",
"minute": "15"
}
]
}
I want to extract the visit array values dynamically.
Like below: (pipe demited column)
userid,appnumber| trailid |
visit.visitnumber | visit.time | visit.hour | visit.minute |
visit.visitnumber | visit.time | visit.hour | visit.minute
If you see I have 2 json elements inside the visit array. So I want to extract visitNumber, time, hour, minute dynamically. Sometime I may have 3 or 5 values inside the array, so It should extract all 3 or 5 json automatically(I mean dynamically).
There is a common way to extract this like JsonExtractScalar(JsonExtract(visit,'$.[0].visitnumber') (This syntax may be wrong, but similar syntax we'll use. So here Im manually using [0] to extract fist element in the array.
If it has 10+ elements then I should use [0]...[1]...[2]....[10]. This thing I want to solve, somehow without mentioning all elements it should dynamically pick all 10 elements and extract it.
Could someone help me with the extract queries?

Column names when exporting ORC files from hive server 2 using beeline

I am facing a problem where exporting results from hive server 2 to ORC files show some kind of default column names (e.g. _col0, _col1, _col2) instead of the original ones created in hive. We are using pretty much default components from HDP-2.6.3.0.
I am also wondering if the below issue is related:
https://issues.apache.org/jira/browse/HIVE-4243
Below are the steps we are taking:
Connecting:
export SPARK_HOME=/usr/hdp/current/spark2-client
beeline
!connect jdbc:hive2://HOST1:2181,HOST2:2181,HOST2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Creating test table and inserting sample values:
create table test(str string);
insert into test values ('1');
insert into test values ('2');
insert into test values ('3');
Running test query:
select * from test;
+-----------+--+
| test.str |
+-----------+--+
| 1 |
| 2 |
| 3 |
+-----------+--+
Exporting as ORC:
insert overwrite directory 'hdfs://HOST1:8020/tmp/test' stored as orc select * from test;
Getting the results:
hdfs dfs -get /tmp/test/000000_0 test.orc
Checking the results:
java -jar orc-tools-1.4.1-uber.jar data test.orc
Processing data file test.orc [length: 228]
{"_col0":"1"}
{"_col0":"2"}
{"_col0":"3"}
java -jar orc-tools-1.4.1-uber.jar meta test.orc
Processing data file test.orc [length: 228]
Structure for test.orc
File Version: 0.12 with HIVE_13083
Rows: 2
Compression: SNAPPY
Compression size: 262144
Type: struct<_col0:string>
Stripe Statistics:
Stripe 1:
Column 0: count: 2 hasNull: false
Column 1: count: 2 hasNull: false min: 1 max: 3 sum: 2
File Statistics:
Column 0: count: 2 hasNull: false
Column 1: count: 2 hasNull: false min: 1 max: 3 sum: 2
Stripes:
Stripe: offset: 3 data: 11 rows: 2 tail: 60 index: 39
Stream: column 0 section ROW_INDEX start: 3 length 11
Stream: column 1 section ROW_INDEX start: 14 length 28
Stream: column 1 section DATA start: 42 length 5
Stream: column 1 section LENGTH start: 47 length 6
Encoding column 0: DIRECT
Encoding column 1: DIRECT_V2
File length: 228 bytes
Padding length: 0 bytes
Padding ratio: 0%
Looking at the results I can see _col0 as the column name while expecting the original str.
Any ideas on what I am missing?
Update
I noticed that the connection from beeline was going to hive 1.x, and not 2.x as wanted. I changed the connection to the Hive Server 2 Interactive URL:
Connected to: Apache Hive (version 2.1.0.2.6.3.0-235)
Driver: Hive JDBC (version 1.21.2.2.6.3.0-235)
Transaction isolation: TRANSACTION_REPEATABLE_READ
And tried again with the same sample. It even prints out the schema correctly:
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:test.str, type:string, comment:null)], properties:null)
But still no luck in getting it to the ORC file.
Solution
You need to enable Hive LLAP (Interactive SQL) in Ambari, then change the connection string you are using. For example, my connection became jdbc:hive2://.../;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-hive2
Note the additional "-hive2" at the end of the URL. Here is a tutorial vid from hortonworks.
"Proof"
After connecting to the updated Hive endpoint, I ran
create table t_orc(customer string, age int) stored as orc;
insert into t_orc values('bob', 12),('kate', 15);
Then
~$ hdfs dfs -copyToLocal /apps/hive/warehouse/t_orc/000000_0 ~/tmp/orc/hive2.orc
~$ orc-metadata tmp/orc/hive2.orc
{ "name": "tmp/orc/hive2.orc",
"type": "struct<customer:string,age:int>",
"rows": 2,
"stripe count": 1,
"format": "0.12", "writer version": "HIVE-13083",
"compression": "zlib", "compression block": 262144,
"file length": 305,
"content": 139, "stripe stats": 46, "footer": 96, "postscript": 23,
"row index stride": 10000,
"user metadata": {
},
"stripes": [
{ "stripe": 0, "rows": 2,
"offset": 3, "length": 136,
"index": 67, "data": 23, "footer": 46
}
]
}
Where orc-metadata is a tool distributed by the ORC repo on github.
You have to set this in hive script or hive-shell otherwise put it in a .hiverc file in your main directory or any of the other hive user properties files.
set hive.cli.print.header=true;

How to store sorted set of objects in redis?

I would like to know how to store a list of objects in Redis. That is I have a key like this.
users:pro
{
name: "Bruce", age: "20", score: 100,
name: "Ed", age: "22", score: 80
}
Where I will want to store a list of hashes as value of a particular key. I would like to use the score field as the score field in the sorted set. How could I accomplish this?
I have seen writing a putting a single hash for a key, but what if I want multiple hashes and one of the hash fields must act as a score field for the sorted set?
Using a single key to store all your hashes will require some serialization as Redis doesn't support nested data structures. The result would be the following:
key: users:pro
|
+-----> field value
name:Bruce "age: 20, score: 100"
name:Ed "age: 22, score: 80"
> HMSET users:pro name:Bruce "age: 20, score: 100" name:Ed "age:22, score:80"
The corresponding Sorted Set would be:
key: users:pro.by_scores
|
+---> scores: 80 100
+---> values: "name:Ed" "name:Bruce"
> ZADD users:pro.by_scores 80 "name:Ed" 100 "name:Bruce"
Note 1: this approach mandates a unique ID per-user, currently the name property is used which could be problematic.
Note 2: to avoid the serialization (and deserialization), you can consider using a dedicated key per user. That means doing:
key: users:pro:Bruce
|
+-----> field value
age 20
score 100
key: users:pro:Ed
|
+-----> field value
age 22
score 80
> HMSET users:pro:Bruce age 20 score 100
> HMSET users:pro:Ed age 22 score 80
key: users:pro.by_scores
|
+---> scores: 80 100
+---> values: "users:pro:Ed" "users:pro:Bruce"
> ZADD users:pro.by_scores 80 "users:pro:Ed" 100 "users:pro:Bruce"