KUSTO QUERY LANGUAGE (KQL) - Cannot unpack the dictionary - kql

I have a table wherein i uploaded the file with a column converted to json. but when im tryin to upack it using KQL. it does not work.
Id Query Name Workitem Id Logged Date Details
0 Bug Stats 111 2022-06-08T02:26:43.111196Z {'AssignedTo': 'me', 'ClosedDate': None, 'CreatedDate': '2022-03-08T19:28:15.673Z', 'StartDate': None, 'State': 'For Review', 'Tags': 'tags', 'Title': 'Title', 'WorkItemType': 'Bug'}
Returns nothing
Datatable
|extend Details = parse_json(Details) //i tried todynamic() as well but same response
|evaluate bag_unpack(Details)
Returns Error:
Datatable
|evaluate bag_unpack(Details)
ERROR:
Semantic error: evaluate bag_unpack(): the following error(s) occurred while evaluating the output schema: evaluate bag_unpack(): argument #1 expected to be a reference to a dynamic column. Query: 'DevopsQueriesTest |evaluate bag_unpack(Details) '

you do need to first invoke parse_json() on Details, for creating a dynamic value out of a string value.
see: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/parsejsonfunction
the input string you have isn't a valid JSON payload - it uses single quotes instead of double quotes.
it's best that you fix the component that generates the data, and ingest valid payloads instead of invalid ones.
if, for whatever reason, you can't/won't do so - and you prefer paying a performance hit as part of your queries, you can use the translate() function, for example.
see: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/translatefunction
Moreover, the None in your payload is invalid too - it needs to be encapsulated with double quotes for the JSON payload to become valid.
you can use the replace_string() function to fix that at query runtime.
see: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/replace-string-function
print s = "{'AssignedTo': 'me', 'ClosedDate': None, 'CreatedDate': '2022-03-08T19:28:15.673Z', 'StartDate': None, 'State': 'For Review', 'Tags': 'tags', 'Title': 'Title', 'WorkItemType': 'Bug'}"
| project s = parse_json(replace_string(translate("'", '"', s), "None", '"None"'))
| evaluate bag_unpack(s)
AssignedTo
ClosedDate
CreatedDate
StartDate
State
Tags
Title
WorkItemType
me
None
2022-03-08 19:28:15.6730000
None
For Review
tags
Title
Bug

Related

How do you convert text column data with Ruby JSON format ("key" => "value") to standard JSON?

I have data in the comment column of the payments table. The data is stored as plain text in the following format:
{"foo"=>"bar"}
I need to query the value of the specific "foo" key and tried the following:
select comment::json -> 'foo' from payments
but because the data stored is not in JSON format I get the following error:
invalid input syntax for type json DETAIL: Token "=" is invalid. CONTEXT: JSON data, line 1: {"foo"=>"bar"}
which refers to the => that Ruby uses for Hashes.
Is there a way to convert the text data to JSON data on-the-fly so I can then access the specific keys I need?
You can replace the => with a : to make that example a valid JSON value:
replace(comment, '=>', ':')::jsonb ->> 'foo'
It sounds like the data is technically valid ruby which means we can do something a bit clever.
require 'json'
def parse_data(data_string)
eval(data_string).to_json
end
Should do the trick so long as the data is trusted.

How to extract data from array in a JSON message using CloudWatch Logs Insights?

I log messages that are JSON objects. The JSON has an array that contains key/value pairs:
{
...
"arr": [{"key": "foo", "value": "bar"}, ...],
...
}
Now I want to filter results that contains a specific key and extract the values for a specific key in the array.
I've tried using regex, something like parse #message /.*"key":"my_specific_key","value":(?<value>.*}).*/ which extracts the value but also returns the rest of the message. Also it doesn't filter the results.
How can I filter results and extract the values for a specific key?
If in your log entry in the cloudwatch log group they are actually showing up as json, you can just reference the key directly in any place you would a field.
(don't need the #, cloudwatch appends that automatically to all default values)
If you are using python, you can use aws_lambda_powertools to do this as well, in a very slick way (and its an actual aws product)
If they are showing up in your log as a string, then it may be an escaped string and you'll have to match it -exactly- - including spaces and what not. when you parse, you will want to do something like this:
if this is the string of your log message '{"AKey" : "AValue", "Key2" : "Value2"}
parse #message "{\"*\" : \"*\",\"*\" : \"*\"} akey, akey_value, key2, key2_value
then you can filter or count or anything against those variables. parse is specifically a statement to match a pattern and assign the wildcard to a variable, one at a time in order
tho with a complex json, if your above regex works than all you need is a filter statement
field #message
| pares #message ... your regex as value_var
| filer value_var /some more regex/
if its not a string in the log entry, but an actual json, you can just reference against the key:
filter a_key ~="some value" (or regex here)
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_AnalyzeLogData-discoverable-fields.html
for more info

GCP BigQuery throwing Invalid URL while trying get_status of the schema even though the schema exists

The schema very well exists and I can copy and run on the BigQuery.
In the requirements.txt we are using the latest package (google-cloud-bigquery)
exception Invalid URL 'None/bigquery/v2/projects/XXXXXX/XXXXX
No schema supplied. Perhaps you meant http://None/bigquery/v2/projects/XXXX/XXX
Code snippet:
client = bigquery.Client(project="XXXXXXXX")
client_bq.get_table(table_ref)
It's not clear to me what exactly you are passing with that variable table_ref, so I'm going to explain what that method is expecting, hope that helps you to get it right.
Note: be careful your client object is used as client_bq, that won't
run.
The method client.get_table(table_ref) (docs) is expecting a table_ref that according to the documentation is a pointer to a table, and if table_ref is a string, it must include a project ID, dataset ID, and table ID, each separated by a . (dot), for example, from a public dataset of BigQuery:
table_ref = "bigquery-public-data.noaa_historic_severe_storms.storms_2019"
Given this table_ref the following code should work:
client = bigquery.Client(project="my_project_id")
client.get_table(table_ref)
On the other hand, if you want to create table_ref as a Table object, notice that you don't need to provide an schema, this parameter is optional (docs).
table_ref = bigquery.table.Table("bigquery-public-data.noaa_historic_severe_storms.storms_2019")
If your goal is create a valid schema, in the docs states that this is build with SchemaField objects, the documentation is the best reference at this point, if you want a sample you can get it by saving the Table object and getting the schema attribute as follows:
>>> table = client.get_table(table_ref)
>>> table.schema
(running the code in a Python console)
Sample Output:
[SchemaField('magnitude', 'FLOAT', 'NULLABLE', 'Description', ()),
SchemaField('magnitude_type', 'STRING', 'NULLABLE', 'Description', ()),
SchemaField('flood_cause', 'STRING', 'NULLABLE', 'Description', ())]

pandas to gbq claims a schema mismatch while the schema's are exactly the same. On github all the issues are claimed to have been solved in 2017

I am trying to append a table to a different table through pandas, pulling the data from BigQuery and sending it to a different BigQuery dataset. While the table schema is exactly the same i get the error " "Please verify that the structure and "
pandas_gbq.gbq.InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table."
This error occurred earlier where I went for table overwrites but in this case the datasets are too large to do that (and that is not a sustainable solution).
df = pd.read_gbq(query, project_id="my-project", credentials=bigquery_key,
dialect='standard')
pd.io.gbq.to_gbq(df, dataset, projectid,
if_exists='append',
table_schema=[{'name': 'Date','type': 'STRING'},
{'name': 'profileId','type': 'STRING'},
{'name': 'Opco','type': 'STRING'},
{'name': 'country','type': 'STRING'},
{'name': 'deviceType','type': 'STRING'},
{'name': 'userType','type': 'STRING'},
{'name': 'users','type': 'INTEGER'},
{'name': 'sessions','type': 'INTEGER'},
{'name': 'bounceRate','type': 'FLOAT'},
{'name': 'sessionsPerUser','type': 'FLOAT'},
{'name': 'avgSessionDuration','type': 'FLOAT'},
{'name': 'pageviewsPerSession','type': 'FLOAT'}
],
credentials=bigquery_key)
The schema in BigQuery is as follows:
Date STRING
profileId STRING
Opco STRING
country STRING
deviceType STRING
userType STRING
users INTEGER
sessions INTEGER
bounceRate FLOAT
sessionsPerUser FLOAT
avgSessionDuration FLOAT
pageviewsPerSession FLOAT
I then get the following error:
Traceback (most recent call last): File "..file.py", line 63, in
<module>
main()
File "..file.py", line 57, in main
updating_general_data(bigquery_key)
File "..file.py", line 46, in updating_general_data
credentials=bigquery_key)
File
"..\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\io\gbq.py",
line 162, in to_gbq
credentials=credentials, verbose=verbose, private_key=private_key)
File
"..\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas_gbq\gbq.py",
line 1141, in to_gbq
"Please verify that the structure and " pandas_gbq.gbq.InvalidSchema: Please verify that the structure and
data types in the DataFrame match the schema of the destination table.
To me it seems that there is a 1 on 1 match. I've seen other threads talk about this and these threads are mainly talking about date formats even though the date format is already a string in this case and is then with the table_schema still made as string.
Ultimate solution to this is instead of manually specifying schema which can always be prone to type casting/naming errors, Its always best top get the schema from the very table. So have a client using latest version of API:
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file(
'credentials.json')
project_id = 'your_project_id',
client = bigquery.Client(credentials= credentials,project=project_id)
Get the table you want to write/append to:
table = client.get_table('your_dataset.your_table')
table
Generate schema from the table:
generated_schema = [{'name':i.name, 'type':i.field_type} for i in table.schema]
generated_schema
Rename your dataframe accordingly:
data.columns = [i.name for i in table.schema]
Pass the same schema while pushing it to BigQuery:
data.to_gbq(project_id = 'your_project_id',
destination_table = 'your_dataset.your_table',
credentials = service_account.Credentials.from_service_account_file(
'credentials.json'),
table_schema = generated_schema,
progress_bar = True,
if_exists = 'replace')
I had real trouble with this, and fixed it by getting pandas-gbq to create the database rather than making it in UI and trying to match the schema
I had a column in my dataframe called "No." Deleting the period got rid of the issue and the schema was inferred.
Most likely, the problem arises because the column names in the DataFrame and Schema do not match

BigQuery UDF Internal Error

We had a simple UDF in BigQuery that somehow throws an error that keeps returning
Query Failed
Error: An internal error occurred and the request could not be completed.
The query was simply trying to use UDF to perform a SHA256.
SELECT
input AS title,
input_sha256 AS title_sha256
FROM
SHA256(
SELECT
title AS input
FROM
[bigquery-public-data:hacker_news.stories]
GROUP BY
input
)
LIMIT
1000
The in-line UDF is pasted below. However I can not post the full UDF as StackOverflow complaints too much code in the post. The full UDF can be seen this gist.
function sha256(row, emit) {
emit(
{
input: row.input,
input_sha256: CryptoJS.SHA256(row.input).toString(CryptoJS.enc.Hex)
}
);
}
bigquery.defineFunction(
'SHA256', // Name of the function exported to SQL
['input'], // Names of input columns
[
{'name': 'input', 'type': 'string'},
{'name': 'input_sha256', 'type': 'string'}
],
sha256 // Reference to JavaScript UDF
);
Not sure if it helps, but the Job-ID is
bigquery:bquijob_7fd3b51c_153c058dc7c
Looks like there is a similar issue at:
https://code.google.com/p/google-bigquery/issues/detail?id=478
Short answer - this is an issue related to memory allocation that I uncovered via my own testing and fixed today, but it will take a little while to flow out to production.
Slightly longer answer - we just rolled out a fix today for an issue where users who were having "out of memory" issues when scaling up their UDFs over larger number of rows, even though the UDF would succeed on smaller numbers of rows. The queries that were hitting that condition are now running fine on our internal / test trees. However, since public BigQuery hosts have much higher traffic loads, the JavaScript engine that executes the UDFs (V8) behaves somewhat differently in production than it does in internal trees. Specifically, there's a new memory allocation error that some of the previously OOMing jobs are now hitting that we couldn't observe until the queries ran on a fully-loaded tree.
It's a minor error with a quick fix, but we'd ideally let it flow through our regular testing and QA cycle. This should put the fix in production in about a week, assuming nothing else goes wrong with the candidate. Would that be acceptable for you?
i am re-using answer box to provide full query text. it works if uncomment LIMIT 40
SELECT input, input_sha256 FROM JS(
(
SELECT title AS input
FROM [bigquery-public-data:hacker_news.stories]
GROUP BY input
//LIMIT 40
),
input,
"[ {'name': 'input', 'type': 'string'}, {'name': 'input_sha256', 'type': 'string'} ] ",
"function(row, emit) {
var CryptoJS=CryptoJS||function(h,s){var f={},g=f.lib={},q=function(){},m=g.Base={extend:function(a){q.prototype=this;var c=new q;a&&c.mixIn(a);c.hasOwnProperty('init')||(c.init=function(){c.$super.init.apply(this,arguments)});c.init.prototype=c;c.$super=this;return c},create:function(){var a=this.extend();a.init.apply(a,arguments);return a},init:function(){},mixIn:function(a){for(var c in a)a.hasOwnProperty(c)&&(this[c]=a[c]);a.hasOwnProperty('toString')&&(this.toString=a.toString)},clone:function(){return this.init.prototype.extend(this)}}, r=g.WordArray=m.extend({init:function(a,c){a=this.words=a||[];this.sigBytes=c!=s?c:4*a.length},toString:function(a){return(a||k).stringify(this)},concat:function(a){var c=this.words,d=a.words,b=this.sigBytes;a=a.sigBytes;this.clamp();if(b%4)for(var e=0;e<a;e++)c[b+e>>>2]|=(d[e>>>2]>>>24-8*(e%4)&255)<<24-8*((b+e)%4);else if(65535<d.length)for(e=0;e<a;e+=4)c[b+e>>>2]=d[e>>>2];else c.push.apply(c,d);this.sigBytes+=a;return this},clamp:function(){var a=this.words,c=this.sigBytes;a[c>>>2]&=4294967295<< 32-8*(c%4);a.length=h.ceil(c/4)},clone:function(){var a=m.clone.call(this);a.words=this.words.slice(0);return a},random:function(a){for(var c=[],d=0;d<a;d+=4)c.push(4294967296*h.random()|0);return new r.init(c,a)}}),l=f.enc={},k=l.Hex={stringify:function(a){var c=a.words;a=a.sigBytes;for(var d=[],b=0;b<a;b++){var e=c[b>>>2]>>>24-8*(b%4)&255;d.push((e>>>4).toString(16));d.push((e&15).toString(16))}return d.join('')},parse:function(a){for(var c=a.length,d=[],b=0;b<c;b+=2)d[b>>>3]|=parseInt(a.substr(b, 2),16)<<24-4*(b%8);return new r.init(d,c/2)}},n=l.Latin1={stringify:function(a){var c=a.words;a=a.sigBytes;for(var d=[],b=0;b<a;b++)d.push(String.fromCharCode(c[b>>>2]>>>24-8*(b%4)&255));return d.join('')},parse:function(a){for(var c=a.length,d=[],b=0;b<c;b++)d[b>>>2]|=(a.charCodeAt(b)&255)<<24-8*(b%4);return new r.init(d,c)}},j=l.Utf8={stringify:function(a){try{return decodeURIComponent(escape(n.stringify(a)))}catch(c){throw Error('Malformed UTF-8 data');}},parse:function(a){return n.parse(unescape(encodeURIComponent(a)))}}, u=g.BufferedBlockAlgorithm=m.extend({reset:function(){this._data=new r.init;this._nDataBytes=0},_append:function(a){'string'==typeof a&&(a=j.parse(a));this._data.concat(a);this._nDataBytes+=a.sigBytes},_process:function(a){var c=this._data,d=c.words,b=c.sigBytes,e=this.blockSize,f=b/(4*e),f=a?h.ceil(f):h.max((f|0)-this._minBufferSize,0);a=f*e;b=h.min(4*a,b);if(a){for(var g=0;g<a;g+=e)this._doProcessBlock(d,g);g=d.splice(0,a);c.sigBytes-=b}return new r.init(g,b)},clone:function(){var a=m.clone.call(this); a._data=this._data.clone();return a},_minBufferSize:0});g.Hasher=u.extend({cfg:m.extend(),init:function(a){this.cfg=this.cfg.extend(a);this.reset()},reset:function(){u.reset.call(this);this._doReset()},update:function(a){this._append(a);this._process();return this},finalize:function(a){a&&this._append(a);return this._doFinalize()},blockSize:16,_createHelper:function(a){return function(c,d){return(new a.init(d)).finalize(c)}},_createHmacHelper:function(a){return function(c,d){return(new t.HMAC.init(a, d)).finalize(c)}}});var t=f.algo={};return f}(Math);
(function(h){for(var s=CryptoJS,f=s.lib,g=f.WordArray,q=f.Hasher,f=s.algo,m=[],r=[],l=function(a){return 4294967296*(a-(a|0))|0},k=2,n=0;64>n;){var j;a:{j=k;for(var u=h.sqrt(j),t=2;t<=u;t++)if(!(j%t)){j=!1;break a}j=!0}j&&(8>n&&(m[n]=l(h.pow(k,0.5))),r[n]=l(h.pow(k,1/3)),n++);k++}var a=[],f=f.SHA256=q.extend({_doReset:function(){this._hash=new g.init(m.slice(0))},_doProcessBlock:function(c,d){for(var b=this._hash.words,e=b[0],f=b[1],g=b[2],j=b[3],h=b[4],m=b[5],n=b[6],q=b[7],p=0;64>p;p++){if(16>p)a[p]= c[d+p]|0;else{var k=a[p-15],l=a[p-2];a[p]=((k<<25|k>>>7)^(k<<14|k>>>18)^k>>>3)+a[p-7]+((l<<15|l>>>17)^(l<<13|l>>>19)^l>>>10)+a[p-16]}k=q+((h<<26|h>>>6)^(h<<21|h>>>11)^(h<<7|h>>>25))+(h&m^~h&n)+r[p]+a[p];l=((e<<30|e>>>2)^(e<<19|e>>>13)^(e<<10|e>>>22))+(e&f^e&g^f&g);q=n;n=m;m=h;h=j+k|0;j=g;g=f;f=e;e=k+l|0}b[0]=b[0]+e|0;b[1]=b[1]+f|0;b[2]=b[2]+g|0;b[3]=b[3]+j|0;b[4]=b[4]+h|0;b[5]=b[5]+m|0;b[6]=b[6]+n|0;b[7]=b[7]+q|0},_doFinalize:function(){var a=this._data,d=a.words,b=8*this._nDataBytes,e=8*a.sigBytes; d[e>>>5]|=128<<24-e%32;d[(e+64>>>9<<4)+14]=h.floor(b/4294967296);d[(e+64>>>9<<4)+15]=b;a.sigBytes=4*d.length;this._process();return this._hash},clone:function(){var a=q.clone.call(this);a._hash=this._hash.clone();return a}});s.SHA256=q._createHelper(f);s.HmacSHA256=q._createHmacHelper(f)})(Math);
(function(){var h=CryptoJS,j=h.lib.WordArray;h.enc.Base64={stringify:function(b){var e=b.words,f=b.sigBytes,c=this._map;b.clamp();b=[];for(var a=0;a<f;a+=3)for(var d=(e[a>>>2]>>>24-8*(a%4)&255)<<16|(e[a+1>>>2]>>>24-8*((a+1)%4)&255)<<8|e[a+2>>>2]>>>24-8*((a+2)%4)&255,g=0;4>g&&a+0.75*g<f;g++)b.push(c.charAt(d>>>6*(3-g)&63));if(e=c.charAt(64))for(;b.length%4;)b.push(e);return b.join('')},parse:function(b){var e=b.length,f=this._map,c=f.charAt(64);c&&(c=b.indexOf(c),-1!=c&&(e=c));for(var c=[],a=0,d=0;d< e;d++)if(d%4){var g=f.indexOf(b.charAt(d-1))<<2*(d%4),h=f.indexOf(b.charAt(d))>>>6-2*(d%4);c[a>>>2]|=(g|h)<<24-8*(a%4);a++}return j.create(c,a)},_map:'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='}})();
emit( { input: row.input, input_sha256: CryptoJS.SHA256(row.input).toString(CryptoJS.enc.Hex) } );
}"
)