AWS Cloudwatch Insights - parse a string as JSON - amazon-cloudwatch

Sending JSON logs to AWS Cloudwatch - mostly it works great, but once in awhile, I may get a log entry that isn't quite pure JSON (or at least, oddly formatted). Here's an example of a single log entry from a Slack bot:
{"message": "Unhandled request ({'token': 'ezyBLAHBLAHBLAHDSDFL59', 'team_id': 'TF3BLAHBLAH', 'api_app_id': 'A01EBLAHBLAH', 'event': {'client_msg_id': '5ablahbd-blah-blah-blah-ffe18343blah', 'type': 'message', 'text': 'thanks', 'user': 'UFBLAHBLAH', 'ts': '1605733337.001300', 'team': 'TF3BLAHBLAH', 'blocks': [{'type': 'rich_text', 'block_id': 'gucN', 'elements': [{'type': 'rich_text_section', 'elements': [{'type': 'text', 'text': 'thanks'}]}]}], 'channel': 'D01BLAHBLAH', 'event_ts': '1605733337.001300', 'channel_type': 'im'}, 'type': 'event_callback', 'event_id': 'Ev0BLAHBLAH', 'event_time': 1605733337, 'authorizations': [{'enterprise_id': None, 'team_id': 'TFBLAHBLAH', 'user_id': 'U01BLAHBLAH', 'is_bot': True, 'is_enterprise_install': False}], 'is_ext_shared_channel': False, 'event_context': '1-message-TFBLAHBLAHV-D0BLAHBLAH'})", "level": "WARNING", "name": "slack_bolt.App", "time": "2020-11-18T21:08:18.184+00:00"}
So it is valid JSON, and Cloudwatch correctly parses what is there, but the bulk of the details of the unhandled request are trapped inside a string:
"message" : "Unhandled request(<lots_of_json>)"
"level": "WARNING"
"name": "slack_bolt.App"
"time": "2020-11-18T21:08:18.184+00:00"
What I WANT to get out of there is the <lots_of_json> part, and I want to have it interpreted as JSON - be able to report, sort, and aggregate on those fields, etc.
I can get about this far in a Cloudwatch Insights query:
fields #timestamp, #message
| filter message like 'Unhandled request'
| parse message 'Unhandled request (*)' as unhandled_payload
| sort #timestamp desc
| limit 20
And then this gives me the <lots_of_json> string in the ephemeral field unhandled_payload
Now how can I get that unhandled_payload JSON-formatted string parsed as JSON? The parse command only accepts globs or regexes and using either of those for this sounds... unpleasant. There must be a command to parse a JSON string, right? What is it?
("go fix the logging in the app" is not an acceptable answer for the purposes of this question)

I'd love to know the solution. Or at least how the f*** to remove the "web:" prefix in log groups in order to properly log JSONs

Related

Bigquery Can't Upload RECORD Data Type File

I want to upload my Sales data into Bigquery. The file contains STRUCT (RECORD) data type, therefore, I have to upload my file as ndjson.
However, once I do that, I get the error message: 'Failed to create table: Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details."
The ndjson I'm trying to upload look something like this: {"SalesOrder": "SalesOrder", "CreatedBy": "CreatedBy", "CustomerID": "CustomerID", "SalesChannel": "SalesChannel", "SalesChannelType": "SalesChannelType", "CreatedDate": "CreatedDate", "WebDiscount": "WebDiscount", "RevenueFromPlatform": "RevenueFromPlatform", "DeliveryFee": "DeliveryFee", "PriceBeforeVAT": "PriceBeforeVAT", "VAT": "VAT", "DeliveryChannel": "DeliveryChannel", "DeliveryStatus": "DeliveryStatus", "Payment.Status": "Payment.Status", "Payment.Channel": "Payment.Channel", "Payment.Value": "Payment.Value", "Payment.Date": "Payment.Date", "Product.SKU": "Product.SKU", "Product.UnitsSold": "Product.UnitsSold", "Product.UnitPrice": "Product.UnitPrice", "Product.Discount": "Product.Discount", "Product.Price": "Product.Price", "": ""}
How do I solve this issue? Thanks in advance

Big JSON record to BigQuery is not showing up

I wanted to try to upload big JSON record object to BigQuery.
I am talking of JSON records of 1.5 MB each, with a complex nested schema up to 7th degree.
For simplicity, I started to load file with a single record on one line.
At first I try to have BigQuery to autodetect my schema, but that resulted in table that is not responsive and I cannot perform query on, albeit it says it had at least a record.
Then, assuming that my schema could be too hard to reverse for the loader, I tried to write the schema myself and I then I tried to load my my file with single record.
At first I got a simple error with just "invalid".
bq load --source_format=NEWLINE_DELIMITED_JSON invq_data.test_table
my_single_json_record_file
Upload complete.
Waiting on bqjob_r5a4ce64904bbba9d_0000015e14aba735_1 ... (3s) Current
status: DONE
BigQuery error in load operation: Error processing job 'invq-
test:bqjob_r5a4ce64904bbba9d_0000015e14aba735_1': JSON table
encountered too many errors, giving up. Rows:
1; errors: 1.
Which after checking for the job error was just giving me the following:
"status": {
"errorResult": {
"location": "file-00000000",
"message": "JSON table encountered too many errors, giving up. Rows: 1; errors: 1.",
"reason": "invalid"
},
"errors": [
{
"location": "file-00000000",
"message": "JSON table encountered too many errors, giving up. Rows: 1; errors: 1.",
"reason": "invalid"
}
],
"state": "DONE"
},
The after a couple of more attempts creating new tables, it actually started to succeed on command line, without reporting errors:
bq load --max_bad_records=1 --source_format=NEWLINE_DELIMITED_JSON invq_data.test_table_4 my_single_json_record_file
Upload complete.
Waiting on bqjob_r368f1dff98600a4b_0000015e14b43dd5_1 ... (16s) Current status: DONE
with no error on the status checker...
"statistics": {
"creationTime": "1503585955356",
"endTime": "1503585973623",
"load": {
"badRecords": "0",
"inputFileBytes": "1494390",
"inputFiles": "1",
"outputBytes": "0",
"outputRows": "0"
},
"startTime": "1503585955723"
},
"status": {
"state": "DONE"
},
But no actual records are added to my tables.
I tried to perform the same from WebUI but the result is the same. Green on the completed job, but no actual record added.
Is there something else that I can do for checking where the data is sinking to? Maybe some more log?
I can imagine that maybe I am on the the edge of the 2 MB JSON row size limit but, if so, should this be reported as error?
Thanks in advance for the help!!
EDIT:
It turned out the complexity of my schema was a bit the devil in here.
My json files were valid, but my complex schema had several errors.
It turned out that I had to simplify it such schema anyway, because I got a new batch of data where single json instances where more 30MB and I had to restructure this data in a more relational way, whilst making smaller rows to insert in the database.
Funny enough when the schema was scattered across multiple entities (ergo, simplified) the actually error/inconsistencies of the schema started to actually show up in error returned and it was easier to fix them. (Mostly it was new nested undocumented data which I was not aware anyway... but still my bad).
The lesson here, is when a table schema is too long (I didn't experiment how much precisely is too long) BigQuery just hide itself behind reporting too many errors to show.
But that is a point where you should consider simplify the schema(/structure) of your data.

Mule mel getting data from sfdc status

Payload is in this format:
[[UpsertResult created='true'
errors='{[1][Error fields='{[1]Payroll_Type__c,}'
message='This payroll type is not associated to WSE's account. Please select another.'
statusCode='FIELD_CUSTOM_VALIDATION_EXCEPTION'
]
,}'
id='null'
success='false'
]
]
I am able to get the success key as:
message.payload.get(0).success=='false'
I want to get the value of message, errors, statusCode. tried with message.payload.message, message.payload.get(0).errors, and many cases,but nothing helped.
According to the API doc, success is a boolean field.
So you need to use:
message.payload[0].success == false
or even better if it's in a condition:
!message.payload[0].success
To access the values of the first Error object, use:
message.payload[0].errors[0].message
message.payload[0].errors[0].fields
message.payload[0].errors[0].statusCode

How to get useful BigQuery errors

I have a job that I run with jobs().insert()
Currently I have the job failing with:
2014-11-11 11:19:15,937 - ERROR - bigquery - invalid: Could not convert value to string
Considering I have 500+ columns, I find this error message useless and pretty pathetic. What can I do to receive a proper and better error details from BigQuery?
The structured error return dictionary contains 3 elements, a "reason", a "location", and a "message". From the log line you include, it looks like only the message is logged.
Here's an example error return from a CSV import with data that doesn't match the target table schema:
"errors": [
{
"reason": "invalid",
"location": "File: 0 / Line:2 / Field:1",
"message": "Value cannot be converted to expected type."
},
...
Similar errors are returned from JSON imports with data that doesn't match the target table schema.
I hope this helps!

BigQuery Load Job [invalid] Too many errors encountered

I'm trying to insert data into BigQuery using the BigQuery Api C# Sdk.
I created a new Job with Json Newline Delimited data.
When I use :
100 lines for inputs : OK
250 lines for inputs : OK
500 lines for inputs : KO
2500 lines : KO
The error encountered is :
"status": {
"state": "DONE",
"errorResult": {
"reason": "invalid",
"message": "Too many errors encountered. Limit is: 0."
},
"errors": [
{
"reason": "internalError",
"location": "File: 0",
"message": "Unexpected. Please try again."
},
{
"reason": "invalid",
"message": "Too many errors encountered. Limit is: 0."
}
]
}
The file works well when I use the Bq Tools with command :
bq load --source_format=NEWLINE_DELIMITED_JSON dataset.datatable pathToJsonFile
Something seems to be wrong on server side or maybe when I transmit the file but we cannot get more log than "internal server error"
Does anyone have more informations on this ?
Thanks you
"Unexpected. Please try again." could either indicate that the contents of the files you provided had unexpected characters, or it could mean that an unexpected internal server condition occurred. There are several questions which might help shed some light on this:
does this consistently happen no matter how many times you retry?
does this directly depend on the lines in the file, or can you construct a simple upload file which doesn't trigger the error condition?
One option to potentially avoid these problems is to send the load job request with configuration.load.maxBadRecords higher than zero.
Feel free to comment with more info and I can maybe update this answer.