Cloudwatch Logs Insights working with multiple #messages - amazon-cloudwatch

I have the following query with the following output:
Query:
filter #message like /A:|B:/
Output:
[INFO] 2020-07-28T09:20:48.406Z requestid A: [{'Delivery': OK, 'Entry': 12323 }]
[INFO] 2020-07-28T09:20:48.407Z requestid B: {'MyValue':0}
I would like to print ONLY the A message when in the B message 'MyValue' = 0. For the above example, I would have to have the following output
Output:
[INFO] 2020-07-28T09:20:48.406Z requestid A: [{'Delivery': OK, 'Entry': 12323 }]
For the next example
[INFO] 2020-07-28T09:20:48.406Z requestid A: [{'Delivery': OK, 'Entry': 12323 }]
[INFO] 2020-07-28T09:20:48.407Z requestid B: {'MyValue':12}
The output should be empty
I can't do something like this because I miss the A message:
filter #message like /A:|B:/
filter MyValue = 0
Any ideas?

If anyone still interested, there IS ways to get the first and last from grouping by a field. So if you can fit your data into pairs of messages, it might help.
For example, given API Gateway access log (each row is a #message):
2021-09-14T14:09:00.452+03:00 (01c53288-5d25-*******) Extended Request Id: ***************
2021-09-14T14:09:00.452+03:00 (01c53288-5d25-*******) Verifying Usage Plan for request: 01c53288-5d25-*******. API Key: API Stage: **************/dev
2021-09-14T14:09:00.454+03:00 (01c53288-5d25-*******) API Key authorized because method 'ANY /path/{proxy+}' does not require API Key. Request will not contribute to throttle or quota limits
2021-09-14T14:09:00.454+03:00 (01c53288-5d25-*******) Usage Plan check succeeded for API Key and API Stage **************/dev
2021-09-14T14:09:00.454+03:00 (01c53288-5d25-*******) Starting execution for request: 01c53288-5d25-*******
2021-09-14T14:09:00.454+03:00 (01c53288-5d25-*******) HTTP Method: GET, Resource Path: /path/json.json
2021-09-14T14:09:00.468+03:00 (01c53288-5d25-*******) Method completed with status: 304
We can get method, uri and return code from the last 2 rows.
To do this, I parse the relevant data into params, and then get them by doing aggregation by request id (that i also parse)
The magic is: using stats likesortsFirst() and sortsLast() and grouping by #reqid. (AWS Docs
Note: IMO, don't use earliest() and latest() as they depend on built-in #timestamp and worked weird for me where 2 sequential messages had the same timestamp
So, for example, using this query:
filter #message like "Method"
| parse #message /\((?<#reqid>.*?)\) (.*?) (Method: (?<#method>.*?), )?(.*?:)* (?<#data>[^\ ]*)/
| sort #timestamp desc
| stats sortsFirst(#method) as #reqMethod, sortsFirst(#data) as #reqPath, sortsLast(#data) as #reqCode by #reqid
| limit 20
We would get the following desired output:
#reqid #reqMethod #reqPath #reqCode
f42e2b44-b858-45cb-***************** GET /path-******.json 304
fecddb03-3804-4ff5-***************** OPTIONS /path-******.json 200
e8e47185-6280-4e1e-***************** GET /path-******.json 304
e4fa9a0c-6d75-4e26-***************** GET /path-******.json 304

Related

cube.js API load endpoint responds with 413

When calling the load endpoint with a query > ~1700 bytes, we are receiving a 413 (request entity too large) error. We have narrowed it down to between 1706 and 1758.
Steps to reproduce the behavior:
post large query to <host>:<port>/cubejs-api/v1/load
Receive 413
removing one or two entries from Dimensions will cause the query to work as expected
standard response JSON and 200 status
Version: 0.31.14
The smallest failing query we have is:
{
"query": {
"measures": [
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.count",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalDiscAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalCopayAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalDedctAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalRejAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalExGrtaAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsGrsAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsStlAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsRbnsAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsIbnrAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsIncrAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsRskStlAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsRskRbnsAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsRskIbnrAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsRskIncrAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsReinsAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsReinsRbnsAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsReinsIbnrAmt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.totalClmsReinsIncrAmt"
],
"dimensions": [
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.clntCoCd",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.trtyClntGrpNm",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.grpNbr",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.primLfNbr",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.insLfNbr",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.rskIncptnDt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.rskRnwlDt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.trtmtStrtDt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.trtmtEndDt",
"rpt_clm_clm_990_clab9f47d0000356e2szw7p19.admsnDt"
]
},
"queryType": "multi"
}
I tried submitting an issue on the cube.js github, but they have marked it as a question and asked that I post it here. I have also searched their docs and have not been able to find any configuration that relates to this. It looks like the max payload size is hard-coded to 1M (see link), but here we are failing with 1758 bytes.

AWS CloudWatch parsing for logging type

My CloudWatch log is coming in the below format:
2022-08-04T12:55:52.395Z 1d42aae9-740f-437d-bdf1-4e8c747e0f04 INFO 14 Field Service activities within Launch Advisory are a core set of activities and recommendations that are proven to support successful deployments and accelerate time-to-value. For customers implementing an AEC Product for the first time, the first year of Field Services available to the Customer will be comprised of Launch Advisory activities only. Google’s Launch Advisory services team will work with the Customer's solution implementation team to guide, assess, and make recommendations for the implementation of newly licensed APAC Products..
2022-08-04T12:55:52.395Z : Is the time stamp
1d42aae9-740f-437d-bdf1-4e8c747e0f04: request Id
INFO : Logging Type
Rest is the actual message
I want to parse those above fields from the message. By taking reference from the AWS document started writing the following query but it's not working
fields #timestamp, #message, #logStream
| PARSE #message "* [*] [*] *" as loggingTime, requestId, loggingType, loggingMessage
| sort #timestamp desc
| display loggingTime, requestId, loggingType, loggingMessage
| limit 200
But, the above parsing expression is not working. Can someone suggest how can this message be parsed?

How to filter the response via API

Wanted to know if this is possible, I have 2 APIs I am testing.
API 1. Gives a list of total jobs posted by the user.
Response =
"jobId": 15596, "jobTitle": "PHP developer"
API 2. Gives the following response.
"total CVs": 19, "0-7days": 12,"status": "New Resume"
meaning in bucket New resume we have a total of 19 CVs and in 19 Cvs 12 Cvs have an aging of 12 days. This response basically related to the jobs posted.
When i Hit the API i am getting the correct numbers but on front end the API 1 will be used as dropdown to select the jobs and then New Resume, ageing and total Cvs will be shown according to that jobs.
I wanted is it possible to test the two API's togther sort of using filter like on front end or the only way to test is to check if the response i am getting is correct.

How to return distinct multi-word strings from blocks of usage log text with POSIX?

First time poster and still learning the ropes, so I apologize if the description below is overly verbose.
I have a database of usage logs I'm pulling data from via various pre-parsed fields. This query is intended to return the count of how many times a distinct error signature was logged over a given period of time. Each logged error is assigned a signature_id, errors of the same type are all assigned the same signature_id. One of the fields I'm returning in my query, message, returns the entire message stack trace/block of usage log text.
I want my query to group by signature_id, which is a pre-parsed field in the table I'm selecting from. I'm struggling to make it work because, while similar error types are assigned the same signature_id', every usage log differs slightly due to the timestamp of when the message was logged. So my query is grouping bymessageinstead ofsignature`.
EX: Of what my query returns if I return the entire usage log message
signature_id
b2dea422
message
2019-01-17 18:01:52,130 ip-BLANK [WARN ][159] [request_id=00e74d7c] Type=Blank.Multiverse.Web.Mvc.Attributes.Usage+UsageLoggingException Message=UsageLogContext not present in HttpContext.Current.Items Data: Signature=b2dea422 Stack Trace: at Blank.Blank.Web.Mvc.Attributes.Usage.GetUsageLogContext() at Blank.Blank.Web.Mvc.Attributes.Usage.AddData(Object data)
count
1
signature_id
b2dea422
message
2019-01-17 16:21:36,681,130 ip-BLANK [WARN ][38] [request_id=c140f8ea] Type=Blank.Multiverse.Web.Mvc.Attributes.Usage+UsageLoggingException Message=UsageLogContext not present in HttpContext.Current.Items Data: Signature=b2dea422 Stack Trace: at Blank.Blank.Web.Mvc.Attributes.Usage.GetUsageLogContext() at Blank.Blank.Web.Mvc.Attributes.Usage.AddData(Object data)
count
1
I mentioned above that every usage log differs due to the timestamp of when a given message was logged, but similar error types are assigned the same signature_id. Similar error types also share the same Exception Message=...
EX: Every time a message is logged with signature_id=ab7d890pq, it will also have Exception Message=Cannot read property 'get' of undefined in the message block.
Since the table I'm selecting from doesn't have a pre-parsed exception_message field, I want to parse out the Exception Message= string so my GROUP BY will return the count of distinct logged signature_id's and a column with the exception message is for each distinct signature.
My current query shown below begins to parse out the exceptionmessgage string, but I can't get it to return the entire string:
SELECT CASE
WHEN sourcecategory = 'source_hello_world_category' THEN 'hwCategory'
END AS Service,
signature,
NULLIF(SUBSTRING(REGEXP_SUBSTR(message, 'Message=\\w+[[:space:]]+'), 9), '') AS exceptionmessage,
count(*)
FROM user_usage_logs
WHERE (signature IS NOT NULL
AND signature NOT IN ('ccce9e73',
'787dd1b5',
'17fc66bc',
'ca384d1f',
'20121ecb',
'ccce9e73'))
AND sourcecategory IN ('source_hello_world_category')
AND messagetime > (getdate() - 1)
GROUP BY signature,
sourcecategory,
exceptionmessage
ORDER BY COUNT DESC
LIMIT 10;
The code shown above returns:
signature_id exceptionmessage count
b1det422 Cannot 31,321
330ope77 Unauthorized 1,207
53m6m466 Reference 311
This is an example of I want returned:
signature_id exceptionmessage count
b1det422 Cannot read property 'get' of undefined Stack 31,321
330ope77 Unauthorized access response for many users 1,207
53m6m466 Reference cannot be set to an empty.object.3 311

How to avoid Hitting the 10 sec limit per user

We run multiple short queries in parallel, and hit the 10 sec limit.
According to the docs, throttling might occur if we hit a limit of 10 API requests per user per project.
We send a "start query job", and then we call the "getGueryResutls()" with timeoutMs of 60,000, however, we get a response after ~ 1 sec, we look for JOB Complete in the JSON response, and since it is not there, we need to send the GetQueryResults() again many times and hit the threshold, that is causing an error, not a slowdown. the sample code is below.
our questions are as such:
1. What is a "user" is it an appengine user, is it a user-id that we can put in the connection string or in the query itslef?
2. Is it really per API project of BigQuery?
3. What is the behavior?we got an error: "Exceeded rate limits: too many user/method api request limit for this user_method", and not a throttling behavior as the doc say and all of our process fails.
4. As seen below in the code, why we get the response after 1 sec & not according to our timeout? are we doing something wrong?
Thanks a lot
Here is the a sample code:
while (res is None or 'jobComplete' not in res or not res['jobComplete']) :
try:
res = self.service.jobs().getQueryResults(projectId=self.project_id,
jobId=jobId, timeoutMs=60000, maxResults=maxResults).execute()
except HTTPException:
if independent:
raise
Are you saying that even though you specify timeoutMs=60000, it is returning within 1 second but the job is not yet complete? If so, this is a bug.
The quota limits for getQueryResults are actually currently much higher than 10 requests per second. The reason the docs say only 10 is because we want to have the ability to throttle it down to that amount if someone is hitting us too hard. If you're currently seeing an error on this API, it is likely that you're calling it at a very high rate.
I'll try to reproduce the problem where we don't wait for the timeout ... if that is really what is happening it may be the root of your problems.
def query_results_long(self, jobId, maxResults, res=None):
start_time = query_time = None
while res is None or 'jobComplete' not in res or not res['jobComplete']:
if start_time:
logging.info('requested for query results ended after %s', query_time)
time.sleep(2)
start_time = datetime.now()
res = self.service.jobs().getQueryResults(projectId=self.project_id,
jobId=jobId, timeoutMs=60000, maxResults=maxResults).execute()
query_time = datetime.now() - start_time
return res
then in appengine log I had this:
requested for query results ended after 0:00:04.959110