AWS CloudWatch parsing for logging type - amazon-cloudwatch

My CloudWatch log is coming in the below format:
2022-08-04T12:55:52.395Z 1d42aae9-740f-437d-bdf1-4e8c747e0f04 INFO 14 Field Service activities within Launch Advisory are a core set of activities and recommendations that are proven to support successful deployments and accelerate time-to-value. For customers implementing an AEC Product for the first time, the first year of Field Services available to the Customer will be comprised of Launch Advisory activities only. Google’s Launch Advisory services team will work with the Customer's solution implementation team to guide, assess, and make recommendations for the implementation of newly licensed APAC Products..
2022-08-04T12:55:52.395Z : Is the time stamp
1d42aae9-740f-437d-bdf1-4e8c747e0f04: request Id
INFO : Logging Type
Rest is the actual message
I want to parse those above fields from the message. By taking reference from the AWS document started writing the following query but it's not working
fields #timestamp, #message, #logStream
| PARSE #message "* [*] [*] *" as loggingTime, requestId, loggingType, loggingMessage
| sort #timestamp desc
| display loggingTime, requestId, loggingType, loggingMessage
| limit 200
But, the above parsing expression is not working. Can someone suggest how can this message be parsed?

Related

Am I able to view a list of devices by partition in iothub?

I have 2 nodes of a cluster receiving messages from iothub. I split their responsibility by partition. Node 1 reads from partitions 1,3,5,7,9 and the other 2,4,6,8, and 0. Recently, my partition 8 stops responding until I stop my code and restart it. It seems like a device is sending a message that locks up the partition. What I want to do is list all devices in my partition 8. Is that possible? Is there a cloud shell command to get those devices in a list?
Not sure this will help you, but you can see the partition on the incoming messages. For example you could use Azure Stream Analytics to see the partitions using this query:
Select GetMetadataPropertyValue(IoTHub, '[IoTHub].[ConnectionDeviceId]') as DeviceId, partitionId
from IoTHub
Also, if you run locally in VisualStudio it will tell you which device is sending malformed JSON. eg.
[Warning] 10/21/2021 9:12:54 AM : User Warning Source 'IoTHub' had 1 occurrences of kind 'InputDeserializerError.InvalidData' between processing times '2021-10-21T15:12:50.5076449Z' and '2021-10-21T15:12:50.5712076Z'. Could not deserialize the input event(s) from resource 'Partition: [1], Offset: [455266583232], SequenceNumber: [634800], DeviceId: [DeviceName]' as Json. Some possible reasons: 1) Malformed events 2) Input source configured with incorrect serialization format
Also check your "Activity Log" blade in the ASA job. It may have more details for you.

How to change the Splunk Forwarder Formatting of my Logs?

I am using our Enterprise's Splunk forawarder which seems to be logging events in splunk like this which makes reading splunk logs a bit difficult.
{"log":"[https-jsse-nio-8443-exec-5] 19 Jan 2021 15:30:57,237+0000 UTC INFO rdt.damien.services.CaseServiceImpl CaseServiceImpl :: showCase :: Case Created \n","stream":"stdout","time":"2021-01-19T15:30:57.24005568Z"}
However, there are different Orgs in our Sibling Enterprise who log splunks thus which is far more readable. (No relation between us and them in tech so not able to leverage their tech support to triage this)
[http-nio-8443-exec-7] 15 Jan 2021 21:08:49,511+0000 INFO DaoOImpl [{applicationSystemCode=dao-app, userId=ANONYMOUS, webAnalyticsCorrelationId=|}]: This is a sample log
Please note the difference in logs (mine vs other):
{"log":"[https-jsse-nio-8443-exec-5]..
vs
[http-nio-8443-exec-7]...
Our Enterprise team is struggling to determine what causes this. I checked my app.log which looks ok (logged using Log4J) and doesn't have the aforementioned {"log" :...} entry.
[https-jsse-nio-8443-exec-5] 19 Jan 2021 15:30:57,237+0000 UTC INFO
rdt.damien.services.CaseServiceImpl CaseServiceImpl:: showCase :: Case
Created
Could someone guide me as to where could the problem/configuration lie that is causing the Splunk Forwarder to send the logs with the {"log":... format to splunk? I thought it was something to do with JSON type vs RAW which I too dont understand if its the cause and if it is - what configs are driving that?
Over the course of investigation - I found that is not SPLUNK thats doing this but rather the docker container. The docker container defaults to json-file that writes the outputs to the /var/lib/docker/containers folder with the **-json postfix which contains the logs in the `{"log" : <EVENT NAME} format.
I need to figure out how to fix the docker logging (aka the docker logging driver) to write in a non-json format.

Is it possible to create log source health alerts in Azure Sentinel?

I am attempting to create an alert that lets me know if a data source stops providing logs to Sentinel. While I know it displays anomalies in log data on the dash board, I am hoping to receive alerts if a source stops providing logs for an extended period of time.
Something like creating a rule with the following query (CEF in this case):
CommonSecurityLog
| where TimeGenerated > ago(24h)
| summarize count() by DeviceVendor, DeviceProduct, DeviceName, DeviceExternalID
| where count_ == 0

How to set up job dependencies in google bigquery?

I have a few jobs, say one is loading a text file from a google cloud storage bucket to bigquery table, and another one is a scheduled query to copy data from one table to another table with some transformation, I want the second job to depend on the success of the first one, how do we achieve this in bigquery if it is possible to do so at all?
Many thanks.
Best regards,
Right now a developer needs to put together the chain of operations.
It can be done either using Cloud Functions (supports, Node.js, Go, Python) or via Cloud Run container (supports gcloud API, any programming language).
Basically you need to
issue a job
get the job id
poll for the job id
job is finished trigger other steps
If using Cloud Functions
place the file into a dedicated GCS bucket
setup a GCF that monitors that bucket and when a new file is uploaded it will execute a function that imports into GCS - wait until the operations ends
at the end of the GCF you can trigger other functions for next step
another use case with Cloud Functions:
A: a trigger starts the GCF
B: function executes the query (copy data to another table)
C: gets a job id - fires another function with a bit of delay
I: a function gets a jobid
J: polls for job is ready?
K: if not ready, fires himself again with a bit of delay
L: if ready triggers next step - could be a dedicated function or parameterized function
It is possible to address your scenario with either cloud functions(CF) or with a scheduler (airflow). The first approach is event-driven getting your data crunch immediately. With the scheduler, expect data availability delay.
As it has been stated once you submit BigQuery job you get back job ID, that needs to be check till it completes. Then based on the status you can handle on success or failure post actions respectively.
If you were to develop CF, note that there are certain limitations like execution time (max 9min), which you would have to address in case BigQuery job takes more than 9 min to complete. Another challenge with CF is idempotency, making sure that if the same datafile event comes more than once, the processing should not result in data duplicates.
Alternatively, you can consider using some event-driven serverless open source projects like BqTail - Google Cloud Storage BigQuery Loader with post-load transformation.
Here is an example of the bqtail rule.
rule.yaml
When:
Prefix: "/mypath/mysubpath"
Suffix: ".json"
Async: true
Batch:
Window:
DurationInSec: 85
Dest:
Table: bqtail.transactions
Transient:
Dataset: temp
Alias: t
Transform:
charge: (CASE WHEN type_id = 1 THEN t.payment + f.value WHEN type_id = 2 THEN t.payment * (1 + f.value) END)
SideInputs:
- Table: bqtail.fees
Alias: f
'On': t.fee_id = f.id
OnSuccess:
- Action: query
Request:
SQL: SELECT
DATE(timestamp) AS date,
sku_id,
supply_entity_id,
MAX($EventID) AS batch_id,
SUM( payment) payment,
SUM((CASE WHEN type_id = 1 THEN t.payment + f.value WHEN type_id = 2 THEN t.payment * (1 + f.value) END)) charge,
SUM(COALESCE(qty, 1.0)) AS qty
FROM $TempTable t
LEFT JOIN bqtail.fees f ON f.id = t.fee_id
GROUP BY 1, 2, 3
Dest: bqtail.supply_performance
Append: true
OnFailure:
- Action: notify
Request:
Channels:
- "#e2e"
Title: Failed to aggregate data to supply_performance
Message: "$Error"
OnSuccess:
- Action: query
Request:
SQL: SELECT CURRENT_TIMESTAMP() AS timestamp, $EventID AS job_id
Dest: bqtail.supply_performance_batches
Append: true
- Action: delete
You want to use an orchestration tool, especially if you want to set up this tasks as recurring jobs.
We use Google Cloud Composer, which is a managed service based on Airflow, to do workflow orchestration and works great. It comes with automatically retry, monitoring, alerting, and much more.
You might want to give it a try.
Basically you can use Cloud Logging to know almost all kinds of operations in GCP.
BigQuery is no exception. When the query job completed, you can find the corresponding log in the log viewer.
The next question is how to anchor the exact query you want, one way to achieve this is to use labeled query (means attach labels to your query) [1].
For example, you can use below bq command to issue query with foo:bar label
bq query \
--nouse_legacy_sql \
--label foo:bar \
'SELECT COUNT(*) FROM `bigquery-public-data`.samples.shakespeare'
Then, when you go to Logs Viewer and issue below log filter, you will find the exactly log generated by above query.
resource.type="bigquery_resource"
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.labels.foo="bar"
The next question is how to emit an event based on this log for the next workload. Then, the Cloud Pub/Sub comes into play.
2 ways to publish an event based on log pattern are:
Log Routers: set Pub/Sub topic as the destination [1]
Log-based Metrics: create alert policy whose notification channel is Pub/Sub [2]
So, the next workload can subscribe to the Pub/Sub topic, and be triggered when the previous query has completed.
Hope this helps ~
[1] https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfiguration
[2] https://cloud.google.com/logging/docs/routing/overview
[3] https://cloud.google.com/logging/docs/logs-based-metrics

What is the correct way to add a PCTC SSR in SITA Web Services (SWS)

I'm trying to use the SITA Web Services to add an emergency contact to a booking. I'm using this XML, but I keep getting "013 - ACTION" error
<OTA_AirBookModifyRQ xmlns:="http://www.opentravel.org/OTA/2003/05" TransactionIdentifier="" Version="2003.5.0">
<AirBookModifyRQ BookingReferenceID="JKYZJ" ModificationType="5">
<TravelerInfo>
<SpecialReqDetails>
<SpecialServiceRequests>
<SpecialServiceRequest SSRCode="PCTC" Status="11" TravelerRefNumberRPHList="1">
<Airline Code="XS"></Airline>
<Text>DOCTOR DR/XS222H.SISTER</Text>
</SpecialServiceRequest>
</SpecialServiceRequests>
</SpecialReqDetails>
</TravelerInfo>
</AirBookModifyRQ>
The reason you're getting "013 - ACTION" is that you are specifying "11" (which means NN -Need) as the status. Change the status to "26", which means HK - Hold Confirmed).
Note also, for PCTC the element must be in a specific format to be correctly accepted by reservation system. See the Agent Reservation guide for more details on this.