CloudWatch - Graph Log Stream consumption - amazon-cloudwatch

I am trying to graph the IncomingLogEvents or IncomingBytes or PutLogEvents or whatever metric that would help me understand which log streams are the ones sending the most logs to a specific log group in Cloudwatch
Did anyone run into this? I was able to graph the IncomingBytes metric for log groups but not for logstreams within those log groups
I have several containers in my environment and each container sends their logs through a separate log stream within the same log group
Suddenly costs started rising due to errors on the containers and was able to identify which log group is causing it but I cannot find a way to identify which logstream is
Docker does not help either and of course I can check container logs but I want to be able to alert on an increase, hence detect when a logstream is sending more logs than normal as it will alert me on cost increase as on errors
I know I can monitor log errors with other log centralization tools or even with CloudWatch but need to know from Cloudwatch which container is the one sending the highest amount of logs
There used to be a metric that got deprecated and I cannot find any documentation that would help me use whatever metric or solution they replaced it with
The metric was "storedBytes": 0 which from now on, and since deprecation it will always be 0
Thank you for any help that you can provide me with and hope this question help others achieve their goals too
Eu
Cloudwatch dashboard but can make it for log groups only

Related

Azure App Insights, is there a way to query for thread count details?

this question is mostly for DevOps experts, in app insights.
So I found I have an issue on my app, it seems some threads are being created and not released, causing the thread count to increase and ending at some point in the "CGI error", which usually happens when you exceed your quota in any resource.
I already identified the exceeded resource is thread count thanks to this Metrics option, which gives you a graphical representation on how it is being consumed (and released when an app restart happens)
I would like to have some details on this, not the grouped information but the actual information that is giving this graph, any lead would help me to understand which place is creating and not releasing threads, a namespace, a class name, anything.
Is there another place where I could get this information in a very detailed way? AppInsight queries seems to lack this metric.
Thanks in advance.
AFAIK there is no direct way to do this. The only way that I can see is by adding custom logging inside your application and sending the logs to a Log Analytics Workspace.
Inside your function app in the portal go to 'Diagnostic settings' and connect to your log analytics workspace (if it doesn't exist create one).
Inside the log analytics workspace you will find your custom logs either under a 'Custom Logs' tab or under 'Application Insights' tab, after this find the correct field and parse, something like:
customMetrics
| extend d=parse_json(customDimensions)
| extend processSessionId=d.processSessionId
For Azure related topics there is also a decent Q&A platform here:
https://learn.microsoft.com/en-us/answers/products/azure?product=all
For KSQL this is a handy page:
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/tutorial?pivots=azuremonitor
Hope this helps somewhat

How to monitor data traffic in WAN using splunk?

My task is actually for self learning, I know the basics of Splunk and how does it work, overall it needs logs then further analysis is future part.
Right now, I am working in a small company without specific firewall device, just the router is there, I am planning to keep real time track of internet speed + data usage by each hosts and similar things, overall say I want to monitor all the network usage in this small WAN environment.
My question is, as splunk need logs then how can I get the log of all the network flow when there is just a router but no external device to create logs, that can be feeded to splunk later?
I tried google but there I find only preset softwares that automatically capture logs and show analysis while I just need logs only.
What you need is a way to get logs off your router to somewhere Splunk can parse them
Most typically, this is done by sending the router's syslog messages to a syslog collector like SC4S.
From there, Splunk can then get the messages processed and ingested for anaylsis

How to start a VM instance using Cloud Scheduler

Background and Goal
I have a Debian/Linux VM on GCP which I manually start every morning and after it runs, it shuts down by itself using a Linux command. I want to automate the start of the VM by using the Cloud Scheduler. The question asked in GCP auto shutdown and startup using Google Cloud Schedulers has several answers and I am interested in pursuing the answer (https://stackoverflow.com/a/65062924/10322004) proposed by #nikelone because it seems to be simple and also it has been endorsed by #Damien and #RayFoss as being easy. I am a neophyte in these matters and I could not comprehend their replies fully. So this post was created to elicit more clear answers for a person like me.
What I have tried
I have gone to https://cloud.google.com/compute/docs/reference/rest/v1/instances/start (call this page A) and tried the API and was able to successfully start my already stopped VM when I clicked on the execute button. I presume that this means that my entries were fine and can be used in conjunction with appropriate software like Cloud Scheduler to perform the start function on a predefined schedule. But the problem is that I do not know or understand how to proceed from here. I give below my questions.
My Questions
On page A, the last three paragraphs are titled Authorization Scopes, IAM permissions, and Examples, and none of them say anything specific about what the user should do. Is it correct to assume that they have nothing to do with the Cloud Scheduler, but related to other methods to achieve the same goal? If this is not correct then my next question is what should I be doing to follow the statements in these three paragraphs?
Assuming that the answer to question 1 is "yes", meaning I can now start scheduling with the Cloud Scheduler, I next looked at the quickstart for Cloud Scheduler at https://cloud.google.com/scheduler/docs/quickstart (call this page B). The list of items to do is quite large including installing Cloud SDK, running a quite a few commands on the console, enabling some features, set up Pub/Sub, create a job, run the job and verify the results in Pub/Sub. This looks like a daunting set of tasks and I could not understand why it is necessary to jump through the hoops to use something that has already been achieved with just a few keystrokes earlier. So are these steps all necessary? Or is there a way to use the Cloud Scheduler directly without going through so many intermediate steps?
Now assume that the answer to question 2 is that I have to perform all steps stated on page B. If I run into some problem while accomplishing the tasks outlined on page B, my VM may get messed up irretrievably. Is there a way in which the Cloud Platform or its components can be used to reset my VM to its current state as of today, which is working fine? I really do not want to end up with something worse than what I have now.
To answer your questions:
Auth Scopes and IAM permissions are required for you to call the Compute Engine API methods such as instance.start & instance.stop. You need to set the right scope and the right IAM permission on your job or else it will fail. They are indeed related to the method that you're interested to call so you must keep them in mind. What you see on the examples are the ways to call the {API} using different programming languages so you don't need to pay attention to them as you will create the job through the Cloud Console. To further address this part, see the full steps I included below.
The answer that you're trying to follow uses HTTP target while the quickstart you've linked uses Pub/Sub and they are different with each other because they have separate use cases. This link shows a proper instruction how to create a scheduler job with an HTTP target. You can create this kind of job straight from the Cloud Console or a one-liner gcloud command. If your config is incorrect, the trigger will not execute the endpoint URL and you will see an error that you must fix.
Addressed on answer #2
Basically, you just need to follow the instructions to the link you've sent. However, I'll post it here as well along with my explanation:
Go to https://cloud.google.com/scheduler. Click on Go to Console. Click on Create Job. Fill up the required fields (those with red asterisks) when creating a Scheduler Job.
Select HTTP as target type.
Enter this as your URL (modify the capitalized words).
https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/INSTANCE_ZONE/instances/INSTANCE_NAME/start
Choose HTTP method POST.
Click show more and choose Auth Header "Add OAuth Token"
Enter your service account. This is used to pass an OAuth Token when your scheduler job calls the Compute API. Make sure that the service account you will use have the "Compute Instance Admin" role because this role contains the permissions to start/stop your instance. See this instruction how to grant access on a service account. If you're not sure what service account to use, feel free to use the Compute Engine default service account.
Add this on Scope:
https://www.googleapis.com/auth/cloud-platform
The description of this scope:
See, edit, configure, and delete your Google Cloud Platform data.
Repeat for Stop instance job and change URL in #3.

Setting up AWS CloudWatch Alarms to alert via SNS if in INSUFFICIENT_DATA state

I've been completely unable to get a clear, straight answer about undertaking this, and am at a point where there doesn't seem to be one doc page or forum post to refer to.
Currently, I've got CloudWatch alarms set in place to monitor a variety of variables to alert on breach for, through SNS to Slack. One thing that I'm looking to do after the fact, is to make sure that any alarms in the INSUFFICIENT_DATA status get their own alerts sent to Slack, as well. I assumed setting up composite alarms would lead me down the right path, but it doesn't actually look like this method enables me to send out alerts on that status. Getting the cloudwatch agents out to the instances, along with the configs, was done via chef recipe.
My infrastructure is on the larger side, and there are at least five metrics that are being monitored, which leaves me with 1370+ alarms. Getting these alarms to alert on INSUFFICIENT_DATA statuses needs to (or ideally) be done in one large swathe, or in large, individual batches. I'm not finding a single, rational way to do this, either. Scripting this out could be the answer, sure, but I'm unsure how to tailor the script to grab each and every alarm name and add this specific type of alert variable to them.
If anyone has any idea about how to proceed, I'd be immensely grateful. This is big task blocker right now, so I'm willing to work out a solution one phase at a time. Thanks very much!!

How to send data in IIS logs to CloudWatch logs

How can i send the data in IIS logs to Amazon CloudWatch logs so that i can monitor the performance of my website.
One of the things that I am trying to monitor is the average request size of my web request. I know that IIS logs have the data about the size of web request(BytesRecv, ByteSent) and I can have CloudWatch logs read my IIS log files but What i cannot figure out is a way to tell CloudWatch logs that BytesRecv, ByteSent should be treated as 2 datapoints.
I don't think the CloudWatch Logs service has that capability. When it ingests logs like IIS, you can create simple filters to match something, like 404 errors, and then you can create datapoints on the number of those errors in a given time period. However, I haven't found a way to extract data from logs directly in CloudWatch.
I believe the solution to this problem would be to use Amazon Kinesis to get the log files out of CloudWatch and then process them with EMR to get those data points and then put that information into S3. A lot easier said than done, I know. I think the toughest part of this would be writing your EMR logic and then putting that data into some kind of consolidated format to write to S3. I'd recommend asking for help around that area.
Another option would be to have Amazon Kinesis drop the log files in S3, then trigger an Amazon Lambda action when those logs files are uploaded. The Lambda function could then parse those log files, extract the information you need, put it into some kind of json, xml, etc and write that to S3. The hard part here is writing the lambda function. This link describes how to use lambda to parse CloudTrail logs written to S3, so you could probably follow a lot of that logic to do this.
http://docs.aws.amazon.com/lambda/latest/dg/wt-cloudtrail-events-adminuser.html
If you can get this info in IIS logs you can share them to cloudwatch logs
you can send logs via EC2Config Service or SSM Agent more details are documented in this Post.
Then you can use existing filters to your log group or create custom filter to extract the fields that you want from the logs -> so it's custom log metric based on log filters. e.g.
[serverip, method, uri, query, port, dash, clientip, useragent, status, zero1, zero2, millis]
or some specific filters.
So you can now either use filters as mentioned above or Log Insight queries for creating dashboards.