How to send data in IIS logs to CloudWatch logs - amazon-cloudwatch

How can i send the data in IIS logs to Amazon CloudWatch logs so that i can monitor the performance of my website.
One of the things that I am trying to monitor is the average request size of my web request. I know that IIS logs have the data about the size of web request(BytesRecv, ByteSent) and I can have CloudWatch logs read my IIS log files but What i cannot figure out is a way to tell CloudWatch logs that BytesRecv, ByteSent should be treated as 2 datapoints.

I don't think the CloudWatch Logs service has that capability. When it ingests logs like IIS, you can create simple filters to match something, like 404 errors, and then you can create datapoints on the number of those errors in a given time period. However, I haven't found a way to extract data from logs directly in CloudWatch.
I believe the solution to this problem would be to use Amazon Kinesis to get the log files out of CloudWatch and then process them with EMR to get those data points and then put that information into S3. A lot easier said than done, I know. I think the toughest part of this would be writing your EMR logic and then putting that data into some kind of consolidated format to write to S3. I'd recommend asking for help around that area.
Another option would be to have Amazon Kinesis drop the log files in S3, then trigger an Amazon Lambda action when those logs files are uploaded. The Lambda function could then parse those log files, extract the information you need, put it into some kind of json, xml, etc and write that to S3. The hard part here is writing the lambda function. This link describes how to use lambda to parse CloudTrail logs written to S3, so you could probably follow a lot of that logic to do this.
http://docs.aws.amazon.com/lambda/latest/dg/wt-cloudtrail-events-adminuser.html

If you can get this info in IIS logs you can share them to cloudwatch logs
you can send logs via EC2Config Service or SSM Agent more details are documented in this Post.
Then you can use existing filters to your log group or create custom filter to extract the fields that you want from the logs -> so it's custom log metric based on log filters. e.g.
[serverip, method, uri, query, port, dash, clientip, useragent, status, zero1, zero2, millis]
or some specific filters.
So you can now either use filters as mentioned above or Log Insight queries for creating dashboards.

Related

CloudWatch - Graph Log Stream consumption

I am trying to graph the IncomingLogEvents or IncomingBytes or PutLogEvents or whatever metric that would help me understand which log streams are the ones sending the most logs to a specific log group in Cloudwatch
Did anyone run into this? I was able to graph the IncomingBytes metric for log groups but not for logstreams within those log groups
I have several containers in my environment and each container sends their logs through a separate log stream within the same log group
Suddenly costs started rising due to errors on the containers and was able to identify which log group is causing it but I cannot find a way to identify which logstream is
Docker does not help either and of course I can check container logs but I want to be able to alert on an increase, hence detect when a logstream is sending more logs than normal as it will alert me on cost increase as on errors
I know I can monitor log errors with other log centralization tools or even with CloudWatch but need to know from Cloudwatch which container is the one sending the highest amount of logs
There used to be a metric that got deprecated and I cannot find any documentation that would help me use whatever metric or solution they replaced it with
The metric was "storedBytes": 0 which from now on, and since deprecation it will always be 0
Thank you for any help that you can provide me with and hope this question help others achieve their goals too
Eu
Cloudwatch dashboard but can make it for log groups only

Send Google Home data to Splunk

I'm trying to send data generated by a Google Home mesh network to a Splunk instance. I'd especially like to capture which devices are connected to which points throughout the day. This information is available in the app, but does not seem to be able to be streamed to a centralized logging platform. Is there a way I'm not seeing?
There are two methods of ingesting google cloud data supported by splunk.
--Push-based method : data is sent to a Splunk HTTP Event Collector (HEC) through a pub/sub to Splunk dataflow job.
--Pull-based method: data is fetched from the Google Cloud APIs through the Splunk Add-on for Google Cloud Platform.
It is generally recommended that you use the push-based method to ingest Google Cloud data in Splunk. Only in certain cases it is recommended to use the pull based method to ingest Google Cloud data into Splunk. These circumstances are as follows:
--Your Splunk deployment does not offer a Splunk HEC endpoint.
--Your log volume is low.
--You want to pull Cloud Monitoring metrics, Cloud Storage objects, or low-volume logs.
--You are already managing one or more Splunk heavy forwarders or are using a hosted Inputs Data Manger for Splunk Cloud.
More information as how to work around with the setup part and working around with this
problem can be found in the following link :
https://cloud.google.com/architecture/exporting-stackdriver-logging-for-splunk

How to monitor data traffic in WAN using splunk?

My task is actually for self learning, I know the basics of Splunk and how does it work, overall it needs logs then further analysis is future part.
Right now, I am working in a small company without specific firewall device, just the router is there, I am planning to keep real time track of internet speed + data usage by each hosts and similar things, overall say I want to monitor all the network usage in this small WAN environment.
My question is, as splunk need logs then how can I get the log of all the network flow when there is just a router but no external device to create logs, that can be feeded to splunk later?
I tried google but there I find only preset softwares that automatically capture logs and show analysis while I just need logs only.
What you need is a way to get logs off your router to somewhere Splunk can parse them
Most typically, this is done by sending the router's syslog messages to a syslog collector like SC4S.
From there, Splunk can then get the messages processed and ingested for anaylsis

How to start a VM instance using Cloud Scheduler

Background and Goal
I have a Debian/Linux VM on GCP which I manually start every morning and after it runs, it shuts down by itself using a Linux command. I want to automate the start of the VM by using the Cloud Scheduler. The question asked in GCP auto shutdown and startup using Google Cloud Schedulers has several answers and I am interested in pursuing the answer (https://stackoverflow.com/a/65062924/10322004) proposed by #nikelone because it seems to be simple and also it has been endorsed by #Damien and #RayFoss as being easy. I am a neophyte in these matters and I could not comprehend their replies fully. So this post was created to elicit more clear answers for a person like me.
What I have tried
I have gone to https://cloud.google.com/compute/docs/reference/rest/v1/instances/start (call this page A) and tried the API and was able to successfully start my already stopped VM when I clicked on the execute button. I presume that this means that my entries were fine and can be used in conjunction with appropriate software like Cloud Scheduler to perform the start function on a predefined schedule. But the problem is that I do not know or understand how to proceed from here. I give below my questions.
My Questions
On page A, the last three paragraphs are titled Authorization Scopes, IAM permissions, and Examples, and none of them say anything specific about what the user should do. Is it correct to assume that they have nothing to do with the Cloud Scheduler, but related to other methods to achieve the same goal? If this is not correct then my next question is what should I be doing to follow the statements in these three paragraphs?
Assuming that the answer to question 1 is "yes", meaning I can now start scheduling with the Cloud Scheduler, I next looked at the quickstart for Cloud Scheduler at https://cloud.google.com/scheduler/docs/quickstart (call this page B). The list of items to do is quite large including installing Cloud SDK, running a quite a few commands on the console, enabling some features, set up Pub/Sub, create a job, run the job and verify the results in Pub/Sub. This looks like a daunting set of tasks and I could not understand why it is necessary to jump through the hoops to use something that has already been achieved with just a few keystrokes earlier. So are these steps all necessary? Or is there a way to use the Cloud Scheduler directly without going through so many intermediate steps?
Now assume that the answer to question 2 is that I have to perform all steps stated on page B. If I run into some problem while accomplishing the tasks outlined on page B, my VM may get messed up irretrievably. Is there a way in which the Cloud Platform or its components can be used to reset my VM to its current state as of today, which is working fine? I really do not want to end up with something worse than what I have now.
To answer your questions:
Auth Scopes and IAM permissions are required for you to call the Compute Engine API methods such as instance.start & instance.stop. You need to set the right scope and the right IAM permission on your job or else it will fail. They are indeed related to the method that you're interested to call so you must keep them in mind. What you see on the examples are the ways to call the {API} using different programming languages so you don't need to pay attention to them as you will create the job through the Cloud Console. To further address this part, see the full steps I included below.
The answer that you're trying to follow uses HTTP target while the quickstart you've linked uses Pub/Sub and they are different with each other because they have separate use cases. This link shows a proper instruction how to create a scheduler job with an HTTP target. You can create this kind of job straight from the Cloud Console or a one-liner gcloud command. If your config is incorrect, the trigger will not execute the endpoint URL and you will see an error that you must fix.
Addressed on answer #2
Basically, you just need to follow the instructions to the link you've sent. However, I'll post it here as well along with my explanation:
Go to https://cloud.google.com/scheduler. Click on Go to Console. Click on Create Job. Fill up the required fields (those with red asterisks) when creating a Scheduler Job.
Select HTTP as target type.
Enter this as your URL (modify the capitalized words).
https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/INSTANCE_ZONE/instances/INSTANCE_NAME/start
Choose HTTP method POST.
Click show more and choose Auth Header "Add OAuth Token"
Enter your service account. This is used to pass an OAuth Token when your scheduler job calls the Compute API. Make sure that the service account you will use have the "Compute Instance Admin" role because this role contains the permissions to start/stop your instance. See this instruction how to grant access on a service account. If you're not sure what service account to use, feel free to use the Compute Engine default service account.
Add this on Scope:
https://www.googleapis.com/auth/cloud-platform
The description of this scope:
See, edit, configure, and delete your Google Cloud Platform data.
Repeat for Stop instance job and change URL in #3.

Possible to get image from Amazon S3 but create it if it doesn't exist

I'm not sure how to word the question but here is what I am looking to do.
I have a site that uses custom map tile overlays on a google map.
The javascript calls a php file on my server that checks to see if an existing map tile exists for the given x, y, and zoom level.
If if exists, it displays that image using file_get_contents.
If it doesn't exist, it creates the new tile then displays it.
I would like to utilize Amazon S3 store and serve the images since there could end being a lot of them and my server is slow. If I have my script check to see if the image exists on amazon and then display it, I am guessing I am not getting the benefits of the speed and Amazons CDN. Is there a way to do this?
Or is there a way to try and pull the file from Amazon first then set up something on Amazon to redirect to my script if the files no there?
Maybe host the script on another of Amazons services? The tile generation is quite slow also in some cases.
Thanks
Ideas:
1 - Use CloudFront, but point it to a cluster of tile generation machines. This way, you can generate the tiles on demand, and any future requests are served right from Cloudfront.
2 - Use CloudFront, but back with with an S3 store of generated tiles. Turn on logging for the S3 bucket, so you can detect failed requests. Consume those logs on a schedule, and generate the missing tiles. This results in a cheaper way of generating tiles, but means that when a tile fails the user get's nothing.
3 - Just pre-generate all the tiles. Throw tasks in an SQS queue, then spin up a collection of EC2 instances to generate the tiles. This will cost the most up front, but all users get a fast experience.
I've written a blog post with a strategy for dealing with this. It's designed to make intelligent and thrifty use of CloudFront, maximize caching and deal with new versions of existing images. You may find the technique described there helpful. The example code shows how to handle different dimensions (i.e. thumbnails) of images. You could modify it to handle different zoom levels.
I need to update that post to support CloudFront custom origins, and I think that for your application you might be better off skipping S3 and using a custom origin. The advantage of a custom origin is simply that it's probably going to be easier to manage all of your images on your local filesystem compared to managing them on S3.