We have our applications (Map Reduce Jobs, microservices) completely running out of AWS.
We intend to use a single service for viewing (for debug purposes), monitoring and alarms (notifications based on a threshold) on logs.
Is there any specific benefits of using external service providers like sumo logic over the one provided by AWS itself (cloudwatch in this case)
In full disclosure, I'm an engineer at Sumo Logic. But here is an analysis done by one of my colleagues a few months ago as to why you would want to use Sumo Logic specifically over AWS Cloudwatch itself:
You can’t easily search across multiple Cloudwatch Log Groups in Cloudwatch. In Sumo, you can define metadata to easily query across log groups, and even log sources outside of AWS within the same query.
Cloudwatch does not include any pre-built apps (out of the box queries, dashboards, and alerts). Sumo Logic provides a wide variety of apps and integrations, all included with your subscription: https://www.sumologic.com/applications/
With Cloudwatch, you pay for dashboards, you pay to query your data, and you pay to use its query API. These are all included in your Sumo Logic subscription (depending on the subscription level you choose).
Sumo provides 30 days of log retention out of the box. Data retention is another a la carte cost when using CloudWatch. Sumo Logic also provides you with the ability to forward your logs off to S3 for long-term storage outside of our platform.
Cloudwatch does not include advanced analytics operators. Sumo Logic includes operators like Outlier, Log Reduce, and Log Compare, which are all part of the Sumo Logic platform.
Regarding search time, Sumo Logic vs AWS CloudWatch Insights (AWS CloudWatch log search): Here is a quote from a customer with 100 AWS accounts worth of CloudTrail logs: "We can search all 100 of our accounts in Sumo in the same amount of time it takes us to search 1 account with AWS's CloudWatch.”
Sumo Logic provides Threat Intelligence as part of your subscription as well, to be able to check all of your logs against Crowdstrike’s threat database of known IoC’s.
Sumo training and certification is included with your subscription.
On a personal note, I can also say that Sumo Logic's post-sales support is top-notch, as we put a huge emphasis on customer success.
Please keep in mind that this analysis by my colleague is a few months old, so some of these items may have been addressed by AWS since then.
Related
I would like to know how can we address this scenario in Azure Log Analytics where I need to generate Kube-audit logs of different cluster every week and also retain these logs for approx 400 days. Now storing it over Log Analytics will cost me more and its not an optimized architecture as I will not be require that so often. So I would like to know from experts whats the best way to design the architecture, where we get the kube audit logs which can be retained for 400 days and be available for querying when required without incurring too much cost.
PS: I also heard in my team that querying 400 days logs always times out in KQL.
Log analytics offerings:
Log analytics now provides the capability to manage several service tiers at table scope. Setting your data as archive, with no query capabilities at a much lower cost. offering spans for up to 7 years.
when needed, you can choose to elevate a subset of your data into the Analytics offering, providing you the capability to query it. The action of elevating your data is denoted as - "Search jobs"
Another option is to elevate an entire period in time to the Analytic offering, they call it - "Restore logs".
Table's different service tiers -
https://learn.microsoft.com/en-us/azure/azure-monitor/logs/data-retention-archive?tabs=api-1%2Capi-2
Search job offering -
https://learn.microsoft.com/en-us/azure/azure-monitor/logs/search-jobs?tabs=api-1%2Capi-2%2Capi-3
Restore logs -
https://learn.microsoft.com/en-us/azure/azure-monitor/logs/restore?tabs=api-1%2Capi-2
all are under public preview.
both offerings - Search jobs and Restore logs provides you the capability to engage your data on demand, can't comment or suggest regarding the actual cost.
Azure data explorer solution:
Another option is to use Azure storage to hold your data (as an example), Azure data explorer provides the capability to create an external table, that table is a logical view on top of your data, the data itself is kept outside of the ADX cluster. you can query your data by using ADX, expect degradation in query performance.
ADX external table offering -
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/schema-entities/externaltables
I have some app data which is currently stored in Splunk. But i am looking for a way where I can input the Splunk data directly to BigQuery. My target is to analyze the app data on BigQuery and perhaps create Data Studio dashboards based on the BigQuery.
I know there are a lot of third party connectors that can help me with this, but I am looking for a solution where I can use features from Splunk or BigQuery to conncet both of them together and not rely on third party connectors.
Based on your comment indicating that you're interested in resources to egress data from Splunk into BigQuery with custom software, I would suggest using either tool's REST API on either side.
You don't indicate whether this is a one-time or a recurring asking - that may impact where you want the software to run that performs this operation. If it's a one-time thing and you've got a fair internet connection yourself, you may just want to write a console application from your own machine to perform the migration. If it's a recurring operation, you might instead look at any of the various "serverless" hosting options out there (e.g. Azure Functions, Google Cloud Functions, or AWS Lambda). In addition to development experience, note that you may have to pay an egress bandwidth cost for each on top of normal service charges.
Beyond that, you need to decide whether it makes more sense to do a bulk export from Splunk out to some external file that you load into Google Drive and then import into Big Query. But maybe it makes more sense to download the records as paged data via HTTPS so you can perform some ETL operation on top of it (e.g. replace nulls with empty strings, update Datetime types to match Google's exacting standards, etc.). If you go this route, it looks as though this is the documentation you'd use from Splunk and you can either use Google's newer, and higher-performance Storage Write API to receive the data or their legacy streaming API to ingest into BigQuery. Either option supports SDKs across varied languages (e.g. C#, Go, Ruby, Node.js, Python, etc.), though only the legacy streaming API supports plain HTTP REST calls.
Beyond that, don't forget your OAuth2 concerns to authenticate on either side of the operation, though this is typically abstracted away by the various SDKs offered by either party, and less of something you'd have to deal with the ins and outs of.
Does GCP have a job scheduling service like Azure Scheduler, where jobs can be scheduled and managed dynamically via API?
Google Cron service is set in a static file and it seems like their answer to this is to use that to poke a roll your own service backed with PubSub and a data store. Looking for Quartz-like functionality, consumable by APP engine, which can be managed and invoked via API as opposed to managing a cluster, queue, and compute instance/VM deployment of Quartz (or the like) or rolling a custom solution. Should support 50 million simultaneous jobs per day with retry / recoverability and dynamic scheduling per tenant capabilities.
This is the cheapest and easiest way I can imagine building a solution today on top of an existing AppEngine based project:
As you observed, currently there is no such API/service directly available on GCP. There is an open feature request (on GAE) for it.
But, also as you observed, it is possible to build and use a custom solution, just like the one you proposed.
Depending on the context even simpler solutions are possible. For a GAE context check out, for example, How to schedule repeated jobs or tasks from user parameters in Google App Engine?.
I am writing software that creates a large graph database. The software needs to access dozens of different REST APIs with millions of total requests. The data will then be processed by the Hadoop cluster. Each of these APIs have rate limits that vary by requests/second, per window, per day and per user (typically via OAuth).
Does anyone have any suggestions on how I might use either a Map function or other Hadoop-ecosystem tool to manage these queries? The goal would to be to leverage the parallel processing in Hadoop.
Because of the varied rate limits, it often makes sense to switch to a different API query while waiting for the first limit to reset. An example would be one API call that creates nodes in the graph and another that enriches the data for that node. I could have the system go out and enrich the data for the new nodes while waiting for the first API limit to reset.
I have tried using SQS queuing on EC2 to manage the various API limits and states (creating a queue for each API call), but have found it to be ridiculously slow.
Any ideas?
It looks like the best option for my scenario will be using Storm, or specifically the Trident abstraction. It gives me the greatest flexibility for both workload management but process management as well
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I'm building app on top of Amazon S3. How can I keep my S3 running under a set budget? Suppose I don't want unexpected traffic to over charge my AWS account. I'd rather it remain unavailable.
There is no way to set a budget for AWS.
But this feature is being requested very often,
so probably one day it will be implemented.
https://forums.aws.amazon.com/thread.jspa?threadID=58127
AWS has announced the general availability of the functionality to Monitor Estimated Charges Using Billing Alerts via Amazon CloudWatch as of May 10, 2012 (which according to Daniel Lopez' answer [+1] has been available to AWS premium accounts since end of 2011 already):
We regularly estimate the total monthly charge for each AWS service
that you use. When you enable monitoring for your account, we begin
storing the estimates as CloudWatch metrics, where they'll remain
available for the usual 14 day period. [...]
As outlined in the introductory blog post, You can start by using the billing alerts to let you know when your AWS bill will be higher than expected, see Monitor Your Estimated Charges Using Amazon CloudWatch for more details regarding this functionality.
This is already pretty useful for many basic needs, however, using the CloudWatch APIs to retrieve the stored metrics yourself (see the GetMetricStatistics API and Getting Statistics for a Metric for usage samples) actually allows you to drive arbitrary workflows and business logic based upon this data.
Regarding the latter, the scope of this offering is stressed as well though:
It is important to note that these are estimates, not predictions. The
estimate approximates the cost of your AWS usage to date within the
current billing cycle and will increase as you continue to consume
resources. [...] It does not take trends or potential changes in your AWS usage pattern
into account. [emphasis mine]
It seems there is still no solution provided by Amazon.
Take a look on Amazon Price-Watcher - Monitor your bill and auto-shut down your instances
So here is a basic script I've put together in Python which will sit and monitor the current price of your instance, and shut it down if it goes over a certain price-limit. (In the future, this can be changed to maybe throttling incoming bandwidth, or emailing the admin).
As of December 2011, if you have an AWS premium account you can use CloudWatch to monitor your estimated charges and if they go over a certain limit you can trigger different actions (such as shutting down the machine)
http://blog.bitnami.org/2011/12/monitor-your-estimated-aws-charges-with.html