Is there a way to stop CloudWatchLogsInsight from searching after a first match? - amazon-cloudwatch

I am searching through a week worth of flow logs to check if an IP is in existents or not, however whenever there's a match, the query will still continue consuming resources and time.
How do I query and return only the latest event matching an IP address
I have set limit = 1, but the query still continues.
sample query:
filter isIpv4InSubnet(srcAddr,"127.0.0.1/32") | limit 1

Related

Reducing database load from consecutive queries

I have an application which calls the database multiple times to achieve one simple goal.
A little information about this application; In short, the application scrapes data from a webpage & stores specific information from this page into a database. The important information in this query is: Player name, Position. There can be multiple sitting at one specific position, kill points & Class
Player name has every potential to change or remain the same every day
Regarding the Position, there can be multiple sitting in one position
Kill points has the potential to increase or remain the same every day
Class, there is only 2 possibilities that a name can be, Ex: A can change to B or remain A (same in reverse), but cannot be C,D,E,F
The player name can change at any particular day, Position can also change dependent on the kill point increase from the last update which spins back around to the goal. This is to search the database day by day, from the current date to as far back as 2021-02-22 starting at the most recent entry for a player name and back track to the previous day to check if that player name is still the same or has changed.
What is being used as a main reference to the change is the kill points. As the days go on, this number will either be the exact same or increase, it can never decrease.
So now onto the implementation of this application.
The first query which runs finds the most recent entry for the player name
SELECT TOP(1) * FROM [changes] WHERE [CharacterName]=#charname AND [Territory]=#territory AND [Archived]=0 ORDER BY [Recorded] DESC
Then continue to check the previous days entries with the following query:
SELECT TOP(1) * FROM [changes] WHERE [Territory]=#territory AND [CharacterName]=#charname AND [Recorded]=#searchdate AND ([Class] LIKE '%{Class}%' OR [Class] LIKE '%{GetOpposite(Class)}%' AND [Archived]=0 )
If no results are found, will then proceed to find an alternative name with the following query:
SELECT TOP(5) * FROM [changes] WHERE [Kills] <= #kills AND [Recorded]='{Data.Recorded.AddDays(-1):yyyy-MM-dd}' AND [Territory]=#territory AND [Mode]=#mode AND ([Class] LIKE #original OR [Class] LIKE #opposite) AND [Archived]=0 ORDER BY [Kills] DESC
The aim of the query above is to get the top 5 entries that are the closest possible matches & Then cross references with the day ahead
SELECT COUNT(*) FROM [changes] WHERE [CharacterName]=#CharacterName AND [Territory]=#Territory AND [Recorded]=#SearchedDate AND [Archived]=0
So with checking the day ahead, if the character name is not found in the day ahead, then this is considered to be the old player name for this specific character, else after searching all 5 of the results and they are all found to be present in the day aheads searches, then this name is considered to be new to the table.
Now with the date this application started to run up to today's date which is over 400 individual queries on the database to achieve one goal.
It is also worth a noting that this table grows by 14,400 - 14,500 Rows each and every day.
The overall question to this specific? Is it possible to bring all these queries into less calls onto the database, reduce queries & improve performance?
What you can do to improve performance will be based on what parts of the application stack you can manipulate. Things to try:
Store Less Data - Database content retrieval speed is largely based on how well the database is ordered/normalized and just how much data needs to be searched for each query. Managing a cache of prior scraped pages and only storing data when there's been a change between the current scrape and the last one would guarantee less redundant requests to the db.
Separate specific classes of data - Separating data into dedicated tables would allow you to query a specific table for a specific character, etc... effectively removing one where clause.
Reduce time between queries - Less incoming concurrent requests means less resource contention and faster response times to prior requests.
Use another data structure - The only reason you're using top() is because you need data ordered in some specific way (most-recent, etc...). If you just used a code data structure that keeps the data ordered and still easily-query-able you could then perhaps offload some sql requests to this structure instead of the db.
The suggestions above are not exhaustive, but what you do to improve performance is largely a function of what in the application stack you have the ability to modify.

Splunk Failed Login Report

I am relatively new to Splunk and I am trying to create a reportthat will display a hostname and the amount of times that host failed to login within the past five minutes, when they failed 3 or more times. The only way I was able to get the initial search results I want is to look only within the past 5 minutes, as you can see in my query:
index="wineventlog" EventCode=4625 earliest=-5min | stats count by host,_time | stats count by host | search count > 2
This returns the host and the count. The issue is if I use this query in my report, it can run every five minutes, but the hosts that were listed previously get removed as they no longer are included in the search results.
I found ways to generate logs that I can then search for separately (http://docs.splunk.com/Documentation/Splunk/6.6.2/Alert/LogEvents) but it didn't work the way I expected.
I am looking for an answer to any of these questions that can help me get the intended results:
Can my original search be improved to still only get results where the failed logins were within 5 minutes but be able to search over any time period?
Is there a way to send the results from the query I already have to a report, where the results will not be cleared out when the search is run again?
Is there any other option I haven't considered to achieve the desired result?
If you only care about the last 5 minutes then search only the last 5 minutes. Searching more is just wasting resources.
Consider writing your results to a summary index (using collect) with a scheduled search and have your report/dashboard display values from the summary index.

How to find traffic and number of hits per URL in Splunk?

I have been using Splunk as a log monitoring tool but recently got to know that we will get network traffic and number of hits per URL.
For example, I have a URL like the one below and I want to know the total number of hits that occurred over the last week:
https://stackoverflow.com/
What would be the query that I need to write to get the number of hits (count) per day/period of time in Splunk?
I tried this:
"url" | stats sum(linecount) as Total
which is returning >1000 hits count for the last 15 minutes, which is not correct.
Thanks in advance.
It would be quick and accurate when you mention index, host and site names.
index name = environment of the application like SIT/UAT/QA/pre-prod/production
host name = In which instance application is hosted
site name = in my example it will be https://stackoverflow.com
Query = index="SIT*" host="*host_name*" "https://stackoverflow.com" "/questions" | stats sum(linecount) as Total
by executing above query I can get number of hits for stackoverflow.com/questions url.
The above query has given accurate results and in splunk we do have drop down option to select period of time.
Try one of these queries to return the total number of hits:
"url" | stats count
Or:
"url" | stats sum(count) as total
Hi This below query is one of good example to get the site requests
index="bcom" "https://www.bloomingdales.com/" | stats sum(linecount) as Total
#Ravindra'S

Big Query Issue: Query Failed with Request was blocked to protect the systems operation

Please can you advise why we are seeing this error for a query we were previously able to run?
Error: Request was blocked to protect the systems operation. Please contact
We have tried running this query several times
Writing an email to the address returned:
you may not have permission to post messages to the group
I get this message when querying a 12TB table with around 25B rows. The query I am trying to run is selecting from one table, with a cross join on another table where two values in table A are between two values in table B, and I am doing a group by on two field. As mentioned before, all was working fine for the last 15 months until yesterday
To address your points in turn :
1 - Copy from shollyman's comment concerning your error:
The short answer: a cross join involving a table of that size is problematic given any reasonably sized second table. The message indicates that the BQ team is explicitly blocking this query due to its behavior.
2- I think you couldn't email at the address because it's a Google Group. You need to register to these first. There should be a way for you to do so. It's also possible (notice the error message says "may") that your message just needs to be accepted by a member of the group before it goes through.
3- If your issue is recent, it's most likely because you recently added enough data to one of your tables to make the cross join too big.

Rails - Complicated SQL Query

I need help with a query that does the following:
Start from the newest record and go downwards to the older records.
It needs to be ordered by created_at time.
If there are new records in the database by created_at time, retrive them but do not get records I already got from step 1.
I want to only get only 16 records at a time. That number can change later.
Do not retrive records I already sent from a previous time.
Also just to let you know, this is started via $.ajax.
Reason for this is because I am getting new + old records real-time to be sent to the client. Think something like like user starts off visiting the website and it gets the current records starting with new ones. Then the user can go get older records, but at the same request, it also retrieves the brand new records. With a twist of only 16 records at a time.
Do I make sense?
This is what I currently have for code:
RssEntry.includes(:RssItem).where("rss_items.rss_id in (?) AND (rss_entries.id < ? OR rss_entries.created_at > ?)", rssids, lid, time).order("rss_entries.id DESC").limit(16)
lid = last lowest id from those records
rssids = ids from where to get the records
time = last time it did the records call
That code above is only the beginning. I now need help to make sure it fits my requirements above.
UPDATE 1
Ok, so I managed to do what I wanted but in 2 sql queries. I really don't think it is possible to do what I want in one sql query.
Any help is greatly appreciated.
Thanks.
Firstly, use scopes to get what you want:
class RssEntry < ActiveRecord::Base
scope :with_items, includes(:RssItem)
scope :newer_first, order("rss_entries.created_at DESC")
def self.in(args)
where(id: args)
end
def self.since(time)
where('rss_entries.created_at > ?', time)
end
end
then
RssEntry.with_items.in(rssids).since(time).offset(lid).limit(16).newer_first
It should work as expected.