Time based sliding window query in splunk - splunk

Is there a way to do time based sliding window query in splunk on real time? To provide an insight what I am looking for is, lets say if log statements are published to splunk, can I get counts of error which has occurred in last 15 minutes. And this has to be sliding and continuously updating me the state of the system.

As you said, you can use real time queries.
create your query
make it real time (last 15 min)
save as alert
set the cron period for the query to run
Hope it helps.

Related

BigQuery view appears to be generating huge number of Queries

I have a View which is populated by a SQL request to a regular dataset. As far as I know, not a soul is looking at that view (at least not very often). But if I go to the View and click on Project History then there is an IAM service account which is running the query every 15 seconds. Each query is shuffling 2.55 MB and reading 70,000 records, so I'd really prefer that it didn't do this.
The dataset source to create the view shows that it's last modified date was 3 days ago, so the service account is not being trigged by a change in the source. I checked the job scheduler and nothin there. So what is triggering it and how can I tell it to calm down?

How to auto refresh Power BI dataset after failure

I have scheduled query for a dataset in Power BI.
In case of a refresh failure, I want Power BI to "retry" to refresh the data again, up to 5 times.
Is there a way to do it?
For the time being it doesn't seem possible as confirmed by this post. You can play with the "Command time out in minutes(optional)" in your query when creating your data source as noted in the comments.
Under Advanced options.
If the timeout is left blank the default is 10 minutes. So if the issue is that your queries are timing out this may be the solve for you.
Another workaround is that you can schedule your data source to update multiple times at half hour increments. Like so. Note that depending on how big your data set is this may place a burden on the server you are pulling from. If that is the case then looking into incremental refresh would be your next go to.
Hope this helps.

BigQuery Google Analytics Export Processing Time Management

Our company has many schedule reports in BigQuery that generate aggregation tables of Google Analytics data. Because we cannot control when Google Analytics data is imported into our BigQuery environment we keep getting days with no data.
This means we then have to manually run the data for missing days.
I have edited my schedule query to keep pushing back the time of day the scheduled query runs however it is now running around 8 AM. These queries are for reports for stakeholders and stakeholders are requesting them earlier. Is there any way to ensure Google Analytics export to BigQuery processing times?
You may also think about a Scheduled Query solution that reruns at a later time if the requested table isn't available yet.
You can't current add a conditional trigger to a BigQuery scheduled query.
You could manually add a fail safe to your query to check for table from yesterday using a combination of the code below and DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY):
SELECT
MAX(FORMAT_TIMESTAMP('%F %T', TIMESTAMP(PARSE_DATE('%Y%m%d',
REGEXP_EXTRACT(_TABLE_SUFFIX,r'^\d\d\d\d\d\d\d\d'))) ))
FROM `DATASET.ga_sessions_*` AS ga_sessions
Obviously this will fail if the conditions are not met and will not retry, which I understand is not an advancement on your current setup.
I've encountered this many times in the past and eventually had to move my data pipelines to another solution, as scheduled queries are still quite simplistic.
I would recommend you take a look at CRMint for simple pipelines into BigQuery:
https://github.com/google/crmint
If you still find this too simplistic then you should look at Google Cloud Composer, where you can check a table exists before running a particular job in a pipeline:

Is there a way to pause querying in google data-studio while editing a report

I’m building a bar chart in google data-studio in a report connected to BigQuery, calculating min, max, avg for a metric with one dimension. The problem is that every time I edit the chart to add the metric and change its calculation (for instance for sum to min), a BigQuery query is run which is very wasteful. So I was wondering if there is a way to pause the querying until I finish constructing / editing the chart before unpausing it to have the final query with the final chart.
Thx in advance
Turning off the pre-fetch cache may lower costs: https://support.google.com/datastudio/answer/7020039?hl=en. But I'm not sure if that will stop queries from being issued during edits.
You can try writing your own connector using Apps Script and fetch data using the BigQuery service. If you set up an intermediary Apps Script cache, you reduce the number of times you have to hit BigQuery.

Best practice for getRefreshedUserItems

On the data extracts page Yodlee describes best practices for using getRefreshedUserItems but I think there are a few more details there that should be shared:
Is the 1 minute recommendation just in place to mitigate having to deal with large amounts of returned data? Is it within reason to only perform the polling for refreshed accounts every 5 minutes instead?
Say I do set up my process to retrieve refreshed items every 5 minutes as previously described, but my process fails to run during one of the iterations. If I leave it alone does that mean for 24 hours there are a few items I will have failed to pick up the refresh for? If so, how are others handling this? Recording a record of the timestamp of each successful communication with getRefreshedUserItems or perhaps iterating their local cache of Financial institutions that haven't been synced in more than 24 hours and retrieving updates for those as a one off communication? Or something else?
The main reason for keeping the limit as 1 minute is because of high number of refreshes, at current point of time you may not have high number of users but in future it may become higher.
Coming to your question about handling failure use cases- Say one of your job fails to fetch the items for that particular instance(passed duration in request), then you can have records of all such requests(which failed to get those items) and you can have a follow up job every hour which will trigger request for all the failed durations. This way you won't missed any items and keep data in sync with Yodlee.