Querying in time intervals throughout history in Splunk - sql

I have a query that returns me a count.
I want to get all the counts of a daily/weekly/monthly granularity, spanning a year back.
Currently I can get the counts manually from the presets (last 30 days, last 15 days, etc), or the date range (e.g. Between 20180101 - 20180201), but what I really want is a query that says
"get me a weekly count that spans a year back from today", and it'll return:
2018-11-15 to 2018-11-22 : count = 10
2018-11-08 to 2018-11-15 : count = 3
2018-11-01 to 2018-11-08 : count = 6
...
2017-11-15 to 2017-11-22 : count = 11

This should get you started.
index=foo earliest=-1y | bucket span=1w _time | stats count by _time

Related

Splunk: Split a time period into hourly intervals

index="dummy" url="https://www.dummy.com" status="200 OK"
| stats count by id
| where count > 10
If I apply this above query for 1 day, I would get, for example
id count
ABC 50
XYZ 60
..
This would mean ABC hit https://www.dummy.com 50 times in 1 day, and XYZ called that 60 times.
Now I want to check this for 1 day but with every two hours interval
Suppose, ABC called that request 25 times at 12:00 AM, then 25 times at 3:AM,
and XYZ called all the 60 requests between 12 AM and 2 AM
I want the output to look like this (time format doesn't matter)
id count time
XYZ 60 12:00 AM
ABC 25 12:00 AM
ABC 25 2:00 AM
..
You can use bin to group events into time buckets. You can use any span value, but for the 2 hours you mentioned, the updated query would be:
index="dummy" url="https://www.dummy.com" status="200 OK"
| bin _time span=2h
| stats count by id, _time
| where count > 10

timechart is crating stats which are not part of the search in splunk

I was extracting some volume data for PE testing from prod systems, using following query
I am expecting to get stats from 9AM to 6PM event counts with respect to proxy names. but following code creating stats for entire day please help me to remove these extra data.
Query
index= index_Name environmentName= Env_name clientAppName="App_Name"
| eval eventHour=strftime(_time,"%H")
| where eventHour<18 AND eventHour>=9
| timechart count span=60m by proxyName
result :
TIme
Proxy1
proxy2
2022-02-16 06:00
0
0
2022-02-16 07:00
0
0
2022-02-16 08:00
0
0
2022-02-16 09:00
27
34
The best way to narrow the time window is by using the earliest and latest options in the search command.
To find the events between 9am and 6pm today:
index= index_Name environmentName= Env_name clientAppName="App_Name" earliest=#d+9h latest=#d+18h
| timechart count span=60m by proxyName
To find the events from yesterday between 9am and 6pm:
index= index_Name environmentName= Env_name clientAppName="App_Name" earliest=-1d#d+9h latest=-1d#d+18h
| timechart count span=60m by proxyName
The #d+9h construct says to go to the beginning of the day and add 9 hours.

TSQL reduce the amount of data returned by a query to a parametric defined sample

I have a table containing a large amount of data which is stored on change.
tbl_bigOne
----------
timestamp | var01 | var02 | ...
2016-01-14 15:20:21 | 10.1 | 100.6 | ...
2016-01-14 15:20:26 | 11.2 | 110.3 | ...`
2016-01-14 15:21:27 | 52.1 | 620.1 | ...
2016-01-14 15:35:00 | 13.5 | 230.6 | ...
...
2016-01-15 09:18:01 | 94.4 | 140.0 | ...
2016-01-15 10:01:15 | 105.3 | 188.7 | ...
...
and so on for years of data
What I would like to obtain is a query/stored procedure that given two datetime references (date_from and date_to) gives the required selected data.
Now, the query just mentioned is pretty straight forward what I would also like to achieve is to set the maximum number of rows returned per day (if data is available) while doing the average of the values.
Let's give a few examples:
date_from: 2016-01-14 00:00:00
date_to: 2016-01-20 23:59:59
max_points:12
in this case the time windows is of 7 days and in this one i would like to have a maximum of 12 rows for each days of the 7 day window, giving a max total of 84 rows whilst doing the average from all the grouping done since, the data for each day is now partitioned by 12.
It is possible to see this partitioning as if every hour worth of data for that specific day is averaged, generating one row of the 12 required for a day.
date_from: 2016-01-14 00:00:00
date_to: 2016-01-14 23:59:59
max_points:1440
in this case the time window is one day worth and, if available, i would like to have a maximum of 1440 rows (for each day) for the selected period.
In this way the parameter defines the maximum number of rows for each day. The minimum time window is one day nothing below that.
Can something like this be achieved just using TSQL?
Thank you.
edit for taking care of the observations raised by #Thorsten Kettner
Use the analytic function ROW_NUMBER() to number the matching rows per day. Then only keep rows up to the given limit. If you want the rows arbitrarily chosen when there exist more than needed, then number the rows in random order using NEWID().
select timestmp, var01, var02, var03
from
(
select
mytable.*,
row_number() over (partition by convert(date, timestmp) order by newid()) as rn
from mytable
where convert(date, timestmp) between #start_date and #end_date
) numbered
where rn <= #limit
order by timestmp;

MS ACCESS – Return a daily count of booked resources within a date range

Please note: this is not for an Access project as such, but a legacy application that uses an Access database for its back end.
Setup
Part of the application is a kind of Gantt chart, fixed to single day columns, where each row represents a single resource. Resources are booked out for a range of days and a booking is for a single resource, so they cannot overlap on a row. The range of dates that is in view is user selectable, open ended, and can be changed by various methods, including horizontal scrolling using mouse or keyboard.
Problem
I've been tasked with adding a row to the top of the chart to indicate overall resource usage for each day. Of course that's trivially easy to do by simply querying for each day in the range separately, but unfortunately that is proving to be an expensive process and therefore slows down horizontal scrolling a lot. So I'm looking for a way to do it more efficiently, hopefully with fewer database reads.
Here is a highly simplified example of the bookings table:
booking_ID | start_Date | end_Date | resource_ID
----------- -------------- ------------- -------------
1 2014-07-17 2014-07-20 21
2 2014-08-24 2014-08-29 4
3 2014-08-26 2014-09-02 21
4 2014-08-28 2014-09-04 19
Ideally, I would like a single query that returns each day within the specified range, along with a count of how many bookings there are on those days. So querying the data above for 20 days from 2014-07-17 would produce this:
check_Date | resources_Used
----------- ---------------
2014-07-17 1
2014-07-18 1
2014-07-19 1
2014-07-20 1
2014-07-21 0
2014-07-22 0
2014-07-23 0
2014-08-24 1
2014-08-25 1
2014-08-26 2
2014-08-27 2
2014-08-28 3
2014-08-29 3
2014-08-30 2
2014-08-31 2
2014-09-01 2
2014-09-02 2
2014-09-03 1
2014-09-04 1
2014-09-05 0
I can get a list of dates in the range by using a table of integers (starting at 0), with this:
SELECT CDATE('2014-07-17') + ID AS check_Date FROM Integers WHERE ID < 20
And I can get the count of resources used for a single day with something like this:
SELECT COUNT(*) AS resources_Used
FROM booking
WHERE start_Date <= CDATE('2014-09-04')
AND end_Date >= CDATE('2014-09-04')
But I can't figure out how (or if) I can tie them both together to get the desired results. Is this even possible?
Create a table called "calendar" and put a list of dates into it covering the necessary timeframe. It just needs one column called check_date with one row for each date. Use Excel, start at whatever date and just drag down, then import into the new table.
After your calendar table is set up you can run the following:
select c.check_date, count(b.resource_id) as resources_used
from calendar c, bookings b
where c.check_date between b.start_date and b.end_date
group by c.check_date

Design Hours of Operation SQL Table

I am designing a SQL table to store hours of operation for stores.
Some stores have very simple hours: Monday to Sunday from 9:30AM to 10:00PM
Others are little more complicated. Please consider the following scenario:
Monday: Open All Day
Tuesday: 7:30AM – 2:30PM & 4:15PM – 11:00 PM
Wednesday: 7:00PM – 12:30 AM (technically closing on Thursday morning)
Thursday: 9:00AM – 6:00PM
Friday: closed.
How would you design the table(s)?
EDIT
The hours will be used to showing if a store is open at a user selected time.
A different table can probably handle any exceptions, such as holidays.
The store hours will not change from week to week.
A table like this would be easy for both the output you posted, as well as just firing a bit back (open? yes/no):
Store | Day | Open | Closed
---------------------------
1 | 1 | 0000 | 2400
1 | 2 | 0730 | 1430
1 | 2 | 1615 | 2300
...
Features:
Using 24-hour isn't necessary, but makes math easier.
Store ID would presumably join to a lookup table where you stored Store information
Day ID would translate to day of week (1 = Sunday, 2 = Monday, etc.)
To query for your dataset, just:
SELECT Day, Open, Close... (you'd want to format Open/Close obviously)
To query IsOpen?, just:
SELECT CASE WHEN #desiredtime BETWEEN Open AND Closed THEN 1 ELSE 0 END
FROM table
WHERE store = #Store
Think of it more as defining time frames, days / weeks are more complex, because they have rules and defined start and stops.
How would you define a timeframe?
one constraint (Start[Time and Day]), one reference 'Duration' (hours, minutes,.. of the span)*. Now the shifts (timeframes) can span multiple days and you don't have to work complex logic to extract and use the data in calculations.
**Store_Hours**
Store | Day | Open | DURATION
---------------------------
1 | 1 | 0000 | 24
1 | 2 | 0730 | 7
1 | 2 | 1615 | 6.75
...
1 | 3 | 1900 | 5.5
Do you have to do more than just store and display it?
I think a design which needs to tell if a store is open at a particular time would have to be informed by all of the possibilities, otherwise, you will end up not being able to accommodate something.
What about holiday exceptions?
I would consider storing them as intervals based on a base time (minutes since time 0 on a week).
So 0 is midnight on Monday.
Interval 1 would be 0 - 1440
Interval 2 would be 1890 - 2310
etc.
You could easily convert a user selected time into a minute offset and determine if a store was open.
Your only problem remaining would be interpretation in display for friendly display (probably some extensive logic, but not impossible) and overlap at time 10080 -> 0.