Show the sum of an event per day by user in Splunk - splunk

I want to be able to show the sum of an event (let's say clicks) per day but broken down by user type. The results I'm looking for will look like this:
User Role
01/01
01/02
01/03
...
Guest
500
450
348
55
Admin
220
200
150
75
Here is my initial start but I'm unsure how to do the pivots on this to produce a table and visual chart
earliest=-30d index=* role=Guest OR role=Admin | count clicks as clickCount | ...
I'm unsure on how to both only count by day but then also only count by role to render them as shown above. Thanks for the help in advance.

You can create a timechart by day and then untable, convert the _time into a day field with formatted mm/dd value, and then construct an xyseries with the rows as columns and the day as the header:
| timechart span=1d count by role as "User Role"
| untable _time name value
| eval day=strftime(_time, "%m/%d")
| xyseries name day value

Related

timechart is crating stats which are not part of the search in splunk

I was extracting some volume data for PE testing from prod systems, using following query
I am expecting to get stats from 9AM to 6PM event counts with respect to proxy names. but following code creating stats for entire day please help me to remove these extra data.
Query
index= index_Name environmentName= Env_name clientAppName="App_Name"
| eval eventHour=strftime(_time,"%H")
| where eventHour<18 AND eventHour>=9
| timechart count span=60m by proxyName
result :
TIme
Proxy1
proxy2
2022-02-16 06:00
0
0
2022-02-16 07:00
0
0
2022-02-16 08:00
0
0
2022-02-16 09:00
27
34
The best way to narrow the time window is by using the earliest and latest options in the search command.
To find the events between 9am and 6pm today:
index= index_Name environmentName= Env_name clientAppName="App_Name" earliest=#d+9h latest=#d+18h
| timechart count span=60m by proxyName
To find the events from yesterday between 9am and 6pm:
index= index_Name environmentName= Env_name clientAppName="App_Name" earliest=-1d#d+9h latest=-1d#d+18h
| timechart count span=60m by proxyName
The #d+9h construct says to go to the beginning of the day and add 9 hours.

Splunk search if message is x for more than 5 minutes

I have two specific messages in splunk data that I'm searching for per user.
on-screen
off-screen
Anyone know how I can search in splunk for a user that is message="off-screen" for more than 5 minutes with a query checking every 2 minutes ?
index="document" (message="off-screen")
My query will be ran every 2 minutes so I want to check for the event with message off-screen. Then next time around check if 5 minutes have elapsed since the on-screen message was fired and that no on-screen event was fired in that time period for that user.
Is this possible ?
If you want to find off-screen messages that don't have an on-screen message within 5 minutes, then you can use a transaction. Let's say your raw data is:
| makeresults count=10
| streamstats count
| eval _time=_time-(count*60)
| eval message=case(count=1,"on-screen",count=2,"on-screen",count=5,"off-screen",count=8,"off-screen",count=9,"on-screen",count=10,"on-screen")
| eval user=case(count=1,"Alice",count=2,"Bob",count=5,"Alice",count=8,"Bob",count=9,"Alice",count=10,"Bob")
| where NOT isnull(user)
| table _time user message
That would look like this:
_time
user
message
2021-05-28 13:57:50
Alice
on-screen
2021-05-28 13:56:50
Bob
on-screen
2021-05-28 13:53:50
Alice
off-screen
2021-05-28 13:50:50
Bob
off-screen
2021-05-28 13:49:50
Alice
on-screen
2021-05-28 13:48:50
Bob
on-screen
You need a transaction that gathers the user's cooresponding on-screen and off-screen messages as long as they are within 5 minutes. But you need to keep the orphans where the off-screen message doesn't have a cooresponding on-screen message. Then you filter out the transactions that have both and you get just the orphans:
message="off-screen" OR message="on-screen"
| transaction user maxpause=5m keeporphans=true startswith="message=off-screen" endswith="message=on-screen"
| where mvcount(message)<2
| table _time user message
That would produce this output:
_time
user
message
2021-05-28 13:50:50
Bob
off-screen
Here is a runnable example:
| makeresults count=10
| streamstats count
| eval _time=_time-(count*60)
| eval message=case(count=1,"on-screen",count=2,"on-screen",count=5,"off-screen",count=8,"off-screen",count=9,"on-screen",count=10,"on-screen")
| eval user=case(count=1,"Alice",count=2,"Bob",count=5,"Alice",count=8,"Bob",count=9,"Alice",count=10,"Bob")
| where NOT isnull(user)
| table _time user message
| search message="off-screen" OR message="on-screen"
| transaction user maxpause=5m keeporphans=true startswith="message=off-screen" endswith="message=on-screen"
| where mvcount(message)<2
| table _time user message

Subtract two aggregated values in Bar Chart

My data is like -
+-----------+------------------+-----------------+-------------+
| Issue Num | Created On | Closed at | Issue Owner |
+-----------+------------------+-----------------+-------------+
| 1 | 12/21/2016 15:26 | 1/13/2017 9:48 | Name 1 |
| 2 | 1/10/2017 7:38 | 1/13/2017 9:08 | Name 2 |
| 3 | 1/13/2017 8:57 | 1/13/2017 8:58 | Name 2 |
| 4 | 12/20/2016 20:30 | 1/13/2017 5:46 | Name 2 |
| 5 | 12/21/2016 19:30 | 1/13/2017 1:14 | Name 1 |
| 6 | 12/20/2016 20:30 | 1/12/2017 9:11 | Name 1 |
| 7 | 1/9/2017 17:44 | 1/12/2017 1:52 | Name 1 |
| 8 | 12/21/2016 19:36 | 1/11/2017 16:59 | Name 1 |
| 9 | 12/20/2016 19:54 | 1/11/2017 15:45 | Name 1 |
+-----------+------------------+-----------------+-------------+
What I am trying to achieve is
Number of issues created per week
Number of issues closed per week
Net number of issues remaining per week
I am able to resolve the top two points but unable to approach the last.
My attempt -
This gives me number of issues created every week.
Similarly I have done for Closed per week.
For Net number of issues (Created-Closed) -
I tried adding Closed At column along with Created On but I can't see second bar in the chart along with Created On either.
Something like this
I tried doing the same in excel -
I want something of this sort but with another column as the difference of
number of issues created that week - number of issues closed that week.
In this case, 8-6=2.
You could use a calculated field(Analysis->Create Calculated Field). Something like this:
{FIXED [Create Date]:Count(if DATEPART('year',[Create Date]) = 2016 then [Number of Records] end)} - {FIXED [Closed Date]:Count(if DATEPART('year',[Closed Date]) = 2016 then [Number of Records] end)}
This function is using LOD expressions to pull back both sets of values. It will filter on all 2016 results for both date sets and then minus them from each other.
For more on LOD's see here:
https://www.tableau.com/about/blog/LOD-expressions
Use this as your measure and pull in one of your date fields as the dimension.
The normal way to solve this problem is to reshape the data so you have one row per status change instead of one row per issue, with a column named [Date] and a column named [Action]. The action can be submit and close (or in a more complex world include approve, reject, whatever - tracking the history.
You can do the reshaping without modifying your source data by using a UNION to get two copies of each row with appropriate calculated fields to make the visible columns make sense (e.g., create calculated a field called Date that returns the submission date or closing date depending on whether the row is from the first or second union, with a similar one called Action whose value depends on that as well. Filter out Close actions that have a null date)
Or you can preprocess the data to reshape it.
Or you can use data blending to make two sources that point to the same data source but customizing the linking fields to line up the submit and close dates (e.g., duplicate the data connection and rename both date fields to have the same name). But in this case, you probably want to create scaffolding source that has every date, but no other data, to use as the primary data source to avoid filtering out data from the secondary for dates that don't appear in the primary. The blending approach can be brittle.
Assuming you used the UNION approach instead of Data Blending, then you can count the number of submissions and closures within a certain date range, or compute a running total of the difference to see the backlog size over time.

Access Query: get difference of dates with a twist

I'm going to do my best to explain this so I apologize in advance if my explanation is a little awkward. If I am foggy somewhere, please tell me what would help you out.
I have a table filled with circuits and dates. Each circuit gets trimmed on a time cycle of about 36 months or 48 months. I have a column that gives me this info. I have one record for every time the a circuit's trim cycle has been completed. I am attempting to link a known circuit outage list, to a table with their outage data, to a table with the circuit's trim history. The twist is the following:
I only want to get back circuits that have exceeded their trim cycles by 6 months. So I would need to take all records for a circuit, look at each individual record, find the most recent previous record relative to the record currently being examined (I will need every record examined invididually), calculate the difference between the two records in months, then return only the records that exceeded 6 months of difference between any two entries for a given feeder.
Here is an example of the data:
+----+--------+----------+-------+
| ID | feeder | comp | cycle |
| 1 | 123456 | 1/1/2001 | 36 |
| 2 | 123456 | 1/1/2004 | 36 |
| 3 | 123456 | 7/1/2007 | 36 |
| 4 | 123456 | 3/1/2011 | 36 |
| 5 | 123456 | 1/1/2014 | 36 |
+----+--------+----------+-------+
Here is an example of the result set I would want (please note: cycle can vary by circuit, so the value in the cycle column needs to be in the calculation to determine if I exceeded the cycle by 6 months between trimmings):
+----+--------+----------+-------+
| ID | feeder | comp | cycle |
| 3 | 123456 | 7/1/2007 | 36 |
| 4 | 123456 | 3/1/2011 | 36 |
+----+--------+----------+-------+
This is the query I started but I'm failing really hard at determining how to make the date calculations correctly:
SELECT temp_feederList.Feeder, Temp_outagesInfo.causeType, Temp_outagesInfo.StormNameThunder, Temp_outagesInfo.deviceGroup, Temp_outagesInfo.beginTime, tbl_Trim_History.COMP, tbl_Trim_History.CYCLE
FROM (temp_feederList
LEFT JOIN Temp_outagesInfo ON temp_feederList.Feeder = Temp_outagesInfo.Feeder)
LEFT JOIN tbl_Trim_History ON Temp_outagesInfo.Feeder = tbl_Trim_History.CIRCUIT_ID;
I wasn't really able to figure out where I need to go from here to get that most recent entry and perform the mathematical comparison. I've never been asked to do SQL this complex before, so I want to thank all of you for your patience and any assistance you're willing to lend.
I'm making some assumptions, but this uses a subquery to give you rows in the feeder list where the previous completed date was greater than the number of months ago indicated by the cycle:
SELECT tbl_Trim_History.ID, tbl_Trim_History.feeder,
tbl_Trim_History.comp, tbl_Trim_History.cycle
FROM tbl_Trim_History
WHERE tbl_Trim_History.comp>
(SELECT Max(DateAdd("m", tbl_Trim_History.cycle, comp))
FROM tbl_Trim_History T2
WHERE T2.feeder = tbl_Trim_History.feeder AND
T2.comp < tbl_Trim_History.comp)
If you needed to check for longer than 36 months you could add an arbitrary value to the months calculated by the DateAdd function.
Also I don't know if the value of cycle specified the number of month from the prior cycle or the number of months to the next one. If the latter I would change tbl_Trim_History.cycle in the DateAdd function to just cycle.
SELECT tbl_trim_history.ID, tbl_trim_history.Feeder,
tbl_trim_history.Comp, tbl_trim_history.Cycle,
(select max(comp) from tbl_trim_history T
where T.feeder=tbl_trim_history.feeder and
t.comp<tbl_trim_history.comp) AS PriorComp,
IIf(DateDiff("m",[priorcomp],[comp])>36,"x") AS [Select]
FROM tbl_trim_history;
This query identifies (with an X in the last column) the records from tbl_trim_history that exceed the cycle time - but as noted in the comments I'm not entirely sure if this is what you need or not, or how to incorporate the other 2 tables. Once you see what it is doing you can modify it to only keep the records you need.

Quartile for subgroups in SQL Server 2008

I have a table with the times athletes of a sport club take to run a lap around the field . Each athlete has several entries in that table for each time they run and and for statistics purposed I need to gather some statistics regarding the time they take.
I already have the basic statistics like average time, median time, etc.... However I have no idea how to exactly do the bottom and top quartiles.
I see some examples for quartiles of a table if you just want the statistics of the whole table (in this case the whole club) but I have no idea how to make them for sub groups like distinct athletes of a table, could anyone give me point me on the right direction/give me an example?
The relevant data is in a very simple structure like this (there are more columns but in this case they don't matter)
LAP_ID | ATHLETE| TIME |
1 | Ath_X | 120 |
2 | Ath_Y | 160 |
3 | Ath_X | 90 |
4 | Ath_X | 80 |
5 | Ath_Z | 113 |
6 | Ath_X | 115 |
EDIT:There seems to be some misunderstanding, by Quartile I mean the 1st and 3rd Quartile, that is the place where it splits off the lowest 25% of data from the highest 75% and the place where it splits off the highest 25% of data from the lowest 75%.