How to get a count of events by IP for each day of the past week, then calculate a daily average of count over 3 days by IP as well as over 7 days - splunk

not sure if I articulated my problem well in the title but let me elaborate here. I need to find where IPs have a daily average count from the past 3 days that is at least 150% larger than a daily average count from the past 7 days. I am looking for spikes in activity based on those two averages. With the way I phrased it, that may sound confusing, but let me show you what I have and why I'm having issues calculating the averages.
| index=blah_blah
| earliest=-7d
| bucket _time span=1d
| stats count by ip _time
| sort ip
| trendline sma3(count) as 3_Day_Average
| trendline sma7(count) as 7_Day_Average
| where 3_Day_Average > 7_Day_Average * 1.5
This provides incorrect averages because if an IP doesn't have a count on a particular day, it won't include that day in the statistics table and it won't be calculated into the average. Instead, it will use a different IP's count to fill in. So if one IP doesn't have a count for 2 of the 7 days for example, then it will take 2 counts from the next IP and calculate that into the average for the original IP that was missing 2 days... I'm hoping that all makes sense. I need the days that don't have counts to still show so that they can be calculated into these averages. If this doesn't make sense to you, feel free to ask questions. I appreciate the help

Instead of stats, try timechart. The timechart command will fill in zeros for spans that have no data.
| index=blah_blah earliest=-7d
| timechart span=1d count by ip
| untable _time ip count
| sort ip
| trendline sma3(count) as 3_Day_Average
| trendline sma7(count) as 7_Day_Average
| where 3_Day_Average > 7_Day_Average * 1.5

Related

Group event counts by hour over time

I currently have a query that aggregates events over the last hour, and alerts my team if events are over a specific threshold. The query was recently accidentally disabled, and it turns out there were times when the alert should have fired but did not.
My goal is apply this alert query logic to the previous month, and determine how many times the alert would have fired, had it been functional. However, I am having a hard time figuring out how best to group these. In pseudo code I basically I would have (running over a 30 day time frame) :
index="some_index" | where count > n | group by hour
Hopefully this makes sense, if not, I am happy to provide some clarification.
Thanks in advance
This should get you started:
index=foo | bin span=1h _time | stats count by _time | where count > n

Splunk - How can I get accumulative vales for a day for a period of time?

One of the things I'm using Splunk to monitor is electricity usage, one of the fields indexed is the accumulative Kw value for the day, how can I get the last value for the day for a given timespan? So output the total Kw for each day for a month - I've tried using
host=Electricity earliest=-4w#w1 latest=+w#w1 | timechart last(live_day_kw) as Kw
but for the data I have it seems to be adding each day together so its increasing day on day and not daily values, so for example day1 is 7kw and day2 is 14kw and day3 is 21kw - I'd expect it to be ~7kw a day. Also just checked and the live_day_kw value does reset to zero at midnight
Not quite sure of what you're looking for, but maybe this will help.
host=Electricity earliest=-4w#w1 latest=+w#w1 | timechart span=1d last(live_day_kw) as Kw
For the benefit of those looking for the same solution I managed to solve it thus:
host=Electricity earliest=-4w#w1 | timechart latest(live_day_kw) as "Kw_Day" | eval Kw_Day = round(Kw_Day,2)
Also needed the search set to 'month to date' and it get exactly what I needed.

Calculating interest using SQL

I am using PostgreSQL, and have a table for a billing cycle and another for payments made in a billing cycle.
I am trying to figure out how to calculate interest based on how much amount was left after each billing cycle's last payment date. Problem is that every time a repayment is made, the interest has to be calculated on the amount remaining after that.
My thoughts on building this query are like this. Build data for all dates from last pay date of the billing cycle to today. Using partitioning, get the remaining amount for the first date. For second date, use amount from previous row and add interest to it, and then calculate interest on this one.
Unfortunately I am stuck just at the thought and can't figure out how to make this into a query!
Here's some sample data to make things easier to understand.
Billing Cycles:
id | ends_at
-----+---------------------
1 | 2017-11-30
2 | 2017-11-30
Payments:
amount | billing_cycle_id | type | created_at
-----------+------------------+---------+----------------------------
6000.0000 | 1 | payment | 2017-11-15 18:40:22.151713
2000.0000 | 1 |repayment| 2017-11-19 11:45:15.6167
2000.0000 | 1 |repayment| 2017-12-02 11:46:40.757897
So if we see, user made a repayment on the 19th, so amount due for interest post ends date(30th Nov 2017), is only 4000. So, from 30th to the 2nd, interest will be calculated daily on 4000. However, from the 2nd, interest needs to be calculated on 2000 only.
Interest Calculations(Today being 2017-12-04):
date | amount | interest
------------+---------+----------
2017-12-01 | 4000 | 100 // First day of pending dues.
2017-12-02 | 2100 | 52.5 // Second day of pending dues.
2017-12-03 | 2152.5 | 53.8125 // Third day of pending dues.
2017-12-04 |2206.3125| // Fourth's day interest will be added tomorrow
Your data is too sparse. It doesn't make any sense to need to write this query, because over time the query will get significantly more complicated. What happens when interest rates change over time?
The table itself (or a secondary table, depending on how you want to structure it) could have a running balance you add every time a deposit / withdrawal is made. (I suggest this table be add-only) Otherwise you're making both the calculation and accounting far harder on yourself than it should be. Even with the way you've presented the problem here, there's not enough information to do the calculation. (interest rate is missing) When that's the case, your stored procedure is going to be too complicated. Complicated means bugs, and people get irritated about bugs when you're talking about their money.

Crystal reports formula for getting an average of group Minimum and Maximum running totals

First question here.
My question is how do I get an average of running total minimums and an average of running total maximums? I'm thinking I will need to use a formula rather than running totals but I don't know what that formula is.
I'm writing a crystal report that gives occurrences of a thing over time. How many times did a thing happen each month for a year grouped by month. Also, how long did it take to happen? Average time, Minimum Time, Maximum Time.
Year | Month | How Many | Average? (days) | Minimum Time | Maximum Time
2017
January | 15 | 5 | 2 | 16
February | 7 | 4 | 1 | 10
March | 20 | 6 | 4 | 12
Average | 14 | 5 | 2.33 | 12.66
I am using a running total in the month groups to get the average, min, and max for each month.
But when I get to the average for all the groups, I want the average for all the minimums and the average for the maximums. I don't want the minimum for the year which would be 1. I could use a running total for that. I want 2.33. Crystal doesn't let me to a running total average on running total minimums.
I hope that makes sense. Thanks in advance for your help.
I would use a Formula Field to create a variable to accumulate values and then a second Formula Field to count the number of times it has accumulated a value. Then you divide the accumulated total by the counter to get your average.
The formula for the accumulator would be...
WhilePrintingRecords;
Shared Numbervar MinsAccumulator := MinsAccumulator + {#Minimum_Time};
The formula for the counter would be...
WhilePrintingRecords;
Shared Numbervar counter = counter + 1;
Do keep in mind though, if the section you use this within is repeated at all, then you will also need a formula field to reset the values of your variables to zero. This would look like this...
WhilePrintingREcords;
Shared Numbervar counter = 0;
Just drop each formula field into the section where you want them to be evaluated and then suppress the fields so they don't display. Then you can create additional formula fields to display the value of the variable you want to use in your report. here is the formula for displaying the variable.
Shared Numbervar counter;
counter;
If you aren't familiar with using variables in Crystal Report just respond back and I can explain them in more detail. They can be a little tricky at first.

Designing a scalable points leaderboard system using SQL Server

I'm looking for suggestions for scaling a points leaderboard system. I already have a working version using a very normalized strategy. This first version was essentially a table which looked something like this.
UserPoints - PK: (UserId,Date)
+------------+--------+---------------------+
| UserId | Points | Date |
+------------+--------+---------------------+
| 1 | 10 | 2011-03-17 07:16:36 |
| 2 | 35 | 2011-03-17 08:09:26 |
| 3 | 40 | 2011-03-17 08:05:36 |
| 1 | 65 | 2011-03-17 09:01:37 |
| 2 | 16 | 2011-03-17 10:12:35 |
| 3 | 64 | 2011-03-17 12:51:33 |
| 1 | 300 | 2011-03-17 12:19:21 |
| 2 | 1200 | 2011-03-17 13:24:13 |
| 3 | 510 | 2011-03-17 17:29:32 |
+------------+--------+---------------------+
I then have a stored procedure which basically does a GroupBy UserID and Sums the Points. I can also pass #StartDate and #EndDate parameters to create a leaderboard for a specific time period. For example, time windows for Top Users for the Day / Week / Month / Lifetime.
This seemed to work well with a moderate amount of data, but things became noticeably slower as the number of points records passed a million or so. The test data I'm working with is just over a million point records created by about 500 users distributed over a timespan of 3 months.
Is there a different way to approach this? I have experimented with denormalizing the data by pre-grouping the points into hour datetime buckets to reduce the number of rows. But I'm starting to think the real problem I need to worry about is the increasing number of users that need to be accounted for in the leaderboard. The time window sizes will generally be small but more and more users will start generating points within any given window.
Unfortunately I don't have access to 'Jobs' since I'm using SQL Azure and the Agent is not available (yet). But, I am open to the idea of scaling this using a different storage system if you are convincing enough.
My past work experience tells me I should look into data warehousing since this is almost a reporting problem. But at the same time I need it to be as real-time as possible.
Update
Ultimately, I would like to support custom leaderboards that could span from Monday 8am - Friday 6pm every week. But that's down the road and why I'm trying to not get too fancy with the aggregation. I'm willing to settle with basic Day/Week/Month/Year/AllTime windows for now.
The tricky part is that I really can't store them denormalized because I need these windows to be TimeZone convertible. The system is mult-tenant and therefore all data is stored as UTC. The problem is a week starts at different hours for different customers. Aggregating the sums together will cause some points to fall into the wrong buckets.
here are a few thoughts:
Sticking with SQL Azure: you can have another table, PointsTotals. Every time you add a row to your UserPoints table, also increment the TotalPoints value for a given UserId in PointsTotals (or insert a new row if they don't have a row to increment). Now you always have totals computed for each UserId.
Going with Azure Table Storage: Create a UserPoints table, with Partition Key being userId. This keeps all of a user's points rows together, where you'd easily be able to sum them. And... you can borrow the idea from suggestion #1, creating a separate PointsTotals table, with PartitionKey being UserId and RowKey probably being the total points.
If it were my problem, I'd ignore the timestamps and store the user and points totals by day
I decided to go with the idea of storing points along with a timespan (StartDate and EndDate columns) localized to the customer's current TimeZone setting. I realized an extra benefit with this is that I can 'purge' old leaderboard round data after a few monts without affecting the lifetime total of points.