How to Cluster and create a timechart in splunk - splunk

I have a field with LogMsg error messages That I am grouping based on similarities using cluster.
What I am trying to achieve is a display that will show a timeseries with the grouped error
index="my_index_here" LogLevel=ERROR
| cluster showcount=t t=0.2 field=Message | eval "Error Count" = cluster_count
| head 10 | timechart count("Error Count") By LogMsg span=60m
The Idea is this
Get all the error Messages LogLevel=ERROR
Group the items based on Message field | cluster showcount=t t=0.2 field=Message | eval "Error Count" = cluster_count
Get top 10 results | head 10
Draw a timechart timechart count("Error Count") By LogMsg span=60m. The time chart should have a plot of number different error messages generated from the cluster against time, something like
Message
8.00
9:00
10.00
11:00
Unable to authenticate
90
40
30
60
Another Error
80
40
30
60
Yet another error
70
40
30
60
---
---
---
---
---
The 10th most frequent error
50
40
30
60
My approach above is not working returning a blank plot,

The way to debug SPL is to execute one pipe at time and verify the results before adding the next pipe.
One thing I believe you'll discover is the head command ruins the timechart. It's possible all of the top 10 results will be in the same hour so the results may be less than useful.
A common cause of a "blank plot" is a stats or timechart command that references a non-existent or null field. You should discover which field is null during the debug.
FWIW, here's a run-anywhere query similar to yours that produces a plot.
index=_internal log_level=INFO
| cluster showcount=t t=0.2 field=event_message
| eval "Error Count" = cluster_count
| head 10
| timechart count("Error Count") By group span=60m

Related

Conditionally remove a field in Splunk

I have a table generated by chart that lists the results of a compliance scan
These results are typically Pass, Fail, and Error - but sometimes there is "Unknown" as a response
I want to show the percentage of each (Pass, Fail, Error, Unknown), so I do the following:
| fillnull value=0 Pass Fail Error Unknown
| eval _total=Pass+Fail+Error+Unknown
<calculate percentages for each field>
<append "%" to each value (Pass, Fail, Error, Unknown)>
What I want to do is eliminate a "totally" empty column, and only display it if it actually exists somewhere in the source data (not merely because of the fillnull command)
Is this possible?
I was thinking something like this, but cannot figure out the second step:
| eventstats max(Unknown) as _unk
| <if _unk is 0, drop the field>
edit
This could just as easily be reworded to:
if every entry for a given field is identical, remove it
Logically, this would look something like:
if(mvcount(values(fieldname))<2), fields - fieldname
Except, of course, that's not valid SPL
could you try that logic after the chart :
``` fill with null values ```
| fillnull value=null()
``` do 90° two time, droping empty/null ```
| transpose 0 include_empty=false | transpose 0 header_field=column | fields - column
[edit:] it is working when I do the following but not sure it is easy to make it working on all conditions
| stats count | eval keep=split("1 2 3 4 5"," ") | mvexpand keep
| table keep nokeep
| fillnull value=null()
| transpose 0 include_empty=false | transpose 0 header_field=column | fields - column
[edit2:] and if you need to add more null() could be done like that
| stats count | eval keep=split("1 2 3 4 5"," "), nokeep=0 | mvexpand keep
| table keep nokeep
| foreach nokeep [ eval nokeep=if(nokeep==0,null(),nokeep) ]
| transpose 0 include_empty=false | transpose 0 header_field=column | fields - column

Dynamic creation of Table type

I have a column table with a single column.
I would like to create a table type with all the elements in the column of the above mentioned table as column names with fixed datatype and size and use it in a function.
similarly like below:
Dynamic creation of table in tsql
Any suggestions would be appreciated.
EDIT:
To finish a product, a machine has to perform different Jobs on the material with different tools.
I have a list of Jobs a machine can perform and a list of Tools. a specific tool for a specific Job.
Each job needs a specific tool and number of hours (to change the tool once it reached its change time). A Job can be performed many times on a product. (in this case if a Job is performed for 1 hour = tool has been used for 1 hour)
For each product, a set of tools will be at work in a sequence. so I Need a report for each product, number of hours the tool has worked.
EDIT 2:
Product table
---------+-----+
ProductID|Jobs |
---------+-----+
1 | job1 |
1 | job2 |
1 | job3 |
1 | . |
1 | . |
1 |100th |
2 | job1 |
2 | . |
2 | . |
2 |200th |
Jobs table
-------+-------+-------
Jobs | tool | time
-------+-------+-------
job1 |tool 10| 2
job1 |tool 09| 1
job2 |tool 11| 4
job3 |tool 17| 0.5
required report (this table does not physically exist)
----------+------+------+------+------+------+-----
productID | job1 | job2 | job3 | job4 | job5 | . . .
----------+------+------+------+------+------+------
1 | 20 | 10 | 5 | . | . | .
----------+------+------+------+------+------+------
2 | 10 | 13 | 5 | . | . | .
----------+------+------+------+------+------+------
Based on the added information, there are two main requirements here:
You want to sum up the time spent for producing each product grouped by the jobs involved
and
You want to have a cross-table report showing the times from step 1 against products and jobs.
For the first bit, you probably could do this with a query like this:
SELECT
p.product_id,
j.jobs,
SUM(j.time) as SUM_TIME
FROM
products p
INNER JOIN jobs j
ON p.jobs = j.jobs
GROUP BY
p.product_id,
j.jobs;
For the second part: this is usually called a PIVOT report.
SAP HANA does not provide a dynamic SQL command for generating output in this form (other DBMS have that).
However, this dynamic transformation is usually relevant for the data presentation and not so much for the processing.
So, as you probably want to use some form of front end for this report (e.g. MS Excel, Crystal Reports, Business Objects X, Tableau, ...) I would recommend doing the transformation and formatting in the frontend report. Look for "PIVOT" or "CROSSTAB" options to do that.

Creating a session ID and applying it to events

I have event data from an app that helps tell me what people are doing inside my app.
userID|timestamp |name | value |
A | 1 |Launch | 23 |
A | 3 |ClickButton| Header|
B | 2 |Launch | 10 |
B | 5 |ClickBanner| ad |
etc
I am defining a Session as anytime someone has been out of the app for more than 5 minutes, the next entry is a new session. So if you come back in after 4 minutes, it is still the same session.
I use a lag to select the previous launch timestamp, add the value of time in seconds for that and then take the difference for the next launch. So I can select the first timestamp for each 'Session'
Now I need to map each non Launch event back to the session it is a part of so I can easily analyze things such as 'What percent of sessions include an ad click?'
I'm pulling my data using HIVE and am not having success finding an efficient way to do this as my dataset is fairly large.

Rails, SQL: private chat, how to find last message in each conversation

I'v got the folowing schema
+----+------+------+-----------+---------------------+--------+
| id | from | to | message | timestamp | readed |
+----+------+------+-----------+---------------------+--------+
| 46 | 2 | 6 | 123 | 2013-11-19 19:12:19 | 0 |
| 44 | 2 | 3 | 123 | 2013-11-19 19:12:12 | 0 |
| 43 | 2 | 1 | ????????? | 2013-11-19 18:37:11 | 0 |
| 42 | 1 | 2 | adf | 2013-11-19 18:37:05 | 0 |
+----+------+------+-----------+---------------------+--------+
from/to is the ID of the user's, message – obviously, the message, timestamp and read flag.
When user open's his profile I want him to see the list of dialogs he participated with last message in this dialog.
To find a conversation between 2 people I wrote this code, it's simple (Message model):
def self.conversation(from, to)
where(from: [from, to], to: [from, to])
end
So, I can now sort the messages and get the last one. But it's not cool to fire a lot of queries for each dialog.
How could I achieve the result I'm looking for with less queries?
UPDATE:
Ok, looks like it's not really clear, what I'm trying to achieve.
For example, 4 users – Kitty, Dandy, Beggy and Brucy used that chat.
When Brucy entered in dialogs, she shall see
Beggy: hello brucy haw ar u! | <--- the last message from beggy
-------
Dandy: Hi brucy! | <---- the last message from dandy
--------
Kitty: Hi Kitty, my name is Brucy! | <–– this last message is from current user
So, three separated dialogs. Then, Brucy can enter anyone dialog to continue private conversation.
And I can't figured out how could I fetch this records without firing a query for each dialog between users.
This answer is a bit late, but there doesn't seem to be a great way to do this, in Rails 3.2.x at least.
However, here is the solution I came up with
(as I had the same problem on my website).
#sender_ids =
Message.where(recipient_id: current_user.id)
.order("created_at DESC")
.select("DISTINCT owner_id")
.paginate(per_page: 10, page: params[:page])
sql_queries =
#sender_ids.map do |user|
user_id = user.owner_id
"(SELECT * FROM messages WHERE owner_id = #{user_id} "\
"AND recipient_id = #{current_user.id} ORDER BY id DESC "\
"LIMIT 1)"
end.join(" UNION ALL ")
#messages = Message.find_by_sql(sql_queries)
ActiveRecord::Associations::Preloader.new(#messages, :owner).run
This gets the last 10 unique people you sent messages to.
For each of those people, it creates a UNION ALL query to get the last message sent to each of those 10 unique people. With 50, 000 rows, the query completes in about ~20ms. And of course to get assocations to preload, you have to use .includes will not work when using .find_by_sql
def self.conversation(from, to)
order("timestamp asc").last
end
Edit:
This railscast will be helpful..
http://railscasts.com/episodes/316-private-pub?view=asciicast
EDIT2:
def self.conversation(from, to)
select(:from, :to, :message).where(from: [from, to], to: [from, to]).group(:from, :to, :country).order("timestamp DESC").limit(1)
end

Date Join Query with Calculated Fields

I'm creating an Access 2010 database to replace an old Paradox one. Just now getting to queries, and there is no hiding that I am a new to SQL.
What I am trying to do is set up a query to be used by a graph. The graph's Y axis is to be a simple percentage passed, and the X axis is a certain day. The graph will be created on form load and subsequent new records entered with a date range of "Between Date() And Date()-30" (30 days, rolling).
The database I'm working with can have multiple inspections per day with multiple passes and multiple fails. Each inspection is a separate record.
For instance, on 11/26/2012 there were 7 inspections done; 5 passed and 2 failed, a 71% ((5/7)*100%) acceptance. The "11/26/2012" and "71%" represent a data point on the graph. On 11/27/2012 there were 8 inspections done; 4 passed and 4 failed, a 50% acceptance. Etc.
Here is an example of a query with fields "Date" and "Disposition" of date range "11/26/2012 - 11/27/2012:"
SELECT Inspection.Date, Inspection.Disposition
FROM Inspection
WHERE (((Inspection.Date) Between #11/26/2012# And #11/27/2012#) AND ((Inspection.Disposition)="PASS" Or (Inspection.Disposition)="FAIL"));
Date | Disposition
11/26/2012 | PASS
11/26/2012 | FAIL
11/26/2012 | FAIL
11/26/2012 | PASS
11/26/2012 | PASS
11/26/2012 | PASS
11/26/2012 | PASS
11/27/2012 | PASS
11/27/2012 | PASS
11/27/2012 | FAIL
11/27/2012 | PASS
11/27/2012 | FAIL
11/27/2012 | PASS
11/27/2012 | FAIL
11/27/2012 | FAIL
*NOTE - The date field is of type "Date," and the Disposition field is of type "Text." There are days where no inspections are done, and these days are not to show up on the graph. The inspection disposition can also be listed as "NA," which refers to another type of inspection not to be graphed.
Here is the layout I want to create in another query (again, for brevity, only 2 days in range):
Date | # Insp | # Passed | # Failed | % Acceptance
11/26/2012 | 7 | 5 | 2 | 71
11/27/2012 | 8 | 4 | 4 | 50
What I think needs to be done is some type of join on the record dates themselves and "calculated fields" in the rest of the query results. The problem is
that I haven't found out how to "flatten" the records by date AND maintain a count of the number of inspections and the number passed/failed all in one query. Do I need multiple layered queries for this? I prefer not to store any of the queries as tables as the only use of these numbers is in graphical form.
I was thinking of making new columns in the database to get around the "Disposition" field being Textual by assigning a PASS "1" and a FAIL "0," but this seems like a cop-out. There has to be a way to make this work in SQL, just I haven't found applicable examples.
Thanks for your help! Any input or suggestions are appreciated! Example databases with forms, queries, and graphs are also helpful!
You could group by Date, and then use aggregates like sum and count to calculate statistics for that group:
select Date
, count(*) as [# Insp]
, sum(iif(Disposition = 'PASS',1,0)) as [# Passed]
, sum(iif(Disposition = 'FAIL',1,0)) as [# Failed]
, 100.0 * sum(iif(Disposition = 'PASS',1,0)) / count(*) as [% Acceptance]
from YourTable
where Disposition in ('PASS', 'FAIL')
group by
Date