How Can I Generate A Visualisation with Multiple Data Series In Splunk - splunk

I have been experimenting with Splunk, trying to emulate some basic functionality from the OSISoft PI Time Series database.
I have two data points that I wish to display trends for over time in order to compare fluctuations between them, specifically power network MW analogue tags.
In PI this is very easy to do, however I am having difficulty figuring out how to do it in Splunk.
How do I achieve this given the field values "SubstationA_T1_MW", & "SubstationA_T2_MW" in the field Tag?
The fields involved are TimeStamp, Tag, Value, and Status
Edit:
Sample Input and Output listed below:

I suspect you're going to be most interested in timechart for this
Something along the following lines may get you towards what you're looking for:
index=ndx sourcetype=srctp Value=* TimeStamp=* %NStatus=* (Tag=SubstationA_T1_MW OR Tag=SubstationA_T2_MW) earliest=-2h
| eval _time=strptime(TimeStamp,"%m/%d/%Y %H:%M:%S.%N")
| timechart span=15m max(Value) as Value by Tag
timechart relies on the internal, hidden _time field (which is in Unix epoch time) - so if _time doesn't match TimeStamp, you need the eval statement I added to convert from your TimeStamp to Unix epoch time in _time (which I've assumed is in mm/dd/yyyy format).
Also, go take the free, self-paced Splunk Fundamentals 1 class

Showing trends over time is done by the timechart command. The command requires times be expressed in epoch form in the _time field. Do that using the strptime function.
Of course, this presumes the data is indexed and fields extracted already.
index=foo
| eval _time = strptime(TimeStamp, "%m/%d/%Y %H:%M:%S.%3N")
| timechart max(Value) by Tag

Related

Show message chain in search

I have a message thread, these messages are coming on splunk.
The chain consists of ten different messages: five messages from one system, five messages from another (backup) system.
Messages from the primary system use the same SrcMsgId value, and messages from the backup system are combined with a common SrcMsgId.
Messages from the standby system also have a Mainsys_srcMsgId value - this value is identical to the main system's SrcMsgId value.
The message chain from the backup system enters the splunk immediately after the messages from the main system.
Tell me how can I display a chain of all ten messages? Perhaps first messages from the first system (main), then from the second (backup) with the display of the time of arrival at the server.
With time, I understand, I will include _time in the request. I got a little familiar with the syntax of queries, but still I still have a lot of difficulties with creating queries.
Please help me with an example of the correct request.
Thank you in advance!
You're starting with quite a challenging query! :-)
To combine the two chains, they'll need a common field. The SrcMsgId field won't do since it can represent different message chains. What you can do is create a new common field using Mainsys_srcMsgId, if present, and SrcMsgId. Then link the messages via that field using streamstats. Finally sort by the common field to put them together. Here's an untested sample query:
index=foo
```Get Mainsys_srcMsgId, if it exists; otherwise, get SrcMsgId```
| eval joiner = coalesce(Mainsys_srcMsgId, SrcMsgId)
| streamstats count by joiner
```Find the earliest event for each chain so can sort by it later```
| eventstats min(_time) as starttime by joiner
```Order the results by time, msgId, sequence
| sort starttime joiner count
```Discard our scratch fields```
| fields - starttime joiner count

Use sub-second precision on "earliest" in Splunk query

I have a Splunk search string. If I add earliest=10/05/2020:23:59:58, the search string still works. However, if I changed that to earliest=10/05/2020:23:59:58:01, I got an error message say invalid value "10/05/2020:23:59:58:01" for time term 'earliest'. Does that mean Splunk's earliest parameter's precision is to second only? I cannot find the answer in their documents.
Thanks!
Yes, earliest's precision is limited to "standard" Unix epoch time (ie the number of elapsed seconds since the dawn of Unix (arbitrarily set to 01 Jan 1970 00:00:01 (or, sometimes, 31 Dec 1969 23:59:59))) because the _time field holds whole-number seconds.
Splunk knows how to convert timestamps seen with more precision than mere seconds, but that does not mean _time natively holds them.
_time, and, therefore, anything that references it (like earliest) does not understand subsecond precision. For that, you will need to have another field that contains it in your event.
For millisecond search time, include timeformat=%m/%d/%Y:%H:%M:%S:%3N together with your earliest=10/05/2020:23:59:58:01.

How to unnest Google Analytics custom dimension in Google Data Prep

Background story:
We use Google Analytics to track user behaviour on our website. The data is exported daily into Big Query. Our implementation is quite complex and we use a lot of custom dimensions.
Requirements:
1. The data needs to be imported into our internal databases to enable better and more strategic insights.
2. The process needs to run without requiring human interaction
The problem:
Google Analytics data needs to be in a flat format so that we can import it into our database.
Question: How can I unnest custom dimensions data using Google Data Prep?
What it looks like?
----------------
customDimensions
----------------
[{"index":10,"value":"56483799"},{"index":16,"value":"·|·"},{"index":17,"value":"N/A"}]
What I need it to look like?
----------------------------------------------------------
customDimension10 | customDimension16 | customDimension17
----------------------------------------------------------
56483799 | ·|· | N/A
I know how to achieve this using a standard SQL query in Big Query interface but I really want to have a Google Data Prep flow that does it automatically.
Define the flat format and create it in BigQuery first.
You could
create one big table and repeat several values using CROSS JOINs on all the arrays in the table
create multiple tables (per array) and use ids to connect them, e.g.
for session custom dimensions concatenate fullvisitorid / visitstarttime
for hits concatenate fullvisitorid / visitstarttime / hitnumber
for products concatenate fullvisitorid / visitstarttime / hitnumber / productSku
The second options is a bit more effort but you save storage because you're not repeating all the information for everything.

format splunk query by renaming search elements

I could use a little help with a splunk query I’m trying to use.
This query works fine for gathering the info I need:
index=prd_aws_billing (source="/*2017-12.csv") LinkedAccountId="1234567810" OR LinkedAccountId="123456789" ProductName="Amazon Elastic Compute Cloud" | stats sum(UnBlendedCost) AS Cost by ResourceId,UsageType,user_Name,user_Engagement
However I’d like to refine that a bit. I’d like to represent user_Engagement as just Engagement and user_Name as “Resource Name”.
I tried using AS to change the output, like I did to change UnBlendedCost to just “Cost”. But when I do that it kills my query, and nothing is returned. For instance if I do either:
index=prd_aws_billing (source="/*2017-12.csv") LinkedAccountId="123456789" OR LinkedAccountId="1234567810" ProductName="Amazon Elastic Compute Cloud" | stats sum(UnBlendedCost) AS Cost by ResourceId AS “Resource Name”,UsageType,user_Name,user_Engagement AS “Engagement”
Or
index=prd_aws_billing (source="/*2017-12.csv") LinkedAccountId="123456789" OR LinkedAccountId="1234567819" ProductName="Amazon Elastic Compute Cloud" ResourceID AS “Resource Name” user_Engagement AS “Engagement” | stats sum(UnBlendedCost) AS Cost by ResourceId AS “Resource Name”,UsageType,user_Name,user_Engagement AS “Engagement”
The query dies, and no info is returned. How can I reformat the search elements listed after the 'by' clause?
Use the |rename command. You can only use AS to rename the fields that are being transformed in a |stats.
index=prd_aws_billing (source="/*2017-12.csv") LinkedAccountId="1234567810" OR LinkedAccountId="123456789" ProductName="Amazon Elastic Compute Cloud"
| stats sum(UnBlendedCost) AS Cost by ResourceId,UsageType,user_Name,user_Engagement
| rename user_Name as "Resource Name" user_Engagement as Engagement

Getting table creation time in Big Query

How do you get the creation time for a table in the dataset?
bq show my_project:my_dataset.my_table
gives you
Table my_project:my_dataset.my_table
Last modified Schema Total Rows Total Bytes Expiration
----------------- ------------------ ------------ ------------- ------------
**16 Oct 14:47:41** |- field1: string 3 69
|- field2: string
|- field3: string
We can use the "Last Modified" date but its missing the year!. Also there needs to be a cryptic log applied to parse the date out.
Is this meta information available through any other specific 'bq' based commands?
I am looking to use this information to determine a appropriate table decorator that can be used on the table since it seems like if the decorator is going back 4 hours (on recurring basis) and the table/partition has existed for only 3hrs the query errors out.
Ideally it would be nice if the decorator usage defaults the time window to "now - table creation time" if the specified window was larger than "now-table creation time".
FWIW this information is available in the API, which the bq tool calls under the covers: https://developers.google.com/bigquery/docs/reference/v2/tables#resource
If you use bq --format=json you can get the information easily:
$ bq --format=prettyjson show publicdata:samples.wikipedia
{
"creationTime": "1335916132870", ...
}
This is the exact value to use in the table decorator.
While I'm not sure that I like the idea of having a 'really low start value' be interpreted as table creation time, I've got other options:
Table#0 means table at creation time
Table#0 means table at the earliest time at which a snapshot is available.
I'm leaning towards #2, since snapshots can only go back 7 days in time.