format splunk query by renaming search elements - splunk

I could use a little help with a splunk query I’m trying to use.
This query works fine for gathering the info I need:
index=prd_aws_billing (source="/*2017-12.csv") LinkedAccountId="1234567810" OR LinkedAccountId="123456789" ProductName="Amazon Elastic Compute Cloud" | stats sum(UnBlendedCost) AS Cost by ResourceId,UsageType,user_Name,user_Engagement
However I’d like to refine that a bit. I’d like to represent user_Engagement as just Engagement and user_Name as “Resource Name”.
I tried using AS to change the output, like I did to change UnBlendedCost to just “Cost”. But when I do that it kills my query, and nothing is returned. For instance if I do either:
index=prd_aws_billing (source="/*2017-12.csv") LinkedAccountId="123456789" OR LinkedAccountId="1234567810" ProductName="Amazon Elastic Compute Cloud" | stats sum(UnBlendedCost) AS Cost by ResourceId AS “Resource Name”,UsageType,user_Name,user_Engagement AS “Engagement”
Or
index=prd_aws_billing (source="/*2017-12.csv") LinkedAccountId="123456789" OR LinkedAccountId="1234567819" ProductName="Amazon Elastic Compute Cloud" ResourceID AS “Resource Name” user_Engagement AS “Engagement” | stats sum(UnBlendedCost) AS Cost by ResourceId AS “Resource Name”,UsageType,user_Name,user_Engagement AS “Engagement”
The query dies, and no info is returned. How can I reformat the search elements listed after the 'by' clause?

Use the |rename command. You can only use AS to rename the fields that are being transformed in a |stats.
index=prd_aws_billing (source="/*2017-12.csv") LinkedAccountId="1234567810" OR LinkedAccountId="123456789" ProductName="Amazon Elastic Compute Cloud"
| stats sum(UnBlendedCost) AS Cost by ResourceId,UsageType,user_Name,user_Engagement
| rename user_Name as "Resource Name" user_Engagement as Engagement

Related

How to get stats from combined aggregated bin data in AWS Cloudwatch Logs Insights

I have some AWS CloudWatch logs which output values every 5 seconds. I'd like to get the max over a rolling 10 minute interval and then get the average value per day based on that. Using the CloudWatch Logs Insights QuerySyntax I cannot seem to get the result of the first bin aggregation to use in the subsequent bin. I tried:
fields #timestamp, #message
| filter #LogStream like /mylog/
| parse #message '*' as threadCount
| stats max(threadCount) by bin(600s) as maxThreadCount
| stats avg(maxThreadCount) by bin(24h) as avgThreadCount
But the query syntax is invalid for multiple stats functions. Combining the last two lines into one like:
| stats avg(max(threadCount) by bin(600s)) by bin(24h) as threadCountAvg
Also is invalid. I can't seem to find much in the AWS logs. Am I out of luck? Anyone know a trick?

Splunk query to take a search from one index and add a field's value from another index?

How can I write a Splunk query to take a search from one index and add a field's value from another index? I've been reading explanations that involve joins, subsearches, and coalesce, and none seem to do what I want -- even though the example is extremely simple. I am not sure what I am not understanding yet.
main-index has src field which is an IP address and a field I will restrict my results on. I will look over a short amount of time, e.g.
index="main-index" sourcetype="main-index-source" main-index-field="wildcard-restriction*" earliest=-1h | stats count by src
other-index has src_ip field which is an IP address, and has the hostname. It's DHCP leases, so I need to check a longer time frame, and return only the most recent result for a given IP address. I want to get back the hostname from src_nt_host, e.g.
index="other-index" sourcetype="other-index-sourcetype" earliest=-14d
I would like to end up with the following values:
IP address, other-index.src_nt_host, main-index.count
main-index has the smallest amount of records, if that helps for performance reasons.
If I understand you correctly, you need to look at two different time ranges in two different indices,
In that case, it is most likely to be true that a join will be needed
Here's one way it can be done:
index=ndx1 sourcetype=srctp1 field1="someval" src="*" earliest=-1h
| stats count by src
| join src
[| search index=ndx2 sourcetype=srctp2 field2="otherval" src_ip=* src_nt_host=* earliest=-14d
| stats count by src_ip src_nt_host
| fields - count
| rename src_i as src ]
You may need to flip the order of the searches, depending on how many results they each return, and how long they take to run.
You may also be able to achieve what you're looking for in another manner without the use of a join, but we'd need to have some sample data to possibly give a better result

How Can I Generate A Visualisation with Multiple Data Series In Splunk

I have been experimenting with Splunk, trying to emulate some basic functionality from the OSISoft PI Time Series database.
I have two data points that I wish to display trends for over time in order to compare fluctuations between them, specifically power network MW analogue tags.
In PI this is very easy to do, however I am having difficulty figuring out how to do it in Splunk.
How do I achieve this given the field values "SubstationA_T1_MW", & "SubstationA_T2_MW" in the field Tag?
The fields involved are TimeStamp, Tag, Value, and Status
Edit:
Sample Input and Output listed below:
I suspect you're going to be most interested in timechart for this
Something along the following lines may get you towards what you're looking for:
index=ndx sourcetype=srctp Value=* TimeStamp=* %NStatus=* (Tag=SubstationA_T1_MW OR Tag=SubstationA_T2_MW) earliest=-2h
| eval _time=strptime(TimeStamp,"%m/%d/%Y %H:%M:%S.%N")
| timechart span=15m max(Value) as Value by Tag
timechart relies on the internal, hidden _time field (which is in Unix epoch time) - so if _time doesn't match TimeStamp, you need the eval statement I added to convert from your TimeStamp to Unix epoch time in _time (which I've assumed is in mm/dd/yyyy format).
Also, go take the free, self-paced Splunk Fundamentals 1 class
Showing trends over time is done by the timechart command. The command requires times be expressed in epoch form in the _time field. Do that using the strptime function.
Of course, this presumes the data is indexed and fields extracted already.
index=foo
| eval _time = strptime(TimeStamp, "%m/%d/%Y %H:%M:%S.%3N")
| timechart max(Value) by Tag

Splunk : How can we get the combine result of cache and memory on splunk dashboard using splunk query

I am trying to get combine results of cache and memory utilization on splunk dashboard using splunk query , can someone please help me with splunk query
Assuming cache and memory utilization are seperate queries, and there are values as cache_util and memory_util
there could be couple ways to achieve the same thing:
(index=cache cache_util=*) OR (index=memory memory_util=*) | eval util=coalesce(cache_util,memory_util)" | timechart avg(util) by host
another way
index=cache cache_util=* | stats avg(cache_util) by host | appendcols [search index=memory memory_util=* | stats avg(memory_util) by host ]
if you can give some examples of your data or your search , better answer could be there.

How do you run a saved query from Big Query cli and export result to CSV?

I have a saved query in Big Query but it's too big to export as CSV. I don't have permission to export to a new table so is there a way to run the query from the bq cli and export from there?
From the CLI you can't directly access your saved queries as it's a UI-only feature as of now but, as explained here there is a feature request for that.
If you just want to run it once to get the results you can copy the query from the UI and just paste it when using bq.
Using the docs example query you can try the following with a public dataset:
QUERY="SELECT word, SUM(word_count) as count FROM publicdata:samples.shakespeare WHERE word CONTAINS 'raisin' GROUP BY word"
bq query $QUERY > results.csv
The output of cat results.csv should be:
+---------------+-------+
| word | count |
+---------------+-------+
| dispraisingly | 1 |
| praising | 8 |
| Praising | 4 |
| raising | 5 |
| dispraising | 2 |
| raisins | 1 |
+---------------+-------+
Just replace the QUERY variable with your saved query.
Also, take into account if you are using Standard or Legacy SQL with the --use_legacy_sql flag.
Reference docs here.
Despite what you may have understood from the official documentation, you can get large query results from bq query, but there are multiple details you have to be aware of.
To start, here's an example. I got all of the rows of the public table usa_names.usa_1910_2013 from the public dataset bigquery-public-data by using the following commands:
total_rows=$(bq query --use_legacy_sql=false --format=csv "SELECT COUNT(*) AS total_rows FROM \`bigquery-public-data.usa_names.usa_1910_2013\`;" | xargs | awk '{print $2}');
bq query --use_legacy_sql=false --max_rows=$((total_rows + 1)) --format=csv "SELECT * FROM \`bigquery-public-data.usa_names.usa_1910_2013\`;" > output.csv
The result of this command was a CSV file with 5552454 lines, with the first two containing header information. The number of rows in this table is 5552452, so it checks out.
Here's where the caveats come in to play:
Regardless of what the documentation might seem to say when it comes to query download limits specifically, those limits seem to only apply to the Web UI, meaning bq is exempt from them;
At first, I was using the Cloud Shell to run this bq command, but the number of rows was so big that streaming the result set into it killed the Cloud Shell instance! I had to use a Compute instance with at least the same resources that of an n1-standard-4 (4vCPUs, 16GiB RAM), and even with all of this RAM, the query took me 10 minutes to finish (note that the query itself runs server-side, it's just a problem of buffering the results);
I'm manually copy-pasting the query itself, as there doesn't seem to be a way to reference saved queries directly from bq;
You don't have to use Standard SQL, but you have to specify max_rows, because otherwise it'll only return you 100 rows (100 is the current default value of this argument);
You'll still be facing the usual quotas & limits associated with BigQuery, so you might want to run this as a batch job or not, it's up to you. Also, don't forget that the maximum response size for a query is 128 MiB, so you might need to break the query into multiple bq query commands in order to not hit this size limit. If you want a public table that's big enough to hit this limitation during queries, try the samples.wikipedia one from bigquery-public-data dataset.
I think that's about it! Just make sure you're running these commands on a beefy machine and after a few tries it should give you the result you want!
P.S.: There's currently a feature request to increase the size of CSVs you can download from the Web UI. You can find it here.