Splunk bucket name conversion to epoch to human script - splunk

I'm facing the following issue: I have some frozen bucket in a Splunk enviroment that are saved in epoch format. More specifically the template is:
db_1181756465_1162600547_1001
that, if converted, return to me the end date, which is in the first number, and the start one, that is in the second one. So, it means that, base on my example:
1181756465 = Wednesday 13 June 2007 17:41:05
1162600547 = Saturday 4 November 2006 00:35:47
Now, ho to convert in human is clear for me, also because if not i coudn't put the translation here. My problem is that I have file full of bucket name that must be converted, with hunderds of entry; so, I'm asking if there is a script or other way to authomatize this conversion and print the output in a file. The idea is to have the final oputput with somehting like that:
db_1181756465_1162600547_1001 = Wednesday 13 June 2007 17:41:05 - Saturday 4 November 2006 00:35:47

You could use Splunk to view these values. They are outputted from the dbinspect command which provides startEpoch & endEpoch times for the frozen bucket
| dbinspect index=* state=frozen
| eval startDate=strftime(startEpoch,"%A %d %B %Y %H:%M:%S")
| eval endDate=strftime(endEpoch,"%A %d %B %Y %H:%M:%S")
| fields index, path, startDate, endDate
listing example using hot buckets since I don't have frozen in this test system
If you just have the list of folder names you can upload it to a Splunk instance as CSV and do some processing to extract startDate & endDate
| makeresults
| eval frozenbucket="db_1181756465_1162600547_1001"
| eval temp=split(frozenbucket,"_")
| eval sDate=mvindex(temp,2)
| eval eDate=mvindex(temp,1)
| eval startDate=strftime(sDate,"%A %d %B %Y %H:%M:%S")
| eval endDate=strftime(eDate,"%A %d %B %Y %H:%M:%S")
| fields frozenbucket,startDate,endDate
| fields - _time

Related

Validate and change different date formats in pyspark

This is the continues of this problem(Validate and change the date formats in pyspark)
In the above scenario the solution was perfect but what if I have timestamp date formats and some more different date formats like below.
df = sc.parallelize([['12-21-2006'],
['05/30/2007'],
['01-01-1984'],
['22-12-2017'],
['12222019'],
['2020/12/23'],
['2020-12-23'],
['12.11.2020'],
['22/02/2012'],
['2020/12/23 04:50:10'],
['12/23/1996 05:56:20'],
['23/12/2002 10:30:50'],
['24.12.1990'],
['12/03/20']]).toDF(["Date"])
df.show()
+-------------------+
| Date|
+-------------------+
| 12-21-2006|
| 05/30/2007|
| 01-01-1984|
| 22-12-2017|
| 12222019|
| 2020/12/23|
| 2020-12-23|
| 12.11.2020|
| 22/02/2012|
|2020/12/23 04:50:10|
|12/23/1996 05:56:20|
|23/12/2002 10:30:50|
| 24.12.1990|
| 12/03/20|
+-------------------+
When I tried the same way of solving this(Validate and change the date formats in pyspark). There is an error which I am getting. As far as I know the error is due to time stamp formats and the record with similar MM/dd/yyy, dd/MM/yyyy are notable to convert into the required format.
sdf = df.withColumn("d1", F.to_date(F.col("Date"),'yyyy/MM/dd')) \
.withColumn("d2", F.to_date(F.col("Date"),'yyyy-MM-dd')) \
.withColumn("d3", F.to_date(F.col("Date"),'MM/dd/yyyy')) \
.withColumn("d4", F.to_date(F.col("Date"),'MM-dd-yyyy')) \
.withColumn("d5", F.to_date(F.col("Date"),'MMddyyyy')) \
.withColumn("d6", F.to_date(F.col("Date"),'MM.dd.yyyy')) \
.withColumn("d7", F.to_date(F.col("Date"),'dd-MM-yyyy')) \
.withColumn("d8", F.to_date(F.col("Date"),'dd/MM/yy')) \
.withColumn("d9", F.to_date(F.col("Date"),'yyyy/MM/dd HH:MM:SS'))\
.withColumn("d10", F.to_date(F.col("Date"),'MM/dd/yyyy HH:MM:SS'))\
.withColumn("d11", F.to_date(F.col("Date"),'dd/MM/yyyy HH:MM:SS'))\
.withColumn("d12", F.to_date(F.col("Date"),'dd.MM.yyyy')) \
.withColumn("d13", F.to_date(F.col("Date"),'dd-MM-yy')) \
.withColumn("result", F.coalesce("d1", "d2", "d3", "d4",'d5','d6','d7','d8','d9','d10','d11','d12','d13'))
sdf.show()
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 34.0 failed 1 times, most recent failure: Lost task 0.0 in stage 34.0 (TID 34, ip-10-191-0-117.eu-west-1.compute.internal, executor driver): org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '01-01-1984' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.
Is there any better way of solving this? I just want to know the perfect function library that can convert any kind of date format into a single date format.

Divide two timecharts in Splunk

I want to divide two timecharts (ideally to look also like a timechart, but something else that emphasizes the trend is also good).
I have two types of URLs and I can generate timecharts for them like this:
index=my-index sourcetype=access | regex _raw="GET\s/x/\w+" | timechart count
index=my-index sourcetype=access | regex _raw="/x/\w+/.*/\d+.*\s+HTTP" | timechart count
The purpose is to emphasize that the relative number of URLs of the second type is increasing and the relative number of URLs of the first type is decreasing.
This is why I want to divide them (ideally the second one by the first one).
For example, if the first series generates 2, 4, 8, 4 and the second one generates 4, 9, 20, 12 I want to have only one dashboard showing somehow the result 2, 2.25, 2.5, 3.
I just managed to get together those information by doing this, but not to generate a timechart and not to divide them:
index=my-index sourcetype=access
| eval type = if(match(_raw, "GET\s/x/\w+"), "new", if(match(_raw, "/x/\w+/.*/\d+.*\s+HTTP"), "old", "other"))
| table type
| search type != "other"
| stats count as "Calls" by type
I also tried some approaches using eval, but none of them work.
Try this query:
index=my-index sourcetype=access
| eval type = if(match(_raw, "GET\s/x/\w+"), "new", if(match(_raw, "/x/\w+/.*/\d+.*\s+HTTP"), "old", "other"))
| fields type
| search type != "other"
| timechart count(eval(type="new")) as "New", count(eval(type="old")) as "Old"
| eval Div=if(Old=0, 0, Old/New)

Spark Scala : How to read fixed record length File

I have a simple question.
“How to read Files with Fixed record length?” i have 2 fields in the record- name & state.
File Data-
John OHIO
VictorNEWYORK
Ron CALIFORNIA
File Layout-
Name String(6);
State String(10);
I just want to read it and create a DataFrame from this file. Just to elaborate more for example on “fixed record length”- if you see since “OHIO” is 4 characters length, in file it is populated with 6 trailing spaces “OHIO “
The record length here is 16.
Thanks,
Sid
Read your input file:
val rdd = sc.textFile('your_file_path')
Then use substring to split the fields and then convert RDD to Dataframe using toDF().
val df = rdd.map(l => (l.substring(0, 6).trim(), l.substring(6, 16).trim()))
.toDF("Name","State")
df.show(false)
Result:
+------+----------+
|Name |State |
+------+----------+
|John |OHIO |
|Victor|NEWYORK |
|Ron |CALIFORNIA|
+------+----------+

Stata time conversion

I am working with a int (%8.0g) variable called timeinsecond that was badly coded. For example, a value for this variable 12192 should mean 3h 23min 12s. I'm trying to create a new variable that based on the value of time would give me the total time expressed in HH:MM:SS.
In the example I mentioned, the new variable would be 03:23:12.
Stata uses the units of milliseconds for date-times, so assuming that no time here is longer than 24 hours, you can use the principle here:
. clear
. set obs 1
number of observations (_N) was 0, now 1
. gen timeinsecond = 12192
. gen double wanted = timeinsecond * 1000
. format wanted %tcHH:MM:SS
. list
+---------------------+
| timein~d wanted |
|---------------------|
1. | 12192 03:23:12 |
+---------------------+
All documented at help datetime.

Not able to split chararray field containing spaces and tabs between the words. Help me with the command using Apache Pig?

Sample.txt File
2017-01-01 10:21:59 THURSDAY -39 3 Pick up a bus - Travel for two hours
2017-02-01 12:45:19 FRIDAY -55 8 Pick up a train - Travel for one hour
2017-03-01 11:35:49 SUNDAY -55 8 Pick up a train - Travel for one hour
I
.
.
When I executed the suggested command, it got split into three fields.
when I do the below operation, it is not working as expected.
A = LOAD 'Sample.txt' USING PigStorage() as (line:chararray);
B = foreach A generate STRSPLIT(line, ' ', 3);
c = foreach B generate $2;
split C into buslog if $0 matches '.*bus*.', trainlog if $0 matches '.*train*.';
Note:- Dump of C will give below result.
THURSDAY -39 3 Pick up a bus - Travel for two hours
FRIDAY -55 8 Pick up a train - Travel for one hour
SUNDAY -55 8 Pick up a train - Travel for one hour
Requirement: In the above result, i want to split train and bus into two relations, but it is not happening as expected
The syntax is .*string.*.Notice that it is .* on both sides of the string.
split C into buslog if $0 matches '.*bus.*', trainlog if $0 matches '.*train.*';