Getting Error as "Regex: syntax error in subpattern name (missing terminator)." in SPLUNK - splunk

I have been extracting fields in Splunk and this looks to be working fine for all headers but for the header l-s-m, I am getting the error as "syntax error in subpattern name (missing terminator)."
I have done similar for other headers and all works but this is the only header with "hypen" sign that is giving this error, I have tried multiple times but this is not helping.
Headers:
Content-Type: application/json
Accept: application/json,application/problem json
l-m-n: txxxmnoltr
Accept-Encoinding:gzip
Regex I am trying is "rex field=u "l-m-n: (?<l-m-n>.*)" in SPLUNK. Could you please guide me here?

rex cannot extract into a field name with hyphens. However, you can solve this with rename
| rex field=u "l-m-n: (?<lmn>.*)" | rename lmn AS "l-m-n"
In general, I would avoid the use of hyphens in a field name, as it can be mistaken for a minus. If you want to use the field l-m-n, you will need to quote it everywhere, like 'l-m-n' . I would strongly suggest you stick with using the field name lmn.
Try running the following to see what I mean
| makeresults | eval l-m-n=10 | eval l=1 | eval m=1 | eval n=1 | eval result_noquote=l-m-n | eval result_quoted='l-m-n'

Related

Extract a field from nested json in a splunk query

This is the data I have:
{ "a":"1",
"b":2,
"c": { "x":"3", "y":"4",}
}
let's suppose I have tons of events in that format. What I want to do is to write a query that will only extract "x"s from all events. I don't want anything else to be returned, just the "x"s.
I've tried multiple examples and I went through pages of documentation and yet I still did not succeed with this, there must be something I'm missing. Please advise.
It would help to know what you've tried so far and how those attempts failed to meet expectations.
Have you tried the rex command?
| rex max_match=0 "\\\"x\\\":\\\"(?<x>[^\\\"]+)"
The forest of backslashes is needed to escape the embedded quotation marks through multiple parsers.
This query works for me. Note the missing comma after the "y" value. Splunk will produce unexpected results if the JSON is not valid.
| makeresults
| eval _raw="{ \"a\":\"1\",
\"b\":2,
\"c\": { \"x\":\"3\", \"y\":\"4\"}
}"
| spath output=foo path=c.x
| table foo

Splunk Rex: Extracting fields of a string to a value

I'm a newbie to SPlunk trying to do some dashboards and need help in extracting fields of a particular variable
Here in my case i want to extract only KB_List":"KB000119050,KB000119026,KB000119036" values to a column
Expected output:
KB_List
KB000119050,KB000119026,KB000119036
i have tried:
| rex field=_raw "\*"KB_List":(?<KB_List>\d+)\*"
highlighted the part below in the log
svc_log_ERROR","Impact":4.0,"CategoryId":"94296c474f356a0009019ffd0210c738","hasKBList":"true","lastNumOfAlerts":1,"splunkURL":false,"impactedInstances":"","highestSeverity":"Minor","Source":"hsym-plyfss01","reqEmail":"true","AlertGroup":"TIBCOP","reqPage":"","KB_List":"KB000119050,KB000119026,KB000119036","reqTicket":"true","autoTicket":true,"SupportGroup":"TESTPP","Environment":"UAT","Urgency":4.0,"AssetId":"AST000000000159689","LiveSupportGroup":"TESTPP","sentPageTo":"TESTPP"},"Notification":{"":{"requestId":"532938335"}},"":
rex field=_raw "KB_List\":\"(?<KB_List>[^\"])\""
This regular expression will look for anything that begins with KB_List":", the capture everything except a ".
In your example, you are only capturing digits (\d+), whereas the contents in the KB_List field also contain characters ("KB" and ",")
Alas:
I figured out by looking into so many articles:
| rex "KB_List\":\"(?<KB_Listed>[^\"]+)" | table KB_Listed

Using Named Group Capture With rex In a Splunk Dashboard Query?

While trying to use rex as part of a splunk search I have a regular expression that works fine:
eventtype=my_type | rex field=_raw ".*\[(?<foo>.*?)\].*" | table _time, foo
But when I try to save the search into a dashboard table I get the following error:
Error parsing XML on line 29: Premature end of data in tag form line 1
I know my query is fine because when I click the "Run Search" button while adding it to the dashboard table I get a valid result. But when I click the save button I get the above error.
I suspect the named group capture within the regular expression is throwing off the XML parser.
How do I use a rex regular expression with name capture as part of a dashboard query?
Thank you in advance for your consideration and response.
To use named group capture you have to replace the angle brackets with < and >:
... | rex field=_raw ".*\[(?<foo>.*?)\].*" | ...

The stats command isn't returning any results?

I have the following query:
search (...) AND ERROR
| rex field=error "^.*(?<vcbn>Value cannot be null.)$"
| stats count(vcbn) by error
but for whatever reason the stats count(vcbn) by error isn't generating any results.
Additionally, the rex field=error "^.*(?<vcbn>Value cannot be null.)$" isn't building a new field in the list on the left of the event search results.
The search itself returns 170 events.
Splunk Version: 4.3.3
looks like rex command is not able to extract at search time.
Can you provide sample _raw log event or 'error' field from the log event?
Also refer,
http://docs.splunk.com/Documentation/Splunk/6.0.1/SearchReference/Rex
So after a good bit of research, I found a solution. The first problem was I misunderstood the field parameter for the rex command. It's meant to tell the parser was field to search through. The next thing I had to do was make sure to use the line characters ^ and $. Finally, I had to add the trailing .* to the mix so that it would look through the entire _raw field.
rex "^.*(?<vcbn>Value cannot be null).*$"
| stats count(vcbn)
NOTE: the _raw field is built in.

Data between quotes and field separator

In the example given below, the last line is not uploaded. I get an error:
Data between close double quote (") and field separator:
This looks like a bug since all the data between pipe symbol should be treated as a single field.
Schema: one:string,two:string,three:string,four:string
Upload file:
This | is | test only | to check quotes
second | line | "with quotes" | no text
third line | with | "start quote" and | a word after quotes
The first and second line above is processed. But not the third.
Update:
Can some please explain why does the following work except the third line?
This | is | test only | to check quotes
second | line | "with quotes" | no text
third line | with | "start quote" and | a word after quotes
forth line | enclosed | {"GPRS","MCC_DETECTED":false,"MNC_DETECTED":false} | how does this work?
fifth line | with | {"start quote"} and | a word after quotes
There can be some fancy explanation to this. From the end user perspective this is absurd.
From the CSV RFC4180 page: "If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote."
You probably want to do this:
This | is | test only | to check quotes
second | line | "with quotes" | no text
third line | with | " ""start quote"" and " | a word after quotes
More about our CSV input format here.
Using --quote worked perfectly.
bq load
--source_format CSV --quote ""
--field_delimiter \t
--max_bad_records 10
-E UTF-8
destination table
Source files
API V2
https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.quote
bq command
--quote: Quote character to use to enclose records. Default is ". To indicate no quote character at all, use an empty string.
Try this as an alternative:
Load the MySQL backup files into a Cloud SQL instance.
Read the data in BigQuery straight out of MySQL.
Longer how-to:
https://medium.com/google-cloud/loading-mysql-backup-files-into-bigquery-straight-from-cloud-sql-d40a98281229
You can use the other flags also while uploading the data. I used the bq tool with following flags
bq load -F , --source_format CSV --skip_leading_rows 1 --max_bad_records 1 --format csv -E UTF-8 yourdatset gs://datalocation.
Try loading every time with bq shell.
I had to load 1100 columns. While trying with the console with all the error options, it threw lot many errors. Ignoring the errors in the console means loosing records.
Hence tried with the shell and succeeded loading all the records.
Try the following:
bq load --source_format CSV --quote "" --field_delimiter \t --allow_jagged_rows --ignore_unknown_values --allow_quoted_newlines --max_bad_records 10 -E UTF-8 {dataset_name}.{table_name} gs://{google_cloud_storage_location}/* {col_1}:{data_type1},{col_2}:{data_type2}, ....
References:
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#bigquery_load_table_gcs_csv-cli
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#csv-options