A regex in SQL or Spark (Scala) - sql

I am a new developer in Spark Scala. I am not familiar with Regex but I want to write a regex that can extract an ID like this :
abcd_mss5884_mww020_025_b => mss5884
abv_c_e_mss478_mww171_172 => mss478
abv_c_e_mww171_172 => otherwise, return THE SAME input string
So, in our input string, I should return the first characters starting from "mss...." and stop when I find the first "_" after the "mss" of course (i should ignore the other underscores).
How can I do this please ?
Should I use a regex ? A regex in SQL or in Scala ?
Or should I just use a simple substring method ??

Simply use regexp_extract function. Something like this:
val df = Seq(("abcd_mss5884_mww020_025_b"), ("abv_c_e_mss478_mww171_172"), ("abv_c_e_mww171_172")).toDF("input")
df.withColumn("ID", regexp_extract($"input", "^(.*)(mss[^_]+)_(.*)$", 2))
.withColumn("ID", when($"ID" =!= "", $"ID").otherwise($"input"))
.show()
+-------------------------+------------------+
|input |ID |
+-------------------------+------------------+
|abcd_mss5884_mww020_025_b|mss5884 |
|abv_c_e_mss478_mww171_172|mss478 |
|abv_c_e_mww171_172 |abv_c_e_mww171_172|
+-------------------------+------------------+

Related

Pattern match using regexp_extract_all

I am trying to build a array from this string and need help with pattern on regexp_extract_all.
Here is my input string contains keyword value pairs
BEGIN
DECLARE p_JSON STRING DEFAULT """
{
"instances": [{
"LT_20MN_SalesContrctCnt": 388.0,
"Pyramid_Index": '',
"MARKET": "'Growth Markets','Europe'",
"SERVICE_DIM": "'S&C','F&M'",
"SG_MD": "'All Service Group'"
}]}
""";
SELECT split(x,":")[OFFSET(0)] as keyword, split(x,":")[OFFSET(1)] keyword_value
FROM unnest(split(REGEXP_REPLACE(JSON_EXTRACT(p_JSON, '$.instances'),r'([\'\"\[\]{}])', ''))) as x
END;
The above SQL is failing at SPLIT due to , with in the data.
All I am trying to do here is build a two columns Keyword and value.
The idea here is if I can extract each row using REGEXP_EXTRACT_ALL with out the last "," then I should be able to split into keyword and keyword_value columns. Btw the names or number of keywords/values are not fixed.
Intended output from REGEXP_EXTRACT_ALL:
"LT_20MN_SalesContrctCnt": 388.0
"Pyramid_Index": ''
"MARKET": "'Growth Markets','Europe'"
"SERVICE_DIM": "'S&C','F&M'"
"SG_MD": "'All Service Group'"
Appreciate if you can suggest a better way to handle this.
Thanks in advance.
Using your sample data, I just added an extra REGEXP_REPLACE to replace ," to #" so we can avoid splitting using ,. See approach below:
SELECT
SPLIT(arr,":")[OFFSET(0)] as keyword,
SPLIT(arr,":")[OFFSET(1)] as keyword_value,
FROM sample_data,
UNNEST(SPLIT(REGEXP_REPLACE(REGEXP_REPLACE(JSON_EXTRACT(p_JSON, '$.instances'),r'[\[\]{}]',''),r',"','#"'),'#')) arr
Output:

How to use Splunk functions in the query

Anyone here knows how can I use built-in functions(case) in a Splunk Query? All examples I found were to handle the query results (I can not put it after eval or | )
I need something like.
index=case(indexVar == "qa", "qa-all", indexVar == "prod", "prod-all") sourcetype="kube:container:rail-service"
OBS I can not just concat the indexVar + "-all"
The case function may be built-in, but that doesn't mean you can use it anywhere. It's only valid with the eval, fieldformat, and where commands.
A workaround would be to put the eval in a subsearch.
sourcetype="kube:container:rail-service" [
| makeresults
| eval index=case(indexVar == "qa", "qa-all", indexVar == "prod", "prod-all")
| fields index ]

How can we write the Splunk Query to find subField2 is present or not and if present get the counts of all subFiled2

{
index:"myIndex",
field1: "myfield1",
field2: {"subField1":"mySubField1","subField2":145,"subField3":500},
...
..
.
}
SPL : index:"myIndex" eval result = if(field.subField2) .....
is the dot operator works in SPL ?
I am assuming your data is in JSON format. If so, you can use spath to extract fields from your structured data. Then just check if the field is present or not with isnotnull
index="myIndex" | spath | where isnotnull(field2.subField2)
Presuming your data is in JSON format, this should do it:
index=myIndex sourcetype=srctp field2{}.subField2=*
If those are multivalue fields, you'll need to do an mvexpand first

Neo4j - incasesensitive lucene query [duplicate]

Is it possible to run a case-insensitive cypher query on neo4j?
Try that: http://console.neo4j.org/
When I type into this:
start n=node(*)
match n-[]->m
where (m.name="Neo")
return m
it returns one row. But when I type into this:
start n=node(*)
match n-[]->m
where (m.name="neo")
return m
it does not return anything; because the name is saved as "Neo". Is there a simple way to run case-insensitive queries?
Yes, by using case insensitive regular expressions:
WHERE m.name =~ '(?i)neo'
https://neo4j.com/docs/cypher-manual/current/clauses/where/#case-insensitive-regular-expressions
Another way would be:
WHERE LOWER(m.Name) = LOWER("Neo")
And if you are using the Neo4j Client (.NET):
Client.Cypher.Match("(m:Entity)")
.Where("LOWER(m.Name) = LOWER({name})")
.WithParam("name", inputName)
.Return(m => m.As<Entity>())
.Results
.FirstOrDefault();
If anybody is looking on how to do this with a parameter, I managed to do it like this.
query = "{}{}{}".format('Match (n) WHERE n.pageName =~ "'"(?i)", name, '" RETURN n')
and "name" is the variable or your parameter
You can pass a parameter to the case insensitive regular expression like:
WHERE m.name =~'(?i)({param})

linq match word with boundaries

say i have a nvarchar field in my database that looks like this
1, "abc abccc dabc"
2, "abccc dabc"
3, "abccc abc dabc"
i need a select LINQ query that would match the word "abc" with boundaries not part of a string
in this case only row 1 and 3 would match
from row in table.AsEnumerable()
where row.Foo.Split(new char[] {' ', '\t'}, StringSplitOptions.None)
.Contains("abc")
select row
It's important to include the call to AsEnumerable, which means the query is executed on the client-side, else (I'm pretty sure) the Where clause won't get converted into SQL succesfully.
Maybe a regular expression like this (nb - not compiled or tested):
var matches = from a in yourCollection
where Regex.Match(a.field, ".*\sabc\s.*")
select a;
datacontext.Table.Where(
e => Regex.Match(e.field, #"(.*?[\s\t]|^)abc([\s\t].*?|$)")
);
or
datacontext.Table.Where(
e => e.Split(' ', '\t').Contains("abc");
);
For efficiency, you want to do as much of the filtering as possible on the server, and then the rest of the filtering on the client. You can't use Regex on the server (SQL Server doesn't support it) so the solution is to first use a LIKE-type search (by calling .Contains) then use Regex on the client to further refine the results:
db.MyTable
.Where (t => t.MyField.Contains ("abc"))
.AsEnumerable() // Executes locally from this point on
.Where (t => Regex.IsMatch (t.MyField, #"\babc\b"))
This ensures that you retrieve only the rows from SQL Server than contain the letters 'abc' (regardless of whether they're a word-boundary match or not) and use Regex on the client-side to further restrict the result set so that only matches that are on word boundaries are included.