Sumo Logic: Count every matching string within a field - sumo

I have a parsed field and I need to count the number of times a given string appears within it. It seems relatively simple, but I've been searching through Sumo documentation and now I'm not sure this is even possible. Please help!

I have an idea for a hacky solution using replace() regex variant.
If inputField is your input field and you want to count the number of times is happens in the inputField, then
| "This is a very hacky solution which might get you in trouble" as inputField
| replace(inputField, /is/, "#") as matched
| replace(matched, /[^#]/, "") as skipTheRest
| length(skipTheRest) as finalCount
The solution assumes # is not present in the input field.
Disclaimer: I am currently employed by Sumo Logic.

If I understand question correctly, we have a field A which we have parsed and now we want to match if it contains some string s.
In that case, below can be appended to your query.
| if(A matches "*s*", 1, 0) as ct
| sum(ct)

Related

Regex match first number if it does not appear at the end

I am currently facing a Regex problem which apparently I cannot find an answer to.
My Regex is embedded in a teradata SQL of the form:
REGEXP_SUBSTR(column, 'regex_pattern')
I want to find the first appearance of any number except if it appears at the end of the string.
For Example:
"YEL2X30" -> "2"
"YEL19XYZ05" -> "19"
"YELLOW05" -> ""
I tried it with '[0-9]+(?!$)/' but this returns me a blank String always.
Thanks in Advance!
Shot in the dark here since I'm unfamiliar with teradata and the supported SQL-functionality. However, reading the docs on the REGEXP_SUBSTR() function it seems like you may want to use the 3rd and 4th possible argument along with a slightly different regular expression:
[0-9]+(?![0-9]|$)
Meaning: 1+ Digits that are not followed by either the end of the string or another digit.
I'd believe the following syntax may work now to retrieve the 1st appearance of any number from the matching results:
REGEXP_SUBSTR(column, '[0-9]+(?![0-9]|$)', 1, 1)
The 3rd parameter states from which position in the source-string we need to start searching whereas the 4th will return the 1st match from any possible multiple matches (is how I read the docs). For example: abc123def456ghi789 whould return 123.
Fiddling around in online IDE's gave me that:
CREATE TABLE TBL (TST varchar(100));
INSERT INTO TBL values ('YEL2X30'), ('YEL19XYZ05'), ('YELLOW05'), ('abc123def456ghi789');
SELECT REGEXP_SUBSTR(TST, '[0-9]+(?![0-9]|$)', 1, 1) as 'RESULTS' FROM TBL;
Resulted in:
RESULTS
2
19
NULL
123
NOTE: I also noticed that leaving out the 3rd and 4th parameter made no difference since they will default back to 1 without explicitly mentioning them. I tested this over here.
Possibly the simplest way is to look for digits followed by a non-digit. Then keep all the digits:
regexp_substr(regexp_substr(column, '[0-9]+[^0-9]'), '[0-9]+')

SSRS if field value in list

I've looked through a number of tutorials and asks, and haven't found a working solution to my problem.
Suppose my dataset has two columns: sort_order and field_value. sort_order is an integer and field_value is a numerical (10,2).
I want to format some rows as #,#0 and others as #,#0.00.
Normally I would just do
iif( fields!sort_order.value = 1 or fields!sort_order.value = 23 or .....
unfortunately, the list is fairly long.
I'd like to do the equivalent of if fields!sort_order.value in (1,2,21,63,78,...) then...)
As recommended in another post, I tried the following (if sort in list, then just output a 0, else a 1. this is just to test the functionality of the IN operator):
=iif( fields!sort_order.Value IN split("1,2,3,4,5,6,8,10,11,15,16,17,18,19,20,21,26,30,31,33,34,36,37,38,41,42,44,45,46,49,50,52,53,54,57,58,59,62,63,64,67,68,70,71,75,76,77,80,81,82,92,98,99,113,115,116,120,122,123,127,130,134,136,137,143,144,146,147,148,149,154,155,156,157,162,163,164,165,170,171,172,173,183,184,185,186,192,193,194,195,201,202,203,204,210,211,212,213,263",","),0,1)
However, it doesn't look like the SSRS expression editor wants to accept the "IN" operator. Which is strange, because all the examples I've found that solve this problem use the IN operator.
Any advice?
Try using IndexOf function:
=IIF(Array.IndexOf(split("1,2,3,4,...",","),fields!sort_order.Value)>-1,0,1)
Note all values must be inside quotations.
Consider the recommendation of #Jakub, I recommend this solution if
your are feeding your report via SP and you can't touch it.
Let me know if this helps.

Using SQL - how do I match an exact number of characters?

My task is to validate existing data in an MSSQL database. I've got some SQL experience, but not enough, apparently. We have a zip code field that must be either 5 or 9 digits (US zip). What we are finding in the zip field are embedded spaces and other oddities that will be prevented in the future. I've searched enough to find the references for LIKE that leave me with this "novice approach":
ZIP NOT LIKE '[0-9][0-9][0-9][0-9][0-9]'
AND ZIP NOT LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
Is this really what I must code? Is there nothing similar to...?
ZIP NOT LIKE '[\d]{5}' AND ZIP NOT LIKE '[\d]{9}'
I will loath validating longer fields! I suppose, ultimately, both code sequences will be equally efficient (or should be).
Thanks for your help
Unfortunately, LIKE is not regex-compatible so nothing of the sort \d. Although, combining a length function with a numeric function may provide an acceptable result:
WHERE ISNUMERIC(ZIP) <> 1 OR LEN(ZIP) NOT IN(5,9)
I would however not recommend it because it ISNUMERIC will return 1 for a +, - or valid currency symbol. Especially the minus sign may be prevalent in the data set, so I'd still favor your "novice" approach.
Another approach is to use:
ZIP NOT LIKE '%[^0-9]%' OR LEN(ZIP) NOT IN(5,9)
which will find any row where zip does not contain any character that is not 0-9 (i.e only 0-9 allowed) where the length is not 5 or 9.
There are few ways you could achieve that.
You can replace [0-9] with _ like
ZIP NOT LIKE '_'
USE LEN() so it's like
LEN(ZIP) NOT IN(5,9)
You are looking for LENGTH()
select * from table WHERE length(ZIP)=5;
select * from table WHERE length(ZIP)=9;
To test for non-numeric values you can use ISNUMERIC():
WHERE ISNUMERIC(ZIP) <> 1

Understanding OpenERP Domain Filter?

I would like to ask you if you could please explain the anatomy of the Openerp domain filters. I have to use it my project.
Please explain the description of the following domain filter.
['|',('order_id.user_id','=',user.id),('order_id.user_id','=',False)]
I want to know the exact meaning of (order_id.user_id','=',user.id), what is order_id, user_id, and user.id. Are they referencing any table. If yes then how am I supposed to know which one...
Basically I want to know decipher the notation from bottom up so that can use it as per my requirement.
This one is pretty simple.
Consider the following fields (only XML i've given here, python you got to manage)
<field name="a"/>
<field name="b"/>
<field name="c"/>
Single Condition
Consider some simple conditions in programming
if a = 5 # where a is the variable and 5 is the value
In Open ERP domain filter it would be written this way
[('a','=',5)] # where a should be a field in the model and 5 will be the value
So the syntax we derive is
('field_name', 'operator', value)
Now let's try to apply another field in place of static value 5
[('a','=',b)] # where a and b should be the fields in the model
In the above you've to note that first variable a is enclosed with single quotes whereas the value b is not. The variable to be compared will be always first and will be enclosed with single quotes and the value will be just the field name. But if you want to compare variable a with the value 'b' you've to do the below
[('a','=','b')] # where only a is the field name and b is the value (field b's value will not be taken for comparison in this case)
Condition AND
In Programming
if a = 5 and b = 10
In Open ERP domain filter
[('a','=',5),('b','=',10)]
Note that if you don't specify any condition at the beginning and condition will be applied. If you want to replace static values you can simply remove the 5 and give the field name (strictly without quotes)
[('a','=',c),('b','=',c)]
Condition OR
In Programming
if a = 5 or b = 10
In Open ERP domain filter
['|',('a','=',5),('b','=',10)]
Note that the , indicates that it's and condition. If you want to replace fields you can simply remove the 5 and give the field name (strictly without quotes)
Multiple Conditions
In Programming
if a = 5 or (b != 10 and c = 12)
In Open ERP domain filter
['|',('a','=',5),('&',('b','!=',10),('c','=',12))]
Also this post from Arya will be greatly helpful to you. Cheers!!
The '|' is an OR that gets applied to the next comparison. The (..., '=', False) gets converted into an IS NULL so the SQL for this would be
WHERE order_id.user_id = x OR order_id.user_id is NULL
The default is AND which is why you don't see ('&', ('field1', '=' ,1), ('field2' ,'=', 2) everywhere.
Note that another useful one is ('field1', '!=', False) which gets converted to WHERE field1 IS NOT NULL
There isn't a lot of great documentation for this and they get quite tricky with multiple operators as you have to work through the tuples in reverse consuming the operators. I find I use complex ones infrequently enough that I just turn on query logging in Postgres and use trial and error observing the generated queries until I get it right.

regex skip prefix if it's there, like country code of phone number

I need a regex to get phone numbers without country codes, if they're there. It seems like lookarounds should work for this, but I can't quite get to the final solution. Here are the subjects:
In-country: 0008003428573
Outside: +91 4058 825058
With dots: +91.88.4732.1354
The desired matches are:
8003428573
4058 825058
88.4732.1354
I know I can use (?!91) to avoid matching 91 such as
(?!91)[1-9][-. 0-9]{8,11}[0-9](?![0-9])
...but then it matches the 1 like 1 4058 825058.
I also found a complete solution using an if-then condition while testing in Perl:
(?!91)(?(?=1)(?<!9)|)[1-9][-. 0-9]{8,11}[0-9](?![0-9])
but then found out it doesn't work with NSRegularExpression in Objective C.
The solution cannot use groups, since I have multiple regexes for different situations that are processed by the same code. The code can't use group 1 in some cases and group 2 in others..unless there's no way to solve this with regular expressions. The `91 must not be in the overall match.
Is there a way to do this with a regex in Obj-C?
A solution using groups, please note, that the group with the result is always in group 1, since the other groups are non-capturing! A group that starts with ?: is a non-capturing group.
(?:\b0+|\B\+91[. ]?)([1-9]\d+(?:[. ]\d+)*)\b
See it here on Regexr. When you hover the mouse over the match you can see the content of the only group.