Redis SCAN matching - redis

If I have the following keys in my Redis:
Person:1234
Person:1234:email
Person:name:John
Person:lastName:Doe
I am trying to only MATCH the first key using SCAN.
I have tried with Person:*, but that of course returns all of them. Person:[^:]* or similar attempts did not work, I tried to match everything excluding the :.
Could someone point me in the right direction?

Redis scan match only support glob style matching. It cannot do regex matching. In order to achieve your goal, you have two options:
Scan all keys and do matching on client side.
Use Lua script to do the scan and matching. You can try the following one-liner as an example:
redis-cli eval 'local res = redis.call("scan", ARGV[1]); local matches = {}; for i,v in ipairs(res[2]) do if v == string.match(v, ARGV[2]) then matches[#matches+1] = v end end res[2] = matches; return res' 0 cursor-starting-from-0 'Person:[^:]*'
This one-liner returns results exactly like the built-in scan command. I'm not a Lua expert, and the code is not fully tested.
Also, Lua's matching is NOT regex matching, although it can solve most problems. You need to take Lua's reference to check if it matches your case.

Related

How can i query for specific matching substrings while ignoring the rest in a SCAN query?

trying to do a redis SCAN command and trying to figure out how to do glob-pattern substring matching for words instead of single characters (using ruby redis gem)
redis.set("first:url:123", "val1")
redis.set("second:url:123", "val2")
redis.set("third:url:123", "val3")
redis.set("fourth:url:123", "val4")
cursor = 0
pattern = "[first,second]:url:*" ## I only want the first and second keys
redis.scan(cursor, match: pattern)
# => ...
--
according to the docs here i found these available options but it looks like it only works for single characters, how can i use it for words?
h[ae]llo matches hello and hallo, but not hillo
Edit:
https://globster.xyz/
makes me think that using {first,second}:url:123 should work, but that doesnt seem to work either
Redis doesn't support regex expressions for key name patterns, only glob-like expressions.
You can revert to an EVAL Lua script if you are amendment about it. Here's one that does it, but do read the comments: https://gist.github.com/itamarhaber/19c8393f465b62c9cfa8

How to write a redis scan command to match either of two patterns

I want to use HSCAN match clause to match a key which is of type 1 or type 2. A regex would be like ^match1|^match2. Is it possible to do this in glob style pattern.
Redis does not offer a straight forward way to match multiple patterns.
Redis matchs using glob-style pattern which is very limited.
Supported glob-style patterns:
h?llo matches hello, hallo and hxllo
h*llo matches hllo and heeeello
h[ae]llo matches hello and hallo, but not hillo
h[^e]llo matches hallo, hbllo, ... but not hello
h[a-b]llo matches hallo and hbllo
Use \ to escape special characters if you want to match them verbatim.
You can't use glob-style for that, but you can Lua your way around this. That means you can use
EVAL and a script (can be similar to https://github.com/itamarhaber/redis-lua-scripts/blob/master/scanregex.lua) for performing the HSCAN match1* first, and then filtering with Lua on match2.*.

regex not working correctly when the test is fine

For my database, I have a list of company numbers where some of them start with two letters. I have created a regex which should eliminate these from a query and according to my tests, it should. But when executed, the result still contains the numbers with letters.
Here is my regex, which I've tested on https://www.regexpal.com
([^A-Z+|a-z+].*)
I've tested it against numerous variations such as SC08093, ZC000191 and NI232312 which shouldn't match and don't in the tests, which is fine.
My sql query looks like;
SELECT companyNumber FROM company_data
WHERE companyNumber ~ '([^A-Z+|a-z+].*)' order by companyNumber desc
To summerise, strings like SC08093 should not match as they start with letters.
I've read through the documentation for postgres but I couldn't seem to find anything regarding this. I'm not sure what I'm missing here. Thanks.
The ~ '([^A-Z+|a-z+].*)' does not work because this is a [^A-Z+|a-z+].* regex matching operation that returns true even upon a partial match (regex matching operation does not require full string match, and thus the pattern can match anywhere in the string). [^A-Z+|a-z+].* matches a letter from A to Z, +,|or a letter fromatoz`, and then any amount of any zero or more chars, anywhere inside a string.
You may use
WHERE companyNumber NOT SIMILAR TO '[A-Za-z]{2}%'
See the online demo
Here, NOT SIMILAR TO returns the inverse result of the SIMILAR TO operation. This SIMILAR TO operator accepts patterns that are almost regex patterns, but are also like regular wildcard patterns. NOT SIMILAR TO '[A-Za-z]{2}%' means all records that start with two ASCII letters ([A-Za-z]{2}) and having anything after (%) are NOT returned and all others will be returned. Note that SIMILAR TO requires a full string match, same as LIKE.
Your pattern: [^A-Z+|a-z+].* means "a string where at least some characters are not A-Z" - to extend that to the whole string you would need to use an anchored regex as shown by S-Man (the group defined with (..) isn't really necessary btw)
I would probably use a regex that specifies want the valid pattern is and then use !~ instead.
where company !~ '^[0-9].*$'
^[0-9].*$ means "only consists of numbers" and the !~ means "does not match"
or
where not (company ~ '^[0-9].*$')
Not start with a letter could be done with
WHERE company ~ '^[^A-Za-z].*'
demo: db<>fiddle
The first ^ marks the beginning. The [^A-Za-z] says "no letter" (including small and capital letters).
Edit: Changed [A-z] into the more precise [A-Za-z] (Why is this regex allowing a caret?)

Issue with using wildcards with lucene .net QueryParser

I have following code for Lucene .Net search:
If I use query like:
AccountId:1 AND CompanyId:1 AND CreatedOn:[636288660000000000 TO 636315443990000000] AND AuditProperties.FriendlyName.NewValue:CustomerId|235
It works fine with exact match with CustomerId = 235.
However, if I try to search for a wildcard match like for example:
AccountId:1 AND CompanyId:1 AND CreatedOn:[636288660000000000 TO 636315443990000000] AND AuditProperties.FriendlyName.NewValue:CustomerId|*235*
it doesn't fetch me any results. I think it is still going for an exact match with value "*235*" Am I missing anything here?
Thanks!
As per the QueryParser syntax documentation, the character | is not supported. However, it is not very clear whether you intended it to be a logical OR or a literal character.
Logical OR
The correct syntax for logical OR is either CustomerId OR *235*, CustomerId *235* or CustomerId||*235*.
Also, if this is meant to be a logical OR, you have to allow for a leading wildcard character as pointed out in Howto perform a 'contains' search rather than 'starts with' using Lucene.Net.
parser.AllowLeadingWildcard = true;
Literal |
To search for a literal pipe character, you should escape the character so the parser doesn't confuse it with a logical OR.
CustomerId\|*235*

How can I extract field names from SQL with Perl?

I have a series of select statements in a text file and I need to extract the field names from each select query. This would be easy if some of the fields didn't use nested functions like to_char() etc.
Given select statement fields that could have several nested parenthese like:
ltrim(rtrim(to_char(base_field_name, format))) renamed_field_name,
Or the simple case of just base_field_name as a field, what would the regex look like in Perl?
Don't try to write a regex parser (though perl regexes can handle nested patterns like that), use SQL::Statement::Structure.
Why not ask the target database itself how it would interpret the queries?
In perl, one can use the DBI to query the prepared representation of a SQL query. Sometimes this is database-specific: some drivers (under the perl DBD:: namespace) support their RDBMS' idea of describing statements in ways analogous to the RDBMS' native C or C++ API.
It can be done generically, however, as the DBI will put the names of result columns in the statement handle attribute NAME. The following, for example, has a good chance of working on any DBI-supported RDBMS:
use strict;
use warnings;
use DBI;
use constant DSN => 'dbi:YouHaveNotToldUs:dbname=we_do_not_know';
my $dbh = DBI->connect(DSN, ..., { RaiseError => 1 });
my $sth;
while (<>) {
next unless /^SELECT/i; # SELECTs only, assume whole query on one line
chomp;
my $sql = /\bWHERE\b/i ? "$_ AND 1=0" : "$_ WHERE 1=0"; # XXX ugly!
eval {
$sth = $dbh->prepare($sql); # some drivers don't know column names
$sth->execute(); # until after a successful execute()
};
print $#, next if $#; # oops, problem with that one
print join(', ', #{$sth->{NAME}}), "\n";
}
The XXX ugly! bit there tries to append an always-false condition on the SELECT, so that the SQL engine doesn't have to do any real work when you execute(). It's a terribly naive approach -- that /\bWHERE\b/i test is no more correctly identifying a SQL WHERE clause than simple regexes correctly parse out SELECT field names -- but it is likely to work.
In a somewhat related problem at the office I used:
my #SqlKeyWordList = qw/select from where .../; # (1)
my #Candidates =split(/\s/,$SqlSelectQuery); # (2)
my %FieldHash; # (3)
for my $Word (#Candidates) {
next if grep($word,#SqlKeyWordList);
$FieldHash($Word)++;
}
Comments:
SqlKeyWordList contains all the SQL keywords that are potentially in the SQL statement (we use MySQL, there are many SQL dialiects, choosing/building this list is work, look at my comments below!). If someone decided to use a keyword as a field name, you will need a regex after all (beter to refactor the code).
Split the SQL statement into a list of words, this is the trickiest part and WILL REQUIRE tweeking. For now it uses Perl notion of "space" (=not in word) to split. Splitting the field list (select a,b,c) and the "from" portion of the SQL might be advisabel here, depends on your SQL statements.
%MyFieldHash will contain one entry per select field (and gunk, until you validated your SqlKeyWorkList and the regex in (2)
Beware
there is nothing in this code that could not be done in Python.
your life would be much easier if you can influence the creation of said SQL statements. (e.g. make sure each field is written to a comment)
there are so many things that can/will go wrong in this parsing approach, you really should sidestep the issue entirely, by changing the process (saves time in the long run).
this is the regex we use at the office
my #Candidates=split(/[\s
\(
\)
\+
\,
\*
\/
\-
\n
\
\=
\r
]+/,$SqlSelectQuery
);
How about splitting each line into terms (replace every parenthesis, comma and space with a newline), then sorting:
perl -ne's/[(), ]/\n/g; print' < textfile | sort -u
You'll end up with a lot of content like:
fieldname1
fieldname1
formatstring
ltrim
rtrim
t_char