Spring Data JPA Like or Containing - sql

I'm messing with Spring Boot Data JPA and, reading the documentation, I got confused. Whats the difference?
What I understood is, the "Like" operator makes SQL without the "%" surrounding my String (where name like 'String') and the "Containing" operator makes SQL with the "%" surrounding my String (where name like '%String%'). Am I wrong?
I used the "Like" operator and he works fine in situations where the "%" is required in both sides so, I'm really confused!

It is correct that you can emulate a containing with a like.
The differences are:
you have to enclose your search string with wildcards yourself when using like.
you can have wildcards not only at the beginning or end but also in the middle, multiple wildcards in the middle and different wildcards like _ which matches a single character.
a final subtile difference is that containing will escape wildcards contained in your search argument, which like would not. So when searching for abc%def the two behave differently
| containing | like (with additional `%` around the searchstring)
-------------------------------------------------------------------------------------
123abc%def456 | matches | matches
123abcXYZdef456 | does not match | matches

Related

How to do regEx in Spark SQL

I have to create a data frame in which the rows in one column should be a name I extract from a long URL. Let's say I have the following url:
https://xxx.xxxxxx.com/xxxxx/y...y/?...?/<irrelevant>
Now unfortunately I can't disclose the exact URLs but what I can say is that the letters x contain strings that don't change (i.e. all URLs in the database contain those patterns and are known), the y...y is an username that is unknown, with unknown length and may change with each URL and the ?...? is the name in which I am interested in (again a string with unknown length). After that there may be multiple strings separated by / which are not useful. How exactly would I do that? Up until now I used to do three different UDFs which use substrings and indexes but I think that's a very cumbersome solution.
I am not very familiar with Regex or with Spark SQL, so even just the regex would be useful.
Thanks
Edit: I think I got the regex down, now I just need to find out how to use it.
https:\/\/xxx\.xxxxxx\.com\/xxxxx\/(?:[^0-9\/]+)\/([a-zA-z]*)
I have a bit modified your regex. Regex:
^https:\/\/www\.example\.com\/user=\/(.*?)\/(.*?)(?:\/.*|$)$
It will capture two groups:
1st group - username
2nd group - some name
You can use regexp_extract spark function for selecting regex capture groups. E.g.
import spark.implicits._
import org.apache.spark.sql.functions.regexp_extract
val df = Seq(
("https://www.example.com/user=/username1/name3/asd"),
("https://www.example.com/user=/username2/name2"),
("https://www.example.com/user=/username3/name1/asd"),
("https://www.example.com/user=")
).toDF("url")
val r = "^https:\\/\\/www\\.example\\.com\\/user=\\/(.*?)\\/(.*?)(?:\\/.*|$)$"
df.select(
$"url",
regexp_extract($"url", r, 1).as("username"),
regexp_extract($"url", r, 2).as("name")
).show(false)
Result:
+-------------------------------------------------+---------+-----+
|url |username |name |
+-------------------------------------------------+---------+-----+
|https://www.example.com/user=/username1/name3/asd|username1|name3|
|https://www.example.com/user=/username2/name2 |username2|name2|
|https://www.example.com/user=/username3/name1/asd|username3|name1|
|https://www.example.com/user= | | | <- not correct url
+-------------------------------------------------+---------+-----+
P.S. you can use regex101.com for validating your regular expressions

Orient-db regex modifiers

I'm working with orient-db database, and I've issues with regex pattern matching. I really need case-insensitive modifier to be present in the request, but somehow it doesn't work as I'm expecting.
Query:
select from UserAccounts where email MATCHES '^ther.*'
Returns as expected matches in lowercase.
Whenever I try to add a modifier, outside delimiters i.e.
select from UserAccounts where email MATCHES '\^ther.*\i'
I get an empty collection. Actually the query returns an empty collection whenever delimiters are present.
If there is no way to attach modifiers I could probably replace each 'alpha' char to an expression in square brackets i.e.
select from UserAccounts where email MATCHES "^[tT][hH][eE][rR].*"
But I'm not really happy with this solution.
Using the Java case-insensitive regex modifier (from Pattern's special constructs) works in OrientDB 1.7.9 - for your example:
select from UserAccounts where email MATCHES '(?i)^ther.*'
(See also: Pattern - Special Constructs)
I've added a comment to the corresponding OrientDB issue as well.
Unfortunately there is no way to specify modifiers for regex in matches operator.
For now the good solution would be to create a custom function, where you can use whole power of JS regexps.
But we definitely should add ability to specify modifiers in MATCHES, could you create a feature request?

Using SQL like for pattern query

I have a PHP function that accepts a parameter called $letter and I want to set the default value of the parameter to a pattern which is "any number or any symbol". How can I do that?
This is my query by the way .
select ID from $wpdb->posts where post_title LIKE '".$letter."%
I tried posting at wordpress stackexchange and they told me to post it here as this is an SQL/general programming question that specific to wordpress.
Thank you! Replies much appreciated :)
In order to match just numbers or letters (I'm not sure exactly what you mean by symbols) you can use the RLIKE operator in MySQL:
SELECT ... WHERE post_title RLIKE '^[A-Za-z0-9]'
That means by default $letter would be [A-Za-z0-9] - this means all letters from a to z (both cases) and numbers from 0-9. If you need specific symbols you can add them to the list (but - has to be first or last, since otherwise it has a special meaning of range). The ^ character tells it to be at the beginning of the string. So you will need something like:
"select ID from $wpdb->posts where post_title RLIKE '^".$letter."%'"
Of course I have to warn you against SQL injection attacks if you build your query like this without sanitizing the input (making sure it doesn't have any ' (apostrophe) in it.
Edit
To match a title that starts with a number just use [0-9] - that means it will match one digit from 0 to 9

IP Address/Hostname match regex

I need to match two ipaddress/hostname with a regular expression:
Like 20.20.20.20
should match with 20.20.20.20
should match with [http://20.20.20.20/abcd]
should not match with 20.20.20.200
should not match with [http://20.20.20.200/abcd]
should not match with [http://120.20.20.20/abcd]
should match with AB_20.20.20.20
should match with 20.20.20.20_AB
At present i am using something like this regular expression: "(.*[^(\w)]|^)20.20.20.20([^(\w)].*|$)"
But it is not working for the last two cases. As the "\w" is equal to [a-zA-Z0-9_]. Here I also want to eliminate the "_" underscore. I tried different combination but not able to succeed. Please help me with this regular expression.
(.*[_]|[^(\w)]|^)10.10.10.10([_]|[^(\w)].*|$)
I spent some more time on this.This regular expression seems to work.
I don't know which language you're using, but with Perl-like regular expressions you could use the following, shorter expression:
(?:\b|\D)20\.20\.20\.20(?:\b|\D)
This effectively says:
Match word boundary (\b, here: the start of the word) or a non-digit (\D).
Match IP address.
Match word boundary (\b, here: the end of the word) or a non-digit (\D).
Note 1: ?: causes the grouping (\b|\D) not to create a backreference, i.e. to store what it has found. You probably don't need the word boundaries/non-digits to be stored. If you actually need them stored, just remove the two ?:s.
Note 2: This might be nit-picking, but you need to escape the dots in the IP address part of the regular expression, otherwise you'd also match any other character at those positions. Using 20.20.20.20 instead of 20\.20\.20\.20, you might for example match a line carrying a timestamp when you're searching through a log file...
2012-07-18 20:20:20,20 INFO Application startup successful, IP=20.20.20.200
...even though you're looking for IP addresses and that particular one (20.20.20.200) explicitly shouldn't match, according to your question. Admittedly though, this example is quite an edge case.

SQL to return results for the following regex

I have the following regular expression:
WHERE A.srvc_call_id = '40750564' AND REGEXP_LIKE (A.SRVC_CALL_DN, '[^TEST]')
The row that contains 40750564 has "TEST CALL" in the column SRVC_CALL_DN and REGEXP_LIKE doesn't seem to be filtering it out. Whenever I run the query it returns the row when it shouldn't.
Is my regex pattern wrong? Or does SQL not accept [^whatever]?
The carat anchors the expression to the start of a string. By enclosing the letters T, E, S & T in square brackets you're searching, as barsju suggests for any of these characters, not for the string TEST.
You say that SRVC_CALL_DN contains the string 'TEST CALL', but you don't say where in the string. You also say that you're looking for where this string doesn't match. This implies that you want to use not regexp_like(...
Putting all this together I think you need:
AND NOT REGEXP_LIKE (A.SRVC_CALL_DN, '^TEST[[:space:]]CALL')
This excludes every match from your query where the string starts with 'TEST CALL'. However, if this string may be in any position in the column you need to remove the carat - ^.
This also assumes that the string is always in upper case. If it's in mixed case or lower, then you need to change it again. Something like the following:
AND NOT REGEXP_LIKE (upper(A.SRVC_CALL_DN), '^TEST[[:space:]]CALL')
By upper-casing SRV_CALL_DN you ensure that you're always going to match but ensure that your query may not use an index on this column. I wouldn't worry about this particular point as regular expressions queries can be fairly poor at using indexes anyway and it appears as though SRVC_CALL_ID is indexed.
Also if it may not include 'CALL' you will have to remove this. It is best when using regular expressions to make your match pattern as explicit as possible; so include 'CALL' if you can.
Try with '^TEST' or '^TEST.*'
Your regexp means any string not starting with any of the characters: T,E,S,T.
But your case is so simple, starts with TEST. Why not use a simple like:
LIKE 'TEST%'