Compare String to a wildcard pattern string in SQL database column - sql

How would I compare a file name with a naming convention thats saved as a windows wildcard pattern in a SQL column? For example:
I have a file HCLA_MCLA_20220308 and I want to check if there is a wildcard naming convention that matches this file name in my File_Naming_Convention column of my table. The match should be for "*HCLA_MCLA_**"
HCLA naming convention
All the naming conventions either use * or ?. The stars mean any number of characters before and/or any number or characters after. The ? means only 1 wild character in that position. Im not sure why there are two ** at the end of files in some cases.
My wildcard file naming convention column looks like this:
Section of column
Now I tried a query like below but this doesn't seem to be working. It appears that the wildcard tokens in the SQL table are windows wildcards. In my query I tried to replace them with SQL equivalents.
Select *
from MPM_FTP_Eligibility_Files_List
where replace(File_Naming_Convention,'*','%') like 'HCLA_MCLA_20220308'
Update:
Here is the fix to take care of both * and ?. Needed to swap positions between my file name and like pattern in my query, as jarlh accurately pointed out.
Select * from MPM_FTP_Eligibility_Files_List where 'FCS-CA25_1_20220301_1000N_Cigna_W_Trans.txt' like replace(replace(File_Naming_Convention,'*','%'),'?','_')

Related

Django ORM underscore wildcard

I have been searching for a way of using Django ORM to use the SQL underscore wildcard, and do something equivalent to this:
SELECT * FROM table
WHERE field LIKE 'abc_wxyz'
Currently, I am doing:
field_like = 'abc_wxyz'
result = MyClass.objects.extra(where=["field LIKE " + field_like])
I already tried with contains() and icontains(), but that's not what I need, since what it does is adding parenthesis to the query:
SELECT * FROM table
WHERE field LIKE '%abc/_wxyz%'
Thanks!
You can use __regex lookup to build more complex lookup expressions than __contains, __startswith or __endswith (can add "i" character to beginning of each of these to make lookups case insensitive, like icontains). In your case, I think
MyClass.objects.filter(field__regex=r'^abc.wxyz$')
Would do what you are trying to do.
You can use the field__contains attribute.
for example:
MyClass.objects.filter(field__contains='abc_wxyz')
This is equivalent to:
SELECT * FROM MyClass WHERE field LIKE 'abc_wxyz'
Lord Elron's answer is incorrect. Django escapes all developer supplied wildcard characters to the LIKE-type lookups. The statement is equivalent to
SELECT * FROM MyClass WHERE field LIKE '%abc/_wxyz%'
(as the OP discovered) and the underscore has no effect.
See Escaping percent signs and underscores in LIKE statements
The field lookups that equate to LIKE SQL statements (iexact, contains, icontains, startswith, istartswith, endswith and iendswith) will automatically escape the two special characters used in LIKE statements – the percent sign and the underscore.

Regex not working in LIKE condition

I'm currently using Oracle SQL developer and am trying to write a query that will allow me to search for all fields that resemble a certain value but always differ from it.
SELECT last_name FROM employees WHERE last_name LIKE 'Do[^e]%';
So the result that I'm after would be: Give me all last names that start with 'Do' but are not 'Doe'.
I got the square brackets method from a general SQL basics book so I assume any SQL database should be able to run it.
This is my first post and I'd be happy to clarify if my question wasn't clear enough.
In Oracle's LIKE no regular expressions can be used. But you can use REGEXP_LIKE.
SELECT * FROM EMPLOYEES WHERE REGEXP_LIKE (Name, '^Do[^e]');
The ^ at the beginning of the pattern anchors it to the beginning of the compared string. In other words the string must start with the pattern to match. And there is no wildcard needed at the end, as there is no anchor for the end of the string (which would be $). And you seem to already know the meaning of [^e].

Is there a way to exclude a character(s) from the SQL wildcard %?

I am attempting to do file searches with SQL. I need to keep the searches within the directory I am doing the comparisons against. When doing a LIKE or NOT LIKE comparison in SQL, the wildcard '%' is used to represent 0, 1, or many characters. Are there ways to exclude or delimit characters involved in that wildcard search? I need to do a comparison like the following:
LIKE 'C:/Data/%.txt'
However, I do not want the '%' to find any records where there is a slash '/' character involved in the wildcard search. I would want it to return examples like:
C:/Data/file1.txt
C:/Data/records_2017.txt
C:/Data/listing#7.txt
But I do NOT want the wildcard to find records with slash in it, because in my situation, it goes beyond the current folder I am interested in doing file searches in:
C:/Data/OtherFolder/file1.txt
C:/Data/System/SomethingElse/records_2016.txt
The above examples would be returned, because everything between the slash after Data and the .txt extension is all free game for the wildcard. I do NOT want the above examples to be returned in my wildcard search.
I tried doing character sets, such as [^/], but it seems to only work to exclude strings BEGINNING with a slash '/' character. I need to prevent wildcard from using a slash ANYWHERE in the string.
You can't do this with a single LIKE, but you could do this:
WHERE MyColumn LIKE 'C:/Data/%.txt'
AND MyColumn NOT LIKE 'C:/Data/%/%.txt'
Most databases support some form of regular expressions. For instance:
col regexp '^C:/Data/[^/]?.txt$'
The specific match operator/function varies by database.
If you are using Oracle, you can use REGEXP_LIKE instead of LIKE for finer control of what records will match. Check this document for details on how to use it.

Table or column name cannot start with numeric?

I tried to create table named 15909434_user with syntax like below:
CREATE TABLE 15909434_user ( ... )
It would produced error of course. Then, after I tried to have a bit research with google, I found a good article here that describe:
When you create an object in PostgreSQL, you give that object a name. Every table has a name, every column has a name, and so on. PostgreSQL uses a single data type to define all object names: the name type.
A value of type name is a string of 63 or fewer characters. A name must start with a letter or an underscore; the rest of the string can contain letters, digits, and underscores.
...
If you find that you need to create an object that does not meet these rules, you can enclose the name in double quotes. Wrapping a name in quotes creates a quoted identifier. For example, you could create a table whose name is "3.14159"—the double quotes are required, but are not actually a part of the name (that is, they are not stored and do not count against the 63-character limit). ...
Okay, now I know how to solve this by use this syntax (putting double quote on table name):
CREATE TABLE "15909434_user" ( ... )
You can create table or column name such as "15909434_user" and also user_15909434, but cannot create table or column name begin with numeric without use of double quotes.
So then, I am curious about the reason behind that (except it is a convention). Why this convention applied? Is it to avoid something like syntax limitation or other reason?
Thanks in advance for your attention!
It comes from the original sql standards, which through several layers of indirection eventually get to an identifier start block, which is one of several things, but primarily it is "a simple latin letter". There are other things too that can be used, but if you want to see all the details, go to http://en.wikipedia.org/wiki/SQL-92 and follow the links to the actual standard ( page 85 )
Having non numeric identifier introducers makes writing a parser to decode sql for execution easier and quicker, but a quoted form is fine too.
Edit: Why is it easier for the parser?
The problem for a parser is more in the SELECT-list clause than the FROM clause. The select-list is the list of expressions that are selected from the tables, and this is very flexible, allowing simple column names and numeric expressions. Consider the following:
SELECT 2e2 + 3.4 FROM ...
If table names, and column names could start with numerics, is 2e2 a column name or a valid number (e format is typically permitted in numeric literals) and is 3.4 the table "3" and column "4" or is it the numeric value 3.4 ?
Having the rule that identifiers start with simple latin letters (and some other specific things) means that a parser that sees 2e2 can quickly discern this will be a numeric expression, same deal with 3.4
While it would be possible to devise a scheme to allow numeric leading characters, this might lead to even more obscure rules (opinion), so this rule is a nice solution. If you allowed digits first, then it would always need quoting, which is arguably not as 'clean'.
Disclaimer, I've simplified the above slightly, ignoring corelation names to keep it short. I'm not totally familiar with postgres, but have double checked the above answer against Oracle RDB documentation and sql spec
I'd imagine it's to do with the grammar.
SELECT 24*DAY_NUMBER as X from MY_TABLE
is fine, but ambiguous if 24 was allowed as a column name.
Adding quotes means you're explicitly referring to an identifier not a constant. So in order to use it, you'd always have to escape it anyway.

Return sql rows where field contains ONLY non-alphanumeric characters

I need to find out how many rows in a particular field in my sql server table, contain ONLY non-alphanumeric characters.
I'm thinking it's a regular expression that I need along the lines of [^a-zA-Z0-9] but Im not sure of the exact syntax I need to return the rows if there are no valid alphanumeric chars in there.
SQL Server doesn't have regular expressions. It uses the LIKE pattern matching syntax which isn't the same.
As it happens, you are close. Just need leading+trailing wildcards and move the NOT
WHERE whatever NOT LIKE '%[a-z0-9]%'
If you have short strings you should be able to create a few LIKE patterns ('[^a-zA-Z0-9]', '[^a-zA-Z0-9][^a-zA-Z0-9]', ...) to match strings of different length. Otherwise you should use CLR user defined function and a proper regular expression - Regular Expressions Make Pattern Matching And Data Extraction Easier.
This will not work correctly, e.g. abcÑxyz will pass thru this as it has a,b,c... you need to work with Collate or check each byte.