For the below data (well..there are many more nodes in the team foundation server table which i need to refer to..below is just a sample)
Nodes
------------------------
\node1\node2\node3\
\node1\node2\node5\
\node1\node2\node3\node4\
\node1\node2\node3\node4\node5\
I was wondering if i can apply something like (below query does not give the required results)
select * from table_a where nodes like '\node1\node2\%\'
to get the below data
\node1\node2\node3\
\node1\node2\node5\
and something like (below does not give the required results)
select * from table_a where nodes like '\node1\node2\%\%\'
to get
\node1\node2\node3\
\node1\node2\node5\
\node1\node2\node3\node4\
Can the above be done with like operator? Pls. suggest.
Thanks
You'll need to combine two terms, LIKE and NOT LIKE:
select * from table_a where
nodes like '\node1\node2\%\' AND
nodes NOT like '\node1\node2\%\%\'
for the first query, and a similar solution for the second. That's with "plain SQL". There are probably SQL Server specific functions which will count the number of "\" characters in the column, for instance.
maybe use the delimiter to get the resutls.
it is unclear what you are actually trying to get, but you could use the
substr
function to either count or find the position of the delimiter '/' character.
It seems like this would work (basically just eliminating the last backslash):
select * from table_a where nodes like '\node1\node2\%\%'
EDIT
You could also try this:
select * from table_a where
nodes like '\node1\node2\%\' or
nodes like '\node1\node2\%\%\'
A little late to the party, but it appears that the problem is still open. Could it be that the backslashes are escaping the wildcard meaning of the percent signs? And the backslash n could be getting interpreted as well.
Doesn't sql-server know a wildcard for a single character?
select * from table_a
where nodes LIKE '#node1#node2#node_#';
nodes
---------------------
#node1#node2#node5#
#node1#node2#node3#
I testet this on postgresql, where it is hard to insert a backslash, which is the reason why I replaced them with #.
Here is another possibility - negate more than one backslash (# used for my convenience):
SELECT * FROM table_a
WHERE (nodes LIKE '#node1#node2#%#'
AND NOT nodes LIKE '#node1#node2#%#%#');
On postgresql there is too the possibility to match against patterns, with SIMILAR TO, or ~:
SELECT * FROM table_a
WHERE nodes SIMILAR TO '#node1#node2#[^#]*#';
nodes
---------------------
#node1#node2#node5#
#node1#node2#node3#
[] encapsulates a group of alternatively allowed characters, for example [aeiou] would be a lowercase vocal. But when the caret is the first sign in the brackets, the sign(s) are negated so [^aeiou] would mean anything but a lowercase vocal, and [^#] means anything but a #.
The asterix behind that expression means that the preceding sign can occur as often as you like, 0 to million times. (+ would mean at least one times, ? would mean 0 or 1 times).
So '#node1#node2#[^#]*#' means '#node1#node2#', followed by anything but a hash, 0 or single or multiple times, and then, finally a hash.
Related
I'm trying to find the most efficient way to do some pattern validation in T-SQL and struggling with how to check against a list of values. This example works:
SELECT *
FROM SomeTable
WHERE Code LIKE '[0-9]JAN[0-9][0-9]'
OR Code LIKE '[0-9]FEB[0-9][0-9]'
OR Code LIKE '[0-9]MAR[0-9][0-9]'
OR Code LIKE '[0-9]APRIL[0-9][0-9]
but I am stuck on wondering if there is a syntax that will support a list of possible values within the single like statement, something like this (which does not work)
SELECT *
FROM SomeTable
WHERE Code LIKE '[0-9][JAN, FEB, MAR, APRIL][0-9][0-9]'
I know I can leverage charindex, patindex, etc., just wondering if there is a simpler supported syntax for a list of possible values or some way to nest an IN statement within the LIKE. thanks!
I think the closest you'll be able to get is with a table value constructor, like this:
SELECT *
FROM SomeTable st
INNER JOIN (VALUES
('[0-9]JAN[0-9][0-9]'),
('[0-9]FEB[0-9][0-9]'),
('[0-9]MAR[0-9][0-9]'),
('[0-9]APRIL[0-9][0-9]')) As p(Pattern) ON st.Code LIKE p.Pattern
This is still less typing and slightly more efficient than the OR option, if not as brief as we hoped for. If you knew the month was always three characters we could do a little better:
Code LIKE '[0-9]___[0-9][0-9]'
Unfortunately, I'm not aware of SQL Server pattern character for "0 or 1" characters. But maybe if you want ALL months we can use this much to reduce our match:
SELECT *
FROM SomeTable
WHERE (Code LIKE '[0-9]___[0-9][0-9]'
OR Code LIKE '[0-9]____[0-9][0-9]'
OR Code LIKE '[0-9]_____[0-9][0-9]')
You'll want to test this to check if the data might contain false positive matches, and of course the table-value constructor could use this strategy, too. Also, I really hope you're not storing dates in a varchar column, which is a broken schema design.
One final option you might have is building the pattern on the fly. Something like this:
Code LIKE '[0-9]' + 'JAN' + '[0-9][0-9]'
But how you find that middle portion is up to you.
The native TSQL string functions don't support anything like that.
But you can use a workaround (dbfiddle) such as
WHERE CASE WHEN Code LIKE '[0-9]%[^ ][0-9][0-9]' THEN SUBSTRING(Code, 2, LEN(Code) - 3) END
IN
( 'JAN', 'FEB', 'MAR', 'APRIL' )
So first of all check that the string starts with a digit and ends in a non-space character followed by two digits and then check the remainder of the string (not matched by the digit check) is one of the values you want.
The reason for including the SUBSTRING inside the CASE is so that is only evaluated on strings that pass the LIKE check to avoid possible "Invalid length parameter passed to the LEFT or SUBSTRING function." errors if it was to be evaluated on a shorter string.
Newbie here. Been searching for hours now but I can seem to find the correct answer or properly phrase my search.
I have thousands of rows (orderids) that I want to put on an IN function, I have to run a LIKE at the same time on these values since the columns contains json and there's no dedicated table that only has the order_id value. I am running the query in BigQuery.
Sample Input:
ORD12345
ORD54376
Table I'm trying to Query: transactions_table
Query:
SELECT order_id, transaction_uuid,client_name
FROM transactions_table
WHERE JSON_VALUE(transactions_table,'$.ordernum') LIKE IN ('%ORD12345%','%ORD54376%')
Just doesn't work especially if I have thousands of rows.
Also, how do I add the order id that I am querying so that it appears under an order_id column in the query result?
Desired Output:
Option one
WITH transf as (Select order_id, transaction_uuid,client_name , JSON_VALUE(transactions_table,'$.ordernum') as o_num from transactions_table)
Select * from transf where o_num like '%ORD12345%' or o_num like '%ORD54376%'
Option two
split o_num by "-" as separator , create table of orders like (select 'ORD12345' as num
Union
Select 'ORD54376' aa num) and inner join it with transf.o_num
One method uses OR:
WHERE JSON_VALUE(transactions_table, '$.ordernum') LIKE IN '%ORD12345%' OR
JSON_VALUE(transactions_table, '$.ordernum') LIKE '%ORD54376%'
An alternative method uses regular expressions:
WHERE REGEXP_CONTAINS(JSON_VALUE(transactions_table, '$.ordernum'), 'ORD12345|ORD54376')
According to the documentation, here, the LIKE operator works as described:
Checks if the STRING in the first operand X matches a pattern
specified by the second operand Y. Expressions can contain these
characters:
A percent sign "%" matches any number of characters or
bytes.
An underscore "_" matches a single character or byte.
You can escape "\", "_", or "%" using two backslashes. For example, "\%". If
you are using raw strings, only a single backslash is required. For
example, r"\%".
Thus , the syntax would be like the following:
SELECT
order_id,
transaction_uuid,
client_name
FROM
transactions_table
WHERE
JSON_VALUE(transactions_table,
'$.ordernum') LIKE '%ORD12345%'
OR JSON_VALUE(transactions_table,
'$.ordernum') LIKE '%ORD54376%
Notice that we specify two conditions connected with the OR logical operator.
As a bonus information, when querying large datasets it is a good pratice to select only the columns you desire in your out output ( either in a Temp Table or final view) instead of using *, because BigQuery is columnar, one of the reasons it is faster.
As an alternative for using LIKE, you can use REGEXP_CONTAINS, according to the documentation:
Returns TRUE if value is a partial match for the regular expression, regex.
Using the following syntax:
REGEXP_CONTAINS(value, regex)
However, it will also work if instead of a regex expression you use a STRING between single/double quotes. In addition, you can use the pipe operator (|) to allow the searched components to be logically ordered, when you have more than expression to search, as follows:
where regexp_contains(email,"gary|test")
I hope if helps.
In REGEX you can do something like [a-c]+, which will match on
aaabbbccc
abcccaabc
cbccaa
b
aaaaaaaaa
In SQL LIKE it seems that one can either do the equivalent of ".*" which is "%", or [a-c]. Is it possible to use the +(at least one) quantifier in SQL to do [a-c]+?
EDIT: Just to clarify, the desired end-query would look something like
SELECT * FROM table WHERE column LIKE '[a-c]+'
which would then match on the list above, but would NOT match on e.g "xxxxxaxxxx"
As a general rule, SQL Server's LIKE patterns are much weaker than regular expressions. For your particular example, you can do:
where col not like '%[^a-c]%'
That is, the column contains no characters that are not a, b, or c.
You can use regex in SQL with combination of LIKE e.g :
SELECT * FROM Table WHERE Field LIKE '%[^a-z0-9 .]%'
This works in SQL
Or in your case
SELECT * FROM Table WHERE Field LIKE '%[^a-c]%'
I seems you want some data from database, That is you don't know exactly, You must show your column and the all character that you want in that filed.
Do you know how to remove below kind of Characters at once on a query ?
Note : .I'm retrieving this data from the Access app and put only the valid data into the SQL.
select DISTINCT ltrim(rtrim(a.Company)) from [Legacy].[dbo].[Attorney] as a
This column is company name column.I need to keep string characters only.But I need to remove numbers only rows,numbers and characters rows,NULL,Empty and all other +,-.
Based on your extremely vague "rules" I am going to make a guess.
Maybe something like this will be somewhere close.
select DISTINCT ltrim(rtrim(a.Company))
from [Legacy].[dbo].[Attorney] as a
where LEN(ltrim(rtrim(a.Company))) > 1
and IsNumeric(a.Company) = 0
This will exclude entries that are not at least 2 characters and can't be converted to a number.
This should select the rows you want to delete:
where company not like '%[a-zA-Z]%' and -- has at least one vowel
company like '%[^ a-zA-Z0-9.&]%' -- has a not-allowed character
The list of allowed characters in the second expression may not be complete.
If this works, then you can easily adapt it for a delete statement.
I have a question because I'm really bad at SQL. I understand basic functions but when
it gets a bit more complex, I'm completly lost.
here is what I have:
tables: tA, tB
columns: tA: refA tB: refB
basically refA and refB represent the same thing (some id of a form like xxx-xxx-xxx), but
refB can have information appended (like xxx-xxx-xxx_Zxxx or xxx-xxx-xxx Zxxx)
here is what I know how to do:
querying items that are in a table but not in another (when they are exactly the same)
select refA
from tA
where not exists (select *
from tB
where tB.refB = tA.refA
)
What i want to do:
I want a query that will list items from refA that are not in refB.
BUT, Problem is if I run a "simple" query with a NOT EXISTS like I just showed, it will return everything,
because of the appends. so I thought about using some syntax like this:
SELECT refA
FROM tA
WHERE NOT EXISTS (SELECT *
FROM tB
WHERE tB.refB LIKE CONCAT(tA.refA,'%'))
but... of course, it doesn't work.
Could someone show me how it should be done, and also explain how it works, so I can learn ?
Thanks in advance !
edit: additional info
I can't use a left() or something alike, because the ref format is similar but not always the same (varies in number of characters).
The only way to detect the end of the id before the append, is that there is either a blank space or an underscore.
edit 2: data sample causing problems (MON, Jan. 10th)
here is some actual data from the tables, which makes most answers people have given here
miss some results :/
in tA:
B20-60-04-6A-1
B20-60-04-6A-11
B20-60-04-6A-12
B20-60-04-6A-13
in tB:
B20-60-04-6A-11_XX
B20-60-04-6A-12_XX
B20-60-04-6A-13_XX
problem with mid(), left(), etc. is that if we check "B20-60-04-6A-1" (14 chars)
against the 14 first chars, it will return 3 positives, while in fact it is not in tB...
so, how can we proceed ?
Examples of data patterns in tA are like this:
(X, XYZ: charaters. x: alphanumerical)
Xxx-xx-xx-x
Xxx-xx-xx-xx
Xxx-xx-xx-xx-xx
Xxx-xx-xx-xx-xx-x
etc
examples of data patterns in tB:
Xxx-xx-xx-xx-xx-XYZ-xx Z xxx_XX
Xxx-xx-xx-xx-xx-XYZZxxx_XX
Xxx-xx-xx-xx-xx Z xxx_XX
XYZ are always the same 3 characters. When we do not have XYZ, there is always a blank space or an underscore.
so the string of data we compare should be trimmed according to this:
- from start to -XYZ string
- or, if no -XYZ in the string, from start to the first " " or "_"
I'd write that lightning fast in VBA, but in SQL... well, I'll give it a shot, but I'm really bad at it :D
So, first off, you need a function that will change refB to not have the appended information, so it can be compared properly with refA. There will be several approaches, but something like this should work:
Left(tb.RefB, InStr(Replace(tb.RefB+"_", " ", "_"), "_") -1)
That will convert any refB like "123-456 123 EXTRA STUFF" or "123-456_123_EXTRA_STUFF" into "123-456". That result should then be okay to compare directly with a refA.
EDIT: A short explanation of the expression above. What I'm doing is:
Adding an underscore to the end of refB, so that there's always at least one underscore (this copes for the case where refB is the same as refA, e.g. "123" becomes "123_")
Replacing all spaces in refB with underscores (the Replace function). Now we know that the separator is always an underscore, and we also know from step 1 that there will be at least one underscore.
Finding the location of the first underscore (the InStr function). This is the position where refB is split between refA and the additional stuff.
Grabbing all the characters between the start of the string and this first underscore, i.e. the part before the separator.
So, that gives you something like this:
select refA
from tA
where not exists (select *
from tB
where Left(tb.RefB, InStr(Replace(tb.RefB+"_", " ", "_"), "_") -1) = tA.refA
)
I would use this approach rather than comparing with wildcards, or trimming refB to match the length of refA, because of this scenario:
refA
====
123
123-456
123-456-789
refB
====
123-456-789_This_is_a_test
In this case, trimming or wildcard matching refA with refB will result in success for all refAs, because "123*", "123-456*" and "123-456-789*" all match "123-456-789_This_is_a_test".
So you want everything from A where not in B, but where only the start of B's id matches?
select refA
from tA
left outer join tB
on tA.refA = left( tB.refB, len(tA.refA)) --trim B's id to the length of A's
where tB.refB is null
Maybe use a left() function, if one exists in access? Like this:
SELECT refA
FROM tA
WHERE NOT EXISTS (SELECT *
FROM tB
WHERE Left(tB.refB, Len(tA.refA)) = tA.refA)
If, as you said, you have to look for a space or underscore in the refA, you can use this:
SELECT refA
FROM tA
WHERE NOT EXISTS (SELECT *
FROM tB
WHERE Left(tB.refB, Max(Instr(tA.refA, ' '), Instr(tA.refA, '_'))) = tA.refA)
I'd change the schema. Your second table should have two columns, one containing the first part of the identifier, the other containing the second; if the column was the primary key first, just create a unique multi-column index and disallow NULL values.
You can also add a foreign key constraint this way, and/or optimize the comparisons by introducing a surrogate key in the first table and referencing that from the second.
If you do not have an index on the substring you are trying to match, you will end up with a full scan for each value you are looking for, this is hideously expensive.
I think your suggestion will work in a slightly different format, generally the wild card in Access is *, unless you have set ANSI 92 mode, however you can use ALIKE with % in 'ordinary' mode.
EDIT : DIFFERENT IDEA
SELECT tA.refA
FROM tA
WHERE (((tA.refA)
Not In (SELECT Mid(tb.RefB,1,Len(ta.RefA)) FROM tb)));
This is valid syntax and close to the syntax you say you want to write:
SELECT refA
FROM tA
WHERE NOT EXISTS (
SELECT *
FROM tB
WHERE tB.refB ALIKE tA.refA & '%'
);