LIKE operator two columns clickhouse - sql

I want to select rows in clickhouse table where two string columns are LIKE each other (foe example where column1 is 'Hello' and column2 is '%llo')
I tried LIKE operator:
SELECT * FROM table_name WHERE column1 LIKE column2;
but it said:
Received exception from server (version 21.2.8):
Code: 44. DB::Exception: Received from localhost:9000. DB::Exception: Argument at index 1 for function like must be constant: while executing 'FUNCTION like(column1 : 17, column2 : 17) -> like(column1, column2) UInt8 : 28'.
it seems that the second argument should be a constant value. Is there any other way to apply this condition?

CH Like supports only constant argument.
There is no general solution. The same problem with regex functions and so on. (because Clickhouse applies compiled expression and applies to a column byte-stream before separating to rows).
In some cases you can use position or countSubstrings functions for this task.

You can use LOCATE or POSITION for this (https://clickhouse.tech/docs/en/sql-reference/functions/string-search-functions/). The query would look something like this:
SELECT *
FROM table_name
WHERE position(column1, column2, character_length(column1) - character_length(column2) + 1) > 0;
This may be flawed. It seems that in clickhouse most string functions work on bytes or variable UTF8 byte lengths rather than on characters. One has to pay attentention hence how the functions work and how they should be combined. I am using the third parameter start_pos above and assume that it refers to the character position, but well, it can be bytes just as well - I have not been able to find this information in the docs .

Related

Sending ARRAY to VALUES clause fails

If I want to construct a temporary valueset for testing, I can do something like this:
SELECT * FROM (VALUES (97.99), (98.01), (99.00))
which will result in this:
COLUMN1
1
97.99
2
98.01
3
99.00
However, if I want to construct a result set where one of the columns contains an ARRAY, like this:
SELECT * FROM (VALUES (97.99, [14, 37]), (98.01, []), (99.00, [14]))
I would expect this:
COLUMN1
COLUMN2
1
97.99
[14, 37]
2
98.01
[]
3
99.00
[14]
but I actually get the following error:
Invalid expression [ARRAY_CONSTRUCT(14, 37)] in VALUES clause
I don't see anything in the documentation for the VALUES clause that explains why this is invalid. What am I doing wrong here and how can I generate a result set with an ARRAY column?
I think the values clause only allows primitive types. You can define it as a string in single quotes and use parse_json to turn it into an array:
SELECT $1 COL1, parse_json($2)::array COL2
FROM (VALUES (97.99, '[14, 37]'), (98.01, '[]'), (99.00, '[14]'));
VALUES() has some restrictions:
Each expression must be a constant, or an expression that can be evaluated as a constant during compilation of the SQL statement.
Most simple arithmetic expressions and string functions can be evaluated at compile time, but most other expressions cannot.
https://docs.snowflake.com/en/sql-reference/constructs/values.html
From the documentation
Each expression must be a constant, or an expression that can be
evaluated as a constant during compilation of the SQL statement.
Most simple arithmetic expressions and string functions can be
evaluated at compile time, but most other expressions cannot.
The documentation doesn't explicitly says this, but given the ability of arrays to hold multiple data types and varying number of elements, I want to say arrays in most SQL based databases are dynamic arrays that don't evaluate at compile time. Maybe some experts can shed more light on this.
Back to your problem, I would just use explicit select statements like:
select 97.99, [14, 37] union all
select 98.01, [];

SQL : IN operator vs multiple ORs

There is a behaviour I would like to understand for good.
Query #1:
SELECT count(id) FROM table WHERE message like '%TEXT1%'
Output : 504
Query #2
SELECT count(distinct id) FROM table WHERE message like '%TEXT2%'
Output : 87
Query #3
SELECT count(distinct id) FROM table WHERE message in ('%TEXT1%','%TEXT2%' )
Output : 0
I want to understand why am I getting zero in the third query. Based on this, the ( , ) is equivalent to a multiple OR. Isn't this OR inclusive ?
the ( , ) is equivalent to a multiple OR. Isn't this OR inclusive ?
Sure, it's inclusive. But it's still an equality comparison, with no wildcard matching. It's like writing
WHERE (message = '%TEXT1%' or message = '%TEXT2%')
rather than
WHERE (message LIKE '%TEXT1%' or message LIKE '%TEXT2%')
IN does not take wildcards. They are specific to LIKE.
So, you need to use:
WHERE message like '%TEXT1%' OR message like '%TEST2%'
Or, you can use regular expressions:
WHERE message ~ 'TEXT1|TEXT2'
IN checks if the value on its left-hand side is equal to any of the values in the list. It does not support pattern matching.
This behavior is standard ANSI SQL, and is also described in Postgres documentation:
expression IN (value [, ...])
The right-hand side is a parenthesized list of scalar expressions. The result is “true” if the left-hand expression's result is equal to any of the right-hand expressions. This is a shorthand notation for:
expression = value1 OR expression = value2 OR ...
So if you want to match against several possible patterns, you need OR:
where message like '%TEXT1%' or message like '%TEST2%'

SQL FTS and comparison statements

Short story. I am working on a project where I need to communicate with SQLite database. And there I have several problems:
There is one FTS table with nodeId and nodeName columns. I need to select all nodeIds for which nodeNames contains some text pattern. For instance all node names with "Donald" inside. Something similar was discussed in this thread. The point is that I can't use CONTAINS keyword. Instead I use MATCH. And here is the question itself: how should this "Donald" string be "framed"? With '*' or with '%' character? Here is my query:
SELECT * FROM nodeFtsTable WHERE nodeName MATCH "Donald"
Is it OK to write multiple comparison in SELECT statement? I mean something like this:
SELECT * FROM distanceTable WHERE pointId = 1 OR pointId = 4 OR pointId = 203 AND distance<200
I hope that it does not sound very confusing. Thank you in advance!
Edit: Sorry, I missed the fact that you are using FTS4. It looks like you can just do this:
SELECT * FROM nodeFtsTable WHERE nodeName MATCH 'Donald'
Here is relevant documentation.
No wildcard characters are needed in order to match all entries in which Donald is a discrete word (e.g. the above will match Donald Duck). If you want to match Donald as part of a word (e.g. Donalds) then you need to use * in the appropriate place:
SELECT * FROM nodeFtsTable WHERE nodeName MATCH 'Donald*'
If your query wasn't working, it was probably because you used double quotes.
From the SQLite documentation:
The MATCH operator is a special syntax for the match()
application-defined function. The default match() function
implementation raises an exception and is not really useful for
anything. But extensions can override the match() function with more
helpful logic.
FTS4 is an extension that provides a match() function.
Yes, it is ok to use multiple conditions as in your second query. When you have a complex set of conditions, it is important to understand the order in which the conditions will be evaluated. AND is always evaluated before OR (they are analagous to mathematical multiplication and addition, respectively). In practice, I think it is always best to use parentheses for clarity when using a combination of AND and OR:
--This is the same as with no parentheses, but is clearer:
SELECT * FROM distanceTable WHERE
pointId = 1 OR
pointId = 4 OR
(pointId = 203 AND distance<200)
--This is something completely different:
SELECT * FROM distanceTable WHERE
(pointId = 1 OR pointId = 4 OR pointId = 203) AND
distance<200

Return rows where first character is non-alpha

I'm trying to retrieve all columns that start with any non alpha characters in SQlite but can't seem to get it working. I've currently got this code, but it returns every row:
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[A-z]%'
Is there a way to retrieve all rows where the first character of TestNames are not part of the alphabet?
Are you going first character only?
select * from TestTable WHERE substr(TestNames,1) NOT LIKE '%[^a-zA-Z]%'
The substr function (can also be called as left() in some SQL languages) will help isolate the first char in the string for you.
edit:
Maybe substr(TestNames,1,1) in sqllite, I don't have a ready instance to test the syntax there on.
Added:
select * from TestTable WHERE Upper(substr(TestNames,1,1)) NOT in ('A','B','C','D','E',....)
Doesn't seem optimal, but functionally will work. Unsure what char commands there are to do a range of letters in SQLlite.
I used 'upper' to make it so you don't need to do lower case letters in the not in statement...kinda hope SQLlite knows what that is.
try
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[^a-zA-Z]%'
SELECT * FROM NC_CRIT_ATTACH WHERE substring(FILENAME,1,1) NOT LIKE '[A-z]%';
SHOULD be a little faster as it is
A) First getting all of the data from the first column only, then scanning it.
B) Still a full-table scan unless you index this column.

Regular expressions inside SQL Server

I have stored values in my database that look like 5XXXXXX, where X can be any digit. In other words, I need to match incoming SQL query strings like 5349878.
Does anyone have an idea how to do it?
I have different cases like XXXX7XX for example, so it has to be generic. I don't care about representing the pattern in a different way inside the SQL Server.
I'm working with c# in .NET.
You can write queries like this in SQL Server:
--each [0-9] matches a single digit, this would match 5xx
SELECT * FROM YourTable WHERE SomeField LIKE '5[0-9][0-9]'
stored value in DB is: 5XXXXXX [where x can be any digit]
You don't mention data types - if numeric, you'll likely have to use CAST/CONVERT to change the data type to [n]varchar.
Use:
WHERE CHARINDEX(column, '5') = 1
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
CHARINDEX
ISNUMERIC
i have also different cases like XXXX7XX for example, so it has to be generic.
Use:
WHERE PATINDEX('%7%', column) = 5
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
PATINDEX
Regex Support
SQL Server 2000+ supports regex, but the catch is you have to create the UDF function in CLR before you have the ability. There are numerous articles providing example code if you google them. Once you have that in place, you can use:
5\d{6} for your first example
\d{4}7\d{2} for your second example
For more info on regular expressions, I highly recommend this website.
Try this
select * from mytable
where p1 not like '%[^0-9]%' and substring(p1,1,1)='5'
Of course, you'll need to adjust the substring value, but the rest should work...
In order to match a digit, you can use [0-9].
So you could use 5[0-9][0-9][0-9][0-9][0-9][0-9] and [0-9][0-9][0-9][0-9]7[0-9][0-9][0-9]. I do this a lot for zip codes.
SQL Wildcards are enough for this purpose. Follow this link: http://www.w3schools.com/SQL/sql_wildcards.asp
you need to use a query like this:
select * from mytable where msisdn like '%7%'
or
select * from mytable where msisdn like '56655%'