using like operator in spark sql databricks - sql

I'm using spark sql, and I created some Vues to join some data. but I have to join these Vues based on a string column. thats whu I had to use the like operator.
select table.perfume,table2.perfume
from global_temp.gv_table1 table1
join global_temp.gv_table2 table2
on(lower(table1.perfume) like CONCAT('%', lower(table2.perfume), '%') )
but the problem with this query it does not not give all the result, example.
there'es a perfume on the table1 called "FlowerBomb" and a perfume on the table2 called "Flowerbomb Eau du parfum", after the join this perfume was not displayed.
is there a problem with the like operator ?

You've got the order of the columns wrong in your like expression.
Since table2.perfume contains table1.perfume, the expression should be like this:
on(lower(table2.perfume) like CONCAT('%', lower(table1.perfume), '%') )

you may want to convert this to Spark APIs. Its fairly simple -
result = table1.alias('1').join(
table2.alias('2'),
F.expr("2.perfume like 1.perfume")
)

Related

SQL issue with Like operator not working as expected

I have an sql issue using teradata sql-assistant with like operator as is shown in the below exemple:
table A
id|
23_0
111_10
201_540
so i should select only the id that finish with '_0'
i tried the below query but it give me all the three ids
select * from A
where id like '%_0'
but i expect only
id|
23_0
have you any idea, please ?
The problem is that _ is a special character. So, one method is:
where id like '$_0' escape '$'
You can also use right():
where right(id, 2) = '_0'

How can query be optimized?

I have a simple select query on a table, but with different values in LIKE operator. Query is as follows:
SELECT *
FROM BatchServices.dbo.TaskLog
WHERE EntryTime BETWEEN '20190407' AND '20190408' AND
TaskGroup LIKE '%CSR%' AND
(LogText LIKE '%error%' OR LogText LIKE '%fail%')
This above query is fine and returning me the expected results but I don't want to have multiple LIKE in a query, so I have already tried something like
SELECT *
from BatchServices.dbo.TaskLog
WHERE taskgroup = 'csr' AND
LogText IN ( '%error%','%fail%') AND
EntryTime>'2019-04-07'
ORDER BY EntryTime ASC
This query is not giving me any results.
I am expecting a query which looks smarter than the one I have which returns result. Any help?
use like operator with OR condition
SELECT * from BatchServices.dbo.TaskLog WHERE taskgroup ='csr' AND
(LogText like '%error%' or LogText like '%fail%')
AND EntryTime>'2019-04-07'
ORDER BY EntryTime ASC
The LIKE operators are not the problem. It's the leading wild cards. Unless you cant get rid of those, your optimization options are going to be limited to making sure you have a covering index on EntryTime... That and replacing the "*" with the specific columns you need.

How to convert this SAS code to SQL Server code?

SAS CODE:
data table1;
set table2;
_sep1 = findc(policynum,'/&,');
_count1 = countc(policynum,'/&,');
_sep2 = findc(policynum,'-');
_count2 = countc(policynum,'-');
_sep3 = findc(policynum,'_*');
_count3 = countc(policynum,'_*');
How can I convert this into a select statement like below:
select
*,
/*Code converted to SQL from above*/
from table2
For example I tried the below code:
select
*,
charindex('/&,',policynum) as _sep1,
LEN(policynum) - LEN(REPLACE(policynum,'/&,','')) as _count1
from table2
But I got a ERROR 42S02: Function 'CHARINDEX(UNKNOWN, VARCHAR)' does not exist. Unable to identify a function that satisfies that given argument types. You may need to add explicit typecasts.
Please note that the variable pol_no is: 'character varying(50) not null'.
I am running this on using Aginity Workbench for Netezza. I believe this is IBM.
Assuming Oracle based on CHARINDEX() this may work:
You need to apply it twice, once for each character and take the minimum to find the first occurrence.
There may be a better suited function within Oracle, but I don't know enough to suggest one.
select
*,
min(charindex('/',policynum), charindex('&', policynum)) as _sep1
from table2
EDIT: based on OP notes.
Netezza seems like IBM which means use the INSTR function, not CHARINDEX.
select
*,
min(instr(policynum, '/'), instr(policynum, '&')) as _sep1
from table2
https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.sqls.doc/ids_sqs_2336.htm
FINDC & COUNTC functions are basically used for searching a character & counting them.
You can use LIKE operator from SQL to find characters with '%' and '_' wildcards
e.g. -
SELECT * FROM <table_name> WHERE <column_name> LIKE '%-%';
and
SELECT COUNT(*) FROM <table_name> WHERE <column_name> LIKE '%-%';
You can use regular expressions in the LIKE operator as well

select count(distinct) where col not like ('%d1,%d2,...')

select
count(distinct [PROV_CT])
from
[HRecent]
where
[PROV_CT] not like ('%P125, %P961, %P160, %P960, %P220, %P004')
Can I write a query like this? Actually it is showing outputs which is different from the query output.
select
count(distinct [PROV_CT])
from
[HRecent]
where
[PROV_CT] not like '%P125' and
[PROV_CT] not like '%P220' and
[PROV_CT] not like '%P960' and
[PROV_CT] not like '%P004' and
[PROV_CT] not like '%P961' and
[PROV_CT] not like '%P160'
Can anyone help me out please? I want to write an optimised query.
You cannot write the query using a single string literal like in:
[PROV_CT] not like ('%P125, %P961, %P160, %P960, %P220, %P004')
This predicate doesn't look for separate values like '%P125', '%P961' etc.
If you have a very big list of values against which NOT LIKE operation is to be performed, then it might be simpler to do it like this:
select
count(distinct [PROV_CT])
from
[HRecent]
cross apply (
select count(*)
from (values ('%P125'), ('%P961'),
('%P160'), ('%P960'),
('%P220'), ('%P004') ) AS t(v)
where [PROV_CT] LIKE t.v) AS x(cnt)
where x.cnt = 0
Using VALUES Table Value Constructor you create an in-line table containing all the values against which [PROV_CTRCT] column is to be compared. Then query this table using a single LIKE operation to find if there is a match or not.
Demo here

SQL Query for finding a column name where matching text is in column

This is my first stakoverflow question, although I've lurked for quite a while. I'm writing a webapp in PHP/SQLite, and I'm trying to find a column name along with the following SQL query:
SELECT lexemeid FROM lexeme, sense WHERE lexeme.lexemeid =
sense.senselexemeid AND (lexeme.lexeme LIKE '%apani%' OR lexeme.variant
LIKE '%apani%' OR lexeme.affixedform LIKE '%apani%' OR sense.example
LIKE '%apani%');
Basically, I'm offering a full text lookup for a few different fields. The query above works, but I'd like to get the name of the column where my wildcard matches for each result as well. Basically I want something like the above query with the SELECT looking more like:
SELECT lexemeid, COLUMN NAME FROM...
I'd also welcome any ideas for making my SQL Query look/perform better (maybe using LIKE and IN??). I'm basically trying to join lexeme.lexemeid and sense.senselexemeid and do a wildcard lookup on a text string (in this case, "apani").
Thanks!
Assuming you only have a match in one of the columns, you could use a CASE statement.
SELECT lexemeid,
CASE WHEN lexeme.lexeme LIKE '%apani%' THEN 'lexeme'
WHEN lexeme.variant LIKE '%apani%' THEN 'variant'
...
WHEN sense.example LIKE '%apani%' THEN 'example'
END AS ColumnName
FROM ...