Find out what pattern was matched when using a LIKE query? - sql

If I was to perform a normal query with a bunch of LIKE statements in it. Would it be possible to return which search term actually resulted in the row being returned?
So if I ran :
select cand_id
FROM cand_kw
WHERE client_id='client'
AND ( ( UPPER(kw) LIKE '%ANDREW%' AND UPPER(kw) LIKE '%POSTINGS%' )
OR ( UPPER(kw) LIKE '%BRET%' )
OR ( UPPER(kw) LIKE '%TIM%' )) ) )
And it returned some rows of results is there a way to tag on which term was actually matched in the row? So if '%ANDREW%' was what caused this row to be returned I could then show that information.
The data base engine is oracle 9i and I realize that this is normally a function something like full text searches that this database is not setup to handle so I am just trying to fake it in way.

It is a bit tricky, because more than one keyword may match. You could use a CASE expression in the SELECT clause, but then you would get the first matching keyword only.
Another approach would be to put each keyword on a separate row, use a join to filter the original table, and then aggregate the list of matching keyword.
So:
SELECT c.cand_id, LISTAGG(k.kw, ', ') WITHIN GROUP (ORDER BY k.kw) matches
FROM cand_kw c
INNER JOIN (
SELECT 'ANDREW' kw FROM DUAL
UNION ALL SELECT 'POSTINGS' FROM DUAL
UNION ALL SELECT 'BRET' FROM DUAL
UNION ALL SELECT 'TIM' FROM DUAL
) k ON c.kw LIKE '%' || k.kw || '%'
GROUP BY c.cand_id

Related

How to efficiently select records matching substring in another table using BigQuery?

I have a table of several million strings that I want to match against a table of about twenty thousand strings like this:
#standardSQL
SELECT record.* FROM `record`
JOIN `fragment` ON record.name
LIKE CONCAT('%', fragment.name, '%')
Unfortunately this is taking an awful long time.
Considering that the fragment table is only 20k records, can I load it into a JavaScript array using a UDF and match it that way? I'm trying to figure out how to this right now but perhaps there's already some magic I could do here to make this faster. I tried a CROSS JOIN and got resource exceeded fairly quickly. I've also tried using EXISTS but I can't reference the record.name inside that subquery's WHERE without getting an error.
Example using Public Data
This seems to reflect about the same amount of data ...
#standardSQL
WITH record AS (
SELECT LOWER(text) AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT LOWER(name) AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
)
SELECT record.* FROM `record`
JOIN `fragment` ON record.name
LIKE CONCAT('%', fragment.name, '%')
Below is for BigQuery Standard SQL
#standardSQL
WITH record AS (
SELECT LOWER(text) AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT DISTINCT LOWER(name) AS name
FROM `bigquery-public-data.usa_names.usa_1910_current`
), temp_record AS (
SELECT record, TO_JSON_STRING(record) id, name, item
FROM record, UNNEST(REGEXP_EXTRACT_ALL(name, r'\w+')) item
), temp_fragment AS (
SELECT name, item FROM fragment, UNNEST(REGEXP_EXTRACT_ALL(name, r'\w+')) item
)
SELECT AS VALUE ANY_VALUE(record) FROM (
SELECT ANY_VALUE(record) record, id, r.name name, f.name fragment_name
FROM temp_record r
JOIN temp_fragment f
USING(item)
GROUP BY id, name, fragment_name
)
WHERE name LIKE CONCAT('%', fragment_name, '%')
GROUP BY id
above was completed in 375 seconds, while original query is still running at 2740 seconds and keep running, so I will not even wait for it to complete
Mikhail's answer appears to be faster - but lets have one that doesn't need to SPLIT nor separate the text into words.
First, compute a regular expression with all the words to be searched:
#standardSQL
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
), fragment AS (
SELECT name AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
)
SELECT FORMAT('(%s)',STRING_AGG(name,'|'))
FROM fragment
Now you can take that resulting string, and use it in a REGEX ignoring case:
#standardSQL
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
), largestring AS (
SELECT '(?i)(mary|margaret|helen|more_names|more_names|more_names|josniel|khaiden|sergi)'
)
SELECT record.* FROM `record`
WHERE REGEXP_CONTAINS(record.name, (SELECT * FROM largestring))
(~510 seconds)
As eluded to in my question, I worked on a version using a JavaScript UDF which solves this albeit in a slower way than the answer I accepted. For completeness, I'm posting it here because perhaps someone (like myself in the future) may find it useful.
CREATE TEMPORARY FUNCTION CONTAINS_ANY(str STRING, fragments ARRAY<STRING>)
RETURNS STRING
LANGUAGE js AS """
for (var i in fragments) {
if (str.indexOf(fragments[i]) >= 0) {
return fragments[i];
}
}
return null;
""";
WITH record AS (
SELECT text AS name
FROM `bigquery-public-data.hacker_news.comments`
WHERE text IS NOT NULL
), fragment AS (
SELECT name AS name, COUNT(*)
FROM `bigquery-public-data.usa_names.usa_1910_current`
WHERE name IS NOT NULL
GROUP BY name
), fragment_array AS (
SELECT ARRAY_AGG(name) AS names, COUNT(*) AS count
FROM fragment
GROUP BY LENGTH(name)
), records_with_fragments AS (
SELECT record.name,
CONTAINS_ANY(record.name, fragment_array.names)
AS fragment_name
FROM record INNER JOIN fragment_array
ON CONTAINS_ANY(name, fragment_array.names) IS NOT NULL
)
SELECT * EXCEPT(rownum) FROM (
SELECT record.name,
records_with_fragments.fragment_name,
ROW_NUMBER() OVER (PARTITION BY record.name) AS rownum
FROM record
INNER JOIN records_with_fragments
ON records_with_fragments.name = record.name
AND records_with_fragments.fragment_name IS NOT NULL
) WHERE rownum = 1
The idea is that the list of fragments is relatively small enough that it can be processed in an array, similar to Felipe's answer using regular expressions. The first thing I do is create a fragment_array table which is grouped by the fragment lengths ... a cheap way of preventing an over-sized array which I found can cause UDF timeouts.
Next I create a table called records_with_fragments that joins those arrays to the original records, finding only those which contain a matching fragment using the JavaScript UDF CONTAINS_ANY(). This will result in a table containing some duplicates since one record may match multiple fragments.
The final SELECT then pulls in the original record table, joins to records_with_fragments to determine which fragment matched, and also uses the ROW_NUMBER() function to prevent duplicates, e.g. only showing the first row of each record as uniquely identified by its name.
Now, the reason I do the join in the final query is because in my actual data there are more fields I want besides just the string being matched. Earlier on in my actual data I create a table of DISTINCT strings which then later need to be re-joined.
Voila! Not the most elegant but it gets the job done.

select TableData where ColumnData start with list of strings

Following is the query to select column data from table, where column data starts with a OR b OR c. But the answer i am looking for is to Select data which starts with List of Strings.
SELECT * FROM Table WHERE Name LIKE '[abc]%'
But i want something like
SELECT * FROM Table WHERE Name LIKE '[ab,ac,ad,ae]%'
Can anybody suggest what is the best way of selecting column data which starts with list of String, I don't want to use OR operator, List of strings specifically.
The most general solution you would have to use is this:
SELECT *
FROM Table
WHERE Name LIKE 'ab%' OR Name LIKE 'ac%' OR Name LIKE 'ad%' OR Name LIKE 'ae%';
However, certain databases offer some regex support which you might be able to use. For example, in SQL Server you could write:
SELECT *
FROM Table
WHERE NAME LIKE 'a[bcde]%';
MySQL has a REGEXP operator which supports regex LIKE operations, and you could write:
SELECT *
FROM Table
WHERE NAME REGEXP '^a[bcde]';
Oracle and Postgres also have regex like support.
To add to Tim's answer, another approach could be to join your table with a sub-query of those values:
SELECT *
FROM mytable t
JOIN (SELECT 'ab' AS value
UNION ALL
SELECT 'ac'
UNION ALL
SELECT 'ad'
UNION ALL
SELECT 'ae') v ON t.vame LIKE v.value || '%'

Ordering in SQL while using logical operators

I'm trying to write an SQL query that has "OR" operator. The thing is that I want it to work with some, let say "priorities". I have an entity Item, it has two fields that I use in search:
description
title
And there is an SQL query:
select * from item
where description like '%a%' or title like '%a%';
What I want here is that if we have two entities returned and one of them matches like '%a%' by description and another one - by title, the one that matches via title should be the first one in the list. In other words, it should have bigger priority. Is there a way I can describe such a purpose in SQL?
Dialect: Oracle / H2
In Oracle, you may use a CASE to order by a values that makes rows ordered by the way they match your conditions:
/* test case */
with item(title, description) as (
select '__x__', '__x__' from dual union all
select '__x__', '__a__' from dual union all
select '__a__', '__x__' from dual union all
select '__a__', '__a__' from dual
)
/* the query */
select *
from item
where description like '%a%' or title like '%a%'
order by case
when title like '%a%'
then 1
else 2
end
This gives:
TITLE DESCR
----- -----
__a__ __x__
__a__ __a__
__x__ __a__

Compare Items in the "IN" Clause and the resultset

I'd like to achieve something as follows, I have the following query (As simple as this),
SELECT ENT_ID,TP_ID FROM TC_LOGS WHERE ENT_ID IN (1,2,3,4,5).
Now the table TC_LOGS may not have all the items in the IN clause. So assuming that the table TC_LOGS has only 1,2. I'd like to compare the items in the IN clause i.e. 1,2,3,4,5 with 1,2(the resultset) and get a result as FOUND - 1,2 NOT FOUND - 3,4,5. I've have implemented this by applying an XSL transformation on the resultset in the application code, but I'd like to achieve this in a query, which I feel is more of an elegant solution to this problem. Also, I tried the following query with NVL, just to separate out the FOUND and NOT FOUND items as,
SELECT NVL(ENT_ID,"NOT FOUND") FROM TC_LOGS WHERE ENT_ID IN(1,2,3,4,5)
I was expecting a result as 1,2,NOT FOUND,NOT FOUND,NOT FOUND
But the above query doesn't return any result.. I'd appreciate if someone can guide me in the right path here.. Thanks much in advance.
Assuming that the items in your IN list can (or can come) from another query, you can do something like
WITH src AS (
SELECT level id
FROM dual
CONNECT BY level <= 5)
SELECT nvl(ent_id, 'Not Found' )
FROM src
LEFT OUTER JOIN tc_logs ON (src.id = tc_logs.ent_id)
In my case, the src query is just generating the numbers 1 through 5. You could just as easily fetch that data from a different table, load the numbers into a collection that you query using the TABLE operator, load the numbers into a temporary table that you query, etc. depending on how the IN list data is determined.
NVL isn't going to work because no values (including NULLS) are returned when there is no match with the IN statement.
What you can do is something like this:
SELECT NVL(ENT_ID, "NOT FOUND")
FROM TC_LOGS
RIGHT OUTER JOIN (
SELECT 1 AS 'TempID' UNION
SELECT 2 UNION
SELECT 3 UNION
SELECT 4 UNION
SELECT 5) AS Sub ON ENT_ID = TempID
The outer join will return NULLS for ENT_ID where there are no matches. Note, I'm not an Oracle person so I can't guarantee that this syntax is perfect.
if you have a table (let's use table src )contains all (1,2,3,4,5) values, you can use full join.
You can use (WITH src AS ( SELECT level id FROM dual CONNECT BY level <= 5) as the src table also)
SELECT
ent_id,tl.tp_id,src.tp_id
FROM
src
FULL JOIN
tc_logs tl
USING (ent_id)
ORDER BY
ent_id
Here is the web site for oracle full join.http://psoug.org/snippet/Oracle-PL-SQL-ANSI-Joins-FULL-JOIN_738.htm

Using LIKE in an Oracle IN clause

I know I can write a query that will return all rows that contain any number of values in a given column, like so:
Select * from tbl where my_col in (val1, val2, val3,... valn)
but if val1, for example, can appear anywhere in my_col, which has datatype varchar(300), I might instead write:
select * from tbl where my_col LIKE '%val1%'
Is there a way of combing these two techniques. I need to search for some 30 possible values that may appear anywhere in the free-form text of the column.
Combining these two statements in the following ways does not seem to work:
select * from tbl where my_col LIKE ('%val1%', '%val2%', 'val3%',....)
select * from tbl where my_col in ('%val1%', '%val2%', 'val3%',....)
What would be useful here would be a LIKE ANY predicate as is available in PostgreSQL
SELECT *
FROM tbl
WHERE my_col LIKE ANY (ARRAY['%val1%', '%val2%', '%val3%', ...])
Unfortunately, that syntax is not available in Oracle. You can expand the quantified comparison predicate using OR, however:
SELECT *
FROM tbl
WHERE my_col LIKE '%val1%' OR my_col LIKE '%val2%' OR my_col LIKE '%val3%', ...
Or alternatively, create a semi join using an EXISTS predicate and an auxiliary array data structure (see this question for details):
SELECT *
FROM tbl t
WHERE EXISTS (
SELECT 1
-- Alternatively, store those values in a temp table:
FROM TABLE (sys.ora_mining_varchar2_nt('%val1%', '%val2%', '%val3%'/*, ...*/))
WHERE t.my_col LIKE column_value
)
For true full-text search, you might want to look at Oracle Text: http://www.oracle.com/technetwork/database/enterprise-edition/index-098492.html
A REGEXP_LIKE will do a case-insensitive regexp search.
select * from Users where Regexp_Like (User_Name, 'karl|anders|leif','i')
This will be executed as a full table scan - just as the LIKE or solution, so the performance will be really bad if the table is not small. If it's not used often at all, it might be ok.
If you need some kind of performance, you will need Oracle Text (or some external indexer).
To get substring indexing with Oracle Text you will need a CONTEXT index. It's a bit involved as it's made for indexing large documents and text using a lot of smarts. If you have particular needs, such as substring searches in numbers and all words (including "the" "an" "a", spaces, etc) , you need to create custom lexers to remove some of the smart stuff...
If you insert a lot of data, Oracle Text will not make things faster, especially if you need the index to be updated within the transactions and not periodically.
No, you cannot do this. The values in the IN clause must be exact matches. You could modify the select thusly:
SELECT *
FROM tbl
WHERE my_col LIKE %val1%
OR my_col LIKE %val2%
OR my_col LIKE %val3%
...
If the val1, val2, val3... are similar enough, you might be able to use regular expressions in the REGEXP_LIKE operator.
Yes, you can use this query (Instead of 'Specialist' and 'Developer', type any strings you want separated by comma and change employees table with your table)
SELECT * FROM employees em
WHERE EXISTS (select 1 from table(sys.dbms_debug_vc2coll('Specialist', 'Developer')) mt
where em.job like ('%' || mt.column_value || '%'));
Why my query is better than the accepted answer: You don't need a CREATE TABLE permission to run it. This can be executed with just SELECT permissions.
In Oracle you can use regexp_like as follows:
select *
from table_name
where regexp_like (name, '^(value-1|value-2|value-3....)');
The caret (^) operator to indicate a beginning-of-line character &
The pipe (|) operator to indicate OR operation.
This one is pretty fast :
select * from listofvalue l
inner join tbl on tbl.mycol like '%' || l.value || '%'
Just to add on #Lukas Eder answer.
An improvement to avoid creating tables and inserting values
(we could use select from dual and unpivot to achieve the same result "on the fly"):
with all_likes as
(select * from
(select '%val1%' like_1, '%val2%' like_2, '%val3%' like_3, '%val4%' as like_4, '%val5%' as like_5 from dual)
unpivot (
united_columns for subquery_column in ("LIKE_1", "LIKE_2", "LIKE_3", "LIKE_4", "LIKE_5"))
)
select * from tbl
where exists (select 1 from all_likes where tbl.my_col like all_likes.united_columns)
I prefer this
WHERE CASE WHEN my_col LIKE '%val1%' THEN 1
WHEN my_col LIKE '%val2%' THEN 1
WHEN my_col LIKE '%val3%' THEN 1
ELSE 0
END = 1
I'm not saying it's optimal but it works and it's easily understood. Most of my queries are adhoc used once so performance is generally not an issue for me.
select * from tbl
where exists (select 1 from all_likes where all_likes.value = substr(tbl.my_col,0, length(tbl.my_col)))
You can put your values in ODCIVARCHAR2LIST and then join it as a regular table.
select tabl1.* FROM tabl1 LEFT JOIN
(select column_value txt from table(sys.ODCIVARCHAR2LIST
('%val1%','%val2%','%val3%')
)) Vals ON tabl1.column LIKE Vals.txt WHERE Vals.txt IS NOT NULL
You don't need a collection type as mentioned in https://stackoverflow.com/a/6074261/802058. Just use an subquery:
SELECT *
FROM tbl t
WHERE EXISTS (
SELECT 1
FROM (
SELECT 'val1%' AS val FROM dual
UNION ALL
SELECT 'val2%' AS val FROM dual
-- ...
-- or simply use an subquery here
)
WHERE t.my_col LIKE val
)