How to compare a word is it found in table rows?

How to compare a word is it found in table rows? - sql

I have this PostgreSQL table for storing words:
CREATE TABLE IF NOT EXISTS words
(
id bigint NOT NULL DEFAULT nextval('processed_words_id_seq'::regclass),
keyword character varying(300) COLLATE pg_catalog."default",
);
insert into words (keyword)
VALUES ('while swam is interesting');
CREATE TABLE IF NOT EXISTS trademarks
(
id bigint NOT NULL DEFAULT nextval('trademarks_id_seq'::regclass),
trademark character varying(300) COLLATE pg_catalog."default",
);
Into table trademarks I will have thousands of registered trademarks names.
I want to compare words stored into words table keyword, do they match not only for a words but also for word which is in a group of words. For example:
I have a keyword while swam is interesting stored into words.keyword. I also have a trademark swam located in trademarks.trademark I have a word match, so I want to detect this using SQL.
I tried this:
select
w.id, w.keyword, t.trademark
from
words w
inner join
trademarks t
on
t.trademark ilike '%'||w.keyword||'%'
where
w.keyword = 'all';
But I get this result:
As you can see, the result is wrong. I need to match complete words, not parts of the words. How I can fix this?

Split the words field on space to create an array, then use any twice: once for the where (ie "any words list containing 'all') and once for the match (ie any trademark that appears in a words list that also contains 'all'):
select
w.id,
w.keyword,
t.trademark
from words w
join trademarks t on t.trademark =
any(string_to_array(w.keyword, ' '))
where 'all' = any(string_to_array(w.keyword, ' '))
This matches the whole word, so no need to worry about word boundaries etc.
You can tidy this up and improve performance with a CTE that unnests the words:
with w as (
select
id,
keyword,
unnest(string_to_array(keyword, ' ')) as term
from words
where 'all' = any(string_to_array(keyword, ' '))
)
select
w.id,
w.keyword,
t.trademark
from w
join trademarks t on t.trademark = w.term
Add indexes to make this fly:
create index words_keywords_index on words using gin (keyword);
create index trademarks_trademark_index on trademarks(trademarks);
gin indexes index nested values.

One option is to use regular expression as richyen mentions.
If you need to use sql what you have got is correct, except you need to look for a pattern that uses the conditions '%SPACE'||w.keyword||'SPACE%'
select w.id, w.keyword, t.trademark
from words w
join trademarks t
on w.keyword like '% '||t.trademark||' %'
where t.trademark = 'all'
union all /*condition when there are only single words without a space*/
select w.id, w.keyword, t.trademark
from words w
join trademarks t
on w.keyword =t.trademark
where t.trademark = 'all';
https://dbfiddle.uk/k5bYAPUh

You can split the keyword into a array then later compare if the trademark is inside of the splited array.
select
w.id, w.keyword, t.trademark
from
words w
inner join
trademarks t
on
t.trademark = ANY( string_to_array(REPLACE(w.keyword , ' ' , '~^~' ),'~^~' , '') )
where
w.keyword = 'all';
trademark and keywords overlaps each other example:
SELECT
ARRAY['swam'] && string_to_array(REPLACE('while swam is interesting',' ','~^~'), '~^~' ,'');
keywords is inside of trademark example
SELECT
ARRAY['swam'] #> string_to_array(REPLACE('while swam is interesting',' ','~^~'), '~^~' ,'');

Related

Optimize query performance for millions of rows

I have this PostgreSQL table for storing words:
CREATE TABLE IF NOT EXISTS words
(
id bigint NOT NULL DEFAULT nextval('processed_words_id_seq'::regclass),
keyword character varying(300) COLLATE pg_catalog."default",
);
insert into words (keyword)
VALUES ('while swam is interesting');
CREATE TABLE IF NOT EXISTS trademarks
(
id bigint NOT NULL DEFAULT nextval('trademarks_id_seq'::regclass),
trademark character varying(300) COLLATE pg_catalog."default",
);
Into table trademarks I will have thousands of registered trademarks names.
I want to compare words stored into words table keyword, do they match not only for a words but also for word which is in a group of words. For example:
I have a keyword while swam is interesting stored into words.keyword. I also have a trademark swam located in trademarks.trademark I have a word match, so I want to detect this using SQL.
I tried this:
select
w.id,
w.keyword,
t.trademark
from words w
join trademarks t on t.trademark =
any(string_to_array(w.keyword, ' '))
where 'all' = any(string_to_array(w.keyword, ' '))
The SQL query is implemented properly but execution time is too much. The exception time for table with 30 million of records is 10 seconds. Is there some way to speedup the execution time?

You can create an index on words.keyword which supports the queries with regular expressions :
CREATE INDEX IF NOT EXISTS words_keyword ON words USING GIN (keyword gin_trgm_ops);
This index should be used by a query like :
select
w.id,
w.keyword,
t.trademark
from words w
inner join trademarks t
on w.keyword ~ t.trademark
where w.keyword ~ 'all'
This should be checked with EXPLAIN ANALYSE for this query.

select rows that contain non alphanumeric

I need to select the rows that contain non alphanumeric values in names. I got part of the solution, but, I am also getting rows with spaces, hyphens and single quotes. So, I need to pull the rows with that are alphanumeric and also not pull rows with space and - and '.
Did some research online
select *
from EMP
where NAME like '%[^a-z,1-9]%'
I don't want to get rows with - or spaces or '

Here is a way using number table to find bad chars:
create table #GoodLetters(letter char(1))
insert into #GoodLetters
values
('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')
,('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9')
,('-'),(' '),('''')
--select * from #GoodLetters
select a._key, a.name,substring(a.name, v.number+1, 1)
from (select 'Your Key' _key,name from emp ) a
join master..spt_values v on v.number < len(a.name)
left join #GoodLetters GL on substring(a.name, v.number+1, 1)=GL.letter
where v.type = 'P'
and GL.letter is null
drop table #GoodLetters

I think it should go like that:
select *
from EMP
where (NAME NOT LIKE '%-%' OR NOT LIKE '%[ ]%' OR NOT LIKE '%\'%' ESCAPE '\');
'%[ ]%' --> gives you one or more spaces
ESCAPE --> requiered for special character

you can also add ' , - and space chars to the regex.
select *
from EMP
where NAME like '%[^a-z,1-9''- ]%'

Hyphen is a little tricky. Because it is used for character classes, it needs to be the first character in the class. So:
where name like '%[^- a-z,1-9]%'

Search for a word in the column string and list those words

Have two tables, table 1 with columns W_ID and word. Table 2 with column N_ID and note. Have to list all the NID where words found in table 1 word column contains in Note column (easy part) and also list those words in another column without duplicating the N_ID. Which means using STUFF to concatenate all the words found in Note column for that particular N_ID. I tried using
FULL TEXT INDEX using CONTAIN
But it only allows to search for one word at a time. Any suggestions how I can use a while loop to achieve this.

If there is a maximum number of words you want displayed for N_ID, you can pivot this. You could have them in a single column by concatenating them, but I would recommend against that. Here is a pivot that supports up to 4 words per N_ID. You can adjust it as needed. You can view the SQL Fiddle for this here.
SELECT
n_id,
[1] AS word_1,
[2] AS word_2,
[3] AS word_3,
[4] AS word_4
FROM (
SELECT
n_id,
word,
ROW_NUMBER() OVER (PARTITION BY n_id ORDER BY word) AS rn
FROM tbl2
JOIN tbl1 ON
tbl2.note LIKE '%'+tbl1.word+'[ ,.?!]%'
) AS source_table
PIVOT (
MAX(word)
FOR rn IN ([1],[2],[3],[4])
) AS pivot_table
*updated the join to prevent look for a space or punctuation to declare the end of a word.

You can join your tables together based on a postive result from the charindex function.
In SQL 2017 you can run:
SELECT n_id, string_agg(word)
FROM words
inner join notes on 0 < charindex(words.word, notes.note);
Prior to SQL 2017, there is no string_agg so you'll need to use stuff, which is trickier:
select
stuff((
SELECT ', ' + word
FROM words
where 0 < charindex(words.word, notes.note)
FOR XML PATH('')
), 1, 2, '')
from notes;
I used the following schema:
CREATE table WORDS
(W_ID int identity primary key
,word varchar(100)
);
CREATE table notes
(N_ID int identity primary key
,note varchar(1000)
);
insert into words (word) values
('No'),('Nope'),('Nah');
insert into notes (note) values
('I am not going to do this. Nah!!!')
,('It is OK.');

PostgreSQL: Check if each item in array is contained by a larger string

I have an array of strings in PostgreSQL:
SELECT ARRAY['dog', 'cat', 'mouse'];
And I have a large paragraph:
Dogs and cats have a range of interactions. The natural instincts of each species lead towards antagonistic interactions, though individual animals can have non-aggressive relationships with each other, particularly under conditions where humans have socialized non-aggressive behaviors.
The generally aggressive interactions between the species have been noted in cultural expressions.
For each item in the array, I want to check if it appears in my large paragraph string. I know for any one string, I could do the following:
SELECT paragraph_text ILIKE '%dog%';
But is there a way to simultaneously check every string in the array (for an arbitrary number of array elements) without resorting to plpgsql?

I belive you want something like this (assuming paragraph_text is column from table named table):
SELECT
paragraph_text,
sub.word,
paragraph_text ILIKE '%' || sub.word || '%' as is_word_in_text
FROM
table1 CROSS JOIN (
SELECT unnest(ARRAY['dog', 'cat', 'mouse']) as word
) as sub;
Function unnest(array) takes creates table of record from array values. The you can do CROSS JOIN which means all rows from table1 are combines with all rows from that unnest-table.
If paragraph_text is some kind of static value (not from table) you can do just:
SELECT
paragraph_text,
sub.word,
paragraph_text ILIKE '%' || sub.word || '%' as is_word_in_text
FROM (
SELECT unnest(ARRAY['dog', 'cat', 'mouse']) as word
) as sub;

This solution will work only for postgres 8.4 and above as unrest is not available for earlier versions.
drop table if exists t;
create temp table t (col1 text, search_terms text[] );
insert into t values
('postgress is awesome', array['postgres', 'is', 'bad']),
('i like open source', array['open', 'code', 'i']),
('sql is easy', array['mysql']);
drop table if exists t1;
select *, unnest(search_terms) as search_term into temp t1 from t;
-- depending on how you like to do pattern matching.
-- it will look for term not whole words
select *, position(search_term in col1) from t1;
-- This will match only whole words.
select *, string_to_array(col1, E' ')#>string_to_array(search_term, E' ') from t1;
Basically, you need to flatten array of search_terms into one column and then match long string with each search term row wise.

Can I use full text search a row in a table where the condition is an array of values from a select query?

I want to create a view and I want to do a full text search of a row using a set of keywords. These keywords exist in a table in the database.
So is it possible to do something like below where I can use a select statement to dynamically determine which keywords to filter on.
SELECT * FROM table1
WHERE CONTAINS(Row1,
'[SELECT k.Name FROM KeywordCategory kc
inner join Keyword k
on kc.KeywordId = k.Id
where kc.Category in ('BrandA', 'BrandB', 'BrandC')]')

The CONTAINS search condition cannot reference other tables, but you can get around this limitation by constructing a variable from the keywords.
-- build search condition, example: '"keyword1" OR "keyword2" OR "keyword3"'
declare #SearchCondition nvarchar(4000)
SELECT #SearchCondition = IsNull(#SearchCondition + ' OR ', '') + '"' + k.Name + '"'
FROM KeywordCategory kc
inner join Keyword k on kc.KeywordId = k.Id
where kc.Category in ('BrandA', 'BrandB', 'BrandC')
SELECT *
FROM table1
WHERE Contains(*, #SearchCondition)
You won't be able to do this in a view though, so you would have to write it as a function or stored procedure.

Hi Yes you can use contains with the full text search indexed column. If you want to have two words one near the other you can use contains 'keyboard1' near 'keyboard2' etc

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to compare a word is it found in table rows? - sql

Related

Optimize query performance for millions of rows

select rows that contain non alphanumeric

Search for a word in the column string and list those words

PostgreSQL: Check if each item in array is contained by a larger string

Can I use full text search a row in a table where the condition is an array of values from a select query?

Categories

Resources