Optimize query performance for millions of rows - sql

I have this PostgreSQL table for storing words:
CREATE TABLE IF NOT EXISTS words
(
id bigint NOT NULL DEFAULT nextval('processed_words_id_seq'::regclass),
keyword character varying(300) COLLATE pg_catalog."default",
);
insert into words (keyword)
VALUES ('while swam is interesting');
CREATE TABLE IF NOT EXISTS trademarks
(
id bigint NOT NULL DEFAULT nextval('trademarks_id_seq'::regclass),
trademark character varying(300) COLLATE pg_catalog."default",
);
Into table trademarks I will have thousands of registered trademarks names.
I want to compare words stored into words table keyword, do they match not only for a words but also for word which is in a group of words. For example:
I have a keyword while swam is interesting stored into words.keyword. I also have a trademark swam located in trademarks.trademark I have a word match, so I want to detect this using SQL.
I tried this:
select
w.id,
w.keyword,
t.trademark
from words w
join trademarks t on t.trademark =
any(string_to_array(w.keyword, ' '))
where 'all' = any(string_to_array(w.keyword, ' '))
The SQL query is implemented properly but execution time is too much. The exception time for table with 30 million of records is 10 seconds. Is there some way to speedup the execution time?

You can create an index on words.keyword which supports the queries with regular expressions :
CREATE INDEX IF NOT EXISTS words_keyword ON words USING GIN (keyword gin_trgm_ops);
This index should be used by a query like :
select
w.id,
w.keyword,
t.trademark
from words w
inner join trademarks t
on w.keyword ~ t.trademark
where w.keyword ~ 'all'
This should be checked with EXPLAIN ANALYSE for this query.

Related

How to compare a word is it found in table rows?

I have this PostgreSQL table for storing words:
CREATE TABLE IF NOT EXISTS words
(
id bigint NOT NULL DEFAULT nextval('processed_words_id_seq'::regclass),
keyword character varying(300) COLLATE pg_catalog."default",
);
insert into words (keyword)
VALUES ('while swam is interesting');
CREATE TABLE IF NOT EXISTS trademarks
(
id bigint NOT NULL DEFAULT nextval('trademarks_id_seq'::regclass),
trademark character varying(300) COLLATE pg_catalog."default",
);
Into table trademarks I will have thousands of registered trademarks names.
I want to compare words stored into words table keyword, do they match not only for a words but also for word which is in a group of words. For example:
I have a keyword while swam is interesting stored into words.keyword. I also have a trademark swam located in trademarks.trademark I have a word match, so I want to detect this using SQL.
I tried this:
select
w.id, w.keyword, t.trademark
from
words w
inner join
trademarks t
on
t.trademark ilike '%'||w.keyword||'%'
where
w.keyword = 'all';
But I get this result:
As you can see, the result is wrong. I need to match complete words, not parts of the words. How I can fix this?
Split the words field on space to create an array, then use any twice: once for the where (ie "any words list containing 'all') and once for the match (ie any trademark that appears in a words list that also contains 'all'):
select
w.id,
w.keyword,
t.trademark
from words w
join trademarks t on t.trademark =
any(string_to_array(w.keyword, ' '))
where 'all' = any(string_to_array(w.keyword, ' '))
This matches the whole word, so no need to worry about word boundaries etc.
You can tidy this up and improve performance with a CTE that unnests the words:
with w as (
select
id,
keyword,
unnest(string_to_array(keyword, ' ')) as term
from words
where 'all' = any(string_to_array(keyword, ' '))
)
select
w.id,
w.keyword,
t.trademark
from w
join trademarks t on t.trademark = w.term
Add indexes to make this fly:
create index words_keywords_index on words using gin (keyword);
create index trademarks_trademark_index on trademarks(trademarks);
gin indexes index nested values.
One option is to use regular expression as richyen mentions.
If you need to use sql what you have got is correct, except you need to look for a pattern that uses the conditions '%SPACE'||w.keyword||'SPACE%'
select w.id, w.keyword, t.trademark
from words w
join trademarks t
on w.keyword like '% '||t.trademark||' %'
where t.trademark = 'all'
union all /*condition when there are only single words without a space*/
select w.id, w.keyword, t.trademark
from words w
join trademarks t
on w.keyword =t.trademark
where t.trademark = 'all';
https://dbfiddle.uk/k5bYAPUh
You can split the keyword into a array then later compare if the trademark is inside of the splited array.
select
w.id, w.keyword, t.trademark
from
words w
inner join
trademarks t
on
t.trademark = ANY( string_to_array(REPLACE(w.keyword , ' ' , '~^~' ),'~^~' , '') )
where
w.keyword = 'all';
trademark and keywords overlaps each other example:
SELECT
ARRAY['swam'] && string_to_array(REPLACE('while swam is interesting',' ','~^~'), '~^~' ,'');
keywords is inside of trademark example
SELECT
ARRAY['swam'] #> string_to_array(REPLACE('while swam is interesting',' ','~^~'), '~^~' ,'');

SQL: How can I do a keyword search using words stored in a separate search table?

I am doing a keyword search in a SQL table where I want to search for a set of keywords word1, word2, ... , wordn and flag instances where these keywords are found in a new column. Assuming that I am looking for these keywords in a variable [Description] in #TABLE, the query I am using looks like this:
SELECT *
, (CASE WHEN [Description] LIKE '%word1%' THEN 'word1'
WHEN [Description] LIKE '%word2%' THEN 'word2'
...
WHEN [Description] LIKE '%wordn%' THEN 'wordn'
END) as Keywords
INTO #RESULTS_TABLE
FROM #TABLE
Now, the problem with this method is that the keywords I am searching are hard-coded into the code, which makes it inconvenient if I want to alter the set of keywords that I am searching for. Instead of doing this, I would like to have the keywords I will search in some separate table #KEYWORDS as a variable [words] and then reference all the keywords listed in that table (under that variable name) for the search. This would allow me to alter the search table and then re-run the select query on the updated search table, without having to change the select code.
Question: Assuming I have a table #KEYWORDS populated with the keywords I want to search, what is the best way to write the keyword search query so that it gets the keywords from the table rather than from hardcoded terms?
My first choice would be a temp table:
create temporary table words_table (
words varchar
);
insert into words_table values
(word1),
(word2), -- etc for the rest of your rows
Then you can:
select t.[Description]
from
your_table t
join words_table wt on
ON t.[Description] LIKE CONCAT('%',wt.words,'%')
or
select t.[Description]
from
your_table t
join words_table wt
ON t.[Description] LIKE '%' + wt.words +'%'

Comparing email records between two tables

I am trying to compare a table of users from one DB
and check whether that Email Address exists in our Dynamics CRM user base
I have the table User_V from that other interface,
and I created the following query, but it only gives me the results which are useremail = NULL
and I am only trying to find those who exist in user_v and not in [SystemUserBase]
select *
from [HSchool].[dbo].[user_v] as u
where not exists (select InternalEMailAddress
from [HrProd_MSCRM].[dbo].[SystemUserBase] as inn
where ltrim (rtrim (LOWER (u.useremail))) collate database_default <> ltrim (rtrim (LOWER(inn.InternalEMailAddress))))
I hope I am as clear as possible,
thank you in advance!
You are doing a double negative.
select *
from [HSchool].[dbo].[user_v] as u
where not exists (select InternalEMailAddress
from [HrProd_MSCRM].[dbo].[SystemUserBase] as inn
where ltrim (rtrim (LOWER (u.useremail))) collate database_default <> ltrim (rtrim (LOWER(inn.InternalEMailAddress)))
This query finds records that does not match.
Then there is a "Not Exist" above that.
Amending the "Where" clause in the sub-query to an equate should work

Search for a word in the column string and list those words

Have two tables, table 1 with columns W_ID and word. Table 2 with column N_ID and note. Have to list all the NID where words found in table 1 word column contains in Note column (easy part) and also list those words in another column without duplicating the N_ID. Which means using STUFF to concatenate all the words found in Note column for that particular N_ID. I tried using
FULL TEXT INDEX using CONTAIN
But it only allows to search for one word at a time. Any suggestions how I can use a while loop to achieve this.
If there is a maximum number of words you want displayed for N_ID, you can pivot this. You could have them in a single column by concatenating them, but I would recommend against that. Here is a pivot that supports up to 4 words per N_ID. You can adjust it as needed. You can view the SQL Fiddle for this here.
SELECT
n_id,
[1] AS word_1,
[2] AS word_2,
[3] AS word_3,
[4] AS word_4
FROM (
SELECT
n_id,
word,
ROW_NUMBER() OVER (PARTITION BY n_id ORDER BY word) AS rn
FROM tbl2
JOIN tbl1 ON
tbl2.note LIKE '%'+tbl1.word+'[ ,.?!]%'
) AS source_table
PIVOT (
MAX(word)
FOR rn IN ([1],[2],[3],[4])
) AS pivot_table
*updated the join to prevent look for a space or punctuation to declare the end of a word.
You can join your tables together based on a postive result from the charindex function.
In SQL 2017 you can run:
SELECT n_id, string_agg(word)
FROM words
inner join notes on 0 < charindex(words.word, notes.note);
Prior to SQL 2017, there is no string_agg so you'll need to use stuff, which is trickier:
select
stuff((
SELECT ', ' + word
FROM words
where 0 < charindex(words.word, notes.note)
FOR XML PATH('')
), 1, 2, '')
from notes;
I used the following schema:
CREATE table WORDS
(W_ID int identity primary key
,word varchar(100)
);
CREATE table notes
(N_ID int identity primary key
,note varchar(1000)
);
insert into words (word) values
('No'),('Nope'),('Nah');
insert into notes (note) values
('I am not going to do this. Nah!!!')
,('It is OK.');

Indexing a LEFT operation in SQL Server

I have a database table of E.164 calling codes (e.g. 1 for USA/Canada, 44 for the United Kingdom, etc). Here's the table design:
CREATE TABLE CountryCosts (
CallingCode varchar(5) PK NOT NULL
IsFree bit NOT NULL
)
I have a scalar function which accepts a full phone number and indicates if any country-code in the table matches the number (simply by checking if the number begins with any CountryCode in the table) and indiciates if IsFree is true or not.
SELECT
TOP 1
CallingCode,
GratisPermitted
FROM
CountryCosts
WHERE
GratisPermitted = 1
AND
LEFT( #recipient, LEN( CallingCode ) ) = CallingCode
(Variations exist, including using SELECT COUNT(1) inside an SELECT CASE WHEN EXISTS and using #recipient LIKE CONCAT( CallingCode, '%' ) as the predicate)
The Actual Execution Plan reports the main expense is a Clustered Index Scan of the Clustered PK index.
I want to know if there's any way I can improve the performance by adding another index, is there any index that works on varchar columns that SQL Server would use to optimize the LEFT predicate?