SQL query to find matches - sql

I have written a query to provide matches with the same DB and it's giving me expected results except that I don't get few part of it. Below is the query :
select f.name, f.id, f.industry, d.name, d.id, d.industry
from product_table f, product_table d
where (f.name like '%' || d.name || '%') and
(f.industrylike '%' || d.industry|| '%') and
I know by providing this it's actually looking for matches between the 2 columns :
(..... like '%' || ..... || '%')
But what does each part of it do exactly and what does it mean?

This query is executing a self-join (here, a cross self-join) in which we query two instances of the same table for some purpose. In this case it looks like some form of data quality exercise, where we suspect we might have almost duplicate records. That is, we think we have records for the same combination of (product name and industry). The use of wild cards will identify records where the value of one column is wholly embedded in another column: for instance '%STACK%' matches 'META STACKOVERFLOW'.
The posted version has a potential flaw, in that if there are two records with an exact match you will get two hits (one for F:D, one for D:F). You can finagle that by adding a filter on id
select f.name, f.id, f.industry,
d.name, d.id, d.industry
from product_table f, product_table d
where (f.name like '%' || d.name || '%')
and (f.industrylike '%' || d.industry|| '%')
and ( ( f.name = d.name
and f.industry = d.industry
and f.id < d.id )
or f.name != d.name
or f.industry != d.industry
)
The double vertical bar (more commonly known as a pipe) is the concatenation operator. It is used for joining strings together. (Many programming languages use + but Oracle reserves that strictly for arithmetic on numbers.)
not so much clear on why we put it before and after only the second column : f.name like '%' || d.name || '%'
In this case, the query is concatenating a wild card. Given this value for f.name = 'XYZ' , we would get matches for '%' || d.name || '%' on:
'1XYZ1'
'11XYZ11'
'11XYZ'
'XYZ1'
'XYZ' <---- matching same record
We don't need to wrap f.name in wildcard operators because the query is a self-join so all the values of name will appear on the left hand side of the filter. When f.name = '1XYZ1' it match for '%' || d.name || '%' on:
'1XYZ1' <---- matching same record
'XYZ1'
'XYZ'
So you're going to get multiple hits already. Embedding both sides of the filter in wildcards will only generate more noisy duplicates.

Related

How do I use a like in a join with SQL92?

I'm trying to use a like in a join statement. My reason I have one field that is a varchar and another that is also a varchar however, the second field is a comma separated list.
SELECT c.Class, c.SubClass, c.Value, c.Pairs, c.Description, prodt.ProductType, p.Type
FROM pub.table1 c
INNER JOIN pub.Table2 p ON c.SubClass like CONCAT(CONCAT('%',p.Type),'%')
INNER JOIN pub.Table3 prodt ON p.ProdType = prodt.ProductType
WHERE c.Class = 'whatever'
In this case p.Type is the comma separated list and the SubClass is the normal varchar.
I have tried using a few different things already. I would have thought the below worked, but it did not.
INNER JOIN pub.Table2 p ON c.SubClass like '%' + p.Type +'%'
Is there a way to do this or do i have do something different like another select with SQL92? Thank you in advance!
From your explanation, you need to swap the LIKE since p.Type contains the list of values.
p.Type like '%' + c.SubClass + '%'
Your code:
SELECT c.Class, c.SubClass, c.Value, c.Pairs, c.Description, prodt.ProductType, p.Type
FROM pub.table1 c
INNER JOIN pub.Table2 p ON p.Type like '%' || c.SubClass || '% '
INNER JOIN pub.Table3 prodt ON p.ProdType = prodt.ProductType
WHERE c.Class = 'whatever'
Here's a sample test to show this:
select 'matched'
where 'abc' like '%abc,def,ghi,jkl%'
select 'matched'
where 'abc,def,ghi,jkl' like '%abc%'
I am guessing that you want something like this:
ON ',' || p.type || ',' like '%,' || c.subsclass || ',%'
|| has been the string concatenation operator in ISO/ANSI SQL for a long time. To be honest, I don't remember if the adoption of this operator dates back 26 years.

PostgreSQL - join tables using pattern matching

I have two tables and need to join them using two columns that are similar.
The first table is called articles has a column called 'slug' with slug lines for articles, ex: 'trump-fails-yet-again.'
The second table is called log and has a column called path with the url path for the articles, ex: '/articles/trump-fails-yet-again/'
Here is my search query:
"SELECT articles.title, count(*) as num FROM articles, log WHERE articles.slug LIKE CONCAT('%',log.path) GROUP BY articles.title;"
This returns nothing but brackets, []
I have also tried:
"SELECT articles.title, count(*) as num FROM articles JOIN log ON articles.slug SIMILAR TO CONCAT('%',log.path) GROUP BY articles.title;"
That returns a DataError: invalid regular expression: quantifier operand invalid
Any help is greatly appreciated!
try this:
` select articles.title, count(*) as views from articles
join log on articles.slug ~~ ('%' || articles.slug || '%')
group by articles.title;`
You have a slash at the end of the path. How about this?
SELECT a.title, count(*) as num
FROM articles a JOIN
log l
ON a.path LIKE '%' || l.slug || '%'
GROUP BY a.title;
You should also learn to use proper, explicit JOIN syntax. Never use commas in the FROM clause.
Because there is a 1:1 function with this you can do this
SELECT articles.title, count(*) as num
FROM articles
JOIN log ON articles.slug = '/articles/' || articles.slug || '/'
GROUP BY articles.title;
Or even better
CREATE FUNCTION slug_to_article_path( slug text )
RETURNS text AS
$$
SELECT '/articles/' || slug || '/';
$$ LANGUAGE sql
IMMUTABLE;
SELECT articles.title, count(*) as num
FROM articles
JOIN log ON articles.slug = slug_to_article_path(articles.slug)
GROUP BY articles.title;

SQL join statement on 2 tables

i am creating a join between two tables Product_Tree and Product on the two columns Model_Number and Manufacturer to return the matching columns.
Here are the 2 tables:
Table : Product_Tree
Col1,Col2,Model_Number,Col3,Manufacturer,Col4
111111,Pepsi,aaa,111111,aaa,description
222222,Miranda,bbb,222222,bbb,'description
333333,Cola,bbb,333333,bbb,description
Table : Product
Model_Number,Manufacturer
a,a
b,b
c,c
d,d
Here is the query:
SELECT Product_Tree.col0,Product_Tree.col1,Product_Tree.col2,Product_Tree.col3,Product_Tree.Model_Number,Product_Tree.Manufacturer
FROM Product_Tree
JOIN Product ON Product.model_number LIKE ''''%''''Product_Tree.MODEL_NUMBER''''%''''
AND Product.manufacturer LIKE ''''%''Product_Tree.MANUFACTURER''''%'''';
I am getting this error:
ORA-00911: invalid character
You'll need to use a concatenation operator to concatenate your % wildcard to your column product_tree.manufacturer:
SELECT Product_Tree.col0,Product_Tree.col1,Product_Tree.col2,Product_Tree.col3,Product_Tree.Model_Number,Product_Tree.Manufacturer
FROM Product_Tree
JOIN Product ON Product.model_number LIKE '%' || Product_Tree.MODEL_NUMBER || '%'
AND Product.manufacturer LIKE '%' || Product_Tree.MANUFACTURER || '%';
I'm guessing that this query is inside a script and is quoted using single quotes ' which is why you have single quotes all over the place in here. If that's the case then your quoted SQL statement would be:
SELECT Product_Tree.col0,Product_Tree.col1,Product_Tree.col2,Product_Tree.col3,Product_Tree.Model_Number,Product_Tree.Manufacturer
FROM Product_Tree
JOIN Product ON Product.model_number LIKE ''%'' || Product_Tree.MODEL_NUMBER || ''%''
AND Product.manufacturer LIKE ''%'' || Product_Tree.MANUFACTURER || ''%'';

change the left join query based on predefined filter-clause(condition)

I have a fine grained SQL query
SELECT A.Form_Id,
B.CONTAINER_ID,
A.FORM_DESC,
A.FORM_TITLE,
A.LAYOUT,
A.TOTAL_COLUMNS,
COUNT (*) Over () AS Total_Rows
,ROW_NUMBER () OVER ( ORDER BY CONTAINER_ID ASC ) ROWNM
FROM FORM_DEFINITION A
LEFT JOIN
(SELECT CONTAINER_ID,FORM_ID FROM FORM_CONTAINER_DEFINITION
) B
On A.Form_Id = B.Form_Id
WHERE UPPER(TRIM(A.FORM_ID)) LIKE '%' || UPPER(TRIM('FORM2')) || '%'
Its working fine in SQL Developer but we are using our own framework. It's adding a dynamic filter-clause (where condition) for every query like this
AND UPPER(TRIM(A.FORM_ID)) LIKE '%' || UPPER(TRIM('FORM2')) || '%'
but as I changed it as
WHERE UPPER(TRIM(A.FORM_ID)) LIKE '%' || UPPER(TRIM('FORM2')) || '%'
I should not change the filter clause. Could you please suggest how can I modify the left-join query to use the pre defined filter-clause.

PSQL: Concatenate a value with LIKE in stored procedure

I am trying to concatenate a variable using the LIKE statement and for some reason it only finds values where the word to search is at the end of the text variable.
I am using PostgreSQL 8.4 and this is stored in a function (stored procedure)
considering in this example:
a.key1 is "HELLO"
a_text is "I SAY HELLO TO THE WORLD"
Code:
SELECT count(1), a.key1, a.active, a.campkeydbid
FROM campkeydb a
WHERE a_text LIKE '%'|| a.key1 ||'%'
GROUP BY a.key1, a.active, a.campkeydbid
INTO a_count, a_campaignkey, a_active, a_campkeydbid;
In this stored procedure it will NOT return the values; it will not find the word "HELLO"?
It will ONLY return the values if a_text contains "I SAY HELLO"
Does anyone knows what I am doing wrong? It seems that it is correct as I am concatenating a % on both sides of the variable a.key1.
You can use the position string function instead of like. Here is a sample query using a table of the regions and departments of France. I'll try and find all the departments that have a name that includes the region name.
select r.name as region, d.name as deprtment, position( r.name in d.name) as pos
from regions r
join departments d on d.region = r.code
where position( r.name in d.name) != 0
and r.name != d.name;
The results are
region department pos
"Corse" "Corse-du-Sud" 1
"Corse" "Haute-Corse" 7
I added the pos column to show that strings are indexed from 1, not 0. I tried the same thing with 'like' (both queries have the same query plan and should give the same performance):
select r.name as region, d.name as deprtment, position( r.name in d.name) as pos
from regions r
join departments d on d.region = r.code
where d.name like '%' || r.name || '%'
and r.name != d.name;
I like the appearance of the first query, but they both do the same thing. So your logic seems correct, so this seems like a typo in a string.