SQL: join on best string match

SQL: join on best string match - sql

I have a table with a string "identifier", that I want to match on a "group" table, finding the "best match" (that is: the match that contains the longer part of string).
For instance: assume that I have two groups: "19" and "19.10". What I want is:
item "19.10.1" is part of the group "19.10"
item "19.10.xxxx" is part of the group "19.10"
item "19.20" is part of the group "19"
What I got till now is something like this:
SELECT * FROM Items i
LEFT JOIN MyGroup g ON g.Prefix = SUBSTRING(i.ItemID,1,LEN(g.Prefix))
that matches all the string, but I don't know how can I filter the "best match" (i.e. the longer match) from my results.
By the way, I'm working on SQL Server 2005.
Example SQL Fiddle:
http://sqlfiddle.com/#!3/9a9d8/1

Try this one.
SELECT t.ItemID, g1.prefix, g1.GroupDesc
FROM Items i1
LEFT JOIN MyGroup g1 ON g1.Prefix = SUBSTRING(i1.ItemID,1,LEN(g1.Prefix))
RIGHT JOIN
(
SELECT i2.ItemID, max(len(g2.prefix)) AS ln
FROM Items i2
LEFT JOIN MyGroup g2 ON g2.Prefix = SUBSTRING(i2.ItemID,1,LEN(g2.Prefix))
GROUP BY i2.ItemID
) t ON i1.ItemID = t.ItemID AND len(g1.prefix) = t.ln
You can test it on this test data:
CREATE TABLE dbo.MyGroup
(GroupDesc VARCHAR(100),
Prefix VARCHAR(10) );
CREATE TABLE dbo.Items
(ItemDesc VARCHAR(100),
ItemID VARCHAR(10) );
INSERT INTO MyGroup (GroupDesc, Prefix)
VALUES ( 'Group A', '19' );
INSERT INTO MyGroup (GroupDesc, Prefix)
VALUES ( 'Group B', '19.10' );
INSERT INTO MyGroup (GroupDesc, Prefix)
VALUES ( 'Group C', '19.10.3' );
INSERT INTO Items (ItemDesc, ItemID)
VALUES ( 'Item 1', '19.10.4' );
INSERT INTO Items (ItemDesc, ItemID)
VALUES ( 'Item 2', '19.10.3' );
INSERT INTO Items (ItemDesc, ItemID)
VALUES ( 'Item 3', '19.20' );
INSERT INTO Items (ItemDesc, ItemID)
VALUES ( 'Item 4', '44.55' );

I came up with this:
with tmp as
(
SELECT * FROM Items i
LEFT JOIN MyGroup g ON g.Prefix = SUBSTRING(i.ItemID,1,LEN(g.Prefix))
)
SELECT a.* FROM tmp a WHERE LEN(a.prefix) = (SELECT MAX(LEN(b.prefix)) FROM tmp b WHERE a.itemid = b.itemid )
Seems to work...
SQLFiddle

Related

Insert records from two tables that match

I have the following tables:
CREATE TABLE forms
(
ID INT NOT NULL,
NAME TEXT NOT NULL,
TITLE TEXT NOT NULL
);
CREATE TABLE new_forms
(
ID INT NOT NULL,
NAME TEXT NULL,
TITLE TEXT NULL
);
INSERT INTO forms VALUES (0, 'test', 'test');
INSERT INTO new_forms VALUES (0, 'new_test', NULL);
And I'm using the following query:
INSERT INTO forms(id, name, title)
SELECT
1, COALESCE(nf.name, f.name), COALESCE(nf.title, f.title)
FROM
forms f
LEFT OUTER JOIN
new_forms nf ON nf.id = f.id;
SELECT * FROM forms;
The idea is to add both rows that match to the table.
In this example this two new records should be added:
1 test test
1 new_test test
But it's only adding the last one.
I have tried with all the join and none of them worked.
Fiddle
Thanks

You are using a join in the query which will give you only 1 row. If you need 2 rows. You have to use UNION ALL clause -
INSERT INTO forms(id, name, title)
SELECT
1, COALESCE(nf.name, f.name), COALESCE(nf.title, f.title)
FROM
forms f
LEFT OUTER JOIN
new_forms nf ON nf.id = f.id
UNION ALL
SELECT
1, COALESCE(f.name, nf.name), COALESCE(nf.title, f.title)
FROM
forms f
LEFT OUTER JOIN
new_forms nf ON nf.id = f.id;

Dynamically update table with column from another table

I have a table customer like this:
CREATE TABLE tbl_customer (
id INTEGER,
name VARCHAR(16),
voucher VARCHAR(16)
);
and a voucher table like this:
CREATE TABLE tbl_voucher (
id INTEGER,
code VARCHAR(16)
);
Now imagine that the customer table always has rows with id and name filled in, however the voucher needs to be inserted periodically from the tbl_voucher table.
Important: every voucher may only be assigned to one specific customer (i.e. must be unique)
I wrote a query like this:
UPDATE tbl_customer
SET voucher = (
SELECT code
FROM tbl_voucher
WHERE code NOT IN (
SELECT voucher
FROM tbl_customer
WHERE voucher IS NOT NULL
)
LIMIT 1
)
WHERE voucher IS NULL;
However this is not working as expected, since the part that looks for an unused voucher is executed once and said voucher is then applied to every customer.
Any ideas on how I can solve this without using programming structures such as loops?
Also, some example data so you can imagine what I would like to happen:
INSERT INTO tbl_customer VALUES (1, 'Sara', 'ABC');
INSERT INTO tbl_customer VALUES (1, 'Simon', 'DEF');
INSERT INTO tbl_customer VALUES (1, 'Andy', NULL);
INSERT INTO tbl_customer VALUES (1, 'Alice', NULL);
INSERT INTO tbl_voucher VALUES (1, 'ABC');
INSERT INTO tbl_voucher VALUES (2, 'LOL');
INSERT INTO tbl_voucher VALUES (3, 'ZZZ');
INSERT INTO tbl_voucher VALUES (4, 'BBB');
INSERT INTO tbl_voucher VALUES (5, 'CCC');
After the wanted query is executed, I'd expect Andy to have the voucher LOL and Alice should get ZZZ

I am going to guess this is MySQL. The answer is that this is a pain. The following assigns the values in a select:
select c.*, v.voucher
from (select c.*, (#rnc := #rnc + 1) as rn
from tbl_customer c cross join
(select #rnc := 0) params
where c.voucher is null
) c join
(select v.*, (#rnv := #rnv + 1) as rn
from tbl_vouchers v cross join
(select #rnv := 0) params
where not exists (select 1 from tbl_customers c where c.voucher = v.voucher)
) v
on c.rn = v.rn;
You can now use this for the update:
update tbl_customer c join
(select c.*, v.voucher
from (select c.*, (#rnc := #rnc + 1) as rn
from tbl_customer c cross join
(select #rnc := 0) params
where c.voucher is null
) c join
(select v.*, (#rnv := #rnv + 1) as rn
from tbl_vouchers v cross join
(select #rnv := 0) params
where not exists (select 1 from tbl_customers c where c.voucher = v.voucher)
) v
on c.rn = v.rn
) cv
on c.id = cv.id
set c.voucher = cv.voucher;

How to UPDATE pivoted table in SQL SERVER

I have flat table which I have to join using EAN attribute with my main table and update gid (id of my main table).
id attrib value gid
1 weight 10 NULL
1 ean 123123123112 NULL
1 color blue NULL
2 weight 5 NULL
2 ean 331231313123 NULL
I was trying to pivot ean rows into column, next join on ean both tables, and for this moment everything works great.
--update SideTable
--set gid = ab_id
select gid, ab_id
from SideTable
pivot (max (value) for attrib in ([EAN],[MPN])) as b
join MainTable as c
on c.ab_ean = b.EAN
where b.EAN !='' AND c.ab_archive = '0'
When I am selecting both id columns is okey, but when I am uncomment first lines and delete select whole table is set with first gid from my main table.
It have to set my main id into all attributes where ID where ean is matched from my main table.
I am sorry for my terrible english but I hope someone can help me, with that.

The reason your update does not work is that you don't have any link between your source and target for the update, although you reference sidetable in the FROM clause, this is effectively destroyed by the PIVOT function, leaving no link back to the instance of SideTable that you are updating. Since there is no link, all rows are updated with the same value, this will be the last value encountered in the FROM.
This can be demonstrated by running the following:
DECLARE #S TABLE (ID INT, Attrib VARCHAR(50), Value VARCHAR(50), gid INT);
INSERT #S
VALUES
(1, 'weight', '10', NULL), (1, 'ean', '123123123112', NULL), (1, 'color', 'blue', NULL),
(2, 'weight', '5', NULL), (2, 'ean', '331231313123', NULL);
SELECT s.*
FROM #S AS s
PIVOT (MAX(Value) FOR attrib IN ([EAN],[MPN])) AS pvt;
You clearly have a table aliased s in the FROM clause, however because you have used pivot you cannot use SELECT s*, you get the following error:
The column prefix 's' does not match with a table name or alias name used in the query.
You haven't provided sample data for your main table, but I am about 95% certain your PIVOT is not needed, I think you can get your update using just normal JOINs:
UPDATE s
SET gid = ab_id
FROM SideTable AS s
INNER JOIN SideTable AS ean
ON ean.ID = s.ID
AND ean.attrib = 'ean'
INNER JOIN MainTable AS m
ON m.ab_EAN = ean.Value
WHERE m.ab_archive = '0'
AND m.ab_EAN != '';

As per comment to the question, you need to use update + select statement.
A standard version looks like:
UPDATE
T
SET
T.col1 = OT.col1,
T.col2 = OT.col2
FROM
Some_Table T
INNER JOIN
Other_Table OT
ON
T.id = OT.id
WHERE
T.col3 = 'cool'
As to your needs:
update a
set a.gid = p.ab_id
from SideTable As a
Inner join (
select gid, ab_id
from SideTable
pivot (max (value) for attrib in ([EAN],[MPN])) as b
join MainTable as c
on c.ab_ean = b.EAN
where b.EAN !='' AND c.ab_archive = '0') p ON a.ean = p.EAN

try and break it down a bit more like this..
update SideTable
set SideTable.gid = p.ab_id
FROM
(
select gid, ab_id
from SideTable
pivot (max (value) for attrib in ([EAN],[MPN])) as b
join MainTable as c
on c.ab_ean = b.EAN
where b.EAN !='' AND c.ab_archive = '0'
) p
WHERE p.EAN = SideTable.EAN

Update table based on comparing data from different tables and priority

Say I have the below table which holds customer data:
DECLARE #customer TABLE (ref varchar(10), RepName varchar(10), City varchar(10))
INSERT INTO #customer
SELECT 'CustomerA', 'Tom', 'London' UNION ALL
SELECT 'CustomerC', 'John', 'London'
and I have 2 other identical tables SourceA and SourceB which holds customer data as well,
I have a script which compares data among these 3 tables and inserts the details into the below table:
DECLARE #diffs TABLE (ref varchar(10), existing_value varchar(100), nev_value varchar(100), source_table varchar(100), column_name varchar(100))
INSERT INTO #diffs
SELECT 'CustomerA', 'Tom', 'Tom A', 'SourceA', 'RepName' UNION ALL
SELECT 'CustomerA', 'Tom', 'Tom Ax', 'SourceB', 'RepName' UNION ALL
SELECT 'CustomerC', 'London', 'New York', 'SourceA', 'City'
This table highlights that the rep name in our Customer table is different than sourceA and sourceB. Current value is Tom, but sourceA has the value as Tom A and sourceB has it as Tom Ax, and it also highlights the difference in city but the city is only different in sourceA.
And I use the below table to understand which source to use when I am updating the Customers table:
DECLARE #temp TABLE (column_name varchar(100), source_to_use varchar(100), source_priority int)
INSERT INTO #temp
SELECT 'RepName', 'SourceA', 1 UNION ALL
SELECT 'RepName', 'SourceB', 2 UNION ALL
SELECT 'City', 'SourceB', 1 UNION ALL
SELECT 'City', 'SourceA', 2
Based on this I need to update the rep name with Tom A and city with New York based on the source_priority. Before writing the update statement I have tried to get the right rows using this:
SELECT *
FROM #diffs d
LEFT OUTER JOIN #temp t ON t.column_name = d.column_name and d.source_table = t.source_to_use
AND source_priority = CASE WHEN EXISTS
(
SELECT source_priority
FROM #temp x
Where source_priority = 1 AND d.source_table = x.source_to_use
) THEN 1 ELSE 2 END
But this does not give me what I want, is there anyway of querying these tables and update the customers table with the differences based on priority?
Thanks

One way that I can think to do this. Flatten the three tables into a single table with different columns. Then use cross apply to choose the value from #temp. Here is an example that assumes that customers has rows for all customers:
select c.ref, repname.repname
from (select c.*, ca.repname as repname_a, ca.city = city_a,
cb.repname as repname_b, cb.city as city_b
from customers c left join
customersa ca
on c.ref = ca.ref left join
customersb cb
on c.ref = cb.ref
) c cross apply
(select top 1
(case when source_to_use = 'source_a' and repname_a is not null then name_a
when source_to_use = 'source_b' and repname_b is not null then repname_b
when source_to_use = 'source' and repname is not null then repname
end) as repname
from #temp t
where t.column_name = 'repname'
order by priority
) repname;

I'd go for a CTE. Something like this should work:
WITH cte AS (
SELECT d.*, t.source_priority
FROM #diffs d
LEFT OUTER JOIN #temp t
ON t.column_name = d.column_name
AND d.source_table = t.source_to_use
), mins AS (
SELECT ref, column_name, MIN(source_priority) source_priority
FROM cte
GROUP BY ref, column_name
)
SELECT cte.ref, cte.column_name, cte.new_value
FROM cte INNER JOIN mins
ON cte.ref = mins.ref
AND cte.column_name = mins.column_name
AND cte.source_priority = mins.source_priority

realization of an algorithm in SQL query

I have a database which has 2 tables:
CREATE TABLE RecipeDB (
RecipeID INT PRIMARY KEY AUTO_INCREMENT,
Name VARCHAR,
Recipe VARCHAR,
Origin VARCHAR,
Category VARCHAR,
Favoured BOOL);
CREATE TABLE IngredientDB (
RecipeID REFERENCES RecipeDB.RecipeID,
Ingredient VARCHAR,
Quantity VARCHAR);
(One-to-many relation between Recipe and Ingredients)
I also have an actionscript, in which I have ingArr:Array of ingredient strings.
Now, I would like to realize the following queries here:
1) Select (all fields) one recipe which has the most of ingredients from the array. If more than one record have the same amount of matches, then divide the number of matches by total number of ingredients in recipe and return the one with the highest ratio. If there are no matches return nothing.
2) As above, but return 10 recipes with the most matches and do not perform check for equal number of matches. Sort the results by the number of matches.
Any ideas how to compose those queries in SQLite?

(The SQL statement provide under are for SQLite)
So for the second one you need the top 10 recipee that match the most ingredient
What you need is:
count the row that match your ingredient list (use IN operator)
order the result by best count in descendant order (4,3,2,...)
limit the result by 10
So the sql statement looks like
SELECT
r.RecipeId, COUNT(1) cnt, r.Name, r.Recipe, r.Origin, r.Category, r.Favoured
FROM
RecipeDB r
INNER JOIN IngredientDB i USING(RecipeID)
WHERE
i.Ingredient in ('ingr_1',..,'ingr_x')
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10
using AIR + AS3 it can be something like that :
var sqls:SQLStatement = new SQLStatement()
sqls.sqlConnection = YOUR SQL CONNECTION
// your ingredient list
var ingredients:Array = ['i2', 'i3', 'i4']
// use to build the in parameter array
var inParams:Array = []
// fill parameter values
for(var i:int = 0; i < ingredients.length; ++i) {
inParams[i] = '?'
sqls.parameters[i] = ingredients[i]
}
// build the query
var qry:String = "SELECT r.RecipeId, COUNT(1) cnt, r.Name, r.Recipe, r.Origin,"+
"r.Category, r.Favoured FROM RecipeDB r INNER JOIN IngredientDB i USING(RecipeID)"+
"WHERE i.Ingredient in (" + inParams.join(',') + ") GROUP BY 1 "+
"ORDER BY 2 DESC LIMIT 10"
// set the query
sqls.text = qry
//execute
sqls.execute()
And for the first one same idea as above but you need to count also all the ingredient present into the recipee to provide a ratio between match / total
What you need is:
count the row that match your ingredient list (use IN operator)
make a rank by divide previous count by all total ingredient
get the best match
limit the result by 1
So the sql statement looks like :
SELECT
i1.RecipeId, (cast(rs.cnt as real) / cast (COUNT(1) as real)) rank,
rs.Name, rs.Recipe, rs.Origin, rs.Category, rs.Favoured
FROM
IngredientDB i1
INNER JOIN (
SELECT
r.RecipeId, COUNT(1) cnt, r.Name, r.Recipe, r.Origin, r.Category, r.Favoured
FROM
RecipeDB r
INNER JOIN IngredientDB i USING(RecipeID)
WHERE
i.Ingredient in ('ingr_1',..,'ingr_x')
GROUP BY 1
) rs USING (RecipeId)
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1
Using the same logic as for the first example your query can be written as :
var ingredients:Array = ['i2', 'i3', 'i4']
var inParams:Array = []
for(var i:int = 0; i < ingredients.length; ++i) {
inParams[i] = '?'
sqls.parameters[i] = ingredients[i]
}
var qry:String = "SELECT i1.RecipeId, (cast(rs.cnt as real) / cast (COUNT(1) as real)) rank,"+
"rs.Name, rs.Recipe, rs.Origin, rs.Category, rs.Favoured "+
"FROM IngredientDB i1 INNER JOIN ("+
"SELECT r.RecipeId, COUNT(1) cnt, r.Name, r.Recipe, r.Origin, r.Category, r.Favoured "+
"FROM RecipeDB r INNER JOIN IngredientDB i USING(RecipeID) "+
"WHERE i.Ingredient in (" + inParams.join(',') + ") GROUP BY 1) rs USING (RecipeId) "+
"GROUP BY 1 ORDER BY 2 DESC LIMIT 1"

The queries can be tuned slightly, but the T-SQL below demonstrates the answers your looking for in a fairly readable way.
BEGIN
-- setup test
DECLARE #one TABLE(id INT, name VARCHAR(10))
DECLARE #many TABLE(pid INT, name VARCHAR(10))
INSERT INTO #one VALUES
(1, 'AAA'),
(2, 'BBB'),
(3, 'CCC')
INSERT INTO #many VALUES
(1, 'x'),(1, 'y'),(1, 'z'),
(2, 'x'),(2, 'y'),
(3, 'z')
--
-- WHERE m.name IN ('x', 'y')
-- 'x', 'y' represent your list of ingrediants
-- answer 1
SELECT * FROM #one WHERE id = (
SELECT TOP 1 x.id FROM (
SELECT o.id, COUNT(o.id) 'match', (SELECT COUNT(*) FROM #many WHERE pid=o.id) 'total' FROM #one o
INNER JOIN #many m ON o.id = m.pid
WHERE m.name IN ('x', 'y', 'z')
GROUP BY o.id
) as x ORDER BY x.match DESC, x.match/x.total DESC
)
-- answer 2
SELECT * FROM #one WHERE id IN (
SELECT TOP 10 x.id FROM (
SELECT o.id, COUNT(o.id) 'match' FROM #one o
INNER JOIN #many m ON o.id = m.pid
WHERE m.name IN ('x', 'y')
GROUP BY o.id
) as x ORDER BY x.match DESC
)
END

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: join on best string match - sql

I came up with this: with tmp as ( SELECT * FROM Items i LEFT JOIN MyGroup g ON g.Prefix = SUBSTRING(i.ItemID,1,LEN(g.Prefix)) ) SELECT a.* FROM tmp a WHERE LEN(a.prefix) = (SELECT MAX(LEN(b.prefix)) FROM tmp b WHERE a.itemid = b.itemid ) Seems to work... SQLFiddle

Related

Insert records from two tables that match

Dynamically update table with column from another table

How to UPDATE pivoted table in SQL SERVER

Update table based on comparing data from different tables and priority

realization of an algorithm in SQL query

Categories

Resources