Contains function with string splitting - sql

I am trying to use a contains() function to do matching on two columns when joining tables.
I have two problems
Problem 1
The data looks as such:
col1: '["Red","Blue","Green","yes","purple","car","yellow"]'
col2: 'This Is Not Yellow'
SO using contains(LOWER("col1"), LOWER("col2")) works for some examples, however the one above will not work properly, I need to split col2 and look for each value individually in col1 which I am having trouble doing.
Problem 2
I also have cases that look like this:
col1: '["House","brick","purple","blue"]'
col2: 'Very big houses'
So, again the above examples col2 would need to be split and looked for individually in col1, however houses would need to be trimmed by 1 character from the LEFT (to make house), but some other examples might need 2 characters taken off.
For this I was inclined to put together a dictionary to swap out for the appropriate names, or use some sort of NLP stemmers technique to remove the plurals from the word.
Any help on either of those very welcome
Thanks!

try this code:
create table t1 (
color varchar(32)
);
insert into t1 values ('Red');
insert into t1 values ('Green');
insert into t1 values ('yes');
insert into t1 values ('purple');
insert into t1 values ('car');
insert into t1 values ('yellow');
insert into t1 values ('House');
insert into t1 values ('brick');
insert into t1 values ('blue');
create table t2(
words varchar(100)
);
insert into t2 values ('This Is Not Yellow');
insert into t2 values ('Very big houses');
select * from t1 join t2 on t2.words like concat('%',color, '%');
result is exactly what you want
color
words
yellow
This Is Not Yellow
House
Very big houses
btw, I was using Mysql, seems like LIKE is ignoring cases. If issues move all to the same case.
select * from t1 join t2 on lower(t2.words) like
concat('%',lower(color), '%');

Related

How to string match in SQL(Oracle)

TABLE_1 has 5 strings like these : 'AK___', 'AB_DE', 'AB__E', 'AE__E', 'AF___'
One underscore stands for any one letter or number.
If given 'ABZDE'`,
is there a way to select 'AB_DE', 'AB__E' in my Table_1?
create table TABLE_1 ( modelname varchar2(10) )
INSERT INTO Table_1 VALUES ('AK___')
INSERT INTO Table_1 VALUES ('AB_DE')
INSERT INTO Table_1 VALUES ('AB__E')
INSERT INTO Table_1 VALUES ('AE__E')
INSERT INTO Table_1 VALUES ('AF___')
SELECT *
FROM Table_1
WHERE modelcode like 'AEZDE' --of course, this select clause doesn't work as I expected.
The wildcards are in the modelcode column, so it needs to be the right hand argument of the like operator:
SELECT *
FROM Table_1
WHERE 'AEZDE' like modelname
-- Here ------------^
I think you're misusing the LIKE operator; it needs the wildcard characters. I believe the structure of the query you're looking for is:
SELECT *
FROM Table_1
WHERE modelcode like '%AE%DE%'
But I'm not sure. Do the letters change between your question and your code?

How to define destination for an append query Microsoft Access

I'm trying to append two tables in MS Access at the moment. This is my SQL View of my Query at the moment:
INSERT INTO MainTable
SELECT
FROM Table1 INNER JOIN Table2 ON Table1.University = Table2.University;
Where "University" is the only field name that would have similarities between the two tables. When I try and run the query, I get this error:
Query must have at least one destination field.
I assumed that the INSERT INTO MainTable portion of my SQL was defining the destination, but apparently I am wrong. How can I specify my destination?
You must select something from your select statement.
INSERT INTO MainTable
SELECT col1, col2
FROM Table1 INNER JOIN Table2 ON Table1.University = Table2.University;
Besides Luke Ford's answer (which is correct), there's another gotcha to consider:
MS Access (at least Access 2000, where I just tested it) seems to match the columns by name.
In other words, when you execute the query from Luke's answer:
INSERT INTO MainTable
SELECT col1, col2
FROM ...
...MS Access assumes that MainTable has two columns named col1 and col2, and tries to insert col1 from your query into col1 in MainTable, and so on.
If the column names in MainTable are different, you need to specify them in the INSERT clause.
Let's say the columns in MainTable are named foo and bar, then the query needs to look like this:
INSERT INTO MainTable (foo, bar)
SELECT col1, col2
FROM ...
As other users have mentioned, your SELECT statement is empty. If you'd like to select more that just col1, col2, however, that is possible. If you want to select all columns in your two tables that are to be appended, you can use SELECT *, which will select everything in the tables.

SQL Server - Contain Multiples Values

I need retrieve a value of columm with SELECT. But, I have multiple values ...
I don't know what the user go select in checkbox...
Ex:
Insert Into MyTable (dados) Values ('a1') I want the result = Angulo 1
Insert Into MyTable (dados) Values ('a2';'a3') I want the result = Angulo 2
Insert into MyTable (dados) Values ('a3'; a1) I want the result = Angulo 3; Angulo 1
Insert into MyTable (dados) Values ('a6'; 'a7'; 'a4') I want the result = Angulo 6; Angulo 7;Angulo4
I am Trying with SELECT CASE WHEN. But it still fails...
I suspect you are asking how to use the IN keyword in your SELECT statements? It is a little unclear what you are trying to do.
Try this:
SELECT *
FROM MyTable
WHERE dados IN ('a6','a7','a4')
Assuming you have a table named MyTable and a column named dados with 3 rows in that table for a6, a7 and a4, this will return all the matches (in this case, all three rows).
Good luck.
When you say:
insert into MyTable(dados)
Values ('a6', 'a7', 'a4')
You are saying "I have one column to put data into called dados." Then, you are providing three values. This will fail in any database (even apart from the fact that the semicolons should be commas).
Perhaps you want:
insert into MyTable(dados)
Values ('a6;a7;a4')
That is only one value, a string.
This suggests a denormalized database. You might want three different rows in a table, one for each value, connected together by some key.
here are some examples if you're using sql server 2008 and above:
if(OBJECT_ID('tempdb..#dados') is not null)
DROP TABLE #dados
select top 100 * INTO #dados FROM
(
values(1,2,3),
(4,5,6),
(7,8,9)
) t(a,b,c)
select * FROM #dados
INSERT INTO #dados (a,b,c)
values(11,22,33),
(44,55,66),
(77,88,99)
SELECT * FROM #dados
INSERT INTO #dados (a,b,c)
SELECT * FROM
(
values(111,222,333),
(444,555,666),
(777,888,999)
) t(a,b,c)
SELECT * FROM #dados
If you want to insert multiple rows (not columns) the syntax is
Insert Into
MyTable (dados)
Values
('a1'),
('a2')
Looks like you're trying to ask for two things.
How to insert multiple values would be done in the following way:
Insert Into MyTable (dados) Values ('a6'),('a7'),('a4')
If you want to return the actual values 'Angulo' + the number, you can use the following:
CREATE TABLE MyTable
(
Dados varchar(255)
)
Insert Into MyTable (dados) Values ('a12')
Insert Into MyTable (dados) Values ('a2'),('a3')
Insert Into MyTable (dados) Values ('a3'),('a1')
Insert Into MyTable (dados) Values ('a6'),('a7'),('a4')
SELECT 'Angulo'+ SUBSTRING(dados,PATINDEX('%[0-9]%',dados),LEN(dados))
FROM MyTable
It will find the first number (assuming it's always the first number you're after) and get the rest of them. It will then append it with the prefix 'Angulo' (e.g Angulo1, Angulo7, etc)
If these aren't what you're after. Please can you explain further what you need.

Short but though SQL Query (T-SQL,SQL Server)

Suppose I have 2 tables, each tables has N columns. There are NO duplicate rows in table1
And now we want to know what datasets in table2 (including duplicates) are also contained in table1.
I tried
select * from table1
intersect
select * from table2
But this only gives me unique rows that are in both tables. But I don't want unique rows, are want to see all rows in table2 that are in table1...
Keep in mind!! I cannot do
select *
from table1 a, table b
where a.table1col = b.table2col
...because I don't know the number of columns of the tables at runtime.
Sure I could do something with dynamic SQL and iterate over the column numbers but I'm asking this precisely because it seems too simple a query for that kind of stuff..
Example:
create table table1 (table1col int)
create table table2 (table2col int)
insert into table1 values (8)
insert into table1 values (7)
insert into table2 values (1)
insert into table2 values (8)
insert into table2 values (7)
insert into table2 values (7)
insert into table2 values (2)
insert into table2 values (9)
I want my query then to return:
8
7
7
If the amount of columns is not know, you will have to resort to a value computed over a row to make a match.
One such function is CHECKSUM.
Returns the checksum value computed over a row of a table, or over a
list of expressions. CHECKSUM is intended for use in building hash
indices.
SQL Statement
SELECT tm.*
FROM (
SELECT CS = CHECKSUM(*)
FROM Table2
) tm
INNER JOIN (
SELECT CS = CHECKSUM(*)
FROM Table2
INTERSECT
SELECT CHECKSUM(*)
FROM Table1
) ti ON ti.CS = tm.CS
Note that CHECKSUM might introduce collisions. You will have to test for that before doing any operation on your data.
Edit
In case you are using SQL Server 2005, you might make this a bit more robust by throwing in HASH_BYTES.
The downside of HASH_BYTESis that you need to specify the columns on which you want to operate but for all the columns you do known up-front, you could use this to prevent collisions.
EXCEPT vs INTERSECT - link
EXCEPT returns any distinct values from the left query that are not also found on the right query.
INTERSECT returns any distinct values that are returned by both the query on the left and right sides of the INTERSECT operand.
Maybe EXCEPT can solve your problem

How to append distinct records from one table to another

How do I append only distinct records from a master table to another table, when the master may have duplicates. Example - I only want the distinct records in the smaller table but I need to insert/append records to what I already have in the smaller table.
Ignoring any concurency issues:
insert into smaller (field, ... )
select distinct field, ... from bigger
except
select field, ... from smaller;
You can also rephrase it as a join:
insert into smaller (field, ... )
select distinct b.field, ...
from bigger b
left join smaller s on s.key = b.key
where s.key is NULL
If you don't like NOT EXISTS and EXCEPT/MINUS (cute, Remus!), you have also LEFT JOIN solution:
INSERT INTO smaller(a,b)
SELECT DISTINCT master.a, master.b FROM master
LEFT JOIN smaller ON smaller.a=master.a AND smaller.b=master.b
WHERE smaller.pkey IS NULL
You don't say the scale of the problem so I'll mention something I recently helped a friend with.
He works for an insurance company that provides supplemental Dental and Vision benefits management for other insurance companies. When they get a new client they also get a new database that can have 10's of millions of records. They wanted to identify all possible dupes with the data they already had in a master database of 100's of millions of records.
The solution we came up with was to identify two distinct combinations of field values (normalized in various ways) that would indicate a high probability of a dupe. We then created a new table containing MD5 hashes of the combos plus the id of the master record they applied to. The MD5 columns were indexed. All new records would have their combo hashes computed and if either of them had a collision with the master the new record would be kicked out to an exceptions file for some human to deal with it.
The speed of this surprised the hell out of us (in a nice way) and it has had a very acceptable false-positive rate.
You could use the distinct keyword to filter out duplicates:
insert into AnotherTable
(col1, col2, col3)
select distinct col1, col2, col3
from MasterTable
Based on Microsoft SQL Server and its Transact-SQL. Untested as always and the target_table has the same amount of rows as the source table (otherwise use columnnames between INSERT INTO and SELECT
INSERT INTO target_table
SELECT DISTINCT row1, row2
FROM source_table
WHERE NOT EXISTS(
SELECT row1, row2
FROM target_table)
Something like this would work for SQL Server (you don't mention what RDBMS you're using):
INSERT INTO table (col1, col2, col3)
SELECT DISTINCT t2.a, t2.b, t2.c
FROM table2 AS t2
WHERE NOT EXISTS (
SELECT 1
FROM table
WHERE table.col1 = t2.a AND table.col2 = t2.b AND table.col3 = t2.c
)
Tune where appropriate, depending on exactly what defines "distinctness" for your table.