How to compare string data to table data in SQL Server - I need to know if a value in a string doesn't exist in a column

How to compare string data to table data in SQL Server - I need to know if a value in a string doesn't exist in a column - sql

I have two tables, one an import table, the other a FK constraint on the table the import table will eventually be put into. In the import table a user can provide a list of semicolon separated values that correspond to values in the 2nd table.
So we're looking at something like this:
TABLE 1
ID | Column1
1 | A; B; C; D
TABLE 2
ID | Column2
1 | A
2 | B
3 | D
4 | E
The requirement is:
Rows in TABLE 1 with a value not in TABLE 2 (C in our example) should be marked as invalid for manual cleanup by the user. Rows where all values are valid are handled by another script that already works.
In production we'll be dealing with 6 columns that need to be checked and imports of AT LEAST 100k rows at a time. As a result I'd like to do all the work in the DB, not in another app.
BTW, it's SQL2008.
I'm stuck, anyone have any ideas. Thanks!

Seems to me you could pass ID & Column1 values from Table1 to a Table-Valued function (or a temp table in-line) which would parse the ;-delimited list, returning individual values per record.
Here are a couple options:
T-SQL: Parse a delimited string
Quick T-Sql to parse a delimited string
The result (ID, value) from the function could be used to compare (unmatched query) against values in Table 2.
SELECT tmp.ID
FROM tmp
LEFT JOIN Table2 ON Table2.id = tmp.ID
WHERE Table2.id is null
The ID results of the comparison would then be used to flag records in Table 1.

Perhaps inserting those composite values into 'TABLE 1' may have seemed like the most convenient solution at one time. However, unless your users are using SQL Server Management Studio or something similar to enter the values directly into the table then I assume there must be a software layer between the UI and the database. If so, you're going to save yourself a lot headaches both now and in the long run by investing a little time in altering your code to split the semi-colon delimited inputs into discrete values before inserting them into the database. This will result in 'TABLE 1' looking something like this
TABLE 1
ID | Column1
1 | A
1 | B
1 | C
1 | D
It's then trivial to write the SQL to find those IDs which are invalid.

If it is possible, try putting the values in separate rows when importing (instead of storing it as ; separated).
This might help.

Here is an easy and straightforward solution for the IDs of the invalid rows, despite its lack of performance because of string manipulations.
select T1.ID
from [TABLE 1] T1
left join [TABLE 2] T2
on ('; ' + T1.COLUMN1 + '; ') like ('%; ' + T2.COLUMN2 + '; %')
where T1.COLUMN1 is not null
group by T1.ID
having count(*) < len(T1.COLUMN1) - len(replace(T1.COLUMN1, ';', '')) + 1
There are two assumptions:
The semicolon-separated list does not contain duplicates
TABLE 2 does not contain duplicates in COLUMN2.
The second assumption can easily be fixed by using (select distinct COLUMN2 from [TABLE 2]) rather than [TABLE 2].

Related

Joining two tables in SQL in which one column has to be "cleaned"

I need to join two tables in SQL, which has two related columns (column ID1 in Table 1 and column ID in Table 2). ID1 in table 1 consists of 6 digits, whereas ID2 in table 2 consists of 6 digitis but an additional quotation marks (") in the beginning and end of the string. I need to remove these quotation marks and join the two tables to verify if there is any values reocurring in both columns.
I know how to remove first and last character of the string in table 2:
SELECT SUBSTRING ([ID2],2,Len([ID2])-2) FROM [dbo].[table2]
I need to join this new "trimmed" column with the other column from table 1.
Any suggestions?

Assuming you are using ms sql server db, and need everything from table1 and matched from table2 then:
sample:
table1 | table2
[ID] | [ID]
547832 | "547832"
-----------------------------
select table1.* , table2.*
from
db.tb1 table1
left join
db.tb2 table2
on
table1.[ID] = SUBSTRING([ID2],2,Len([ID2])-2) ;

First extract your trimmed column with different name by using 'AS' and then you can join the tables.
Try like the below
syntax: SELECT Substring( columnname , positon, length) AS Newcolumnname FROM Tablename;
EX: SELECT Substring(customerName,1,5) AS Newstr from Customer
Joins Table2 ON customer.Newstr = Table2.name;

I am using MS SQL, yes.
Thanks for the reply. However, why is it a left join and not an inner join here? Just curious.
So, essentially what I need to do is:
In the first table, I have around 10 columns, in the second table I have 5 columns. They all have different names, ID was just used as an example. Two of the columns from table 2 appears to have similar values as two of the columns from table 1 (one is an ID of 6 digits, the other is names). I want to remove the first and last character of the 6 digits in the ID column in table 2 and join that and the names column with ID and names from table 1. Hope it makes sense

Oracle Compare data between two different table

I have two table one is having all field VARCHAR2 but other having different type for different data.
For Example :
Table One
==========================
Col 1 VARCHAR2 UNIQUE KEY
Col 2 VARCHAR2
Col 3 VARCHAR2
===========================
Table Two
==========================
Col One VARCHAR2 UNIQUE KEY
Col Two TIMESTAMP
Col Three NUMBER
==========================
we are having one mapping table. it denotes which column of Table One has to compare with which column of Table Two.
For Example
Mapping Table
==============================
Table One Table Two
==============================
Col 1 Col One
Col 2 Col Three
Col 3 Col Two
==============================
Now with the help of UNIQUE KEY of TABLE ONE we have to find same row in TABLE TWO and compare rows column by column and get changes in data.
Currently we are using java program for comparing data row by row and column by column and getting changes between data in rows with same UNIQUE KEY. it is working fine but taking too much time as we are having 100000 records in DB.
Now my question is : is there any way i can compare data at SQL level and get changes in data?

You can do it 'manually' with a query like this: It's a lot of work, but there are only three different types of checks you need to do, so it's not very complex:
select
*
from
Table1 t1
full outer join Table2 t2 on t2.ID = t1.ID
where
-- Check ID, either record does not exist in either table.
t1.ID is null or
t2.ID = null or
-- Not nullable field can be easily compared.
t1.NotNullableField1 <> t2.NotNUllableField1 or
-- Nullable field is slightly more work.
t1.NullableField1 <> t2.NullableField1 or
(t1.NullableField1 is null and t2.NullableField1 is not null) or
(t1.NullableField1 is not null and t2.NullableField1 is null)
Another solution is to use MINUS, which is a bit like UNION, only it returns a dataset minus the records in a second dataset:
select * from Table1 t1
MINUS
select * from Table2 t2
This works only one way (which might be fine for your purpose), but you can also combine it with UNION to make it bidirectional.
select
*
from
( select * from Table1
MINUS
select * from Table2)
UNION ALL
( select * from Table2
MINUS
select * from Table1)
The output of both solutions is a bit different.
In the FULL OUTER JOIN query, the IDs will be joined and the values of the matching rows will be displayed next to each other as a single row.
In the MINUS query, the result will be presented as a single dataset. If a record does not exist in either one table, it will be displayed. If a record (ID) exists in both tables, but other fields are different, you will get both rows. So it's a bit harder to compare them.
See: http://www.techonthenet.com/oracle/minus.php

Compare comma separated list with individual row in table

I have to compare comma separated values with a column in the table and find out which values are not in database. [kind of master data validation]. Please have a look at the sample data below:
table data in database:
id name
1 abc
2 def
3 ghi
SQL part :
Here i am getting comma separated list like ('abc','def','ghi','xyz').
now xyz is invalid value, so i want to take that value and return it as output saying "invalid value".
It is possible if i split those value, take it in temp table, loop through each value and compare one by one.
but is there any other optimal way to do this ??

I'm sure if I got the question right, however, I would personally be trying to get to something like this:
SELECT
D.id,
CASE
WHEN B.Name IS NULL THEN D.name
ELSE "invalid value"
END
FROM
data AS D
INNER JOIN badNames B ON b.Name = d.Name
--as SQL is case insensitive, equal sign should work
There is one table with bad names or invalid values if You prefer. This can a temporary table as well - depending on usage (a black-listed words should be a table, ad hoc invalid values provided by a service should be temp table, etc.).
NOTE: The select above can be nested in a view, so the data remain as they were, yet you gain the correctness information. Otherwise I would create a cursor inside a function that would go through the select like the one above and alter the original data, if that is the goal...

It sounds like you just need a NOT EXISTS / LEFT JOIN, as in:
SELECT tmp.InvalidValue
FROM dbo.HopeThisIsNotAWhileBasedSplit(#CSVlist) tmp
WHERE NOT EXISTS (
SELECT *
FROM dbo.Table tbl
WHERE tbl.Field = tmp.InvalidValue
);
Of course, depending on the size of the CSV list coming in, the number of rows in the table you are checking, and the style of splitter you are using, it might be better to dump the CSV to a temp table first (as you mentioned doing in the question).

Try following query:
SELECT SplitedValues.name,
CASE WHEN YourTable.Id IS NULL THEN 'invalid value' ELSE NULL END AS Result
FROM SplitedValues
LEFT JOIN yourTable ON SplitedValues.name = YourTable.name

MS SQL - Joining on two tables with a substringed key in one column

I have a 2 tables I need to join, however on one of the tables I need to extract a key from a varchar field in each row.
Table 1 Description (numeric 18,varchar 4000)
descriptionid description
1 Blah Blah: Queue 1Blah Blah
2 foobar:Queue 2
3 rem:Queue 2 -This is a note
4 Anotherrow: Queue 3
5 Something else
Table 2 Queue - (numeric 18, varchar 100)
queueid queue
123 Queue 1
124 Queue 2
127 Queue 3
129 Queue 4
So I need to produce the output like so
View 3 Queue-Description (numeric 18, numeric 18)
descriptionid queueid
1 123
2 124
3 124
4 127
5 null
So in table 1 row 1, I need to strip out the value Queue1 from the description, verify it is in the queue table, and lookup the queueid.
I am unable to change the structure of tables 1 and 2.
What ways can this be achieved in MSSQL?
What is the most efficient way to do this in SQL - using MSSQL 2005 here.

most efficient way
Well... don't know about that but it is a way.
select T1.descriptionid,
T2.queueid
from Table1 as T1
left outer join Table2 as T2
on T1.description like '%'+T2.queue+'%'
Another way
select T1.descriptionid,
T2.queueid
from Table1 as T1
left outer join Table2 as T2
on charindex(T2.queue, T1.description, 1) > 0
If there are more than one match (see comment by Ed Harper) you can use this to pick the one with the longest match.
select T1.descriptionid,
T2.queueid
from Table1 as T1
outer apply (
select top 1 T3.queueid
from Table2 as T3
where charindex(T3.queue, T1.description, 1) > 0
order by len(T3.queue) desc
) as T2(queueid)

The most efficient way to do this is to add an extra column to your table and insert the extracted the ID from the string. You can do this when rows are added and you can process the existing ones fairly easily. But trying to left join like this will be very slow.

In Sql Server 2005 you can extract your queue string using regex. The Data Extraction section on this page contains an example.
In a stored procedure you can then build an indexed temp table that contains a new column - this allows you to do this without changing the table metadata).
If you can change the table metadata you can:
Trigger the content into another column (on insert).
Or if the information is not needed immediately a daily sql job could extract the information.

Matching records with wild cards from two different tables

I have two tables with the following data (amongst other data).
Table 1
Value 1
'003232339639
'00264644106272
0026461226291#
I need to match the second column in the table below using column 1 as an identifier
Table 2
Value 1 Value 2
00264 1
0026485 2
0026481 3
00322889 4
00323283 5
00323288 6
So the results I need will be as follows:
Result
Table 1, Value 1 Table 2, Value 2
'003232339639......4
'00264644106272....1
0026461226291#.....1
Any help will be appreciated - very stuck here and doing it manually at the moment in excel.
I hope this format makes sense - first time I am using this forum.

Melany, the question is kind of confusing (not written correctly) perhaps that's why no one is responding. I'll make an attempt to explain how similar selects is done
SELECTING DATA FROM TABLE1 WHERE A MATCHING COLUMN (COL1) EXISTS IN BOTH TABLE
SELECT * FROM TABLE1
INNER JOIN TABLE2
ON TABLE1.COL1 = TABLE2.COL1
AND TABLE1.COL1 = 'XYZ'
USING A SUBSELECT FOR THE SAME
SELECT * FROM TABLE1
WHERE COL1 IN(SELECT COL1 FROM TABLE2
WHERE COL1 = 'XYZ')

In SQL, the wildcard for one or more characters is %, and is to be used with the keyword LIKE.
So I suggest the following (if your purpose is really to match rows in Table1 for which Value1 begins like a value in Table2.Value1):
SELECT Table1.Value1, Table2.Value2 WHERE Table1.Value1 LIKE CONCAT(Table2.Value1, '%');
Edit: replace CONCAT(x, y) with x || y for some DBMSs (SQLite for instance).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas