Matching up 2 fields with differing Character lengths, substr? Padding? - sql

I need to be able to write a SQL to match customerid where the # of characters differs between tables. As you will see below, table 1 has the CustomerId with no padding (# of characters may differ, as shown within the example). Table 2 has a specific format of '0001' + 0 padding to make the field a total of 30 characters.
So, if I needed to write SQL for this, for table1 CustomerIds, would this be some type of substring?
Example:.
Table1 has customerid as '123456'.
Table 2 has customerid as '000100000000000000000000123456'
Example 2:
Table 1 has customerid as '98765432'
Table 2 has customerid as '000100000000000000000099765432'

You should pad the value from the table1 with 0001 and zeroes until it is 30 characters and then use that to compare.
where '0001'||lpad(columnname,'0',26) = columnname2

........
where table2.customerID like '%' || table1.customerID
like is really a misnomer, it means "equals". '%' will match an arbitrary string, of any length (zero or more).

LIKE should definitely a good option.
One more way, assuming that the columns contain only numeric characters, you could use:-
ON ( TO_NUMBER(SUBSTR(tab2.customerid,5)) = tab1.customerid )
Adding a function based INDEX on TO_NUMBER(SUBSTR(tab1.customerid,5) may speed up the query.

you can also use ltrim function for it like query below just trimming off 0s or 1s from table2's customerid:
select * from ns_table2 a,ns_table3 b where
ltrim(b.val1,'01')=ltrim(a.val1,'01') ;
sample input:
create table ns_table2(val1 varchar(30),val2 varchar(30));
create table ns_table3(val1 varchar(30),val2 varchar(30));
insert into ns_table2 values('123456','table2');
insert into ns_table2 values('98765432','table2');
insert into ns_table3 values('000100000000000000000000123456','table3');
insert into ns_table3 values('000100000000000000000098765432','table3');
select * from ns_table2 a,ns_table3 b where
ltrim(b.val1,'01')=ltrim(a.val1,'01') ;
sample output:
123456 table2 000100000000000000000000123456 table3
98765432 table2 000100000000000000000098765432 table3

Related

Joining two tables in SQL in which one column has to be "cleaned"

I need to join two tables in SQL, which has two related columns (column ID1 in Table 1 and column ID in Table 2). ID1 in table 1 consists of 6 digits, whereas ID2 in table 2 consists of 6 digitis but an additional quotation marks (") in the beginning and end of the string. I need to remove these quotation marks and join the two tables to verify if there is any values reocurring in both columns.
I know how to remove first and last character of the string in table 2:
SELECT SUBSTRING ([ID2],2,Len([ID2])-2) FROM [dbo].[table2]
I need to join this new "trimmed" column with the other column from table 1.
Any suggestions?
Assuming you are using ms sql server db, and need everything from table1 and matched from table2 then:
sample:
table1 | table2
[ID] | [ID]
547832 | "547832"
-----------------------------
select table1.* , table2.*
from
db.tb1 table1
left join
db.tb2 table2
on
table1.[ID] = SUBSTRING([ID2],2,Len([ID2])-2) ;
First extract your trimmed column with different name by using 'AS' and then you can join the tables.
Try like the below
syntax: SELECT Substring( columnname , positon, length) AS Newcolumnname FROM Tablename;
EX: SELECT Substring(customerName,1,5) AS Newstr from Customer
Joins Table2 ON customer.Newstr = Table2.name;
I am using MS SQL, yes.
Thanks for the reply. However, why is it a left join and not an inner join here? Just curious.
So, essentially what I need to do is:
In the first table, I have around 10 columns, in the second table I have 5 columns. They all have different names, ID was just used as an example. Two of the columns from table 2 appears to have similar values as two of the columns from table 1 (one is an ID of 6 digits, the other is names). I want to remove the first and last character of the 6 digits in the ID column in table 2 and join that and the names column with ID and names from table 1. Hope it makes sense

Inner-Join on two column where one column has a single tailing character

Hi I'm new to SQL and I have 2 tables that I am trying to do an inner-join with.
------------------------
First table:
------------------------
ID-Number CustomerName
------------------------
Second table
------------------------
ID-Number CustomerDevice
(ID with a single tailing character)
Questions
What would be the best preforming way to execute the inner-join on both table's ID-number?
Is there a method to remove the trailing character within the inner-join command?
You don't have much choice. Here is how you can express the logic:
select . . .
from t1 join
t2
on t1.id like t2.id + '_';
Unfortunately, this may not make use of indexes. (Also note that + for string concatenation is SQL Server-specific).
You might be able to rewrite the query as:
on t1.id = left(t2.id, len(t2.id) - 1)
This should be able to use an index on t1(id).
The best approach is to fix the data, so your ids are the same type, same length, and have a properly declared foreign key relationship. Another alternative available in SQL Server is an index on a computed column:
alter table t2 add realId as (left(id, len(id) - 1));
create index idx_t2_realId on t2(realId);
Then write the join logic using realId.
Would this work?
SELECT
ID-Number,
CustomerName,
CustomerDevice
FROM t1
INNER JOIN t2 on t1.ID-Number=LEFT(t2.ID-Number,LEN(t2.ID-Number)-1)
EDIT: Forgot the 1
Given that the table Customer has this column
ID_number int not null;
And the the table Device has this column
ID_number varchar(15);
And we know that Device.ID_number, if it is not NULL, is always equal to some Customer.ID_number with a letter appended, then (SQL Server):
SELECT *
FROM Customer c
JOIN Device d
ON c.ID_number = CAST(SUBSTRING(i.ID_number, 1, LEN(i.ID_number) - 1) AS int)
More robust solutions that allow for more possibilities in the data require more defensive coding. You may want to define a scalar function to process Customer.ID_number.

Best Way to do a JOIN on Product UPC varchar(20)?

I am trying to do a JOIN on two Tables. Each table contains a UPC varchar2(20) from different data sources.
What makes this JOIN a bit difficult is that the values of UPC can vary in size, can sometimes be padded with leading zeros, and sometimes not. All contain a trailing check digit.
For example:
Table 1 Table 2
UPC "00000123456789" "123456789"
"234567890" "234567890"
"00000003456789" "00000003456789"
"3456799" "00000003456799"
My thoughts are to convert each to a long and then do the compare. Or I can append leading zeros. Or an do a contains.
What is the best way to do the join using SQL?
You can try this:
select * from
table1 inner join table2
on (CAST(CAST(table1.UPC AS BIGINT) AS VARCHAR))
=(CAST(CAST(table2.UPC AS BIGINT) AS VARCHAR))
SQL FIDDLE DEMO
or
select * from
table1 inner join table2
on (RIGHT(table1.UPC,(LEN(table1.UPC) - PATINDEX('%[^0]%',table1.UPC)) + 1))
=(RIGHT(table2.UPC,(LEN(table2.UPC) - PATINDEX('%[^0]%',table2.UPC)) + 1))
SQL FIDDLE DEMO
This is not the highest-performance option, but it is the simplest:
SELECT
T1.UPC,
T2.Column1
FROM
myTable T1
INNER JOIN myTable T2 ON
RIGHT(REPLICATE('0', 20) + T2.UPC, 20) = RIGHT(REPLICATE('0', 20) + T1.UPC, 20)
Alternatively, you can create computed columns for these padded UPCs, and place indexes upon them. However, this comes with a slew of restrictions. I have not been able to use this in the real world very many times.
Indexes on Computed Columns (MSFT)
If you have the ability to add columns to your table, you could have persisted computed columns to just cast your varchar to bigint. Then they can be indexed and the joins on these would be a lot quicker. In apps where there are way more reads than writes, this can be worthwhile.
create table ack
(
UPC
varchar( 20 ) null,
UPCValue
as isnull( convert( bigint, Upc ), 0 ) persisted
)
You don't have to do the isnull if Upc doesn't need to support null.
Most UPC values are 12 digits, or 13 for the EAN variety (see http://en.m.wikipedia.org/wiki/Universal_Product_Code). So if you can get away with altering your tables to make the column size match the expected value size that would be the simplest way. You could alter the tables to be varchar(18) to allow for 5-6 leading zeros, while still allowing the values to safely be cast to bigint. The maximum size of bigint is 9,223,372,036,854,775,807 (19 digits), so any numeric value stored in varchar(18) will fit. Then you can query easily:
Select *
From tab1
Join tab2
on cast (tab1.upc as bigint) = cast(tab2.upc as bigint);
If you're stuck with varchar(20) columns, casting to bigint will still work assuming your actual data doesn't contain erroneous values that exceed the maximum size of bigint. But obviously that isn't bulletproof. A string of twenty nines ('99999999999999999999'), for example, will result in data truncation.
If you really need 20 digits you'll need to compare the strings by left-padding with zeros or trimming off the left zeros, as shown in some of the other answers.

Where the value of one column is contained in the value of another column

I'm trying to write a query which compares two tables and finds all entries where one field is contained in another field. For example one field contains a single 5 digit login ID eg 12345. The second field contains one or multiple IDs seperated by commas but with the characters text^ in front eg text^12345 or text^12345,54321,13579,97531
If I try
Select * from table1.login_id a
join table2.login_id b
on b.login_id LIKE '%' + a.login_id + '%'
I am finding is that it is only joining on the last entry in the list. So if a.login_id = 12345 it only brings back where b.login_id = text^12345 or text^54321,12345 but not text^12345,54321
Am I just missing something?
Note: I am using SQL Server 2008, so the query can't use CONCAT.
You need to use CONCAT() to assemble your string:
Select * from table1.login_id a
join table2.login_id b
on b.login_id LIKE CONCAT('%', a.login_id, '%')
Perhaps a where exists is appropriate:
select * from table1 as a where exists (
select b.login_id from table2 as b where a.login_id like concat('%', b.login_id, '%')
);
You are missing something. Your issue is not reproducible.
When I run the following:
DECLARE #Table1 TABLE (login_id varchar(255));
DECLARE #Table2 TABLE (login_id varchar(255));
INSERT INTO #Table1 VALUES ('12345');
INSERT INTO #Table2 VALUES ('text^12345,54321');
Select * from #Table1 a
join #Table2 b
on b.login_id LIKE '%' + a.login_id + '%';
I get:
login_id | login_id
12345 | text^12345,54321
So when you say
it only brings back where b.login_id = text^12345 or text^54321,12345
but not text^12345,54321
You are wrong. It does bring that text back. Find the difference between the query you posted in your question, and the actual query you are using, and you will find what you are missing. Or if there is no difference in the query, then the difference is in the data. The data in your question may not be comparable to your actual data.

How to compare string data to table data in SQL Server - I need to know if a value in a string doesn't exist in a column

I have two tables, one an import table, the other a FK constraint on the table the import table will eventually be put into. In the import table a user can provide a list of semicolon separated values that correspond to values in the 2nd table.
So we're looking at something like this:
TABLE 1
ID | Column1
1 | A; B; C; D
TABLE 2
ID | Column2
1 | A
2 | B
3 | D
4 | E
The requirement is:
Rows in TABLE 1 with a value not in TABLE 2 (C in our example) should be marked as invalid for manual cleanup by the user. Rows where all values are valid are handled by another script that already works.
In production we'll be dealing with 6 columns that need to be checked and imports of AT LEAST 100k rows at a time. As a result I'd like to do all the work in the DB, not in another app.
BTW, it's SQL2008.
I'm stuck, anyone have any ideas. Thanks!
Seems to me you could pass ID & Column1 values from Table1 to a Table-Valued function (or a temp table in-line) which would parse the ;-delimited list, returning individual values per record.
Here are a couple options:
T-SQL: Parse a delimited string
Quick T-Sql to parse a delimited string
The result (ID, value) from the function could be used to compare (unmatched query) against values in Table 2.
SELECT tmp.ID
FROM tmp
LEFT JOIN Table2 ON Table2.id = tmp.ID
WHERE Table2.id is null
The ID results of the comparison would then be used to flag records in Table 1.
Perhaps inserting those composite values into 'TABLE 1' may have seemed like the most convenient solution at one time. However, unless your users are using SQL Server Management Studio or something similar to enter the values directly into the table then I assume there must be a software layer between the UI and the database. If so, you're going to save yourself a lot headaches both now and in the long run by investing a little time in altering your code to split the semi-colon delimited inputs into discrete values before inserting them into the database. This will result in 'TABLE 1' looking something like this
TABLE 1
ID | Column1
1 | A
1 | B
1 | C
1 | D
It's then trivial to write the SQL to find those IDs which are invalid.
If it is possible, try putting the values in separate rows when importing (instead of storing it as ; separated).
This might help.
Here is an easy and straightforward solution for the IDs of the invalid rows, despite its lack of performance because of string manipulations.
select T1.ID
from [TABLE 1] T1
left join [TABLE 2] T2
on ('; ' + T1.COLUMN1 + '; ') like ('%; ' + T2.COLUMN2 + '; %')
where T1.COLUMN1 is not null
group by T1.ID
having count(*) < len(T1.COLUMN1) - len(replace(T1.COLUMN1, ';', '')) + 1
There are two assumptions:
The semicolon-separated list does not contain duplicates
TABLE 2 does not contain duplicates in COLUMN2.
The second assumption can easily be fixed by using (select distinct COLUMN2 from [TABLE 2]) rather than [TABLE 2].