Need to join 2 tables using substring of different lengths on one column with length stated in second table - sql

I need to join 2 tables of which I need to substring a column in Table 1. What is unknown is the length of the substring in order to join to Table 2. The first few numbers are the joining key with differing lengths. Table 2 does state the length and will be the indicator on which entries need to be substringed with a specific length. The second table has fixed length of 9 so will also need to be substringed (which will be easy to do). Table 1 is my problem. The length column in Table 2 tells you how much of the ShortRef to use as well as how much to substring RefNr in Table1 which then becomes the join. However, I am not sure how to do this in SSMS or whether it is possible.
Since table 2 informs how much to substring, I currently don't see a solution and I don't know if like will work or how to do this using like.
Example:
TABLE 1
|RefNr |
|----------------|
|1234567890101234|
|9876543210090876|
|1234569000100223|
TABLE 2
|ShortRef | Length | Name |
|---------|--------|------|
|123456789|8 |Alice |
|123456909|8 |Cindy |
|987654999|6 |Ben |
RESULTS SHOULD BE:
|RefNr | Substr Table1&2 based on Length in Table2 | Name |
|----------------|-------------------------------------------|------|
|1234567890101234| 12345678 |Alice |
|9876543210090876| 987654 |Ben |
|1234569000100223| 12345690 |Cindy |
EXAMPLE OF TABLES

I don't know if you're wanting this exactly, but i took correct output for the example table with this.
SELECT RefNr
,tb2."Substring Table1 based on length in Table2"
,tb2.Name
FROM Table1
INNER JOIN
(SELECT SUBSTRING(ShortRef, 0, Length+1) as "Substring Table1 based on length in Table2",
Name,
Length
FROM Table2) as tb2
ON SUBSTRING(RefNr, 0, tb2.Length + 1) = tb2."Substring Table1 based on length in Table2"

It looks like you want to join the tables together with string operations:
select t1.*, left(t2.refnr, t2.length), t2.name
from table1 t1 join
table2 t2
on left(t2.refnr, t2.length) = left(t1.refnr, t2.length);

Related

Looking for a better way to dynamically pick a column source in SQL

To start with an example, lets say I need an SQL view with a structure like:
ID | Text01 | Text02 | Text03 | Text04 | Text05
Depending on the type of item, what is stored in each column could change, for example item with ID 1 may use 'Length' in Text01 while ID 2 may use Text02 to store length.
Now assume there is another table that explains the mapping:
ID | Text01 | Text02
--------------------
1 | Length |
2 | | Length
I want a way to directly populate the query based on the mapping.
I know I could use a case statement, eg.
case when mapping.text01 = 'length' then sourcetable.length ...
However my actual scenario consists of 40 dynamic columns and up to 150 fields which could be mapped to a column, which makes this option less viable.
Is there any way to convert the text of "sourcetable.length" to a column source or any other ideas you can recommend to potentially simplify this process?
You have a lousy data structure because you are storing data across the tables in columns rather than in rows.
You can do what you want, basically by unpivoting the data and then joining:
with t1 as (
select t1.id, v.colname, v.colvalue
from table1 t1 cross apply
(values ('Text01', t1.Text01),
('Text02', t1.Text02),
('Text03', t1.Text03),
. . .
) v(colname, colvalue)
),
t2 as (
select m.id, v.colname, v.colfield
from mapping m cross apply
(values ('Text01', m.Text01),
('Text02', m.Text02),
('Text03', m.Text03),
. . .
) v(colname, colfield)
)
select t1.id, t2.colfield, t1.colvalue
from t1 join
t2
on t1.id = t2.id and t1.colname = t2.colname;
If you want the data in a single row, then you would have to re-pivot the results.

SQL - Select the longest substrings

I have the data like that.
AB
ABC
ABCD
ABCDE
EF
EFG
IJ
IJK
IJKL
and I just want to get ABCDE,EFG,IJKL. how can i do that oracle sql?
the size of the char are min 2 but doesn't have a fixed length, can be from 2 to 100.
In the event that you mean "longest string for each sequence of strings", the answer is a little different -- you are not guaranteed that all have a length of 4. Instead, you want to find the strings where adding a letter isn't another string.
select t.str
from table t
where not exists (select 1
from table t2
where substr(t2.str, 1, length(t.str)) = t.str and
length(t2.str) = length(t.str) + 1
);
Do note that performance of this query will not be great if you have even a moderate number of rows.
Select all rows where the string is not a substring of any other row. It's not clear if this is what you want though.
select t.str
from table t
where not exists (
select 1
from table t2
where instr(t1.str, t2.str) > 0
);

MS-SQL JOIN with multiple SUBSTRING and LIKE

I have a MS SQL 2005/2008 database and trying to compare two tables of data using substrings with % wildcard to try and find data within one character of a column in other table.
Example is:
UPDATE table1
SET table1.Marker = 1
FROM table1
INNER JOIN table2
ON table1.ForeignKey = table2.ID
AND tabl1.CharacterColumn LIKE SUBSTRING(table2.CharacterColumn , 1, 5) + '%' + SUBSTRING(table2.CharacterColumn , 7, 8)
UPDATE table1
SET table1.Marker = 1
FROM table1
INNER JOIN table2
ON table1.ForeignKey = table2.ID
AND tabl1.CharacterColumn LIKE SUBSTRING(table2.CharacterColumn , 1, 6) + '%' + SUBSTRING(table2.CharacterColumn , 8, 8)
At present it takes a while to run this routine as the column can contain up to 10 characters and the dataset is on a table1 of 300 million rows (however a dataset of maybe 300k) and table2 of 2 million rows (a dataset of 100k).
My question is is the JOIN statement the best way to do one character out searching on a column?
i can't give exact examples as the data is protected, however this should help:
Table2 -
ID | FK | Name
1 | 100 | Phillips
2 | 100 | Bloggs
3 | 100 | Jones
Table1 -
ID | Table2FK | Name
1 | 100 | Philpips
2 | 100 | Bloggs
3 | 100 | Jones
As you see table2 record 1 is within one character of table1 record 1 and I want to identify that. Also the one character out can be at any point in the string
When you wrap column in SQL function, SQL Server is no longer able to use indexes. If you have large tables like you have described SQL Server will need to do many CPU intensive operations like Index Scan. Your have 2 alternatives
Create indexed view with a columns of that sub-string. It will take longer to build it first time but after that you will be able to easily join.
Second alternative is to modify your tables to break apart character column into two separate column and that create index on those two columns.
String operations are very costly and it is best to be able to break strings apart instead into separate columns instead of doing it real time.
Indexed Views Documentation http://technet.microsoft.com/en-us/library/dd171921(v=sql.100).aspx

SQL query for two values of one row based off same table column

I have two columns of one row of a report that I would like to be based off the same one column in a SQL table.
For example, in the report it should be something like:
ID | Reason | SubReason
1 | Did not like | Appearance
In the SQL table it is something like:
ID | ReturnReason
1 | Did not like
1 | XX*SR*Appearance
1 | XX - TestData
1 | XX - TestData2
The SubReason column is being newly added and the current SQL query is something like:
SELECT ID, ReturnReason AS 'Reason'
FROM table
WHERE LEFT(ReturnReason,2) NOT IN ('XX')
And now I'd like to add a column in the SELECT statement for SubReason, which should be the value if *SR* is in the value. This however won't work because it also has 'XX' in the value, which is omitted by the current WHERE clause.
SELECT t.ID, t.ReturnReason AS 'Reason',
SUBSTRING(t1.ReturnReason,7,10000) as 'SubReason '
FROM t
LEFT JOIN t as t1 on t.id=t1.id and t1.ReturnReason LIKE 'XX*SR*%'
WHERE t.ReturnReason NOT LIKE 'XX%'
SQLFiddle demo

Explode range of integers out for joining in SQL

I have one table that stores a range of integers in a field, sort of like a print range, (e.g. "1-2,4-7,9-11"). This field could also contain a single number.
My goal is to join this table to a second one that has discrete values instead of ranges.
So if table one contains
1-2,5
9-15
7
And table two contains
1
2
3
4
5
6
7
8
9
10
The result of the join would be
1-2,5 1
1-2,5 2
1-2,5 5
7 7
9-15 9
9-15 10
Working in SQL Server 2008 R2.
Use a string split function of your choice to split on comma. Figure out the min/max values and join using between.
SQL Fiddle
MS SQL Server 2012 Schema Setup:
create table T1(Col1 varchar(10))
create table T2(Col2 int)
insert into T1 values
('1-2,5'),
('9-15'),
('7')
insert into T2 values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)
Query 1:
select T1.Col1,
T2.Col2
from T2
inner join (
select T1.Col1,
cast(left(S.Item, charindex('-', S.Item+'-')-1) as int) MinValue,
cast(stuff(S.Item, 1, charindex('-', S.Item), '') as int) MaxValue
from T1
cross apply dbo.Split(T1.Col1, ',') as S
) as T1
on T2.Col2 between T1.MinValue and T1.MaxValue
Results:
| COL1 | COL2 |
----------------
| 1-2,5 | 1 |
| 1-2,5 | 2 |
| 1-2,5 | 5 |
| 9-15 | 9 |
| 9-15 | 10 |
| 7 | 7 |
Like everybody has said, this is a pain to do natively in SQL Server. If you must then I think this is the proper approach.
First determine your rules for parsing the string, then break down the process into well-defined and understood problems.
Based on your example, I think this is the process:
Separate comma separated values in the string into rows
If the data does not contain a dash, then it's finished (it's a standalone value)
If it does contain a dash, parse the left and right sides of the dash
Given the left and right sides (the range) determine all the values between them into rows
I would create a temp table to populate the parsing results into which needs two columns:
SourceRowID INT, ContainedValue INT
and another to use for intermediate processing:
SourceRowID INT, ContainedValues VARCHAR
Parse your comma-separated values into their own rows using a CTE like this Step 1 is now a well-defined and understood problem to solve:
Turning a Comma Separated string into individual rows
So your result from the source
'1-2,5'
will be:
'1-2'
'5'
From there, SELECT from that processing table where the field does not contain a dash. Step 2 is now a well-defined and understood problem to solve These are standalone numbers and can go straight into the results temp table. The results table should also get the ID reference to the original row.
Next would be to parse the values to the left and right of the dash using CHARINDEX to locate it, then the appropriate LEFT and RIGHT functions as needed. This will give you the starting and ending value.
Here is a relevant question for accomplishing this step 3 is now a well-defined and understood problem to solve:
T-SQL substring - separating first and last name
Now you have separated the starting and ending values. Use another function which can explode this range. Step 4 is now a well-defined and understood problem to solve:
SQL: create sequential list of numbers from various starting points
SELECT all N between #min and #max
What is the best way to create and populate a numbers table?
and, also, insert it into the temp table.
Now what you should have is a temp table with every value in the exploded range.
Simply JOIN that to the other table on the values now, then to your source table on the ID reference and you're there.
My suggestion is to add one more field and many more records to your ranges table. Specifically, the primary key would be the integer and the other field would be the range. Records would look like this:
number range
1 1-2,5
2 1-2,5
3 na
4 na
5 1-2,5
etc
Having said that, this is still rather limiting because a number can only have one range. If you want to be thorough, set up a many to many relationship between numbers and ranges.
As far as I can tell you best option is something like below:
Create a table value function that accepts your ranges an converts them to a collection of ints. So 1-3,5 would return:
1
2
3
5
Then use these results to join to other tables. I don't have an exact function to do this at hand, but this one seems like an excellent start.