Join over substring across tables in MSSQL - sql

I have two tables with one table with one column having a URL and another table having a substring from that URL
Table 1
Id | URL
----------
1 ...\aaa_common\
2 ...\aaa_qa..
3 ...\aaa_test\Analytics
Table 2
SomeId | compname
-----------------
1 aaa_common
2 aaa_qa
3 aaa_test
It is possible to join using string functions (charindex and substring) . But is there an easier alternative?

Yes you can use join, but not sure that this is best method, cause join on string is not good idea, also I am not sure about repetitive values in your table. Still if require to do so, I will suggest you to have one more column in your Table1, in which you can update only compname from same table using sub-string & then join both tables including new column from Table1 & compname from Table2.
Also for using sub-string you should be 100% sure with index/pattern of your compname in string of Table1.
Please look into this DEMO
Just has example of join on string using sub-string & charindex

You can join using like, but it will be a bit of a performance hit.
select
*
from
table_1 t1
inner join table_2 t2 on
t1.url like concat('%',t2.compname, '%')

Related

Left Join with Partial match SQL Server

I have 2 tables in SQL Server that I am trying to make a left join from so that all records from table1 are shown and any data from table2 is shown if it exists. They are as follows
Table1
id Customername Jobid
--------------------------------
2754444 Jones 123
2854233 Smith 234
Table2
key Location
-----------------------------
FD#2754444 London
FEC#2854233 Liverpool
I can get an inner join query to work as below - but I obviously get only matching records, (which I dont want - I want all records from table1 and any matching values from table2)
This works:
$query = "select distinct table1.id, table1.customername, table1.jobid, table2.location, table2.[key]
from table1
inner join table2
on table1.id= RIGHT([table2].[key],7)"
So changing it to a left join:
This does not work:
$query = "select distinct table1.id, table1.customername, table1.jobid, table2.location, table2.[key]
from table1
left join table2
on table1.id = RIGHT([table2].[key],7)"
It does not return any of the table2 data. Any advice on what I am doing wrong would be very welcome.
Thanks in advance.
I put together a SQL Fiddle to show that your query should work (based on a guess about datatypes). Given that you've wrapped your queries as strings, that raises the question of whether your problem is actually with SQL, or if the ODBC (or whatever) connection is actually returning a parser error rather than a result set. Have you looked at what the db is providing in return? Have you ensured that there is whitespace between each word, even for line breaks (copying your text as-is shows CRs and LFs, but check your code); otherwise, it's quite possible that you're sending SQL Server something like "SELECT * FROMTABLEWHERETHING" rather than "SELECT * FROM TABLE WHERE".
Thank you all for your input. For some reason the RIGHT was not returning anything so I managed to resolve with :
left JOIN [table2] ON [table1].id= substring([key],(CHARINDEX ('#',[key] , 1)+1),7)
Thanks for all your responses.
Jim

Using LIKE in a JOIN query

I have two separate data tables.
This is Table1:
Customer Name Address 1 City State Zip
ACME COMPANY 1 Street Road Maspeth NY 11777
This is Table2:
Customer Active Account New Contact
ACME Y John Smith
I am running a query using the JOIN where only include rows where the joined fields from both tables are equal.
I am joining Customer Name from Table1 and Customer from Table2. Obviously no match. What I am trying to do is show results where the first 4 characters match in each table so I get a result or match. Is this possible using LIKE or LEFT?
Yes, that's possible.
But I doubt, that every name in table 2 only has 4 letters, so here's a solution where the name in table2 is the beginning of the name in table1.
Concat the string with a %. It's a placeholder/wildcard for "anything or nothing".
SELECT
*
FROM
Table1
INNER JOIN Table2 ON Table1.CustomerName LIKE CONCAT(Table2.Customer, '%');
Concatenating of strings may work differently between DBMS.
It probably is, though this might depend on the Database you are using. For example, in Microsoft SQL, it would work to use somthing like this:
SELECT *
FROM [Table1] INNER JOIN [Table2]
ON LEFT([Table1].[Customer Name],4) = LEFT([Table2].[Customer],4)
Syntax may be different if using other RDBMS. What are you trying this on?
Seems like this should work:
Select *
From Table1, Table2
Where Table1.CustomerName Like Cat('%',Trim(Table2.CustomerName),'%')
If you are only trying to match first four Characters you can use following :
SELECT --your columns
FROM Table1 T1
JOIN Table T2
ON
SUBSTRING ( T1.CustomerName ,1, 4) = SUBSTRING ( T2.Customer ,1, 4)

Hive SQL: How to join number to a string of delimited numbers

I need to join two tables by an ID where one ID is stored as a number (i.e. 12345) the other ID is stored as a pipe delimited string (i.e 12345|12346|12347). Is there a quick way to join on the two? Thanks!
** I guess I should say join if the number ID (12345) is in the string of numbers (12345|12346|12347). In theory this example would join as 12345 is in the pipe delimited string.
This will work in Hive
select obj1.*,obj2.some_fields from table1 obj1
JOIN table2 obj2
on (obj1.id=split(obj2.id,'|')[0])
It's not clear to me if you mean SQL or HiveQL.
Is there a quick way to join on the two?
No, not really.
Your DB schema violates First Normal Form. Joining these tables will be slow and error prone.
For DB-agnostic try:
SELECT *
FROM Table1 t1
INNER JOIN Table2 t2
ON t2.id LIKE ('%' + CAST(t1.id as varchar) + '%')

Fuzzy match column to column

I'm trying to find a way to match a column of clean data in table 1 to a column of dirty data in table2 without making any changes to the dirty data. I was thinking a fuzzy match, but there are too many entries in the clean table to allow for CDEs to be used. So, for example:
Table 1
GroupID CompanyName
123 CompanyA
445 CompanyB
556 CompanyC
Table 2
GroupID Patientname
AE123789 PatientA
123987 PatientB
445111 PatientC
And I'm trying to match the insurance company to the patient using the group number. Is there a matching method out there? (Fortunately the group numbers are actually much longer and when looking for a single group's worth of patients, fuzzy matching works really well, so they seem to be unique enough to be applied here).
Working in SQL server 2008.
This changes slightly depending on which database you are using, but it looks like you're looking for something like this:
MSSQL
select *
from table1 t1
join table2 t2 on t2.groupid like '%'+cast(t1.groupid as varchar(max))+'%'
SQL Fiddle Demo
MySQL - use Concat():
select *
from table1 t1
join table2 t2 on t2.groupid like concat('%',t1.groupid,'%')

Join on ((= Statement) or ((!= Statement), other stipulations))

I have some information where the IdNumbers (not primary key Ids, just random Ids assigned to individuals) are not always correct in my first table.
Therefore I am joining my second table on both Ids and names, and trying to get it to where it will join on names only if the IdNumbers do not match.
I'm working on a query with a join statement that is roughly as follows (I'm leaving out the SELECT, WHERE, and ORDER BY sections because I believe that they are not having an effect on this issue and I don't want to be confusing, as they are stupidly complex - if the portion of the query below should be working like I want it to and the problem is obviously somewhere else, then just tell me so and that will answer my question):
FROM Table1
FULL OUTER JOIN Table2 ON ((Table1.IdNumber = Table2.IdNumber)
OR (Table1.IdNumber != Table2.IdNumber
AND Table1.Lname = Table2.Lname
AND Table1.Fname = Table2.Fname))
However, it is joining the people who have both matching Ids and matching names multiple times like so:
Fname M Lname Table1.IdNumber Table2.IdNumber2
Matthew - Smith 1 2
Matthew H Smith 2 1
Matthew - Smith 1 1
Matthew H Smith 2 2
So it is pulling the last 2 because their ids match, but also joining the first 2 because their ids do not match and their names match, but why is it even joining the first 2 to begin with? I suspect that it ignores the != statement when deciding where to join since the other conditions are fulfilled, but I'd like it to take this != statement into account somehow.
If this should be working, like I said before, just tell me and it will answer my question.
(*EDIT)
Sorry, I should have named these properly - I've revised the names. And the full outer join is necessary, I need everything from both tables no matter what and it's working fine, but thank you for the suggestion.
Given how messy this would be to do in one JOIN, I would suggest using a temp table to hold the relationships.
You can insert all of your IDs from the into first table into a temp table, then do two passes to update a column holding the 2nd table ID - first using where the ID matches, and second where the ID doesn't match but the name does.
You can then use this table to join the two tables, retrieving up to one record from table 2 for each record in table 1.
I think you want something like this:
select t1.*,t2.*
from t1,t2
where t1.id = t2.id
and t1.name = t2.name
union
select t1.*,t2.*
from t1,t2
where t1.id ! t2.id
Your query should work, if the columns are coming from the right tables. Because you are not using table aliases, I suspect that you have an expression such as:
fname1 = fname2
and both columns are in the same table. Or worse:
fname1 = fname1
which is essentially always TRUE (except when fname1 is not null).
Your query might work, but it will be inefficient in most databases, because they will use nested loop optimizations. Consider rewriting the query to be:
from table1 t1 full outer join
table2 byID
on t1.IdNumber = byID.IdNumber full outer join
table2 byName
on t1.fname = byName.fname and t1.lname = byName.lname and t1.idNumber <> byName.idNumber
This will require changing other clauses in your query, typically to something like:
coalesce(byId.column, byName.column) as Column