How to convert a varchar value to datatype int in SQL Server 2008 with inner join - sql

I have two lookup tables MyProviders and MyGroups. In my stored procedure, I have a temp table (replaced with an actual table for this example) with data. One column EntityId refers to either provider or a group. EntityTypeId tells me in that temp table if the entity is 1 = Provider or 2 = Group. EntityId can either have numeric GroupId or alphanumeric ExternalProviderId.
I want to check if there is any record in my temp table that has an invalid combination of clientOid + entityid from myprovider and mygroup table.
create table MyProviders
(
id int,
clientoid varchar(20),
externalproviderid varchar(20),
name varchar(25)
)
create table MyGroups
(
id int,
clientoid varchar(20),
name varchar(25)
)
create table MyJobDetails
(
clientoid varchar(20),
entityid varchar(20),
entitytypeid int,
entityname varchar(30)
)
insert into MyJobDetails values ('M.OID', 'MONYE', 1, 'Mark')
insert into MyJobDetails values ('M.OID', 2, 1, 'Lori')
insert into MyJobDetails values ('M.OID', 2, 2, 'Group 1')
insert into MyJobDetails values ('M.OID', 44444, 2, 'Group 2')
insert into MyProviders values (1, 'M.OID', 'MONY', 'Richard')
insert into MyProviders values (2, 'M.OID', '2', 'Mike')
insert into MyProviders values (3, 'M.OID', '3', 'Lori')
insert into MyGroups values (1, 'M.OID', 'Group 1')
insert into MyGroups values (2, 'M.OID', 'Group 2')
I tried the following query to determine if there is an invalid entity or not.
select
COUNT(*)
from
MyJobDetails as jd
where
not exists (select 1
from MyProviders as p
where p.ClientOID = jd.ClientOID
and p.ExternalProviderID = CAST(jd.EntityId as varchar(20))
and jd.EntityTypeId = 1)
and not exists (select 1
from MyGroups as g
where g.ClientOID = jd.ClientOID
and g.Id = jd.EntityId
and jd.EntityTypeId = 2)
This works as expected until I get an alphanumeric data in my temp table that doesn't exist in provider table. I get the following error message:
Conversion failed when converting the varchar value 'MONYE' to data type int.
I have tried to update the solutions mentioned in other threads to use IsNumeric but it didn't work either. In this example, I need to return 1 for one invalid entry of MONYE which doesn't exist either in MyProvider or MyGroup table.
Also, if I can optimize the query in better way to achieve what I want?

This is a really bad design in my opinion.
Since you're referencing one out of two tables, you cannot enforce referential integrity.
And having different datatypes for your keys makes things even more horrible.
I would use
two separate foreign keys in MyJobDetails - one to MyProvider (varchar(20)) and another one to MyGroup (int)
make them both nullable
establish a proper foreign key relationship to the referenced table for each of those two
This way, both can be the correct datatype for each referenced table, and you won't need the EntityTypeId column anymore.
As a side note: whenever you use Varchar in SQL Server, whether you're defining a parameter, a variable, or using it in a CAST statement, I would recommend to always explicitly define a length for that varchar.
Or do you know what length this varchar in your conversion here is going to be?
CAST(jd.EntityId as varchar)
Use an explicit length - always - it's just a good, safe practice to employ:
CAST(jd.EntityId as varchar(15))

In the second AND NOT EXISTS section you compare g.Id, an int, with jd.EntityId, a varchar. Cast the g.Id as a varchar.
and not exists (select 1
from #MyGroups as g
where g.ClientOID = jd.ClientOID
and CAST(g.Id AS VARCHAR(20)) = jd.EntityId
and jd.EntityTypeId = 2)

Try this
select count(*)
from (
select clientoid,entityid from #MyJobDetails where entitytypeid=1
except
select p.ClientOID ,convert(varchar(200),p.ExternalProviderID) from #MyProviders p inner join #MyJobDetails jd on p.ClientOID = jd.ClientOID and p.ExternalProviderID = CAST(jd.EntityId as varchar(20)) where jd.EntityTypeId = 1
except
select g.ClientOID,convert(varchar(200),g.Id) from #MyGroups g inner join #MyJobDetails jd on g.ClientOID = jd.ClientOID and g.Id = jd.EntityId where jd.EntityTypeId = 2
)a

Related

Insert into table from select only when select returns valid rows

I want to insert into table from select statement but it is required that insert only happens when select returns valid rows. If no rows return from select, then no insertion happens.
insert into products (name, type) select 'product_name', type from prototype where id = 1
However, the above sql does insertion even when select returns no rows.
It tries to insert NULL values.
I know the following sql can check if row exists
select exists (select true from prototype where id = 1)
How to write a single SQL to add the above condition to insert to exclude the case ?
You are inserting the wrong way. See the example below, that doesn't insert any row since none matches id = 1:
create table products (
name varchar(10),
type varchar(10)
);
create table prototype (
id int,
name varchar(10),
type varchar(10)
);
insert into prototype (id, name, type) values (5, 'product5', 'type5');
insert into prototype (id, name, type) values (7, 'product7', 'type7');
insert into products (name, type) select name, type from prototype where id = 1
-- no rows were inserted.

How to join on columns that contain strings that aren't exact matches in SQL Server?

I am trying to create a simple table join on columns from two tables that are equivalent but not exact matches. For example, the row value in table A might be "Georgia Production" and the corresponding row value in table B might be "Georgia Independent Production Co".
I first tried a wild card in the join like this:
select BOLFlatFile.*, customers.City, customers.FEIN_Registration_No, customers.ST
from BOLFlatFile
Left Join Customers on (customers.Name Like '%'+BOLFlatFile.Customer+'%');
and this works great for 90% of the data. However, If the string in table A does not exactly appear in Table B, it returns null.
So back to the above example, if the value for table A were "Georgia Independent", it would work, but if it were "Georgia Production, it would not.
This might be a complicated way of still being wrong, but this works with the sample I've mocked up.
The assumption is that because you are "wildcard searching" a string from one table to another, I am assuming that all of the words in the first table column appear in the second table column, which means by default that the second table column will always have a longer string in it than the first table column.
the second assumption is that there is a unique id on the first table, if there is not then you can create one by using the row_number function and ordering on your string column.
The approach below firstly creates some sample data (I've used tablea and tableb to represent your tables).
Then a dummy table is created to store the uniqueid for your first table and the string column.
Next a loop is invoked to iterate across the string in the dummy table and insert the unique id and the first section of the string followed by a space into the handler table which is what you will use to join the 2 target tables together.
The next section joins the first table to the handler table using the unique id and then joins the second table to the handler table on the key words longer than 3 letters (avoiding "the" "and" etc) joining back to the first table using the assumption that the string in table b is longer than table a (because you are looking for instances of each word in table a column in the corresponding column of table b hence the assumption).
declare #tablea table (
id int identity(1,1),
helptext nvarchar(50)
);
declare #tableb table (
id int identity(1,1),
helptext nvarchar(50)
);
insert #tablea (helptext)
values
('Text to find'),
('Georgia Production'),
('More to find');
insert #tableb (helptext)
values
('Georgia Independent Production'),
('More Text to Find'),
('something Completely different'),
('Text to find');
declare #stringtable table (
id int,
string nvarchar(50)
);
declare #stringmatch table (
id int,
stringmatch nvarchar(20)
);
insert #stringtable (id, string)
select id, helptext from #tablea;
update #stringtable set string = string + ' ';
while exists (select 1 from #stringtable)
begin
insert #stringmatch (id, stringmatch)
select id, substring(string,1,charindex(' ',string)) from #stringtable;
update #stringmatch set stringmatch = ltrim(rtrim(stringmatch));
update #stringtable set string=replace(string, stringmatch, '') from #stringtable tb inner join #stringmatch ma
on tb.id=ma.id and charindex(ma.stringmatch,tb.string)>0;
update #stringtable set string=LTRIM(string);
delete from #stringtable where string='' or string is null;
end
select a.*, b.* from #tablea a inner join #stringmatch m on a.id=m.id
inner join #tableb b on CHARINDEX(m.stringmatch,b.helptext)>0 and len(b.helptext)>len(a.helptext);
It all depends how complex you want to make this matching. There is various ways of matching these strings and some may work better than others. Below is an example of how you can split the names in your BOLFlatFile and Customers tables into separate words by using string_split.
The example below will match anything where all the words in the BOLFlatFile customer field are contained within the customers name field (note: it won't take into account ordering of the strings).
The code below will match the first two strings as expected, but not the last two sample strings.
CREATE TABLE BOLFlatFile
(
[customer] NVARCHAR(500)
)
CREATE TABLE Customers
(
[name] NVARCHAR(500)
)
INSERT INTO Customers VALUES ('Georgia Independent Production Co')
INSERT INTO BOLFlatFile VALUES ('Georgia Production')
INSERT INTO Customers VALUES ('Test String 1')
INSERT INTO BOLFlatFile VALUES ('Test 1')
INSERT INTO Customers VALUES ('Test String 2')
INSERT INTO BOLFlatFile VALUES ('Test 3')
;with BOLFlatFileSplit
as
(
SELECT *,
COUNT(*) OVER(PARTITION BY [customer]) as [WordsInName]
FROM
BOLFlatFile
CROSS APPLY
STRING_SPLIT([customer], ' ')
),
CustomerSplit as
(
SELECT *
FROM
Customers
CROSS APPLY
STRING_SPLIT([name], ' ')
)
SELECT
a.Customer,
b.name
FROM
CustomerSplit b
INNER JOIN
BOLFlatFileSplit a
ON
a.value = b.value
GROUP BY
a.Customer, b.name
HAVING
COUNT(*) = MAX([WordsInName])

How to split data in SQL Server table row

I have table of transaction which contains a column transactionId that has values like |H000021|B1|.
I need to make a join with table Category which has a column CategoryID with values like H000021.
I cannot apply join unless data is same.
So I want to split or remove the unnecessary data contained in TransctionId so that I can join both tables.
Kindly help me with the solutions.
Create a computed column with the code only.
Initial scenario:
create table Transactions
(
transactionId varchar(12) primary key,
whatever varchar(100)
)
create table Category
(
transactionId varchar(7) primary key,
name varchar(100)
)
insert into Transactions
select'|H000021|B1|', 'Anything'
insert into Category
select 'H000021', 'A category'
Add computed column:
alter table Transactions add transactionId_code as substring(transactionid, 2, 7) persisted
Join using the new computed column:
select *
from Transactions t
inner join Category c on t.transactionId_code = c.transactionId
Get a straighforward query plan:
You should fix your data so the columns are the same. But sometimes we are stuck with other people's bad design decisions. In particular, the transaction data should contain a column for the category -- even if the category is part of the id.
In any case:
select . . .
from transaction t join
category c
on transactionid like '|' + categoryid + |%';
Or if the category id is always 7 characters:
select . . .
from transaction t join
category c
on categoryid = substring(transactionid, 2, 7)
You can do this using query :
CREATE TABLE #MyTable
(PrimaryKey int PRIMARY KEY,
KeyTransacFull varchar(50)
);
GO
CREATE TABLE #MyTransaction
(PrimaryKey int PRIMARY KEY,
KeyTransac varchar(50)
);
GO
INSERT INTO #MyTable
SELECT 1, '|H000021|B1|'
INSERT INTO #MyTable
SELECT 2, '|H000021|B1|'
INSERT INTO #MyTransaction
SELECT 1, 'H000021'
SELECT * FROM #MyTable
SELECT * FROM #MyTransaction
SELECT *
FROM #MyTable
JOIN #MyTransaction ON KeyTransacFull LIKE '|'+KeyTransac+'|%'
DROP TABLE #MyTable
DROP TABLE #MyTransaction

Filtering a group of records

Please see the SQL structure below:
CREATE table TestTable (id int not null identity, [type] char(1), groupid int)
INSERT INTO TestTable ([type]) values ('a',1)
INSERT INTO TestTable ([type]) values ('a',1)
INSERT INTO TestTable ([type]) values ('b',1)
INSERT INTO TestTable ([type]) values ('b',1)
INSERT INTO TestTable ([type]) values ('a',2)
INSERT INTO TestTable ([type]) values ('a',2)
The first four records are part of group 1 and the fifth and sixth records are part of group 2.
If there is at least one b in the group then I want the query to only return b's for that group. If there are no b's then the query should return all records for that group.
Here you go
SELECT *
FROM testtable
LEFT JOIN (SELECT distinct groupid FROM TestTable WHERE type = 'b'
) blist ON blist.groupid = testtable.groupid
WHERE (blist.groupid = testtable.groupid and type = 'b') OR
(blist.groupid is null)
How it works
join to a list of items that contain b.
Then in where statement... if we exist in that list just take type b. Otherwise take everything.
As an after-note you could be cute with the where clause like this
WHERE ISNULL(blist.groupid,testtable.groupid) = testtable.groupid
I think this is less clear -- but is often how advanced users will do it.

How do I return the column name in table where a null value exists?

I have a table of more than 2 million rows and over 100 columns. I need to run a query that checks if there are any null values in any row or column of the table and return an ID number where there is a null. I've thought about doing the following, but I was wondering if there is a more concise way of checking this?
SELECT [ID]
from [TABLE_NAME]
where
[COLUMN_1] is null
or [COLUMN_2] is null
or [COLUMN_3] is null or etc.
Your method is fine. If your challenge is writing out the where statement, then you can run a query like this:
select column_name+' is null or '
from information_schema.columns c
where c.table_name = 'table_name'
Then copy the results into a query window and use them for building the query.
I used SQL Server syntax for the query, because it looks like you are using SQL Server. Most databases support the INFORMATION_SCHEMA tables, but the syntax for string concatenation varies among databases. Remember to remove the final or at the end of the last comparison.
You can also copy the column list into Excel and use Excel formulas to create the list.
You can use something similar to the following:
declare #T table
(
ID int,
Name varchar(10),
Age int,
City varchar(10),
Zip varchar(10)
)
insert into #T values
(1, 'Alex', 32, 'Miami', NULL),
(2, NULL, 24, NULL, NULL)
;with xmlnamespaces('http://www.w3.org/2001/XMLSchema-instance' as ns)
select ID,
(
select *
from #T as T2
where T1.ID = T2.ID
for xml path('row'), elements xsinil, type
).value('count(/row/*[#ns:nil = "true"])', 'int') as NullCount
from #T as T1