I have a table with three columns
unified_id
master_id
(the third column is a primary key identity)
For an id which could be in either id column, I want to find the set of ids.
For example, if the data looks like this
X001,X123
X002,X123
X123,X123
then starting from X001 I want to return with the set of X001, X002 and X123. Likewise I want the same set starting with any member.
We can assume that the set is composed of values in the unified_id column, that is we assume that there is always a row like the third one, relating the master id to itself.
When I have just one row the follow query hits maximum recursion level, so I frankly have no idea how to solve this with recursive queries.
2BM0000023, 1TE0000111
Code:
CREATE TABLE link_unified_id
(unified_id VARCHAR(16) NOT NULL,
master_id VARCHAR(16) NOT NULL
);
insert into link_unified_id values ('X001','X123');
insert into link_unified_id values ('X002','X123');
insert into link_unified_id values ('X003','X123');
insert into link_unified_id values ('X123','X123');
insert into link_unified_id values ('X999','X999');
with cte1 ( unified_id) as
(
select
up1.unified_id
from
link_unified_id as up1
where
up1.unified_id = 'X001'
union all
select
up2.unified_id
from
link_unified_id as up2
inner join
cte1 on up2.unified_id = cte1.unified_id
)
select *
from cte1
/* hoping to see output like this:
X001
X002
X003
X123
*/
Maybe I don't get it but why not just join the master id from the selected item to the same table using that master id?
Something like this:
select lu1.unified_id
from link_unified_id lu1,
link_unified_id lu2
where lu1.master_id = lu2.master_id
and lu2.unified id = 'X001'
In my opinion, my requirement is to build an "undirected graph" and then find paths in it (not sure what the terminology is), and this the solution to this in SQL is rather procedural (e.g. Recursive CTE in presence of circular references). I have decided to actually build a graph by inserting these pairs via a Python library which seems more elegant than trying to do this in SQL.
Related
I have the tables below:
tb_profile tb_mbx tb_profile_mbx tb_profile_cd
id id id_profile id_perfil
cod_mat mbx id_mbx id_cd (matches id_mbx)
concil bp
masc
I need to create a query that when validating that the id_cd 1,2,4,5
and 6 exists in tb_profile_cd, perform an insert in the
tb_profile_mbx table with the cod_matrix parameters of the tb_profile
table.
Remembering that each concil has its ID in the tb_mbx table and a
concil has many cod_mat.
Another point is that the concil id_mbx represents the id_cd of the
tb_profile_cd table.
One more point is that as I said above, that a concil has many
cod_mat. I have around 20 thousand records for each concil.
For my need, try to consult the query below, but Oracle returned an error:
insert into tb_profile_mbx values (seq_profile_mbx.nextval,
(select id from tb_profile where concil like '%NEXXERA%')
,(select id from tb_mbx where mbx like '%NEXXERA%')
,null
,null);
ORA-01427: single line subquery returns more than one line
Would there be another way to do this query?
Thanks in advance!
You can insert all matching combinations of matches using:
insert into tb_profile_mbx (id_profile, id_mbx)
select p.id, m.id
from tb_profile p join
tb_mbx m
on p.concil like '%NEXXERA%' and m.mbx like '%NEXXERA%';
I would recommend running the select to see if it returns the values you want.
You only show two columns for tb_profile_mbx, so I've only included two columns in the insert.
I have a large (5 million row, 300+ column) csv file I need to import into a staging table in SQL Server, then run a script to split each row up and insert data into the relevant tables in a normalized db. The format of the source table looks something like this:
(fName, lName, licenseNumber1, licenseIssuer1, licenseNumber2, licenseIssuer2..., specialtyName1, specialtyState1, specialtyName2, specialtyState2..., identifier1, identifier2...)
There are 50 licenseNumber/licenseIssuer columns, 15 specialtyName/specialtyState columns, and 15 identifier columns. There is always at least one of each of those, but the remaining 49 or 14 could be null. The first identifier is unique, but is not used as the primary key of the Person in our schema.
My database schema looks like this
People(ID int Identity(1,1))
Names(ID int, personID int, lName varchar, fName varchar)
Licenses(ID int, personID int, number varchar, issuer varchar)
Specialties(ID int, personID int, name varchar, state varchar)
Identifiers(ID int, personID int, value)
The database will already be populated with some People before adding the new ones from the csv.
What is the best way to approach this?
I have tried iterating over the staging table one row at a time with select top 1:
WHILE EXISTS (Select top 1 * from staging)
BEGIN
INSERT INTO People Default Values
SET #LastInsertedID = SCOPE_IDENTITY() -- might use the output clause to get this instead
INSERT INTO Names (personID, lName, fName)
SELECT top 1 #LastInsertedID, lName, fName from staging
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber1, licenseIssuer1 from staging
IF (select top 1 licenseNumber2 from staging) is not null
BEGIN
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber2, licenseIssuer2 from staging
END
-- Repeat the above 49 times, etc...
DELETE top 1 from staging
END
One problem with this approach is that it is prohibitively slow, so I refactored it to use a cursor. This works and is significantly faster, but has me declaring 300+ variables for Fetch INTO.
Is there a set-based approach that would work here? That would be preferable, as I understand that cursors are frowned upon, but I'm not sure how to get the identity from the INSERT into the People table for use as a foreign key in the others without going row-by-row from the staging table.
Also, how could I avoid copy and pasting the insert into the Licenses table? With a cursor approach I could try:
FETCH INTO ...#LicenseNumber1, #LicenseIssuer1, #LicenseNumber2, #LicenseIssuer2...
INSERT INTO #LicenseTemp (number, issuer) Values
(#LicenseNumber1, #LicenseIssuer1),
(#LicenseNumber2, #LicenseIssuer2),
... Repeat 48 more times...
.
.
.
INSERT INTO Licenses(personID, number, issuer)
SELECT #LastInsertedID, number, issuer
FROM #LicenseTEMP
WHERE number is not null
There still seems to be some redundant copy and pasting there, though.
To summarize the questions, I'm looking for idiomatic approaches to:
Break up one large staging table into a set of normalized tables, retrieving the Primary Key/identity from one table and using it as the foreign key in the others
Insert multiple rows into the normalized tables that come from many repeated columns in the staging table with less boilerplate/copy and paste (Licenses and Specialties above)
Short of discreet answers, I'd also be very happy with pointers towards resources and references that could assist me in figuring this out.
Ok, I'm not an SQL Server expert, but here's the "strategy" I would suggest.
Calculate the personId on the staging table
As #Shnugo suggested before me, calculating the personId in the staging table will ease the next steps
Use a sequence for the personID
From SQL Server 2012 you can define sequences. If you use it for every person insert, you'll never risk an overlapping of IDs. If you have (as it seems) personId that were loaded before the sequence you can create the sequence with the first free personID as starting value
Create a numbers table
Create an utility table keeping numbers from 1 to n (you need n to be at least 50.. you can look at this question for some implementations)
Use set logic to do the insert
I'd avoid cursor and row-by-row logic: you are right that it is better to limit the number of accesses to the table, but I'd say that you should strive to limit it to one access for target table.
You could proceed like these:
People:
INSERT INTO People (personID)
SELECT personId from staging;
Names:
INSERT INTO Names (personID, lName, fName)
SELECT personId, lName, fName from staging;
Licenses:
here we'll need the Number table
INSERT INTO Licenses (personId, number, issuer)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then licenseNumber1
when 2 then licenseNumber2
...
when 50 then licenseNumber50
end as licenseNumber,
case nbrs.n
when 1 then licenseIssuer1
when 2 then licenseIssuer2
...
when 50 then licenseIssuer50
end as licenseIssuer
from staging
cross join
(select n from numbers where n>=1 and n<=50) nbrs
) WHERE licenseNumber is not null;
Specialties:
INSERT INTO Specialties(personId, name, state)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then specialtyName1
when 2 then specialtyName2
...
when 15 then specialtyName15
end as specialtyName,
case nbrs.n
when 1 then specialtyState1
when 2 then specialtyState2
...
when 15 then specialtyState15
end as specialtyState
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE specialtyName is not null;
Identifiers:
INSERT INTO Identifiers(personId, value)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then identifier1
when 2 then identifier2
...
when 15 then identifier15
end as value
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE value is not null;
Hope it helps.
You say: but the staging table could be modified
I would
add a PersonID INT NOT NULL column and fill it with DENSE_RANK() OVER(ORDER BY fname,lname)
add an index to this PersonID
use this ID in combination with GROUP BY to fill your People table
do the same with your names table
And then use this ID for a set-based insert into your three side tables
Do it like this
SELECT AllTogether.PersonID, AllTogether.TheValue
FROM
(
SELECT PersonID,SomeValue1 AS TheValue FROM StagingTable
UNION ALL SELECT PersonID,SomeValue2 FROM StagingTable
UNION ALL ...
) AS AllTogether
WHERE AllTogether.TheValue IS NOT NULL
UPDATE
You say: might cause a conflict with IDs that already exist in the People table
You did not tell anything about existing People...
Is there any sure and unique mark to identify them? Use a simple
UPDATE StagingTable SET PersonID=xyz WHERE ...
to set existing PersonIDs into your staging table and then use something like
UPDATE StagingTable
SET PersonID=DENSE RANK() OVER(...) + MaxExistingID
WHERE PersonID IS NULL
to set new IDs for PersonIDs still being NULL.
I have a table with a column that can have values separated by ",".
Example column group:
id column group:
1 10,20,30
2 280
3 20
I want to create a SELECT with where condition on column group where I can search for example 20 ad It should return 1 and 3 rows or search by 20,280 and it should return 1 and 2 rows.
Can you help me please?
As pointed out in comments,storing mutiple values in a single row is not a good idea..
coming to your question,you can use one of the split string functions from here to split comma separated values into a table and then query them..
create table #temp
(
id int,
columnss varchar(100)
)
insert into #temp
values
(1,'10,20,30'),
(2, '280'),
(3, '20')
select *
from #temp
cross apply
(
select * from dbo.SplitStrings_Numbers(columnss,',')
)b
where item in (20)
id columnss Item
1 10,20,30 20
3 20 20
The short answer is: don't do it.
Instead normalize your tables to at least 3NF. If you don't know what database normalization is, you need to do some reading.
If you absolutely have to do it (e.g. this is a legacy system and you cannot change the table structure), there are several articles on string splitting with TSQL and at least a couple that have done extensive benchmarks on various methods available (e.g. see: http://sqlperformance.com/2012/07/t-sql-queries/split-strings)
Since you only want to search, you don't really need to split the strings, so you can write something like:
SELECT id, list
FROM t
WHERE ','+list+',' LIKE '%,'+#searchValue+',%'
Where t(id int, list varchar(max)) is the table to search and #searchValue is the value you are looking for. If you need to search for more than one value you have to add those in a table and use a join or subquery.
E.g. if s(searchValue varchar(max)) is the table of values to search then:
SELECT distinct t.id, t.list
FROM t INNER JOIN s
ON ','+t.list+',' LIKE '%,'+s.searchValue+',%'
If you need to pass those search values from ADO.Net consider table parameters.
insert into Orders values ('1111',
(Select CustomerID from Customers where CustomerID = (Select CustomerID from customers where CompanyName= 'erp')),
(Select EmployeeID from Employees where EmployeeID = (Select EmployeeID from Employees where FirstName = 'Hello')),
(Select ShipperID from Shippers Where ShipperID = (Select ShipperID from Shippers where CompanyName= 'Ntat')),
'2014-12-01','2013-12-01','22','22','aa','aa','dd','gs','ga','ga','qq');
i am unable to run this Query as i m getting error :
Error Code: 1242. Subquery returns more than 1 row
Kindly help
The INSERT command comes in two flavors:
(1) either you have all your values available, as literals or SQL Server variables - in that case, you can use the INSERT .. VALUES() approach:
INSERT INTO dbo.YourTable(Col1, Col2, ...., ColN)
VALUES(Value1, Value2, #Variable3, #Variable4, ...., ValueN)
Note: I would recommend to always explicitly specify the list of column to insert data into - that way, you won't have any nasty surprises if suddenly your table has an extra column, or if your tables has an IDENTITY or computed column. Yes - it's a tiny bit more work - once - but then you have your INSERT statement as solid as it can be and you won't have to constantly fiddle around with it if your table changes.
(2) if you don't have all your values as literals and/or variables, but instead you want to rely on another table, multiple tables, or views, to provide the values, then you can use the INSERT ... SELECT ... approach:
INSERT INTO dbo.YourTable(Col1, Col2, ...., ColN)
SELECT
SourceColumn1, SourceColumn2, #Variable3, #Variable4, ...., SourceColumnN
FROM
dbo.YourProvidingTableOrView
Here, you must define exactly as many items in the SELECT as your INSERT expects - and those can be columns from the table(s) (or view(s)), or those can be literals or variables. Again: explicitly provide the list of columns to insert into - see above.
You can use one or the other - but you cannot mix the two - you cannot use VALUES(...) and then have a SELECT query in the middle of your list of values - pick one of the two - stick with it.
For more details and further in-depth coverage, see the official MSDN SQL Server Books Online documentation on INSERT - a great resource for all questions related to SQL Server!
TL;DR
There is a design integrity issue with your application, from which you will not be able to recover at a Sql Query level.
In Detail
Using non-key values to lookup foreign keys during an insert is not a great idea, as you've now found - the error message indicates that one or more of the subqueries has matched multiple rows, and now you are faced with an idempotence issue.
e.g. Lets just say that in this instance, you have more than one Employee with the name 'Hello'. Your options appear to be:
Either attribute the order to the FIRST employee with the name 'Hello' - obviously this is potentially unfair to the real employee who made the sale
Insert multiple orders, one for each employee - but now we risk double shipping and billing issues.
So the real solution is to ensure that you carry all of the key fields (either a Primary or Unique Key, whether natural or surrogate) for each of the FK role columns through your application at all times.
This then means that you can insert the data with confidence
insert into Orders values ('1111',
#CustomerId,
#EmployeeId,
#ShipperId,
'2014-12-01','2013-12-01','22','22','aa','aa','dd','gs','ga','ga','qq');
You will have to do this thing with the help of procedure because you are getting more than one value in select statement....
You will have to pass value one by one in insert statement
create procedure test
as
declare #customerid int
declare #empid int
declare #shipperid int
begin
set #customerid= (Select CustomerID from customers where CompanyName='erp')
set #empid=(Select EmployeeID from Employees where FirstName = 'Hello')
set #shipperid =(Select ShipperID from Shippers where CompanyName='Ntat')
-- but note down that it will assign last value to variable
-- but if it returns more than one value you will have to create a temporary table and --then assign value to it and will have to apply loop
-- like this create #temp1 (customerid id)
insert into orders values(#customerid,#smpid,#shipperid,'val1','val2'...ans so one)
end
this is one on my database tables template.
Id int PK
Title nvarchar(10) unique
ParentId int
This is my question.Is there a problem if i create a relation between "Id" and "ParentId" columns?
(I mean create a relation between a table to itself)
I need some advices about problems that may occur during insert or updater or delete operations at developing step.thanks
You can perfectly join the table with it self.
You should be aware, however, that your design allows you to have multiple levels of hierarchy. Since you are using SQL Server (assuming 2005 or higher), you can have a recursive CTE get your tree structure.
Proof of concept preparation:
declare #YourTable table (id int, parentid int, title varchar(20))
insert into #YourTable values
(1,null, 'root'),
(2,1, 'something'),
(3,1, 'in the way'),
(4,1, 'she moves'),
(5,3, ''),
(6,null, 'I don''t know'),
(7,6, 'Stick around');
Query 1 - Node Levels:
with cte as (
select Id, ParentId, Title, 1 level
from #YourTable where ParentId is null
union all
select yt.Id, yt.ParentId, yt.Title, cte.level + 1
from #YourTable yt inner join cte on cte.Id = yt.ParentId
)
select cte.*
from cte
order by level, id, Title
No, you can do self join in your table, there will not be any problem. Are you talking which types of problems in insert, update, delete operation ? You can check some conditions like ParentId exists before adding new record, or you can check it any child exist while deleting parent.
You can do self join like :
select t1.Title, t2.Title as 'ParentName'
from table t1
left join table t2
on t1.ParentId = t2.Id
You've got plenty of good answers here. One other thing to consider is referential integrity. You can have a foreign key on a table that points to another column in the same table. Observe:
CREATE TABLE tempdb.dbo.t
(
Id INT NOT NULL ,
CONSTRAINT PK_t PRIMARY KEY CLUSTERED ( Id ) ,
ParentId INT NULL ,
CONSTRAINT FK_ParentId FOREIGN KEY ( ParentId ) REFERENCES tempdb.dbo.t ( Id )
)
By doing this, you ensure that you're not going to get garbage in the ParentId column.
Its called Self Join and it can be added to a table as in following example
select e1.emp_name 'manager',e2.emp_name 'employee'
from employees e1 join employees e2
on e1.emp_id=e2.emp_manager_id
I have seen this done without errors before on a table for menu hierarchy you shouldnt have any issues providing your insert / update / delete queries are well written.
For instance when you insert check a parent id exists, when you delete check you delete all children too if this action is appropriate or do not allow deletion of items that have children.
It is fine to do this (it's a not uncommon pattern). You must ensure that you are adding a child record to a parent record that actually exists etc., but there's noting different here from any other constraint.
You may want to look at recursive common table expressions:
http://msdn.microsoft.com/en-us/library/ms186243.aspx
As a way of querying an entire 'tree' of records.
This is not a problem, as this is a relationship that's common in real life. If you do not have a parent (which happens at the top level), you need to keep this field "null", only then do update and delete propagation work properly.