Will order by preserve? - sql

create table source_table (id number);
insert into source_table values(3);
insert into source_table values(1);
insert into source_table values(2);
create table target_table (id number, seq_val number);
create sequence example_sequence;
insert into target_table
select id, example_sequence.nextval
from
> (select id from source_table ***order by id***);
Is it officially assured that for the id's with the lower values in source_table corresponding sequence's value will also be lower when inserting into the source_table? In other words, is it guaranteed that the sorting provided by order by clause will be preserved when inserting?
EDIT
The question is not: 'Are rows ordered in a table as such?' but rather 'Can we rely on the order by clause used in the subquery when inserting?'.
To even more closely illustrate this, the contents of the target table in the above example, after running the query like select * from target_table order by id would be:
ID | SEQ_VAL
1 1
2 2
3 3
Moreover, if i specified descending ordering when inserting like this:
insert into target_table
select id, example_sequence.nextval
from
> (select id from source_table ***order by id DESC***);
The output of the same query from above would be:
ID | SEQ_VAL
1 3
2 2
3 1
Of that I'm sure, I have tested it multiple times. My question is 'Can I always rely on this ordering?'

Tables in a relational database are not ordered, and any apparent ordering in the result set of a cursor which lacks an ORDER BY is an artifact of data storage, is not guaranteed, and later actions on the table may cause this apparent ordering to change. If you want the results of a cursor to be ordered in a particular manner you MUST use an ORDER BY.

Related

DB2 Using with statement to keep last id

In my project I need to create a script that insert data with auto generate value for the primary key and then to reuse this number for foreign on other tables.
I'm trying to use the WITH statement in order to keep that value.
For instance, I'm trying to do this:
WITH tmp as (SELECT ID FROM (INSERT INTO A ... VALUES ...))
INSERT INTO B ... VALUES tmp.ID ...
But I can't make it work.
Is it at least possible to do it or am I completely wrong???
Thank you
Yes, it is possible, if your DB2-server version supports the syntax.
For example:
create table xemp(id bigint generated always as identity, other_stuff varchar(20));
create table othertab(xemp_id bigint);
SELECT id FROM FINAL TABLE
(INSERT INTO xemp(other_stuff)
values ('a'), ('b'), ('c'), ('d')
) ;
The above snippet of code gives the result below:
ID
--------------------
1
2
3
4
4 record(s) selected.
If you want to re-use the ID to populate another table:
with tmp1(id) as ( SELECT id FROM new TABLE (INSERT INTO xemp(other_stuff) values ('a1'), ('b1'), ('c1'), ('d1') ) tmp3 )
, tmp2 as (select * from new table (insert into othertab(xemp_id) select id from tmp1 ) tmp4 )
select * from othertab;
As per my understanding
You will have to create an auto-increment field with the sequence object (this object generates a number sequence).
You can CREATE SEQUENCE to achieve the auto increment value :
CREATE SEQUENCE seq_person
MINVALUE 1
START WITH 1
INCREMENT BY 1
CACHE 10

Split one large, denormalized table into a normalized database

I have a large (5 million row, 300+ column) csv file I need to import into a staging table in SQL Server, then run a script to split each row up and insert data into the relevant tables in a normalized db. The format of the source table looks something like this:
(fName, lName, licenseNumber1, licenseIssuer1, licenseNumber2, licenseIssuer2..., specialtyName1, specialtyState1, specialtyName2, specialtyState2..., identifier1, identifier2...)
There are 50 licenseNumber/licenseIssuer columns, 15 specialtyName/specialtyState columns, and 15 identifier columns. There is always at least one of each of those, but the remaining 49 or 14 could be null. The first identifier is unique, but is not used as the primary key of the Person in our schema.
My database schema looks like this
People(ID int Identity(1,1))
Names(ID int, personID int, lName varchar, fName varchar)
Licenses(ID int, personID int, number varchar, issuer varchar)
Specialties(ID int, personID int, name varchar, state varchar)
Identifiers(ID int, personID int, value)
The database will already be populated with some People before adding the new ones from the csv.
What is the best way to approach this?
I have tried iterating over the staging table one row at a time with select top 1:
WHILE EXISTS (Select top 1 * from staging)
BEGIN
INSERT INTO People Default Values
SET #LastInsertedID = SCOPE_IDENTITY() -- might use the output clause to get this instead
INSERT INTO Names (personID, lName, fName)
SELECT top 1 #LastInsertedID, lName, fName from staging
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber1, licenseIssuer1 from staging
IF (select top 1 licenseNumber2 from staging) is not null
BEGIN
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber2, licenseIssuer2 from staging
END
-- Repeat the above 49 times, etc...
DELETE top 1 from staging
END
One problem with this approach is that it is prohibitively slow, so I refactored it to use a cursor. This works and is significantly faster, but has me declaring 300+ variables for Fetch INTO.
Is there a set-based approach that would work here? That would be preferable, as I understand that cursors are frowned upon, but I'm not sure how to get the identity from the INSERT into the People table for use as a foreign key in the others without going row-by-row from the staging table.
Also, how could I avoid copy and pasting the insert into the Licenses table? With a cursor approach I could try:
FETCH INTO ...#LicenseNumber1, #LicenseIssuer1, #LicenseNumber2, #LicenseIssuer2...
INSERT INTO #LicenseTemp (number, issuer) Values
(#LicenseNumber1, #LicenseIssuer1),
(#LicenseNumber2, #LicenseIssuer2),
... Repeat 48 more times...
.
.
.
INSERT INTO Licenses(personID, number, issuer)
SELECT #LastInsertedID, number, issuer
FROM #LicenseTEMP
WHERE number is not null
There still seems to be some redundant copy and pasting there, though.
To summarize the questions, I'm looking for idiomatic approaches to:
Break up one large staging table into a set of normalized tables, retrieving the Primary Key/identity from one table and using it as the foreign key in the others
Insert multiple rows into the normalized tables that come from many repeated columns in the staging table with less boilerplate/copy and paste (Licenses and Specialties above)
Short of discreet answers, I'd also be very happy with pointers towards resources and references that could assist me in figuring this out.
Ok, I'm not an SQL Server expert, but here's the "strategy" I would suggest.
Calculate the personId on the staging table
As #Shnugo suggested before me, calculating the personId in the staging table will ease the next steps
Use a sequence for the personID
From SQL Server 2012 you can define sequences. If you use it for every person insert, you'll never risk an overlapping of IDs. If you have (as it seems) personId that were loaded before the sequence you can create the sequence with the first free personID as starting value
Create a numbers table
Create an utility table keeping numbers from 1 to n (you need n to be at least 50.. you can look at this question for some implementations)
Use set logic to do the insert
I'd avoid cursor and row-by-row logic: you are right that it is better to limit the number of accesses to the table, but I'd say that you should strive to limit it to one access for target table.
You could proceed like these:
People:
INSERT INTO People (personID)
SELECT personId from staging;
Names:
INSERT INTO Names (personID, lName, fName)
SELECT personId, lName, fName from staging;
Licenses:
here we'll need the Number table
INSERT INTO Licenses (personId, number, issuer)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then licenseNumber1
when 2 then licenseNumber2
...
when 50 then licenseNumber50
end as licenseNumber,
case nbrs.n
when 1 then licenseIssuer1
when 2 then licenseIssuer2
...
when 50 then licenseIssuer50
end as licenseIssuer
from staging
cross join
(select n from numbers where n>=1 and n<=50) nbrs
) WHERE licenseNumber is not null;
Specialties:
INSERT INTO Specialties(personId, name, state)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then specialtyName1
when 2 then specialtyName2
...
when 15 then specialtyName15
end as specialtyName,
case nbrs.n
when 1 then specialtyState1
when 2 then specialtyState2
...
when 15 then specialtyState15
end as specialtyState
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE specialtyName is not null;
Identifiers:
INSERT INTO Identifiers(personId, value)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then identifier1
when 2 then identifier2
...
when 15 then identifier15
end as value
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE value is not null;
Hope it helps.
You say: but the staging table could be modified
I would
add a PersonID INT NOT NULL column and fill it with DENSE_RANK() OVER(ORDER BY fname,lname)
add an index to this PersonID
use this ID in combination with GROUP BY to fill your People table
do the same with your names table
And then use this ID for a set-based insert into your three side tables
Do it like this
SELECT AllTogether.PersonID, AllTogether.TheValue
FROM
(
SELECT PersonID,SomeValue1 AS TheValue FROM StagingTable
UNION ALL SELECT PersonID,SomeValue2 FROM StagingTable
UNION ALL ...
) AS AllTogether
WHERE AllTogether.TheValue IS NOT NULL
UPDATE
You say: might cause a conflict with IDs that already exist in the People table
You did not tell anything about existing People...
Is there any sure and unique mark to identify them? Use a simple
UPDATE StagingTable SET PersonID=xyz WHERE ...
to set existing PersonIDs into your staging table and then use something like
UPDATE StagingTable
SET PersonID=DENSE RANK() OVER(...) + MaxExistingID
WHERE PersonID IS NULL
to set new IDs for PersonIDs still being NULL.

Inserting a Row in a Table from Select Query

I'm using two Stored Procedures and two Table in My Sql Server.
First Table Structure.
Second Table Structure.
When a Customer books a Order then the Data will be Inserted in Table 1.
I'm using a Select Query in another Page Which Selects the Details from the Second Table.
If a row with a billno from first table is not Present in Second Table I want to Insert into the Second Table with some Default Values in the Select Query. How can I do this
??
If you want to insert in the same query, you will have to create a stored procedure. There you'll query if row exists in second table, and, if not, insert a new entity in second table.
Your code should look something like this:
-- --------------------------------------------------------------------------------
-- Routine DDL
-- Note: comments before and after the routine body will not be stored by the server
-- --------------------------------------------------------------------------------
DELIMITER $$
CREATE DEFINER=`table`#`%` PROCEDURE `insertBill`(IN billNo int, val1 int, val2 int, val3 int)
BEGIN
DECLARE totalres INT DEFAULT 0;
select count(*) from SECOND_TABLE where Bill_Number = billNo INTO totalres;
IF totalres < 1 THEN
INSERT into SECOND_TABLE values(val1,val2,val3);
END IF;
END
Val1,val2 and val3 are the valuest to be inserted into second table.
Hope this helps.
What you do is to LEFT JOIN the two tables and then select only the ones where the second table had no row to join, meaning the bill number were missing.
In the example below, you can replace #default_inform_status and #default_response_status with your default values.
INSERT INTO second_table (Bill_Number, Rest_Inform_Status, Rest_Response_Status)
SELECT ft.Bill_Number, #default_inform_status, #default_response_status
FROM first_table ft
LEFT JOIN second_table st
ON st.Bill_Number = ft.Bill_number
WHERE st.Bill_Number IS NULL
If it is possible to have duplicates of the same Bill_Number in the first table, you should also add a DISTINCT after the SELECT. But considering the fact that it is a primary key, this is no issue for you.

Insert output IDs into another table

I have a status table, and another table containing additional data. My object IDs are the PK in the status table, so I need to insert those into the additional data table for each new row.
I need to insert a new row into my statusTable for each new listing, containing just constants.
declare #temp TABLE(listingID int)
insert into statusTable(status, date)
output Inserted.listingID into #temp
select 1, getdate()
from anotherImportedTable
This gets me enough new listing IDs to use.
I now need to insert the actual listing data into another table, and map each row to one of those listingIDs -
insert into listingExtraData(listingID, data)
select t.listingID, a.data
from #temp t, anotherImportedTable a
Now this obviously doesn't work, because otherDataTable and the IDs in #temp are unrelated... so I get far too many rows inserted.
How can I insert each row from anotherImportedTable into listingExtraData along with a unique newly created listingID? could I possibly trigger some more sql at the point I do the output in the first block of sql?
edit: thanks for the input so far, here's what the tables look like:
anotherImportedTable:
data
statusTable:
listingID (pk), status, date
listingExtraData:
data, listingID
You see that I only want to create one entry into statusTable per row in anotherImportedTable, then put one listingID with a row from anotherImportedTable into listingExtraData... I'm thinking that I might have to resort to a cursor perhaps?
Ok, here's how you can do it (if I'm right about what you actually want to do):
insert into listingExtraData(listingID, data)
select q1.listingID, q2.data
from
(select ListingID, ROW_NUMBER() OVER (order by ListingID) as rn from #temp t) as q1
inner join (select data, ROW_NUMBER() over (order by data) as rn from anotherImportedTable) q2 on q1.rn = q2.rn
In case you matching logic differs you will need to change sorting of anotherImportedTable. In case your match order can not be achieved by ordering anotherImportTable [in one way or another] then you're out of luck.

Normalizing a table, from one to the other

I'm trying to normalize a mysql database....
I currently have a table that contains 11 columns for "categories". The first column is a user_id and the other 10 are category_id_1 - category_id_10. Some rows may only contain a category_id up to category_id_1 and the rest might be NULL.
I then have a table that has 2 columns, user_id and category_id...
What is the best way to transfer all of the data into separate rows in table 2 without adding a row for columns that are NULL in table 1?
thanks!
You can create a single query to do all the work, it just takes a bit of copy and pasting, and adjusting the column name:
INSERT INTO table2
SELECT * FROM (
SELECT user_id, category_id_1 AS category_id FROM table1
UNION ALL
SELECT user_id, category_id_2 FROM table1
UNION ALL
SELECT user_id, category_id_3 FROM table1
) AS T
WHERE category_id IS NOT NULL;
Since you only have to do this 10 times, and you can throw the code away when you are finished, I would think that this is the easiest way.
One table for users:
users(id, name, username, etc)
One for categories:
categories(id, category_name)
One to link the two, including any extra information you might want on that join.
categories_users(user_id, category_id)
-- or with extra information --
categories_users(user_id, category_id, date_created, notes)
To transfer the data across to the link table would be a case of writing a series of SQL INSERT statements. There's probably some awesome way to do it in one go, but since there's only 11 categories, just copy-and-paste IMO:
INSERT INTO categories_users
SELECT user_id, 1
FROM old_categories
WHERE category_1 IS NOT NULL