I am in the midst of refactoring my database to support an upgrade to our web-based tool.
Imagine that I have four tables: Milestones, Categories, Skills, and PayRates. Under the old schema, each of these tables only listed a name, and that name was the key for the table.
Under the new schema, each table has not only a name, but also a generated uniqueidentifier that serves as the key for the table.
Now, also imagine that I have a table Tasks, where each Task consists of a name, a Milestone, a Category, a Skill, and a PayRate, and each of these is selected from their respective tables. Under the old schema, this table only stored names. Under the new schema, this table will store the IDs for the four tables instead of the names, like the following:
TaskID TaskName MilestoneID CategoryID SkillID RateID
where TaskID is a generated uniqueidentifier for that Task.
Each of these tables currently contains data that needs to be transferred from the old schema to the new schema. I can assume that names of each of the four components of Tasks, and the names of Tasks themselves, are unique in the old schema.
My question is, what is the simplest query to move the data from the old schema into the new one?
This is being done to support storing two separate lists of Milestones, Tasks, etc. in the same database.
I'd do the following
Step zero: Backup your database
Step one: Add the unique identifiers to Milestones, Categories, Skills and PayRates. This is a simple column addition with default value the next id. This will generate your identifiers for the existing names.
Step two: Add the four new columns to the existing task table (and add foreign keys to the source tables if desired), without deleting the old, name pointing columns.
Step three: Run (assuming the old, name pointing columns are called Milestone, Skill and so on)
update Tasks set MilestoneID = (
select MilestoneID from Milestones where
Milestone = Tasks.Milestone
),
CategoryID = (
select CategoryID from Categories where
Category = Tasks.Category
),
SkillID = (
select SkillID from Skills where
Skill = Tasks.Skill
),
PayRateID = (
select PayRateID from PayRates where
PayRate = Tasks.PayRate
)
Step four: Check everything is in place
Step five: Delete old columns from Tasks table, make new fields non-null
Related
I figure this has to be easy, I'm just not sure how to ask the question.
I have thousands of records I imported from a Excel Spreadsheet in a Microsoft Access table with a field that I want to extract into a new table. How do I move the data from the field in the existing table to a new table and maintain the relationships to the record?
The goal is to move the existing data from the one field into a new table that will have a one-to-many relationship with the existing parent table.
For example, in a table called tblProperties I have the following fields:
Property_Address | Property_Owner | UtilityMeter_Number
I want to maintain a history of utility meters on properties as they are replaced, so I want to move the UtilityMeter_Number field from tblProperties into a new table called tblMeters and create a one-many relationship between the two so I can have multiple meter records for each property record.
How do I move all the existing data from the UtilityMeter_Number field in tblProperties into tblMeters, and maintain the relationship?
What is what I'm trying to do called, and how do I do it?
This is called normalizing data structure.
Use a SELECT DISTINCT query to create unique records. Use that dataset as source to create a new table. Something like:
SELECT DISTINCT CustID, LName, FName, MName INTO Customers FROM Orders;
Now delete unnecessary LName, FName, MName fields from Orders table.
Tables are related on the common CustID fields. An autonumber primary key is not utilized. If you do want a relationship on autonumber PK, then continue with following steps:
add an autonumber field in new table
create a number field in original table
run an UPDATE action SQL to populate new number field with autonumber value from new table - join tables on common CustID fields
also delete CustID field from original table
Consider my data as inventory list separated by categories.
When I started I had one table that should have been split into two tables, else in the oldTable the columns in a given row would have been un-related. I have created two new tables in my database, one for categories and the other for data/items. Now I am trying to use the oldTable existing data to fill the newTable data/items table so I can learn SQL and not have to manually do it. The categories table I filled in manually because I could not see how to do it otherwise.
The old table has:
tableName (
id,
categoryA,
categoryB,
categoryC,
categoryD,
categoryE,
categoryF,
isPriorityA,
isPriorityB,
isPriorityC,
isPriorityD,
isPriorityE,
isPriorityE
)
The new tables have:
Categories (
cat_id,
name
)
dataItem (
item_id,
cat_id,
name,
priority,
description,
URL
)
How do I force the new dataItem table to require the cat_id match one of the values in the Categories.cat_id table column? Perhaps to give an error if a value is added outside of the range? I believe this may be mapping or linking tables, to thereby make them relationship tables.
How do I copy the tableName data to the dataItem table one column at a time in alphabetical order bringing the name,priority with it and allowing it to auto-increment the item_id value?
Sounds like you want to use a foreign key to limit dataItem.cat_id to values in Categories.cat_Id. Something like this:
ALTER TABLE dataItem ADD FOREIGN KEY (cat_id) REFERENCES Categories(cat_id);
Exact syntax may depend on which database you are using. For more info on foreign keys see: http://www.w3schools.com/sql/sql_foreignkey.asp
In our application user can create different lists (like sharepoint) for example a user can create a list of cars (name, model, brand) and a list of students (name, dob, address, nationality), e.t.c.
Our application should be able to query on different columns of the list so we can't just serialize each row and save it in one row.
Should I create a new table at runtime for each newly created list? If this was the best solution then probably Microsoft SharePoint would have done it as well I suppose?
Should I use the following schema
Lists (Id, Name)
ListColumns (Id, ListId, Name)
ListRows (Id, ListId)
ListData(RowId, ColumnId, Value)
Though a single row will create as many rows in list data table as there are columns in the list, this just doesn't feel right.
Have you dealt with this situation? How did you handle it in database?
what you did is called EAV (Entity-Attribute-Value Model).
For a list with 3 columns and 1000 entries:
1 record in Lists
3 records in ListColumns
and 3000 Entries in ListData
This is fine. I'm not a fan of creating tables on-the-fly because it could mess up your database and you would have to "generate" your SQL queries dynamically. I would get a strange feeling when users could CREATE/DROP/ALTER Tables in my database!
Another nice feature of the EAV model is that you could merge two lists easily without droping and altering a table.
Edit:
I think you need another table called ListRows that tells you which ListData records belong together in a row!
Well I've experienced something like this before - I don't want to share the actual table schema so lets do some thought exercises using some of the suggested table structures:
Lets have a lists table containing a list of all my lists
Lets also have a columns table containing the metadata (column names)
Now we need a values table which contains the column values
We also need a rows table which contains a list of all the rows, otherwise it gets very difficult to work out how many rows there actually are
To keep things simple lets just make everything a string (VARCAHR) and have a go at coming up with some queries:
Counting all the rows in a table
SELECT COUNT(*) FROM [rows]
JOIN [lists]
ON [rows].list_id = [Lists].id
WHERE [Lists].name = 'Cars'
Hmm, not too bad, compared to:
SELECT * FROM [Cars]
Inserting a row into a table
BEGIN TRANSACTION
DECLARE #row_id INT
DECLARE #list_id INT
SELECT #list_id = id FROM [lists] WHERE name = 'Cars'
INSERT INTO [rows] (list_id) VALUES (#list_id)
SELECT #row_id = ##IDENTITY
DECLARE #column_id INT
-- === Need one of these for each column ===
SELECT #column_id = id FROM [columns]
WHERE name = 'Make'
AND list_id = #list_id
INSERT INTO [values] (column_id, row_id, value)
VALUES (#column_id, #row_id, 'Rover')
-- === Need one of these for each column ===
SELECT #column_id = id FROM [columns]
WHERE name = 'Model'
AND list_id = #list_id
INSERT INTO [values] (column_id, row_id, value)
VALUES (#column_id, #row_id, 'Metro')
COMMIT TRANSACTION
Um, starting to get a little bit hairy compared to:
INSERT INTO [Cars] ([Make], [Model}) VALUES ('Rover', 'Metro')
Simple queries
I'm now getting bored of constructing tediously complex SQL statements so maybe you can have a go at coming up with equivalent queries for the followng statements:
SELECT [Model] FROM [Cars] WHRE [Make] = 'Rover'
SELECT [Cars].[Make], [Cars].[Model], [Owners].[Name] FROM [Cars]
JOIN [Owners] ON [Owners].id = [Cars].owner_id
WHERE [Owners].Age > 50
SELECT [Cars].[Make], [Cars].[Model], [Owners].[Name] FROM [Cars]
JOIN [Owners] ON [Owners].id = [Cars].owner_id
JOIN [Addresses] ON [Addresses].id = [Owners].address_id
WHERE [Addresses].City = 'London'
I hope you are beginning to get the idea...
In short - I've experienced this before and I can assure you that creating a database inside a database in this way is definitely a Bad Thing.
If you need to do anything but the most basic querying on these lists (and literally I mean "Can I have all the items in this list please?"), you should try and find an alternative.
As long as each user pretty much has their own database I'll definitely recommend the CREATE TABLE approach. Even if they don't I'd still recommend that you at least consider it.
Perhaps a potential solution would be the creating of lists can involve CREATE TABLE statements for those entities/lists?
It sounds like the db structure or schema can change at runtime, or at the user's command, so perhaps something like this might help?
User wants to create a new list of an entity never seen before. Call it Computer.
User defines the attributes (screensize, CpuSpeed, AmountRAM, NumberOfCores)
System allows user to create in the UI
system generally lets them all be strings, unless can tell when all supplied values are indeed dates or numbers.
build the CREATE scripts, execute them against the DB.
insert the data that the user defined into that new table.
Properly coded, we're working with the requirements given: let users create new entities. There was no mention of scale here. Of course, this requires all input to be sanitized, queries parameterized, actions logged, etc.
The negative comment below doesn't actually give any good reasons, but creates a bit of FUD. I'd be interested in addressing any concerns with this potential solution. We haven't heard about scale, security, performance, or usage (internal LAN vs. internet).
You should absolutely not dynamically create tables when your users create lists. That isn't how databases are meant to work.
Your schema is correct, and the pluralization is, in my opinion, also correct, though I would remove the camel case and call them lists, list_columns, list_rows and list_data.
I would further improve upon your schema by skipping rows and columns tables, they serve no purpose. Simply have a row/column number attached to each cell, and keep things sparse: Don't bother holding empty cells in the database. You retain the ability to query/sort based on row/column, your queries will be (potentially very much) faster because the number of list_cells will be reduced, and you won't have to do any crazy joining to link your data back to its table.
Here is the complete schema:
create table lists (
id int primary key,
name varchar(25) not null
);
create table list_cells (
id int primary key,
list_id int not null references lists(id)
on delete cascade on update cascade,
row int not null,
col int not null,
data varchar(25) not null
);
It sounds like you might have Sharepoint already deployed in your environment.
Consider integrating your application with Sharepoint, and have it be your datastore. No need to recreate all the things you like about Sharepoint, when you could leverage it.
It'd take a bit of configuring, but you could call SP web services to CRUD your list data for you.
inserting list data into Sharepoint via web services
reading SP lists via web services
Sharepoint 2010 can also expose lists via OData, which would be simple to consume from any application.
I have a SQLite database of notes that have columns _id, title, details, listid
_id is the auto incremented primary key
title and details are string data fields
listid is a foreign key pointing to a list name in another table.
I'd like to find a way to have notes that are in multiple lists or notes that are linked in such a way that updating one will update the other or be edited simultaneously by some other means.
The overall goal is to have copies of the same note in multiple lists where you edit one and the rest update automatically.
I've thought of adding an extra column with a sort of link id that will be shared by all linked notes, creating a way to update other notes.
Have three tables:
NOTE: _id, title, details
LIST: _id, listname
NOTES_IN_LIST: note_id, list_id
Then whenever you add a note to a list, you add a new row to NOTES_IN_LIST that connects that note ('s note_id) to the list ('s list_id).
Whenever you edit a note, you just edit it in the NOTE table.
Whenever you list the contents of the list that you have the id for, you do a SELECT something like:
SELECT title, details
from NOTE
where NOTE._id in (
SELECT note_id from NOTES_IN_LIST
where list_id=<your list id>
)
or
SELECT title, details
from NOTE, NOTES_IN_LIST
where
NOTE._id=NOTES_IN_LIST.note_id
and
NOTES_IN_LIST.list_id=<your list id>
Hmm, to transfer old notes to new structure, I would:
create a new notes table with a new autoincrement id field
then select distinct (note title, note details) into that new notes table
then join the old notes table to the new notes table on old_title=new_title and old_detail=new_detail, then select from that the new note id and the old list id, then insert the resulting table into the NOTES_IN_LIST table
then I think you can delete the old notes table
Make sure noone edits or adds notes while this is happening, or you will lose notes.
Also you will need to update the UI to work into the new notes table, put notes to lists not by copying but by inserting a new row into NOTES_IN_LIST, etc.
One note can have many lists, One list can have many notes.
you need an associative table that has a note id and a list id
SQLite 3.6.19+ natively supports (and enforces) Foreign Keys, see SQLite Foreign Key Support.
We have three databases that are physically separated by region, one in LA, SF and NY. All the databases share the same schema but contain data specific to their region. We're looking to merge these databases into one and mirror it. We need to preserve the data for each region but merge them into one db. This presents quite a few issues for us, for example we will certainly have duplicate Primary Keys, and Foreign Keys will be potentially invalid.
I'm hoping to find someone who has had experience with a task like this who could provide some tips, strategies and words of experience on how we can accomplish the merge.
For example, one idea was to create composite keys and then change our code and sprocs to find the data via the composite key (region/original pk). But this requires us to change all of our code and sprocs.
Another idea was to just import the data and let it generate new PK's and then update all the FK references to the new PK. This way we potentially don't have to change any code.
Any experience is welcome!
I have no first-hand experience with this, but it seems to me like you ought to be able to uniquely map PK -> New PK for each server. For instance, generate new PKs such that data from LA server has PK % 3 == 2, SF has PK % 3 == 1, and NY has PK % 3 == 0. And since, as I understood your question anyway, each server only stores FK relationships to its own data, you can update the FKs in identical fashion.
NewLA = OldLA*3-1
NewSF = OldLA*3-2
NewNY = OldLA*3
You can then merge those and have no duplicate PKs. This is essentially, as you already said, just generating new PKs, but structuring it this way allows you to trivially update your FKs (assuming, as I did, that the data on each server is isolated). Good luck.
BEST: add a column for RegionCode, and include it on your PKs, but you don't want to do all the leg work.
HACK: if your IDs are INTs, a quick fix would be to add a fixed value based on region to each key on import. INTs can be as large as: 2,147,483,647
local server data:
LA IDs: 1,2,3,4,5,6
SF IDs: 1,2,3,4,5
NY IDs: 1,2,3,4,5,6,7,9
add 100000000 to LA's IDs
add 200000000 to SF's IDs
add 300000000 to NY's IDs
combined server data:
LA IDs: 100000001,100000002,100000003,100000004,100000005,100000006
SF IDs: 200000001,200000002,200000003,200000004,200000005
NY IDs: 300000001,300000002,300000003,300000004,300000005,300000006,300000007,300000009
I have done this and I say change your keys (pick a method) rather than changing your code. Invariably you will either miss a stored procedure or introduce a bug. With data changes, it is pretty easy to write tests to look for orphaned records or to verify that things were matched up correctly. With code changes, especially code that is working correctly, it is too easy to miss something.
One thing you could do is set up the tables with regional data to use GUID's. That way, the primary keys in each region are unique, and you can mix and match data (import data from one region to another). For the tables which have shared data (like type tables), you can keep the primary keys the way they are (since they should be the same everywhere).
Here is some information about GUID's:
http://www.sqlteam.com/article/uniqueidentifier-vs-identity
Maybe SQL Server Management Studio lets you convert columns to use GUID's easily. I hope so!
Best of luck.
what i have done in a situation like this is this:
create a new db with the same schema
but only tables. no pk fk, checks
etc.
transfer data from DB1 to this
source db
for each table in target database
find the top number for the PK
for each table in the source
database update their pk, fk etc
starting with the (top number + 1)
from the target db
for each table in target database
set identity insert to on
import data from source db to target
db
for each table in target database
set identity insert to off
clear source db
repeat for DB2
As Jon mentioned, I would use GUIDs to solve the merge task. And I see two different solutions that required GUIDs:
1) Permanently change your database schema to use GUIDs instead of INTEGER (IDENTITY) as primary key.
This is a good solution in general, but if you have a lot of non SQL code that is somehow bound to the way your identifiers work, it could require quite some code changes. Probably since you merge databases, you may anyways need to update your application so that it is working with one region data only based on the user logged in etc.
2) Temporarily add GUIDs for migration purposes only, and after the data is migrated, drop them:
This one is kind-of more tricky, but once you write this migration script, you can (re-)run it multiple times to merge databases again in case you screw it the first time. Here is an example:
Table: PERSON (ID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL)
Table: ADDRESS (ID INT PRIMARY KEY, City VARCHAR(100) NOT NULL, PERSON_ID INT)
Your alter scripts are (note that for all PK we automatically generate the GUID):
ALTER TABLE PERSON ADD UID UNIQUEIDENTIFIER NOT NULL DEFAULT (NEWID())
ALTER TABLE ADDRESS ADD UID UNIQUEIDENTIFIER NOT NULL DEFAULT (NEWID())
ALTER TABLE ADDRESS ADD PERSON_UID UNIQUEIDENTIFIER NULL
Then you update the FKs to be consistent with INTEGER ones:
--// set ADDRESS.PERSON_UID
UPDATE ADDRESS
SET ADDRESS.PERSON_UID = PERSON.UID
FROM ADDRESS
INNER JOIN PERSON
ON ADDRESS.PERSON_ID = PERSON.ID
You do this for all PKs (automatically generate GUID) and FKs (update as shown above).
Now you create your target database. In this target database you also add the UID columns for all the PKs and FKs. Also disable all FK constraints.
Now you insert from each of your source databases to the target one (note: we do not insert PKs and integer FKs):
INSERT INTO TARGET_DB.dbo.PERSON (UID, NAME)
SELECT UID, NAME FROM SOURCE_DB1.dbo.PERSON
INSERT INTO TARGET_DB.dbo.ADDRESS (UID, CITY, PERSON_UID)
SELECT UID, CITY, PERSON_UID FROM SOURCE_DB1.dbo.ADDRESS
Once you inserted data from all the databases, you run the code opposite to the original to make integer FKs consistent with GUIDs on the target database:
--// set ADDRESS.PERSON_ID
UPDATE ADDRESS
SET ADDRESS.PERSON_ID = PERSON.ID
FROM ADDRESS
INNER JOIN PERSON
ON ADDRESS.PERSON_UID = PERSON.UID
Now you may drop all the UID columns:
ALTER TABLE PERSON DROP COLUMN UID
ALTER TABLE ADDRESS DROP COLUMN UID
ALTER TABLE ADDRESS DROP COLUMN PERSON_UID
So at the end you should get a rather long migration script, that should do the job for you. The point is - IT IS DOABLE
NOTE: all written here is not tested.