Im trying to build a query that will fetch all changed rows from a source table, comparing it to a target table.
The primary key (its not really defined as a primary key, just what we know identifies an unique row) is a composite that consists of lots of foreign keys. Aproximatly about 15, most of which can have NULL values. For simplicity lets say the primary key consists of these three key columns and have 2 value fields that needs to be compared:
CREATE TABLE SourceTable
(
Key1 int NOT NULL,
Key2 nvarchar(10),
Key3 int,
Value1 nvarchar(255),
Value2 int
)
If Key1 = 1, Key2 = NULL and Key3 = 4. Then I would like to compare it to the row in target that has exactly the same values in the key fields. Including NULL in key 2.
The value fields can also have NULL values.
So whats the best approach to use when designing queries like this where NULL values should be considered as real values and compared?
ISNULL? COALESCE? Intersect?
Any suggestions?
ANSI SQL has the IS [NOT] DISTINCT FROM construct that has not been implemented in SQL Server yet (Connect request).
It is possible to simulate this functionality in SQL Server using EXCEPT/INTERSECT however. Both of these treat NULL as equal in comparisons. You are wanting to find rows where the key columns are the same but the value columns are different. So this should do it.
SELECT *
FROM SourceTable S
JOIN DestinationTable D
ON S.Key1 = D.Key1
/*Join the key columns on equality*/
AND NOT EXISTS (SELECT S.Key2,
S.Key3
EXCEPT
SELECT D.Key2,
D.Key3)
/*and the value columns on unequality*/
AND NOT EXISTS (SELECT S.Value1,
S.Value2
INTERSECT
SELECT D.Value1,
D.Value2)
Nulls don't play nice with foreign keys: changing a null to a value will not (in SQL Server) cause it to cascade when updated.
Best to avoid the null value (and for many other reasons too!) Instead get the DBA to nominate some other 'magic' value of the same data type but outside of the domain type. Examples: DATE: far distant or far future date value. INTEGER: zero or negative value. VARCHAR: value in double-curly braces to denote meta data value e.g. '{{NONE}}', '{{UNKNOWN}}', '{{NA}}', etc then a CHECK constraint to ensure values cannot start/end with double curly braces.
Alternatively, model missing information by absence of a tuple in a relvar (closed world assumption) ;)
Related
I have an automatic process to make incremental inserts based on defined primary keys of the table. In a few tables, the primary key has null values and we cannot make anything to solve it at the root, so we have to deal with it.
Once the primary keys of the table are obtained, we run a query to delete records as follows:
DELETE FROM table
USING stg_table
WHERE table.pk_1 = stg_table.pk_1 AND table.pk_2 = stg_table.pk_2
This works except if some value in the pk_1 or pk_2 is null. I have done the following to solve it:
DELETE FROM table
USING stg_table
WHERE NVL(table.pk_1, 'null') = NVL(stg_table.pk_1, 'null')
However, this only works if the pk_1 is VARCHAR, otherwise, it will fail.
I could modify my automated script and get the type of each column and depending on that pass a different value to the NVL function, but I was wondering if there is any way of achieving correctly the deletion using just Snowflake.
If you want NULL values to match, you can use:
DELETE FROM table
USING stg_table
WHERE (table.pk_1 = stg_table.pk_1 OR table.pk_1 IS NULL AND stg_table.pk_1 IS NULL) AND
(table.pk_2 = stg_table.pk_2 OR table.pk_2 IS NULL AND stg_table.pk_2 IS NULL)
Or, use EQUAL_NULL(), Snowflake's NULL-safe comparison operator:
DELETE FROM table
USING stg_table
WHERE EQUAL_NULL(table.pk_1, stg_table.pk_1) AND
EQUAL_NULL(table.pk_2, stg_table.pk_2);
All that said, primary keys are never NULL -- by the definition of SQL. And it is really rare to create databases where a NULL value in two different tables are supposed to match.
The IS DISTINCT FROM is null-safe operator and it allow to compare tuples:
DELETE FROM table
USING stg_table
WHERE (table.pk_1,table.pk_2) IS NOT DISTINCT FROM (stg_table.pk_1, stg_table.pk_2);
I have a table with 20 columns. The first is my primary key. The remaining columns are attributes about the primary key. I need to evaluate each column on a row by row basis. If a attribute column has a value other than null, then do some further processing.
The way I am familiar with doing this in TSQL would be a while to bump through the rows, capture the values from the columns, evaluate the values for further processing.
Does anyone have any better ideas?
You could unpivot the table and filter out things that are not NULL:
select pk, col, val
from (select pk, col, val
from table t
unpivot (val for col in (attr1, attr2, . . . )) as unpivot
) u
where val is not null;
This will provide a list of the columns and associated non-NULL values. Note: it assumes that the types of the attribute columns are all the same.
select PK from table where col1 is not null or col2 is not null or col3 is not null
etc.
I think the best approach would be to first define if any column actually has a null value. For that you could use something like the following;
Select DATALENGTH(null,'Attr1','Attr2') // Add all the colums. This will return null. All of the columns must be the same type.
This will return null. Let's say that you have a while loop(you could use Cursor as well but they are considered slower compared to while loops) that checks each row with against the results of this statement and when you find the result to be null then you could actually check which column is null. This should speed up the process a little bit.
Also, this one looks pretty easy about finding the rows that has null values: https://dba.stackexchange.com/questions/14864/test-if-any-fields-are-null
Q: Is there any way to implement self-documenting enumerations in "standard SQL"?
EXAMPLE:
Column: PlayMode
Legal values: 0=Quiet, 1=League Practice, 2=League Play, 3=Open Play, 4=Cross Play
What I've always done is just define the field as "char(1)" or "int", and define the mnemonic ("league practice") as a comment in the code.
Any BETTER suggestions?
I'd definitely prefer using standard SQL, so database type (mySql, MSSQL, Oracle, etc) should't matter. I'd also prefer using any application language (C, C#, Java, etc), so programming language shouldn't matter, either.
Thank you VERY much in advance!
PS:
It's my understanding that using a second table - to map a code to a description, for example "table playmodes (char(1) id, varchar(10) name)" - is very expensive. Is this necessarily correct?
The normal way is to use a static lookup table, sometimes called a "domain table" (because its purpose is to restrict the domain of a column variable.)
It's up to you to keep the underlying values of any enums or the like in sync with the values in the database (you might write a code generator to generates the enum from the domain table that gets invoked when the something in the domain table gets changed.)
Here's an example:
--
-- the domain table
--
create table dbo.play_mode
(
id int not null primary key clustered ,
description varchar(32) not null unique nonclustered ,
)
insert dbo.play_mode values ( 0 , "Quiet" )
insert dbo.play_mode values ( 1 , "LeaguePractice" )
insert dbo.play_mode values ( 2 , "LeaguePlay" )
insert dbo.play_mode values ( 3 , "OpenPlay" )
insert dbo.play_mode values ( 4 , "CrossPlay" )
--
-- A table referencing the domain table. The column playmode_id is constrained to
-- on of the values contained in the domain table playmode.
--
create table dbo.game
(
id int not null primary key clustered ,
team1_id int not null foreign key references dbo.team( id ) ,
team2_id int not null foreign key references dbo.team( id ) ,
playmode_id int not null foreign key references dbo.play_mode( id ) ,
)
go
Some people for reasons of "economy" might suggest using a single catch-all table for all such code, but in my experience, that ultimately leads to confusion. Best practice is a single small table for each set of discrete values.
add a foreign key to "codes" table.
the codes table would have the PK be the code value, add a string description column where you enter in the description of the value.
table: PlayModes
Columns: PlayMode number --primary key
Description string
I can't see this as being very expensive, databases are based on joining tables like this.
That information should be in database somewhere and not on comments.
So, you should have a table containing that codes and prolly a FK on your table to it.
I agree with #Nicholas Carey (+1): Static data table with two columns, say “Key” or “ID” and “Description”, with foreign key constraints on all tables using the codes. Often the ID columns are simple surrogate keys (1, 2, 3, etc., with no significance attached to the value), but when reasonable I go a step further and use “special” codes. Following are a few examples.
If the values are a sequence (say, Ordered, Paid, Processed, Shipped), I might use 1, 2, 3, 4, to indicate sequence. This can make things easier if you want to find all “up through” a give stages, such as all orders that have not yet been shipped (ID < 4). If you are into planning ahead, make them 10, 20, 30, 40; this will allow you to add values “in between” existing values, if/when new codes or statuses come along. (Yes, you cannot and should not try to anticipate everything and anything that might have to be done some day, but a bit of pre-planning like this can make some changes that much simpler.)
Keys/Ids are often integers (1 byte, 2 byte, 4 byte, whatever). There’s little cost to make them character values (1 char, 2 char, 3, char, 4 char). That’s character, not variable character. Done this way, you can have mnemonics on your codes, such as
O, P, R, S
Or, Pd, Pr, Sh
Ordr, Paid, Proc, Ship
…or whatever floats your boat. Done this way, I have found that it can save a lot of time when analyzing or debugging. You still want the lookup table, for relational integrity as well as a reminder for the more obscure codes.
There is a need to build constraint on the column that guarantees that only one value in all rows is 1 and all the others are 0.
Solution with triggers exists but I would like to have something built in.
Is such thing possible at all?
Edit
Actually I just noticed you are on SQL Server 2008 you could use a filtered index for this
CREATE UNIQUE NONCLUSTERED INDEX UIX ON YourTable (col) where col = 1
Original Answer
The easiest way would probably be to store this one special pk in a separate one row table. The no more than one row aspect can be enforced with check constraints.
CREATE TABLE OneRowTable
(
lock CHAR(1) DEFAULT 'X' NOT NULL PRIMARY KEY CHECK (lock = 'X'),
OtherTablePK int
);
Otherwise assuming you might have an id field comprised of positive integers you could add a computed column with the following definition
case when col=1 then -1 else id end
and add a unique constraint to that.
I have this table which doesn't have a primary key.
I'm going to insert some records in a new table to analyze them and I'm thinking in creating a new primary key with the values from all the available columns.
If this were a programming language like Java I would:
int hash = column1 * 31 + column2 * 31 + column3*31
Or something like that. But this is SQL.
How can I create a primary key from the values of the available columns? It won't work for me to simply mark all the columns as PK, for what I need to do is to compare them with data from other DB table.
My table has 3 numbers and a date.
EDIT What my problem is
I think a bit more of background is needed. I'm sorry for not providing it before.
I have a database ( dm ) that is being updated everyday from another db ( original source ) . It has records form the past two years.
Last month ( july ) the update process got broken and for a month there was no data being updated into the dm.
I manually create a table with the same structure in my Oracle XE, and I copy the records from the original source into my db ( myxe ) I copied only records from July to create a report needed by the end of the month.
Finally on aug 8 the update process got fixed and the records which have been waiting to be migrated by this automatic process got copied into the database ( from originalsource to dm ).
This process does clean up from the original source the data once it is copied ( into dm ).
Everything look fine, but we have just realize that an amount of the records got lost ( about 25% of july )
So, what I want to do is to use my backup ( myxe ) and insert into the database ( dm ) all those records missing.
The problem here are:
They don't have a well defined PK.
They are in separate databases.
So I thought that If I could create a unique pk from both tables which gave the same number I could tell which were missing and insert them.
EDIT 2
So I did the following in my local environment:
select a.* from the_table#PRODUCTION a , the_table b where
a.idle = b.idle and
a.activity = b.activity and
a.finishdate = b.finishdate
Which returns all the rows that are present in both databases ( the .. union? ) I've got 2,000 records.
What I'm going to do next, is delete them all from the target db and then just insert them all s from my db into the target table
I hope I don't get in something worst : - S : -S
The danger of creating a hash value by combining the 3 numbers and the date is that it might not be unique and hence cannot be used safely as a primary key.
Instead I'd recommend using an autoincrementing ID for your primary key.
Just create a surrogate key:
ALTER TABLE mytable ADD pk_col INT
UPDATE mytable
SET pk_col = rownum
ALTER TABLE mytable MODIFY pk_col INT NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
or this:
ALTER TABLE mytable ADD pk_col RAW(16)
UPDATE mytable
SET pk_col = SYS_GUID()
ALTER TABLE mytable MODIFY pk_col RAW(16) NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
The latter uses GUID's which are unique across databases, but consume more spaces and are much slower to generate (your INSERT's will be slow)
Update:
If you need to create same PRIMARY KEYs on two tables with identical data, use this:
MERGE
INTO mytable v
USING (
SELECT rowid AS rid, rownum AS rn
FROM mytable
ORDER BY
co1l, col2, col3
)
ON (v.rowid = rid)
WHEN MATCHED THEN
UPDATE
SET pk_col = rn
Note that tables should be identical up to a single row (i. e. have same number of rows with same data in them).
Update 2:
For your very problem, you don't need a PK at all.
If you just want to select the records missing in dm, use this one (on dm side)
SELECT *
FROM mytable#myxe
MINUS
SELECT *
FROM mytable
This will return all records that exist in mytable#myxe but not in mytable#dm
Note that it will shrink all duplicates if any.
Assuming that you have ensured uniqueness...you can do almost the same thing in SQL. The only problem will be the conversion of the date to a numeric value so that you can hash it.
Select Table2.SomeFields
FROM Table1 LEFT OUTER JOIN Table2 ON
(Table1.col1 * 31) + (Table1.col2 * 31) + (Table1.col3 * 31) +
((DatePart(year,Table1.date) + DatePart(month,Table1.date) + DatePart(day,Table1.date) )* 31) = Table2.hashedPk
The above query would work for SQL Server, the only difference for Oracle would be in terms of how you handle the date conversion. Moreover, there are other functions for converting dates in SQL Server as well, so this is by no means the only solution.
And, you can combine this with Quassnoi's SET statement to populate the new field as well. Just use the left side of the Join condition logic for the value.
If you're loading your new table with values from the old table, and you then need to join the two tables, you can only "properly" do this if you can uniquely identify each row in the original table. Quassnoi's solution will allow you to do this, IF you can first alter the old table by adding a new column.
If you cannot alter the original table, generating some form of hash code based on the columns of the old table would work -- but, again, only if the hash codes uniquely identify each row. (Oracle has checksum functions, right? If so, use them.)
If hash code uniqueness cannot be guaranteed, you may have to settle for a primary key composed of as many columns are required to ensure uniqueness (e.g. the natural key). If there is no natural key, well, I heard once that Oracle provides a rownum for each row of data, could you use that?