Delete from table using columns with null values Snowflake - sql

I have an automatic process to make incremental inserts based on defined primary keys of the table. In a few tables, the primary key has null values and we cannot make anything to solve it at the root, so we have to deal with it.
Once the primary keys of the table are obtained, we run a query to delete records as follows:
DELETE FROM table
USING stg_table
WHERE table.pk_1 = stg_table.pk_1 AND table.pk_2 = stg_table.pk_2
This works except if some value in the pk_1 or pk_2 is null. I have done the following to solve it:
DELETE FROM table
USING stg_table
WHERE NVL(table.pk_1, 'null') = NVL(stg_table.pk_1, 'null')
However, this only works if the pk_1 is VARCHAR, otherwise, it will fail.
I could modify my automated script and get the type of each column and depending on that pass a different value to the NVL function, but I was wondering if there is any way of achieving correctly the deletion using just Snowflake.

If you want NULL values to match, you can use:
DELETE FROM table
USING stg_table
WHERE (table.pk_1 = stg_table.pk_1 OR table.pk_1 IS NULL AND stg_table.pk_1 IS NULL) AND
(table.pk_2 = stg_table.pk_2 OR table.pk_2 IS NULL AND stg_table.pk_2 IS NULL)
Or, use EQUAL_NULL(), Snowflake's NULL-safe comparison operator:
DELETE FROM table
USING stg_table
WHERE EQUAL_NULL(table.pk_1, stg_table.pk_1) AND
EQUAL_NULL(table.pk_2, stg_table.pk_2);
All that said, primary keys are never NULL -- by the definition of SQL. And it is really rare to create databases where a NULL value in two different tables are supposed to match.

The IS DISTINCT FROM is null-safe operator and it allow to compare tuples:
DELETE FROM table
USING stg_table
WHERE (table.pk_1,table.pk_2) IS NOT DISTINCT FROM (stg_table.pk_1, stg_table.pk_2);

Related

Inserting from one table or view into another but avoid combination duplicates- Can this be done in SQL? If so how?

I have a table that has a primary key WORKITEMID, and the following 3 foreign keys PRODSERVID,PROCESSID,and TASKKNOWID.
I have a view that I can create that also has PRODSERVID,PROCESSID, AND TASKKNOWID. This view will usually have ALL the records in above table plus some new ones - not in the table. The 'table' by definition is meant to hold the unique combinations of PRODSERVID, PROCESSID, and TASKKNOWID.
I would like to insert from the view into the table any new combinations in the view not present in the table. And I don't want to overwrite the existing WORKITEMIDs in the INSERT- because those WORKITEMIDs are used elsewhere.
Can this be done in SQL?
Thanks
Absolutely, the simplest form of criteria for this is to use the negation of EXISTS()
INSERT INTO [TableName] (PRODSERVID,PROCESSID,TASKKNOWID,... )
SELECT PRODSERVID,PROCESSID,TASKKNOWID,...
FROM [ViewName] v
WHERE NOT EXISTS (
SELECT 1 FROM [TableName] t
WHERE t.PRODSERVID = v.PRODSERVID AND t.PROCESSID = v.PROCESSID AND t.TASKKNOWID = v.TASKKNOWID
)
replace the ... with your other fields
You could also use a non-corellating outer join but I find not exists makes the intent much clearer.
There is a good comparison of the different approaches to this issue in this article: T-SQL commands performance comparison – NOT IN vs SQL NOT EXISTS vs SQL LEFT JOIN vs SQL EXCEPT

How to set all table columns to NOT NULL at once?

Is this even possible, if so how? if not then I would be happy with a way that doesn't require typing each column name one by one. My use-case is that I create a table from a query and would like to make all columns NOT NULL because I later do ORM using Slick and it is a lot nicer to have all those column types not null (and therefore non Option[X]). This is static data so the column values will not be null and won't change either.
Unlike MySQL, Postgres doesn't figure out that the originating query columns are all NOT NULL already.
I'd like to avoid in my script adding the constraint one by one and be prone to breaking the solution whenever the query schema is changed i.e.
CREATE TABLE dentistry_procedure AS SELECT * FROM ...
ALTER TABLE dentistry_procedure ALTER column * SET NOT NULL;
How?
You could use metadata table and build dynamic query:
SELECT format('ALTER TABLE %I '||STRING_AGG(format('ALTER COLUMN %I SET NOT NULL', COLUMN_NAME),CHR(13)||',')
, MIN(TABLE_NAME))
FROM INFORMATION_SCHEMA.COLUMNS
WHERE IS_NULLABLE = 'YES'
AND TABLE_NAME = 't';
db<>Fiddle demo

Adding Row in existing table (SQL Server 2005)

I want to add another row in my existing table and I'm a bit hesitant if I'm doing the right thing because it might skew the database. I have my script below and would like to hear your thoughts about it.
I want to add another row for 'Jane' in the table, which will be 'SKATING" in the ACT column.
Table: [Emp_table].[ACT].[LIST_EMP]
My script is:
INSERT INTO [Emp_table].[ACT].[LIST_EMP]
([ENTITY],[TYPE],[EMP_COD],[DATE],[LINE_NO],[ACT],[NAME])
VALUES
('REG','EMP','45233','2016-06-20 00:00:00:00','2','SKATING','JANE')
Will this do the trick?
Your statement looks ok. If the database has a problem with it (for example, due to a foreign key constraint violation), it will reject the statement.
If any of the fields in your table are numeric (and not varchar or char), just remove the quotes around the corresponding field. For example, if emp_cod and line_no are int, insert the following values instead:
('REG','EMP',45233,'2016-06-20 00:00:00:00',2,'SKATING','JANE')
Inserting records into a database has always been the most common reason why I've lost a lot of my hairs on my head!
SQL is great when it comes to SELECT or even UPDATEs but when it comes to INSERTs it's like someone from another planet came into the SQL standards commitee and managed to get their way of doing it implemented into the final SQL standard!
If your table does not have an automatic primary key that automatically gets generated on every insert, then you have to code it yourself to manage avoiding duplicates.
Start by writing a normal SELECT to see if the record(s) you're going to add don't already exist. But as Robert implied, your table may not have a primary key because it looks like a LOG table to me. So insert away!
If it does require to have a unique record everytime, then I strongly suggest you create a primary key for the table, either an auto generated one or a combination of your existing columns.
Assuming the first five combined columns make a unique key, this select will determine if your data you're inserting does not already exist...
SELECT COUNT(*) AS FoundRec FROM [Emp_table].[ACT].[LIST_EMP]
WHERE [ENTITY] = wsEntity AND [TYPE] = wsType AND [EMP_COD] = wsEmpCod AND [DATE] = wsDate AND [LINE_NO] = wsLineno
The wsXXX declarations, you will have to replace them with direct values or have them DECLAREd earlier in your script.
If you ran this alone and recieved a value of 1 or more, then the data exists already in your table, at least those 5 first columns. A true duplicate test will require you to test EVERY column in your table, but it should give you an idea.
In the INSERT, to do it all as one statement, you can do this ...
INSERT INTO [Emp_table].[ACT].[LIST_EMP]
([ENTITY],[TYPE],[EMP_COD],[DATE],[LINE_NO],[ACT],[NAME])
VALUES
('REG','EMP','45233','2016-06-20 00:00:00:00','2','SKATING','JANE')
WHERE (SELECT COUNT(*) AS FoundRec FROM [Emp_table].[ACT].[LIST_EMP]
WHERE [ENTITY] = wsEntity AND [TYPE] = wsType AND
[EMP_COD] = wsEmpCod AND [DATE] = wsDate AND
[LINE_NO] = wsLineno) = 0
Just replace the wsXXX variables with the values you want to insert.
I hope that made sense.

Find changed rows (composite key with nulls)

Im trying to build a query that will fetch all changed rows from a source table, comparing it to a target table.
The primary key (its not really defined as a primary key, just what we know identifies an unique row) is a composite that consists of lots of foreign keys. Aproximatly about 15, most of which can have NULL values. For simplicity lets say the primary key consists of these three key columns and have 2 value fields that needs to be compared:
CREATE TABLE SourceTable
(
Key1 int NOT NULL,
Key2 nvarchar(10),
Key3 int,
Value1 nvarchar(255),
Value2 int
)
If Key1 = 1, Key2 = NULL and Key3 = 4. Then I would like to compare it to the row in target that has exactly the same values in the key fields. Including NULL in key 2.
The value fields can also have NULL values.
So whats the best approach to use when designing queries like this where NULL values should be considered as real values and compared?
ISNULL? COALESCE? Intersect?
Any suggestions?
ANSI SQL has the IS [NOT] DISTINCT FROM construct that has not been implemented in SQL Server yet (Connect request).
It is possible to simulate this functionality in SQL Server using EXCEPT/INTERSECT however. Both of these treat NULL as equal in comparisons. You are wanting to find rows where the key columns are the same but the value columns are different. So this should do it.
SELECT *
FROM SourceTable S
JOIN DestinationTable D
ON S.Key1 = D.Key1
/*Join the key columns on equality*/
AND NOT EXISTS (SELECT S.Key2,
S.Key3
EXCEPT
SELECT D.Key2,
D.Key3)
/*and the value columns on unequality*/
AND NOT EXISTS (SELECT S.Value1,
S.Value2
INTERSECT
SELECT D.Value1,
D.Value2)
Nulls don't play nice with foreign keys: changing a null to a value will not (in SQL Server) cause it to cascade when updated.
Best to avoid the null value (and for many other reasons too!) Instead get the DBA to nominate some other 'magic' value of the same data type but outside of the domain type. Examples: DATE: far distant or far future date value. INTEGER: zero or negative value. VARCHAR: value in double-curly braces to denote meta data value e.g. '{{NONE}}', '{{UNKNOWN}}', '{{NA}}', etc then a CHECK constraint to ensure values cannot start/end with double curly braces.
Alternatively, model missing information by absence of a tuple in a relvar (closed world assumption) ;)

Intervals: How can I make sure there is just one row with a null value in a timstamp column in table?

I have a table with a column which contains a 'valid until' Date and I want to make sure that this can only be set to null in a single row within the table. Is there an easy way to do this?
My table looks like this (postgres):
CREATE TABLE 123.myTable(
some_id integer NOT NULL,
valid_from timestamp without time zone NOT NULL DEFAULT now(),
valid_until timestamp without time zone,
someString character varying)
some_id and valid_from is my PK. I want nobody to enter a line with a null value in column valid_until if there is already a line with null for this PK.
Thank you
In PostgreSQL, you have two basic approaches.
Use 'infinity' instead of null. Then your unique constraint works as expected. Or if you cannot do that:
CREATE UNIQUE INDEX null_valid_from ON mytable(someid) where valid_until IS NULL
I have used both approaches. I find usually the first approach is cleaner and it allows you to use range types and exclude constraints in newer versions of PostgreSQL better (to ensure no two time ranges overlap based on a given given someid), bt the second approach often is useful where the first cannot be done.
Depending on the database, you can't have null in a primary key (I don't know about all databases, but in sql server you can't). The easiest way around this I can think of is to set the date time to the minimum value, and then add a unique constraint on it, or set it to be the primary key.
I suppose another way would be to set up a trigger to check the other values in the table to see if another entry is null, and if there is one, don't allow the insert.
As Kevin said in his answer, you can set up a database trigger to stop someone from inserting more than one row where the valid until date is NULL.
The SQL statement that checks for this condition is:
SELECT COUNT(*)
FROM TABLE
WHERE valid until IS NULL;
If the count is not equal to 1, then your table has a problem.
The process that adds a row to this table has to perform the following:
Find the row where the valid until value is NULL
Update the valid until value to the current date, or some other meaningful date
Insert the new row with the valid until value set to NULL
I'm assuming you are Storing Effective-dated-records and are also using a valid from date.
If so, You could use CRUD stored procedures to enforce this compliance. E.G the insert closes off any null valid dates before inserting a new record with a null valid date.
You probably need other stored procedure validation to avoid overlapping records and to allow deleting and editing records. It may be more efficient (in terms of where clauses / faster queries) to use a date far in the future rather than using null.
I know only Oracle in sufficient detail, but the same might work in other databases:
create another column which always contains a fixed value (say '0') include this column in your unique key.
Don't use NULL but a specific very high or low value. I many cases this is actually easier to use then a NULL value
Make a function based unique key on a function converting the date including the null value to some other value (e.g. a string representation for dates and 'x' for null)
make a materialized view which gets updated on every change on your main table and put a constraint on that view.
select count(*) cnt from table where valid_until is NULL
might work as the select statement. And a check constraint limiting the cnt value to the values 0 and 1
I would suggest inserting to that table through an SP and putting your constraint in there, as triggers are quite hidden and will likely be forgotten about. If that's not an option, the following trigger will work:
CREATE TABLE dbo.TESTTRIGGER
(
YourDate Date NULL
)
CREATE TRIGGER DupNullDates
ON dbo.TESTTRIGGER
FOR INSERT, UPDATE
AS
DECLARE #nullCount int
SELECT #nullCount = (SELECT COUNT(*) FROM TESTTRIGGER WHERE YourDate IS NULL)
IF(#NullCount > 1)
BEGIN
RAISERROR('Cannot have Multiple Nulls', 16, 1)
ROLLBACK TRAN
END
GO
Well if you use MS SQL you can just add a unique Index on that column. That will allow only one NULL. I guess that if you use other RDBMS, this will still function.