Getting difference from two tables and deleting it in SQL Server - sql

I am facing issue in writing the logic of a query that deletes data which are not existing in either of the 2 tables.
For example, I have a tables "Stage" and "Parent". I am using composite primary keys to uniquely identity records(multiple primary keys).
Stage structure and Data
S_Column1(Primary) | PRIDATA1 | PRIDATA4
S_Column2(Primary) | PRIDATA2 | PRIDATA5
S_Column3(Primary) | PRIDATA3 | PRIDATA6
S_Column4 | DJUC | JDNC
S_Column5 | DSSDC | JDDOS
Parent structure and Data
P_Column1(Primary) | PRIDATA1 | PRIDATA4 | PRIDATA7
P_Column2(Primary) | PRIDATA2 | PRIDATA5 | PRIDATA8
P_Column3(Primary) | PRIDATA3 | PRIDATA6 | PRIDATA9
P_Column4 | DJUC | JDNC | FFED
P_Column5 | DSSDC | JDDOS | NHUY
The above is just a sample of structure and data of two tables.
So basically what I want to do is write a query to delete the row that have
PRIDATA7 PRIDATA8 and PRIDATA9 as primary key because their entries are not present in STAGE TABLE.
I am not skilled, but I know I need to find out matching data using JOIN and delete the rest of the data from PARENT TABLE whose entries aren't present in STAGE TABLE
PS: I will be using this in a Trigger.

Try not exists
delete from parent
where not exists (
select 1
from stage s
where s.S_Column1 = parent.S_Column1 and s.S_Column2 = parent.S_Column2 and s.S_Column3 = parent.S_Column3)

You might be looking for the EXCEPT operator.
Read here: https://msdn.microsoft.com/pl-pl/library/ms188055(v=sql.110).aspx

Related

Postgres / SQL Databases: How to enforce unique combination of key/value Pairs

a new Project requires a dynamic datamodel, meaning that the Properties for a record are stored in a seperate table like this:
Items:
ID | insertiondate
1 | 2017-01-31
Properties:
ID | fk_Item_ID | Key | Value
1 | 1 | referenceNr | 1
2 | 1 | office | O1
...
What i need now is a possibility to enforce that a "referenceNumber" in unique per "office".
so the insertion into this table with the 2 values (1, O2) is ok as well as (2, O1) - but (1, O1) has to violate the Constraint.
Is there a simple way to handle this?
Even if the project really asks for some key/value entries, this doesn't seem to be true for referencenr and office as you want to apply constraints on the pair. So simply put the two in your items table and add the constraint.
The only other option I see is to make the two one entry:
ID | fk_Item_ID | Key | Value
1 | 1 | 'referenceNr/office' | '1/01'
I'd go for the first solution. Have key/value pairs only where absolutely necessary (and where the DBMS may be oblivious as to their content and mutual relations).

Selecting Sorted Records Prior to Target Record

The background to this question is that we have had to hand-roll replication between a 3rd party Oracle database and our SQL Server database since there are no primary keys defined in the Oracle tables but there are unique indexes.
In most cases the following method works fine: we load the values of the columns in the unique index along with an MD5 hash of all column values from each corresponding table in the Oracle and SQL Server databases and are able to then calculate what records need to be inserted/deleted/updated.
However, in one table the sheer number of rows precludes us from loading all records into memory from the Oracle and SQL Server databases. So we need to do the comparison in blocks.
The method I am considering is: to query the first n records from the Oracle table and then - using the same sort order - to query the SQL Server table for all records up to the last record that was returned from the Oracle database and then compare the two data sets for what needs to be inserted/deleted/updated.
Then once that has been done to load the next n records from the Oracle database and query the records in the SQL Server table that when sorted in the same way fall between (and include) the first and last records in that data set.
My question is: how to achieve this in SQL Server? If I have the values of the nth record (having queried the table in Oracle with a certain sort order) how can I return the range of records up to and including the record with those values from SQL Server?
Example
I have the following table:
| Id | SOU_ORDREF | SOU_LINESEQ | SOU_DATOVER | SOU_TIMEOVER | SOU_SEQ | SOU_DESC |
|-----------------------------------------------------|------------|-------------|-------------------------|------------------|---------|------------------------|
| AQ000001_10_25/07/2004 00:00:00_14_1 | AQ000001 | 10 | 2004-07-2500:00:00.000 | 14 | 1 | Black 2.5mm Cable |
| AQ000004_91_26/07/2004 00:00:00_15.4833333333333_64 | AQ000004 | 91 | 2004-07-26 00:00:00.000 | 15.4333333333333 | 63 | 2.5mm Yellow Cable |
| AQ000005_31_26/07/2004 00:00:00_10.8333333333333_18 | AQ000005 | 31 | 2004-07-26 00:00:00.000 | 10.8333333333333 | 18 | Rotary Cam Switch |
| AQ000012_50_26/07/2004 00:00:00_11.3_17 | AQ000012 | 50 | 2004-07-26 00:00:00.000 | 11.3 | 17 | 3Mtr Heavy Gauge Cable |
The Id field is basically a concatenation of the five fields which make up the unique index on the table i.e. SOU_ORDREF, SOU_LINESEQ, SOU_DATOVER, SOU_TIMEOVER, and SOU_SEQ.
What I would like to do is to be able to query, for example, all the records (when sorted by those columns) up to the record with the Id 'AQ000005_31_26/07/2004 00:00:00_10.8333333333333_18' which would give us the following result (I'll just show the ids):
| Id |
|-----------------------------------------------------|
| AQ000001_10_25/07/2004 00:00:00_14_1 |
| AQ000004_91_26/07/2004 00:00:00_15.4833333333333_64 |
| AQ000005_31_26/07/2004 00:00:00_10.8333333333333_18 |
So, the query has not included the record with Id 'AQ000012_50_26/07/2004 00:00:00_11.3_17' since it comes after 'AQ000005_31_26/07/2004 00:00:00_10.8333333333333_18' when we order by SOU_ORDREF, SOU_LINESEQ, SOU_DATOVER, SOU_TIMEOVER, and SOU_SEQ.

nullable foreign key columns denormalized table vs many normalized tables

In our entitlement framework each "resource" (resource is nothing but any entity that you want to protect by assigning privileges to roles which can access or not access based on privileges) is stored in a resource table like below.
DESIGN1
RESOURCE TABLE
id (int) | namespace (varchar) | entity_id | black_listed (boolean)
1 | com.mycompany.core.entity1 |24 | false
2 | com.mycompany.core.entity2 |24 | false --note entity2
3 | com.mycompany.core.entity10 |3 | false -- note entity10
each resource in the table represent different entity e.g. entity1,entity2,..,entity10. basically that's nothing but entity1.id, entity2.id, entity3.id, ... and so on. because RESOURCE table keeps resources for all kinds of entity - entity_id column in RESOURCE table can't have proper foreign key relationship constraint. we are thinking to refactor this schema such as follow
DESIGN 2
RESOURCE TABLE
id | description | entity1_id | entity2_id | entity3_id | entity4_id | entity5_id | entity6_id | black_listed(boolean)
1 | com.mycompany.core.entity1|24 | null | null | null |null | null
2 | com.mycompany.core.entity2|null | 24 | null | null |null | null
now entity1_id will have a proper FK to entity1 table , entity2_id will have proper FK to entity2 and so on. downside of this approach is that this table will always have null values for all the columns BUT one. e.g. you can only have one entity resource per row. also having null seems to be anti pattern especially for FK relationship. One other way would be normalize the schema and create a resource table for each enitty type. but that will be pretty insane to maintain and quickly become a headache. not saying it's good or bad but doesn't look like a practical design.
is there a better way to design such a table where proper FK relatinoships are also maintained? or you'll endorse Design 2?
You need to create one table for all entities with id (surrogate primary key) or entity_type, entity_id as unique key.
id entity_type entity_id
1 entity1 24
2 entity2 24
Then you need to have only one column in RESOURCE referring to this table (say entities). Your RESOURCE table will look like as in the first example, but the difference is there will be only one entities table, not 10.

How can I link my Junction table to my main table

I have a SQL database with the main table called Results. This table stores a record of results of tests that are run nightly.
The Results table has many fields but for arguments say lets just say for now it looks like this:
ResultID (Unique key field generated upon insert)
Result (nvchar10)
What I wanted to be able to record was a list of tags used in the tests that were run. The tags may be different for each result and an array of them are stored.
I created a junction table as shown below called Tags:
TagID (int key field unique generated at runtime)
ResultID (int)
ScenarioTag (nvchar128)
FeatureTag (nvchar128)
So what im looking to do is to link these 2 together. I'm not so great with databases ill be honest.
I was thinking that when I save the test results with my normal SQL query immediately after I would loop through each tag and save the tags to this new table but maybe i'm wrong here?
Psuedocode:
//Returned from previous SQL statement that inserted results values into the DB
int ResultID = SQLQueryReturnValue;
Foreach TAG in TAGS
{
string SQLQuery = "INSERT INTO TAGS (ResultID, ScenarioTag, FeatureTag)(#ResultID, #ScenarioTag, #FeatureTag)";
CmdSql.Parameters.AddWithValue("#ResultID", ResultID);
CmdSql.Parameters.AddWithValue("#ScenarioTag", TAG.Scenario);
CmdSql.Parameters.AddWithValue("#FeatureTag", TAG.Feature);
CmdSql.CommandText = SQLQuery;
CmdSql.Execute();
}
Heres an example of what each table might actually look like:
Results Table
|ResultID | Result |
| 10032 | Pass |
| 10031 | Fail |
| 10030 | Fail |
Tags Table
| TagID | ResultID | ScenarioTag | FeatureTag |
| 6 | 10032 | Cheque | Trading |
| 5 | 10032 | GBP | Sales |
| 4 | 10031 | Direct Credit | Trading |
| 3 | 10031 | GBP | Purchase |
| 2 | 10030 | Wire | Dividends |
| 1 | 10030 | USD | Payments |
So finally onto my question...Is there a way that I can physically link this new "Tags" table to my results table. Its informally linked in a way using the ResultID but theres no physical link.
Is it this you're looking for? (Assumption: This query is looking from results. They do not necessarily have to have Tags...)
SELECT *
FROM Results
LEFT JOIN Tags ON Results.ResultID=Tags.ResultID
EDIT: Maybe I did not understand, what you mean by "physically". You could add a foreign key constraint:
ALTER TABLE Tags ADD CONSTRAINT FK_Tags_Results FOREIGN KEY (ResultID) REFERENCES Results(ResultID);
This constraint adds a relation to these tables, making sure, that only values existing in Results are allowed in Tags as "ResultID". On the other hand you cannot delete a Result row with existing children in Tags...
If you do this you could alter the top query to:
SELECT *
FROM Tags
INNER JOIN Results ON Results.ResultID=Tags.ResultID
Now you are looking from Tags (leading table) and you know, that each tag must have a ResultID (INNER JOIN).

SQL: Creating a common table from multiple similar tables

I have multiple databases on a server, each with a large table where most rows are identical across all databases. I'd like to move this table to a shared database and then have an override table in each application database which has the differences between the shared table and the original table.
The aim is to make updating and distributing the data easier as well as keeping database sizes down.
Problem constraints
The table is a hierarchical data store with date based validity.
table DATA (
ID int primary key,
CODE nvarchar,
PARENT_ID int foreign key references DATA(ID),
END_DATE datetime,
...
)
Each unique CODE in DATA may have a number of rows, but at most a single row where END_DATE is null or greater than the current time (a single valid row per CODE). New references are only made to valid rows.
Updating the shared database should not require anything to be run in application databases. This means any override tables are final once they have been generated.
Existing references to DATA.ID must point to the same CODE, but other columns do not need to be the same. This means any current rows can be invalidated if necessary and multiple occurrences of the same CODE may be combined.
PARENT_ID references must have same parent CODE before and after the split. The actual PARENT_ID value may change if necessary.
The shared table is updated regularly from an external source and these updates need to be reflected in each database's DATA. CODEs that do not appear in the external source can be thought of as invalid, new references to these will not be added.
Existing functionality will continue to use DATA, so the new view (or alternative) must be transparent. It may, however, contain more rows than the original provided earlier constraints are met.
New functionality will use the shared table directly.
Select performance is a concern, insert/update/delete is not.
The solution needs to support SQL Server 2008 R2.
Possible solution
-- in a single shared DB
DATA_SHARED (table)
-- in each app DB
DATA_SHARED (synonym to DATA_SHARED in shared DB)
DATA_OVERRIDE (table)
DATA (view of DATA_SHARED and DATA_OVERRIDE)
Take an existing DATA table to become DATA_SHARED.
Exclude IDs with more than one possible CODE so only rows common across all databases remain. These missing rows will be added back once the data is updated the first time.
Unfortunately every DATA_OVERRIDE will need all rows that differ in any table, not only rows that differ between DATA_SHARED and the previous DATA. There are several IDs that differ only in a single database, this causes all other databases to inflate. Ideas?
This solution causes DATA_SHARED to have a discontinuous ID space. It's a mild annoyance rather than a major issue, but worth noting.
edit: I should be able to keep all of the rows in DATA_SHARED, just invalidate them, then I only need to store differing rows in DATA_OVERRIDE.
I can't think of any situations where PARENT_ID references become invalid, thoughts?
Before:
DB1.DATA
ID | CODE | PARENT_ID | END_DATE
1 | A | NULL | NULL
2 | A1 | 1 | 2020
3 | A2 | 1 | 2010
DB2.DATA
ID | CODE | PARENT_ID | END_DATE
1 | A | NULL | NULL
2 | X | NULL | NULL
3 | A2 | 1 | 2010
4 | X1 | 2 | NULL
5 | A1 | 1 | 2020
After initial processing (DATA_SHARED created from DB1.DATA):
SHARED.DATA_SHARED
ID | CODE | PARENT_ID | END_DATE
1 | A | NULL | NULL
3 | A2 | 1 | 2010
-- END_DATE is omitted from DATA_OVERRIDE as every row is implicitly invalid
DB1.DATA_OVERRIDE
ID | CODE | PARENT_ID
2 | A1 | 1
DB2.DATA_OVERRIDE
ID | CODE | PARENT_ID
2 | X |
4 | X1 | 2
5 | A1 | 1
After update from external data where A1 exists in source but X and X1 don't:
SHARED.DATA_SHARED
ID | CODE | PARENT_ID | END_DATE
1 | A | NULL | NULL
3 | A2 | 1 | 2010
6 | A1 | 1 | 2020
edit: The DATA view would be something like:
select D.ID, ...
from DATA D
left join DATA_OVERRIDE O on D.ID = O.ID
where O.ID is null
union all
select ID, ...
from DATA_OVERRIDE
order by ID
Given the small number of rows in DATA_OVERRIDE, performance is good enough.
Alternatives
I also considered an approach where instead of DATA_SHARED sharing IDs with the original DATA, there would be mapping tables to link DATA.IDs to DATA_SHARED.IDs. This would mean DATA_SHARED would have a much cleaner ID-space and there could be less data duplication, but the DATA view would require some fairly heavy joins. The additional complexity is also a significant negative.
Conclusion
Thank you for your time if you made it all the way to the end, this question ended up quite long as I was thinking it through as I wrote it. Any suggestions or comments would be appreciated.