Prevent duplicate rows being inserted

Prevent duplicate rows being inserted - sql

I'm trying to use an SQL insert statement to migrate rows from a table in one database to a table in a different database. The statement works until I add a unique index on the destination table and at that point I'm struggling to get the insert statement to be able to exclude the duplicates. Here's what I though should work:
INSERT INTO [MyDB].[dbo].[HPB] (
[HPID],
[BusinessID]
)
SELECT
PersonId = (SELECT ID FROM [MyDB].[dbo].[HP] WHERE PersonID = lPersonId),
lBusinessId
FROM [MyOriginalDB].[dbo].[tblEmployment]
WHERE
lPersonId in (SELECT PersonID FROM [MyDB].[dbo].[HP])
AND
lBusinessId in (SELECT ID FROM [MyDB].[dbo].[Business])
AND
NOT EXISTS (SELECT * FROM [MyDB].[dbo].[HPB] WHERE
[HPID] = (SELECT ID FROM [MyDB].[dbo].[HP] WHERE PersonID = lPersonId)
AND [BusinessID] = lBusinessId)
The schema for the HPB table is:
CREATE TABLE [dbo].[HPB](
[ID] [int] IDENTITY(1,1) NOT NULL,
[HPID] [int] NOT NULL,
[BusinessID] [int] NOT NULL,
CONSTRAINT [PK_HealthProfessionalBusiness] PRIMARY KEY CLUSTERED)
The unique index is on the [MyDB].[dbo].[HPB] table for columns (HPID, BusinessID)
When I run the insert I get an error about duplicate row inserts and I can't work out why the SQL below doesn't exclude the duplicates.
NOT EXISTS (SELECT * FROM [MyDB].[dbo].[HPB] WHERE
[HPID] = (SELECT ID FROM [MyDB].[dbo].[HP] WHERE PersonID = lPersonId)
AND [BusinessID] = lBusinessId)

Insert MyDB.dbo.HPB( HPID, BusinessID )
Select HP.ID, E.IBusinessID
From [MyOriginalDB].[dbo].[tblEmployment] As E
Join [MyDB].[dbo].[HP] As HP
On HP.PersonId = E.IPersonID
Join [MyDB].[dbo].[Business] As B
On B.ID = E.IBusinessID
Left Join [MyDB].[dbo].[HPB] As HPB
On HPB.BusinessID = E.IBusinessID
And HPB.PersonID = E.IPersonId
Where HPB.ID Is Null
Group By HP.ID, E.IBusinessID

Use:
INSERT INTO [MyDB].[dbo].[HPB]
([HPID], [BusinessID])
SELECT DISTINCT
h.id,
e.lbusinessid
FROM [MyOriginalDB].[dbo].[tblEmployment] e
JOIN [MyDB].[dbo].[HP] h ON h.personid = e.lpersonid
WHERE e.lbusinessid in (SELECT ID FROM [MyDB].[dbo].[Business])
AND NOT EXISTS (SELECT NULL
FROM [MyDB].[dbo].[HPB] hb
WHERE hb.businessid = e.lbusinessid
AND hb.hpid = h.id)

Related

Duplicate rows returned even though group by is used

This is my query
SELECT p.book FROM customers_books p
INNER JOIN books b ON p.book = b.id
INNER JOIN bookprices bp ON bp.book = p.book
WHERE b.status = 'PUBLISHED' AND bp.currency_code = 'GBP'
AND p.book NOT IN (SELECT cb.book FROM customers_books cb WHERE cb.customer = 1)
GROUP BY p.book, p.created_date ORDER BY p.created_date DESC
This is the data in my customers_books table,
I expect only 8,6,1 of books IDs to return but query is returning 8,6,1,1
table structures are here
CREATE TABLE "public"."customers_books" (
"id" int8 NOT NULL,
"created_date" timestamp(6),
"book" int8,
"customer" int8,
);
CREATE TABLE "public"."books" (
"id" int8 NOT NULL,
"created_date" timestamp(6),
"status" varchar(255) COLLATE "pg_catalog"."default",
)
CREATE TABLE "public"."bookprices" (
"id" int8 NOT NULL,
"currency_code" varchar(255) COLLATE "pg_catalog"."default",
"book" int8
)
what do you think I am doing wrong here.
I really dont want to use p.created_date in group by but I was forced to use because of order by

You have too many joins in the outer query:
SELECT b.book
FROM books b INNER JOIN
bookprices bp
ON bp.book = p.book
WHERE b.status = 'PUBLISHED' AND bp.currency_code = 'GBP' AND
NOT EXISTS (SELECT 1
FROM customers_books cb
WHERE cb.book = p.book AND cb.customer = 1
) ;
Note that I replaced the NOT IN with NOT EXISTS. I strongly, strongly discourage you from using NOT IN with a subquery. If the subquery returns any NULL values, then NOT IN returns no rows at all. It is better to sidestep this issue just by using NOT EXISTS.

Performance tuning of query

I have a query which takes a long time to run. It is probably because I used too many isnulls in the join condition. How can I optimise it by removing the isnull?
Is there any alternate way without updating the table? The query is given below:
select pos.C_id
,pos.s_id
,pos.A_id
,pos.Ad_id
,pos.Pr_id
,pos.prog_id
,pos.port_id
,pos.o_type
,pos.o_id
,pos.s_id
,pos.c_id
,pos.s_type_id
,pos.s_type
,pos.e_date
,pos.mv
,0 is_pub
, 1 is_adj
,pos.is_unsup
,getdate() date
,getdate() timestamp
from #temp pos
left join acc c with(nolock) ON pos._id = c.c_id
AND pos.account_id = c.account_id
AND isnull(pos.Pr_id,0) = isnull(c.pr_id,0)
AND isnull(pos.prog_id,0) = isnull(c.prog_id,0)
AND isnull(pos.port_id,0) = isnull(c.port_id,0)
and isnull(pos.style_type_id,0)=isnull(c.s_type_id,0)
AND pos.s_id = c._id
AND pos.c_id = c.c_id
AND pos.s_type = c.s_type
AND pos.is_unsup = c.is_uns
AND pos.is_pub = 1
where c.a_id is null

What about the query below
select pos.C_id
,pos.s_id
,pos.A_id
,pos.Ad_id
,pos.Pr_id
,pos.prog_id
,pos.port_id
,pos.o_type
,pos.o_id
,pos.s_id
,pos.c_id
,pos.s_type_id
,pos.s_type
,pos.e_date
,pos.mv
,0 is_pub
, 1 is_adj
,pos.is_unsup
,getdate() date
,getdate() timestamp
from #temp pos
left join acc c with(nolock) ON pos._id = c.c_id
AND pos.account_id = c.account_id
AND pos.Pr_id = c.pr_id
AND pos.prog_id = c.prog_id
AND pos.port_id = c.port_id
and pos.style_type_id=c.s_type_id
AND pos.s_id = c._id
AND pos.c_id = c.c_id
AND pos.s_type = c.s_type
AND pos.is_unsup = c.is_uns
AND pos.is_pub = 1
where c.a_id is null
union all
select pos.C_id
,pos.s_id
,pos.A_id
,pos.Ad_id
,pos.Pr_id
,pos.prog_id
,pos.port_id
,pos.o_type
,pos.o_id
,pos.s_id
,pos.c_id
,pos.s_type_id
,pos.s_type
,pos.e_date
,pos.mv
,0 is_pub
, 1 is_adj
,pos.is_unsup
,getdate() date
,getdate() timestamp
from #temp pos
left join acc c with(nolock) ON pos._id = c.c_id
AND pos.account_id = c.account_id
AND pos.s_id = c._id
AND pos.c_id = c.c_id
AND pos.s_type = c.s_type
AND pos.is_unsup = c.is_uns
AND pos.is_pub = 1
where c.a_id is null
AND pos.Pr_id is null AND c.pr_id is null
AND pos.prog_id is null AND c.prog_id is null
AND pos.port_id is null AND c.port_id is null
and pos.style_type_id is null AND c.s_type_id is null

try using
AND (pos.Pr_id = c.pr_id OR (pos.Pr_id IS NULL AND c.pr_id IS NULL))
instead of
AND ISNULL(pos.Pr_id,0) = ISNULL(c.pr_id,0)
IS NULL is more efficient than ISNULL

You really need to set a clustered index on your temp table #temp, if not present sql-server will provide a default hash-based index for joins, and it is not performant for large tables.
If any of all that id's columns you have in your #temp table is a unique key you really need to create a clustered index on it:
CREATE CLUSTERED INDEX cx_temp ON #temp (YourIDColumn);
If you have not a key column in #temp add an identity primary key to it in this way:
1) if you use SELECT INTO approach to create #temp, change
select *
into #temp
from YourQuery
in
select IDENTITY (int, 1,1) ThisIsTheKey, *
into #temp
from YourQuery
-- and then add the index
CREATE CLUSTERED INDEX cx_temp ON #temp (ThisIsTheKey);
1) if you use CREATE TABLE + INSERT INTO approach, simply add the column like this:
CREATE TABLE #TEMP (
ThisIsTheKey INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
..
..
-- ALL YOUR OTHER COLUMNS
..
..
)
Don't rely on covering non clustered indexes because the optimizer will likely not use them because you are selecting too many columns.
First test with the clustered index, then if you need further optimization you can try to add some specific non clustered index.

Left join does not return ull values

I have two tables:
AppWindowsEvent:
CREATE TABLE [AppWindowsEvent]
(
[idAppWindowEvent] INT IDENTITY(1,1)
, [idAppWindow] INT
, [idEventType] INT
, [Order] INT
, CONSTRAINT PK_idAppWindowEvent PRIMARY KEY ([idAppWindowEvent])
, CONSTRAINT FK_idAppWindowEvent_AppWindow FOREIGN KEY ([idAppWindow]) REFERENCES [AppWindow]([idAppWindow])
, CONSTRAINT FK_idAppWindowEvent_EventType FOREIGN KEY ([idEventType]) REFERENCES [EventType]([idEventType])
)
Event:
CREATE TABLE [Event]
(
[idEvent] [INT] IDENTITY(1,1) NOT NULL
, [idEventType] [INT] NOT NULL
, [idEntity] [INT] NOT NULL
, CONSTRAINT PK_IdEvent PRIMARY KEY([idEvent])
, CONSTRAINT [FK_Event_EventType] FOREIGN KEY([idEventType]) REFERENCES [EventType] ([idEventType])
)
When i run this query:
SELECT
*
FROM
AppWindowsEvent AWE
LEFT JOIN Event E ON AWE.idEventType = E.idEventType
WHERE
AWE.idMill = 1
AND AWE.idAppWindow = 1
ORDER BY
AWE.[Order] ASC
The result: not return nulls.
And when i run this
SELECT
*
FROM
AppWindowsEvent AWE
LEFT JOIN Event E ON AWE.idEventType = E.idEventType
AND E.[idEntity] = 1234
WHERE
AWE.idMill = 1
AND AWE.idAppWindow = 1
ORDER BY
AWE.[Order] ASC
Result: return nulls.
NOTE:
I need the entire set of data that are and are not already configured, in case you want a specific set of events, in the AND of ON can be filtered by specific idEntity of the Event table and the result returns well, but only for that idEntity, in my case I need all idEntity.

Try this
SELECT *
FROM
AppWindowsEvent AWE
LEFT JOIN Event E ON AWE.idEventType = E.idEventType
WHERE
AWE.idMill = 1
AND AWE.idAppWindow = 1
AND E.[idEntity] = 1234
ORDER BY
AWE.[Order] ASC
Or if you doesn't want appear null valor in second table, you can use Inner Join instead Left Join
SELECT *
FROM
AppWindowsEvent AWE
Inner JOIN Event E ON AWE.idEventType = E.idEventType
AND E.[idEntity] = 1234
WHERE
AWE.idMill = 1
AND AWE.idAppWindow = 1
ORDER BY
AWE.[Order] ASC

SQL query : SELECT

CREATE TABLE WRITTEN_BY
( Re_Id CHAR(15) NOT NULL,
Pub_Number INT NOT NULL,
PRIMARY KEY(Re_Id, Pub_Number),
FOREIGN KEY(Re_Id) REFERENCES RESEARCHER(Re_Id),
FOREIGN KEY(Pub_Number) REFERENCES PUBLICATION(Pub_Number));
CREATE TABLE WORKING_ON
( Re_Id CHAR(15) NOT NULL,
Pro_Code CHAR(15) NOT NULL,
PRIMARY KEY(Re_Id, Pro_Code, Subpro_Code)
FOREIGN KEY(Re_Id) REFERENCES RESEARCHER(Re_Id));
Re_Id stands for ID of a researcher
Pub_Number stands for ID of a publication
Pro_Code stands for ID of a project
Written_by table stores information about a Publication's ID and it's author
Working_on table stores information about a Project's ID and who is working on it
Now, I have this query :
For each project, find the researcher who wrote the most number of publications .
This is what i've done so far :
SELECT Pro_Code,WORK.Re_Id
FROM WORKING_ON AS WORK , WRITTEN_BY AS WRITE
WHERE WORK.Re_Id = WRITE.Re.Id
so I got a table which contains personal ID and project's ID of a researcher who has at least 1 publication. But what's next ? How to solve this problem?

You haven't said which platform you're on but try this. It handles the case where there are ties as well.
select g.Pro_Code, g.Re_Id, g.numpublished
from
(
SELECT work.Pro_Code, WORK.Re_Id, count(WRITE.pub_number) as numpublished
FROM WORKING_ON WORK JOIN WRITTEN_BY AS WRITE ON WORK.Re_Id = WRITE.Re_Id
GROUP BY work.Pro_Code, WORK.Re_Id
) g
inner join
(
select Pro_code, max(numpublished) as maxpublished
from (
SELECT work.Pro_Code, WORK.Re_Id, count(WRITE.pub_number) numpublished
FROM WORKING_ON WORK JOIN WRITTEN_BY AS WRITE ON WORK.Re_Id = WRITE.Re_Id
GROUP BY work.Pro_Code, WORK.Re_Id
) g2
group by Pro_code
) m
on m.Pro_code = g.Pro_Code and m.maxpublished = g.numpublished
Some platforms will allow you to write it this way:
with g as (
SELECT work.Pro_Code, WORK.Re_Id, count(WRITE.pub_number) as numpublished
FROM WORKING_ON WORK JOIN WRITTEN_BY AS WRITE ON WORK.Re_Id = WRITE.Re_Id
GROUP BY work.Pro_Code, WORK.Re_Id
)
select g.Pro_Code, g.Re_Id, g.numpublished
from g
inner join
(
select Pro_code, max(numpublished) as maxpublished
from g
group by Pro_code
) m
on m.Pro_code = g.Pro_Code and m.maxpublished = g.numpublished

I think that you are looking for something like the following :
select
tm.pro_code as pro_code,
tm.re_id as re_id,
max(total) as max_pub
from (
select *
from (
select
wo.pro_code as pro_code
wr.re_id as re_id,
count(wr.pub_number) as total
from
written_by wr,
working_on wo
where
wr.re_id = wo.re_id
group by wr.re_id,wo.pro_code
)
) tm
group by pro_code

If you are using MS SQL, this should work:
With cte as (
select a.Re_Id, Pub_Number,Pro_Code, COUNT(distinct Pub_Number) as pubs
from WRITTEN_BY a
inner join WORKING_ON b
on a.Re_Id = b.Re_Id)
SELECT Re_Id,pubs from cte
HAVING pubs = MAX(pubs)
GROUP BY Re_Id

How to go about this implementation of query

My database table are as follow
CREATE TABLE [dbo].[vProduct](
[nId] [int] NOT NULL, //Primary key
[sName] [varchar](255) NULL
)
CREATE TABLE [dbo].[vProductLanguage](
[kProduct] [int] NOT NULL,
[kLanguage] [int] NOT NULL //Foriegn key to table vLanguage
)
CREATE TABLE [dbo].[vLanguage](
[nId] [int] NOT NULL, //Primary key
[sName] [varchar](50) NULL,
[language] [char](2) NULL
)
Table vProduct has relation to vProductLanguage on vProduct.nId = vProductLanguage.kLanguage
Table vProductLanguage has relation to vLanguage on vProductLanguage.kLanguage = vLanguage.nid
So its like table vProductLanguage will have languages which are being selected
Rows will be like image below
Table vProduct
Table vProductLanguage
Table vLanguage
What i want is select all Languages from table vLanguage and selected languages from table vProductLanguage. This will be associated with table vProduct.
I tried below query but it only returns me the languages which are associated with product.
select * from
vProductLanguage
left join vLanguage on vProductLanguage.kLanguage = vLanguage.nId
left join vProduct on vProductLanguage.kProduct = vProduct.nId
Where vProduct.nId = 1
I want to select all the rows from table vLanguage and table vProductLanguage.
Hope i made my question clear.

It sounds like you want to start your JOIN with the vLanguage table first:
select *
from vLanguage l
left join vProductLanguage pl
on l.nid = pl.kLanguage
left join vProduct p
on pl.kProduct = p.nid
and p.nid = 1
See SQL Fiddle with Demo.
This will return all rows from the vLanguage table and any matching rows from the vProductLanguage table. .
If you have more than one vProduct then you can rewrite the query slightly to:
select *
from vLanguage l
left join
(
select pl.kLanguage,
p.nid,
p.sName
from vProductLanguage pl
left join vProduct p
on pl.kProduct = p.nid
where p.nid = 1
) p
on l.nid = p.kLanguage
See SQL Fiddle with Demo

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Prevent duplicate rows being inserted - sql

Related

Duplicate rows returned even though group by is used

Performance tuning of query

Left join does not return ull values

SQL query : SELECT

How to go about this implementation of query

Categories

Resources