Duplicate rows in SQL Server

Duplicate rows in SQL Server - sql

I am having duplicate rows with the same storeactivityid show up in my results...
This is the primary key, so this should not happen. Is there something wrong with my joins that could cause this? I could use distinct, but that will not solve the issue here.
Any tips or advice? There are 3 duplicates showing for each result!
select pd.storeactivityid,e.EMPLOYEENAME,c.ChainName,c.UserCode as ChainNumber,
s.storenumber,s.StoreNameAndNumber,
pd.startdatetime,
pd.enddatetime,
cast((datediff(s, pd.startdatetime, pd.enddatetime) / 3600.0) as decimal(9,2)) as duration,
exceptioncodes,pe.Description,isnull(pd.approved, 0) as approved,
isnull(pd.comment, '') as comment,
pd.modifieddate
from payrolldetail pd with (nolock)
inner join payperiods pp with (nolock) on pd.enddatetime between pp.begindate and pp.enddate and pp.CompanyID = #companyid
left join stores s with (nolock) on pd.storeid = s.storeid
left join chains c with (nolock) on c.chainid = s.chaincode
left join employees e with (nolock) on pd.employeeid = e.employeeid
inner join payrollexceptions pe with (nolock) on pd.ExceptionCodes = pe.Code
where pd.companyid = #companyid
and cast(getdate() as date) between pp.begindate and pp.enddate
and exceptioncodes = #exceptioncodes
and pd.companyid = #companyid

If it is a primary key, you can be certain that in the actual table you do not have duplicate rows with the same storeactivityid.
Your query returns rows with the same storeactivityid because at least one of the joined tables has the matches the condition specified in the join.
My best guess it is due to the followoing join:
inner join payperiods pp with (nolock) on pd.enddatetime between pp.begindate and pp.enddate and pp.CompanyID = #companyid
Is it possible that a company has multiple payrolldetails within the same range of dates specified in the payperiods table?

i do not know what is in each of the tables,
but the easiest way i found to debug something like his is
select [storeactivityid],count([storeactivityid]) as [count]
from [<table>]
<Start adding joins in one at a time>
where [count] > 1
group by [storeactivityid]

If you use NOLOCK hint to be able to do 'dirty reads' (read uncommitted) and some modifications are happen in the same time on this tables, this may cause missing or double count of even unique rows!
If there is no activity on the server, no updates/inserts/deletes, than there is something inside data of your tables that caused duplicating, as other guys have already said.

Related

Sum fields of an Inner join

How I can add two fields that belong to an inner join?
I have this code:
select
SUM(ACT.NumberOfPlants ) AS NumberOfPlants,
SUM(ACT.NumOfJornales) AS NumberOfJornals
FROM dbo.AGRMastPlanPerformance MPR (NOLOCK)
INNER JOIN GENRegion GR ON (GR.intGENRegionKey = MPR.intGENRegionLink )
INNER JOIN AGRDetPlanPerformance DPR (NOLOCK) ON
(DPR.intAGRMastPlanPerformanceLink =
MPR.intAGRMastPlanPerformanceKey)
INNER JOIN vwGENPredios P (NOLOCK) ON ( DPR.intGENPredioLink =
P.intGENPredioKey )
INNER JOIN AGRSubActivity SA (NOLOCK) ON (SA.intAGRSubActivityKey =
DPR.intAGRSubActivityLink)
LEFT JOIN (SELECT RA.intGENPredioLink, AR.intAGRActividadLink,
AR.intAGRSubActividadLink, SUM(AR.decNoPlantas) AS
intPlantasTrabajads, SUM(AR.decNoPersonas) AS NumOfJornales,
SUM(AR.decNoPlants) AS NumberOfPlants
FROM AGRRecordActivity RA WITH (NOLOCK)
INNER JOIN AGRActividadRealizada AR WITH (NOLOCK) ON
(AR.intAGRRegistroActividadLink = RA.intAGRRegistroActividadKey AND
AR.bitActivo = 1)
INNER JOIN AGRSubActividad SA (NOLOCK) ON (SA.intAGRSubActividadKey
= AR.intAGRSubActividadLink AND SA.bitEnabled = 1)
WHERE RA.bitActive = 1 AND
AR.bitActive = 1 AND
RA.intAGRTractorsCrewsLink IN(2)
GROUP BY RA.intGENPredioLink,
AR.decNoPersons,
AR.decNoPlants,
AR.intAGRAActivityLink,
AR.intAGRSubActividadLink) ACT ON (ACT.intGENPredioLink IN(
DPR.intGENPredioLink) AND
ACT.intAGRAActivityLink IN( DPR.intAGRAActivityLink) AND
ACT.intAGRSubActivityLink IN( DPR.intAGRSubActivityLink))
WHERE
MPR.intAGRMastPlanPerformanceKey IN(4) AND
DPR.intAGRSubActivityLink IN( 1153)
GROUP BY
P.vchRegion,
ACT.NumberOfFloors,
ACT.NumOfJournals
ORDER BY ACT.NumberOfFloors DESC
However, it does not perform the complete sum. It only retrieves all the values of the columns and adds them 1 by 1, instead of doing the complete sum of the whole column.
For example, the query returns these results:
What I expect is the final sums. In NumberOfPlants the result of the sum would be 163,237 and of NumberJornales would be 61.
How can I do this?

First of all the (nolock) hints are probably not accomplishing the benefit you hope for. It's not an automatic "go faster" option, and if such an option existed you can be sure it would be already enabled. It can help in some situations, but the way it works allows the possibility of reading stale data, and the situations where it's likely to make any improvement are the same situations where risk for stale data is the highest.
That out of the way, with that much code in the question we're better served with a general explanation and solution for you to adapt.
The issue here is GROUP BY. When you use a GROUP BY in SQL, you're telling the database you want to see separate results per group for any aggregate functions like SUM() (and COUNT(), AVG(), MAX(), etc).
So if you have this:
SELECT Sum(ColumnB) As SumB
FROM [Table]
GROUP BY ColumnA
You will get a separate row per ColumnA group, even though it's not in the SELECT list.
If you don't really care about that, you can do one of two things:
Remove the GROUP BY If there are no grouped columns in the SELECT list, the GROUP BY clause is probably not accomplishing anything important.
Nest the query
If option 1 is somehow not possible (say, the original is actually a view) you could do this:
SELECT SUM(SumB)
FROM (
SELECT Sum(ColumnB) As SumB
FROM [Table]
GROUP BY ColumnA
) t
Note in both cases any JOIN is irrelevant to the issue.

SQL - Can I use a CASE Statement to left join the same table based on different criteria?

I have a stored proc that is using a LEFT JOIN to select a key from it. This proc is used by two different applications that have different criteria for when this key should be selected.
I'm getting a syntax error on this. The first statement should be used as a LEFT JOIN if the row was created 60 days in advance based on the #IsCheckDateRequired field.
The correlation name 't' is specified multiple times in a FROM clause
If the #IsCheckDateRequiredField is not necessary, I just want to left join the table on the keys.
SELECT
t.PrimaryKey
FROM dbo.WTable w
LEFT JOIN dbo.Table t
on t.PrimaryKey = w.PrimaryKey
and datediff(day, t.CreatedDate, GETUTCDATE()) > 60
and #IsCheckDateRequired = 1
LEFT JOIN dbo.Table t
on t.PrimaryKey = w.PrimaryKey
Is this possible to do in the same proc? My SELECT statement above is attempting to select t.Primary key and is complaining because I've left joined it on the same name twice.
I'm kind of new to SQL as well and am still learning best practices, so should this even be what I'm trying to do or am I way off base here?
Any advice would help for sure!
Thanks in advance!

Thanks to the comments on helping my with my approach.
It's not allowed to do the same alias on the left join. So instead of having the logic be in the LEFT JOIN Statements, I was able to put it in my select.
SELECT
CASE #IsAutoRenewalRun
when 1 then t.PrimaryKey
when 0 then t2.PrimaryKey
END
as PrimaryKey
FROM dbo.WTable w
LEFT JOIN dbo.Table t
on t.PrimaryKey = w.PrimaryKey
and datediff(day, t.CreatedDate, GETUTCDATE()) > 60
and #IsCheckDateRequired = 1
LEFT JOIN dbo.Table t2
on t2.PrimaryKey = w.PrimaryKey

Multiple joins on the same table, Results Not Returned if Join Field is NULL

SELECT organizations_organization.code as organization,
core_user.email as Created_By,
assinees.email as Assigned_To,
from tickets_ticket
JOIN organizations_organization on tickets_ticket.organization_id = organizations_organization.id
JOIN core_user on tickets_ticket.created_by_id = core_user.id
Left JOIN core_user as assinees on assinees.id = tickets_ticket.currently_assigned_to_id
In the above query, if tickets_ticket.currently_assigned_to_id is null then that that row from tickets_ticket is not returned
> Records In tickets_ticket = 109
> Returned Records = 4 (out of 109 4 row has value for currently_assigned_to_id rest 105 are null )
> Expected Records = 109 (with nulll set for Assigned_To)
Note I am trying to achieve multiple joins on the same table

LEFT JOIN can not kill output records,
your problem is here:
JOIN core_user on tickets_ticket.created_by_id = core_user.id
this join kills non-matching records
try
LEFT JOIN core_user on tickets_ticket.created_by_id = core_user.id

First, this is not the actual code you are running. There is a comma before the from clause that would cause a syntax error. If you have left out a where clause, then that would explain why you are seeing no rows.
When using left joins, conditions on the first table go in the where clause. Conditions on subsequent tables go in the on clause.
That said, a where clause may not be the problem. I would suggest using left joins from the first table onward -- along with table aliases:
select oo.code as organization, cu.email as Created_By, a.email as Assigned_To,
from tickets_ticket tt left join
organizations_organization oo
on tt.organization_id = oo.id left join
core_user cu
on tt.created_by_id = cu.id left join
core_user a
on a.id = tt.currently_assigned_to_id ;
I suspect that you have data in your data model that is unexpected -- perhaps bad organizations, perhaps bad created_by_id. Keep all the tickets to see what is missing.
That said, you should probably be including something like tt.id in the result set to identify the specific ticket.

Left Join not working as expected

Hi helpful clever people, I am running into an issue when trying to prepare a query. I am trying to join a count of sales and a running total of sales to a prebuilt temp table.
The temp table (TMP_WEEK_SHOP) just has 2 rows, a list of week codes(WEEK_ID) and then an entry with that code for every location(SHOP_ID).
Now my issue is that no matter what i do, my queries will always omit results from the temp table if there were no sales for that location during that week. I need all entries for every week to be entered, even the 0 sales shops.
From what i can gather a left outer join should give me this, but no matter what i have tried it keeps omitting any shops without sales. I should say I am running on an SQL Server 2005 environment with Server Management Studio.
Any help at all would be fantastic!
USE CX
SELECT tws.WEEK_ID as Week, isnull(tws.SHOP_ID,0)as Shop, isnull(count(sal.SALES_ITEMS_ID),0)Sales, isnull(sum(sal.LINEVALUE),0)Sales_Value
FROM TMP_WEEK_SHOP tws
JOIN CX_DATES dat
on dat.WEEK_ID=tws.WEEK_ID
LEFT OUTER JOIN CX_SALES_ITEMS sal
on sal.DATE_ID=dat.DATE_ID
and sal.SHOP_NUM=tws.SHOP_ID
JOIN CX_STYLES sty
on sal.STY_QUAL = sty.STY_QUAL
WHERE sty.STY_RET_TYPE='BIKES'
and (sal.SHOP_NUM='70006' or sal.SHOP_NUM='70008' or sal.SHOP_NUM='70010' or sal.SHOP_NUM='70018' or sal.SHOP_NUM='70028' or sal.SHOP_NUM='70029' or sal.SHOP_NUM='70012' or sal.SHOP_NUM='70016' or sal.SHOP_NUM='70026')
GROUP BY tws.WEEK_ID, tws.SHOP_ID
ORDER BY tws.WEEK_ID, tws.SHOP_ID

This is your query, formatted a bit better so I can read it:
SELECT tws.WEEK_ID as Week, isnull(tws.SHOP_ID,0)as Shop,
isnull(count(sal.SALES_ITEMS_ID),0)Sales, isnull(sum(sal.LINEVALUE),0)Sales_Value
FROM TMP_WEEK_SHOP tws JOIN
CX_DATES dat
on dat.WEEK_ID = tws.WEEK_ID LEFT OUTER JOIN
CX_SALES_ITEMS sal
on sal.DATE_ID = dat.DATE_ID and
sal.SHOP_NUM = tws.SHOP_ID JOIN
CX_STYLES sty
on sal.STY_QUAL = sty.STY_QUAL
WHERE sty.STY_RET_TYPE='BIKES' and
(sal.SHOP_NUM in ('70006', '70008', '70010', '70018', '70028', '70029', '70012', '70016', '70026')
GROUP BY tws.WEEK_ID, tws.SHOP_ID
ORDER BY tws.WEEK_ID, tws.SHOP_ID;
You have three problems that are "undoing" the left outer join. The inner join condition will fail when sal.STY_QUAL is NULL. SImilarly, the where conditions have the same problem.
You need for all the joins to be left outer joins and to move the where conditions to on clauses:
SELECT tws.WEEK_ID as Week, isnull(tws.SHOP_ID,0)as Shop,
isnull(count(sal.SALES_ITEMS_ID),0)Sales, isnull(sum(sal.LINEVALUE),0)Sales_Value
FROM TMP_WEEK_SHOP tws JOIN
CX_DATES dat
on dat.WEEK_ID = tws.WEEK_ID LEFT OUTER JOIN
CX_SALES_ITEMS sal
on sal.DATE_ID = dat.DATE_ID and
sal.SHOP_NUM = tws.SHOP_ID and
sal.SHOP_NUM in ('70006', '70008', '70010', '70018', '70028', '70029', '70012', '70016', '70026'
) LEFT OUTER JOIN
CX_STYLES sty
on sal.STY_QUAL = sty.STY_QUAL and
sty.STY_RET_TYPE='BIKES'
GROUP BY tws.WEEK_ID, tws.SHOP_ID
ORDER BY tws.WEEK_ID, tws.SHOP_ID;
In addition, count() never returns a NULL value, so using isnull() or coalesce() is unnecessary.

your other join onvolving CX_SALES_ITEMS sal needs to be an left outer join as well

How to optimize the query? t-sql

This query works about 3 minutes and returns 7279 rows:
SELECT identity(int,1,1) as id, c.client_code, a.account_num,
c.client_short_name, u.uso, us.fio, null as new, null as txt
INTO #ttable
FROM accounts a INNER JOIN Clients c ON
c.id = a.client_id INNER JOIN Uso u ON c.uso_id = u.uso_id INNER JOIN
Magazin m ON a.account_id = m.account_id LEFT JOIN Users us ON
m.user_id = us.user_id
WHERE m.status_id IN ('1','5','9') AND m.account_new_num is null
AND u.branch_id = #branch_id
ORDER BY c.client_code;
The type of 'client_code' field is VARCHAR(6).
Is it possible to somehow optimize this query?

Insert the records in the Temporary table without using Order by Clause and then Sort them using the c.client_code. Hope it should help you.
Create table #temp
(
your columns...
)
and Insert the records in this table Without Using the Order by Clause. Now run the select with Order by Clause

Do you have indexes set up for your tables? An index on foreign key columns as well as Magazin.status might help.

Make sure there is an index on every field used in the JOINs and in the WHERE clause
If one or the tables you select from are actually views, the problem may be in the performance of these views.

Always try to list tables earlier if they are referenced in the where clause - it cuts off row combinations as early as possible. In this case, the Magazin table has some predicates in the where clause, but is listed way down in the tables list. This means that all the other joins have to be made before the Magazin rows can be filtered - possibly millions of extra rows.
Try this (and let us know how it went):
SELECT ...
INTO #ttable
FROM accounts a
INNER JOIN Magazin m ON a.account_id = m.account_id
INNER JOIN Clients c ON c.id = a.client_id
INNER JOIN Uso u ON c.uso_id = u.uso_id
LEFT JOIN Users us ON m.user_id = us.user_id
WHERE m.status_id IN ('1','5','9')
AND m.account_new_num is null
AND u.branch_id = #branch_id
ORDER BY c.client_code;
This kind of optimization can greatly improve query performance.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Duplicate rows in SQL Server - sql

i do not know what is in each of the tables, but the easiest way i found to debug something like his is select [storeactivityid],count([storeactivityid]) as [count] from [<table>] <Start adding joins in one at a time> where [count] > 1 group by [storeactivityid]

Related

Sum fields of an Inner join

SQL - Can I use a CASE Statement to left join the same table based on different criteria?

Multiple joins on the same table, Results Not Returned if Join Field is NULL

Left Join not working as expected

How to optimize the query? t-sql

Categories

Resources