SQL 2012 locating duplicate column entries in a table - sql

I am using SQL 2012 and trying to identify rows where the SourceDataID column has two unique entries in the PartyCode column, and I'm having difficulties.
SELECT PartyCode, SourceDataID, count (*) as CNT
FROM CustomerOrderLocation (nolock)
GROUP BY PartyCode, SourceDataID
HAVING (Count(PartyCode)>1)
ORDER BY PartyCode
Results are returning as such:
W3333 948_O 31
(party code/sourcedataid/CNT)
This is showing me the total entries where the Partycode and the SourceDataID are listed together in the table. However, I need it to show a count of any instances where W333 lists 948_O as the SourceDataID more than once.
I'm not having luck structuring the query to pull the results I am looking to get. How can I do this?

A CTE coupled with the PARTITION BY function is helpful in finding duplicates of this manner. Code below:
WITH CTE AS(
SELECT PartyCode, SourceDataID,
ROW_NUMBER()OVER(PARTITION BY SourceDataID ORDER BY SourceDataID) RN
FROM CustomerOrderLocation (NOLOCK))
SELECT * FROM CTE WHERE RN > 1
This should return every duplicate PartyCode attached to a SourceDataID.
If you want to see the entire result, change the last SELECT statement to:
SELECT * FROM CTE ORDER BY PartyCode, RN

Thanks for the help everyone. I did not do the best job of describing the issue but this is the query I ended up creating to get my result set.
;with cte1 (sourcedataid, partycode) as (select sourcedataid, partycode from customerorderparty (nolock) group by PartyCode, SourceDataID)
select count(sourcedataid), sourcedataid from cte1 group by sourcedataid having count(sourcedataid) >1

Related

Is there any optimal way to find the count of rows

I wrote SQL query in which I have one inner query and one outer query, My outer query produces the result on behalf of inner query, now I need to find the no of rows returning by my outer query, so what I did, I enclosed it inside another select statement and use count() function which produces the result, but i need to know more precise way to calculate the row count, please see my below query and suggest me the best way to do the same.
SELECT count(*) FROM (
SELECT
COUNT(*) NO_OF_EMP
,SUM(tbl.AMOUNT) TOTAL_AMOUNT
,tbl.YYYYMM
,tbl.DATA_PICKED_BY_NAME
,MIN(DATA_PICKED_DATE) DATA_PICKED_DATE
,ROW_NUMBER() OVER (ORDER BY tbl.REFERENCE_ID) AS ROW_NUM
FROM (
SELECT
SALARY_REPORT_ID
,EMP_NAME
,EMP_CODE
,PAY_CODE
,PAY_CODE_NAME
,AMOUNT
,PAY_MODE
,PAY_CODE_DESC
,YYYYMM
,REMARK
,EMP_ID
,PRAN_NUMBER
,PF_NUMBER
,PRAN_NO
,ATTOFF_EMPCODE
,DATA_PICKED_DATE
,DATA_PICKED_BY
,DATA_PICKED_BY_NAME
,SUBSTR(REFERENCE_ID,0,3) REFERENCE_ID
FROM SALARY_DETAIL_REPORT_HISTORY
WHERE PAY_CODE=999
AND REFERENCE_ID LIKE '202%'
) tbl
GROUP BY tbl.REFERENCE_ID,tbl.YYYYMM,tbl.DATA_PICKED_BY_NAME
order by tbl.YYYYMM
)mytbl1
Select count distinct of the most abbreviated version of a single value of your group values from your original query:
SELECT count(distinct SUBSTR(REFERENCE_ID,0,3) || YYYYMM || DATA_PICKED_BY_NAME)
FROM SALARY_DETAIL_REPORT_HISTORY
WHERE PAY_CODE=999
AND REFERENCE_ID LIKE '202%'

How can I resolve the distinct issue in SQL Server 2005?

I am trying to get distinct values for my query. I tried like below, but I am not getting proper result, will any one suggest me how to do resolve the issue.
Here the I want to distinct part_id.
http://tinypic.com/view.php?pic=9scx21&s=8#.UupFqT2SzyQ
Thanks in advance.
Why do you think the result is not correct, the rows returned are distinct.
DISTINCT is applied to all the columns, there's nothing like give me a DISTINCT(p.part_id) and don't care about other columns.
What you probably want is a single row for each part.id
If you don't have any rules which row you want to be returned you can go with a ROW_NUMBER:
select *
from
(
select all your columns
, row_number() over (partition by p.partid order by p.part_id) as rn
from ....
where ...
) as dt
where rn = 1
If there are some rules to determine which row should be returned (oldest/newest/whatever) you simply ORDER BY this column DESC instead of ORDER BY p.part
order by part_id;
Change SELECT DISTINCT P.PART_ID FROM.. at begining and add GROUP BY p.part_id at end.
Distinct must be applied for all columns which values are the same so you can add columns but remenber to add thet to GROUP BY also

Over clause in SQL Server

I have the following query
select * from
(
SELECT distinct
rx.patid
,rx.fillDate
,rx.scriptEndDate
,MAX(datediff(day, rx.filldate, rx.scriptenddate)) AS longestScript
,rx.drugClass
,COUNT(rx.drugName) over(partition by rx.patid,rx.fillDate,rx.drugclass) as distinctFamilies
FROM [I 3 SCI control].dbo.rx
where rx.drugClass in ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
GROUP BY rx.patid, rx.fillDate, rx.scriptEndDate,rx.drugName,rx.drugClass
) r
order by distinctFamilies desc
which produces results that look like
This should mean that between the two dates in the table the patID that there should be 5 unique drug names. However, when I run the following query:
select distinct *
from rx
where patid = 1358801781 and fillDate between '2008-10-17' and '2008-11-16' and drugClass='H4B'
I have a result set returned that looks like
You can see that while there are in fact five rows returned for the second query between the dates of 2008-10-17 and 2009-01-15, there are only three unique names. I've tried various ways of modifying the over clause, all with different levels of non-success. How can I alter my query so that I only find unique drugNames within the timeframe specified for each row?
Taking a shot at it:
SELECT DISTINCT
patid,
fillDate,
scriptEndDate,
MAX(DATEDIFF(day, fillDate, scriptEndDate)) AS longestScript,
drugClass,
MAX(rn) OVER(PARTITION BY patid, fillDate, drugClass) as distinctFamilies
FROM (
SELECT patid, fillDate, scriptEndDate, drugClass,rx.drugName,
DENSE_RANK() OVER(PARTITION BY patid, fillDate, drugClass ORDER BY drugName) as rn
FROM [I 3 SCI control].dbo.rx
WHERE drugClass IN ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
)x
GROUP BY x.patid, x.fillDate, x.scriptEndDate,x.drugName,x.drugClass,x.rn
ORDER BY distinctFamilies DESC
Not sure if DISTINCT is really necessary - left it in since you've used it.

Sort by count SQL reporting services

I have a simple query in a tabloid control that gets all the leads in one month. I then use the tabloid control to group them into lead source. And then I have an associated count column. I want to sort my report on the count descending, without doing it in the query. I keep getting an error saying you cannot sort on an aggregate.
Thanks.
you can do one more thing..
just write your query in subquery part and write order by clause in outer query.
(suppose you have group by query as follow-
select lead_source, count(*) cnt
from your_table
group by lead_source
)
so you can do as follow -
select lead_source, cnt from (
select lead_source, count(*) cnt
from your_table
group by lead_source
)
order by cnt
this your_table and group by column list you have to edit accordingly your table structure ..

use Row_number after applying distinct

I am creating an SP which gives some result by applying distinct on it, now I want to implement sever side paging, so I tried using Row_number on distinct result like:
WITH CTE AS
(
SELECT ROW_NUMBER() OVER(ORDER BY tblA.TeamName DESC)
as Row,tblA.TeamId,tblA.TeamName,tblA.CompId,tblA.CompName,tblA.Title,tblA.Thumbnail,tblA.Rank,tblA.CountryId,tblA.CountryName
FROM
(
--The table query starts with SELECT
)tblA
)
SELECT CTE.* FROM CTE
WHERE CTE.Row BETWEEN #StartRowIndex AND #StartRowIndex+#NumRows-1
ORDER BY CTE.CountryName
but rows are first assigned RowNumber then distinct get applied that is why I am getting duplicate values, how to get distinct rows first then get row numbers for the same.
Any solution on this? Am I missing something?
need answer ASAP.
thanks in advance!
Don't you need to add "partition by" to your ROW_NUMBER statement?
ROW_NUMBER() OVER(Partition by ___, ___, ORDER BY tblA.TeamName DESC)
In the blank spaces, place the column names you would like to create a new row number for. Duplicates will receive a number that is NOT 1 so you might not need the distinct.
To gather the unique values you could write a subquery where the stored procedure only grabs the rows with a 1 in them.
select * from
(
your code
) where row = 1
Hope that helps.
I'm not sure why you're doing this:
WHERE CTE.Row BETWEEN #StartRowIndex AND #StartRowIndex+#NumRows-1