Removing SQL Duplicated Fields based on HistoryID - sql

I have a database with duplicate values. Specifically SiteCode, LastName, FirstName, DateofService, Payer, BilledAmount, NetReceivable, and ContractualDiscount concatenate to form records that recur repeatedly throughout this table.
I'd like to remove all but one instance of these fields and am trying to do so by picking just one NetBilledHistoryID (which is a unique field for each record in the table).
Unfortunately, when I run this query, I still get the duplicated values.
How can I correct this so that my select query eliminates these duplicates? Or, even better, should I be using a different query technique all together?
SELECT *
FROM [Reports].[dbo].[NetBilledHistory] t1
WHERE EXISTS (
SELECT 1 FROM [Reports].[dbo].[NetBilledHistory] AS t2
WHERE t2.SiteCode = t1.SiteCode
AND t2.LastName = t1.LastName
AND t2.FirstName = t1.FirstName
AND t2.DateofService = t1.DateofService
AND t2.Payer = t1.Payer
AND t2.BilledAmount = t1.BilledAmount
AND t2.NetReceivable = t1.NetReceivable
AND t2.ContractualDiscount = t1.ContractualDiscount
AND t2.NetBilledHistoryID < t1.NetBilledHistoryID)

you can achieve that by using CTE:
WITH cte_query AS (
SELECT ROW_NUMBER() OVER(
partition by SiteCode, LastName, FirstName,
DateofService, Payer, BilledAmount, NetReceivable,
ContractualDiscount
order by NetBilledHistoryID) AS RowNum, *
FROM [Reports].[dbo].[NetBilledHistory]
)
SELECT * FROM cte_query where RowNum = 1;
adjust as needed to fit your purpose for columns, etc.

Related

window functions and grouping - how to propagate data from row 1 through row 3 in a partition?

I have this query below:
select
Contact.IndividualID,
Contact.IndividualID as ContactId,
Contact.CaseNumber as CaseID,
[Case].ProgramCode as Benefit,
Contact.Email as EmailAddress,
'' as EmailTo,
Contact.FirstName,
[Case].CaseProgramIndividualStatusCode,
[Case].ReviewDueDate as RenewDueDate,
[Case].ReviewDueDate as BenefitExpirationDate,
[Case].ProgramCode as ProgramCode,
pref.Phone as MobileNumber,
Contact.IsHeadOfHousehold,
row_number() over (partition by Contact.CaseNumber order by Contact.IsHeadOfHousehold desc) as row
from
SOMETABLE_Contact_Dev Contact
inner join
SOMETABLE_Case_Dev [Case] on Contact.IndividualID = [Case].IndividualID and Contact.CaseNumber = [Case].CaseNumber
left join
[SSP RE Preferences] pref on Contact.IndividualID = pref.ContactId
where
(([Case].RenewalTypeCode = 'AC' and [Case].ReviewStatusCode in ('RI','RR')) or
([Case].RenewalTypeCode = 'PS' and [Case].ReviewStatusCode = 'RI')) and
DateDiff(day, getdate(), [Case].ReviewDueDate) = 40 and
Contact.Email is not null and
[Case].ProgramCode in ('KC','KT','CC','MA')
And here's the result set from running this query:
Here's what I'm having trouble with. What I want to do is for when there's a grouping as defined by the partition, I want to put as the EmailTo field the Email Address for the top record of the group (row 1):
When there's no grouping, just use the Email Address for the EmailTo field. What's the best way to go about this?
You could use first_value() over the relevant partition.
e.g., Assuming you mean the same partition as used for row_number, add the following line after your row_number line
FIRST_VALUE(Contact.Email) OVER (partition by Contact.CaseNumber order by Contact.IsHeadOfHousehold desc) AS first_email

Access: query crashes

I have the following query (let's call it Query1) (kindly created here by Erik von Asmuth):
SELECT PARTNERID
,NAME
,FIRST_NAME
,UID
,DATA_R
FROM MY_TABLE
WHERE MY_TABLE.[DATA_R] = (
SELECT MAX(t.[DATA_R])
FROM MY_TABLE AS t
WHERE t.PARTNERID = MY_TABLE.PARTNERID
)
ORDER BY PARTNERID;
MY_TABLE has 20000 records and is a Query (even if the name might suggest the opposite) with the following form:
SELECT [MYTABLE_O].PARTNERID, [MYTABLE_O].NAME, [MYTABLE_O].FIRST_NAME, [MYTABLE_O].[Codice fiscale] AS CF, [MYTABLE_O].Date AS DATA_R
FROM [MYTABLE_O] LEFT JOIN [TO_EXCLUDE] ON [MYTABLE_O].[PARTNERID] = [TO_EXCLUDE].[PARTNERID]
WHERE ((([TO_EXCLUDE].PARTNERID) Is Null));
(I want to exclude some already considered elements that are in Table TO_EXCLUDE).
When I run the query (Query1) MS Access freezes. How can I avoid it/make it more efficient and stable?
I have tried to index in MYTABLE_O both PARTNERID AND DATA_R
You may have to write the result of the subquery:
SELECT PARTNERID, MAX([DATA_R]) AS MAXDATAR
FROM YourQuery
GROUP BY PARTNERID
to a temp table, and then replace in your query
FROM MY_TABLE AS t
with
FROM TempTable AS t

Remove duplicate row based on select statement

I have two select statements which is returning duplicated data. What I'm trying to accomplish is to remove a duplicated leg. But I'm having hard times to get to the second row programmatically.
select i.InvID, i.UID, i.StartDate, i.EndDate, i.Minutes,i.ABID from inv_v i, InvoiceLines_v i2 where
i.Period = '2014/08'
and i.EndDate = i2.EndDate
and i.Minutes = i2.Minutes
and i.Uid <> i2.Uid
and i.abid = i2.abid
order by i.EndDate
This select statement returns the following data.
As you can see it returns duplicate rows where minutes are the same ABID is the same but InvID are different. What I need to do is to remove one of the InvID where the criteria matches. Doesn't matter which one.
The second select statement is returning different data.
select i.InvID, i.UID, i.StartDate, i.EndDate, i.Minutes from InvoiceLines_v i, InvoiceLines_v i2 where
i.Period = '2014/08'
and i.EndDate = i2.EndDate
and i.Uid = i2.Uid
and i.Abid <> i2.Abid
and i.Language <> i2.Language
order by i.startdate desc
In this select statement I want to remove an InvID where UID is the same then select the lowest Mintues. In This case, I would remove the following InvIDs: 2537676 , 2537210
My goal is to remove those rows...
I could accomplish this using cursor grab the InvID and remove it by simple delete statement, but I'm trying to stay away from cursors.
Any suggestions on how I can accomplish this?
You can use exists to delete all duplicates except the one with the highest InvID by deleting those rows where another row exists with the same values but with a higher InvID
delete from inv_v
where exists (
select 1 from inv_v i2
where i2.InvID > inv_v.InvID
and i2.minutes = inv_v.minutes
and i2.EndDate = inv_v.EndDate
and i2.abid = inv_v.abid
and i2.uid <> inv_v.uid -- not sure why <> is used here, copied from question
)
I have faced similar problems regarding duplicate data and some one told me to use partition by and other methods but those were causing performance issues
However , I had a primary key in my table through which I was able to select one row from the duplicate data and then delete it.
For example in the first select statement "minutes" and "ABID" are the criteria to consider duplicacy in data.But "Invid" can be used to distinguish between the duplicate rows.
So you can use below query to remove duplicacy.
delete from inv_i where inv_id in (select max(inv_id) from inv_i group by minutes,abid having count(*) > 1 );
This simple concept was helpful to me. It can be helpful in your case if "Inv_id" is unique.
;WITH CTE AS
(
SELECT InvID
,[UID]
,StartDate
,EndDate
,[Minutes]
,ROW_NUMBER() OVER (PARTITION BY InvID, [UID] ORDER BY [Minutes] ASC) rn
FROM InvoiceLines_v
)
SELECT *
FROM CTE
WHERE rn = 1
Replace the ORIGINAL_TABLE with your table name.
QUERY 1:
WITH DUP_TABLE AS
(
SELECT ROW_NUMBER()
OVER (PARTITION BY minutes, ABID ORDER BY minutes, ABID) As ROW_NO
FROM <ORIGINAL_TABLE>
)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;
QUERY 2:
WITH DUP_TABLE AS
(
SELECT ROW_NUMBER()
OVER (PARTITION BY UID ORDER BY minutes) As ROW_NO
FROM <ORIGINAL_TABLE>
)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;

distinct value per column

I am looking at a report on policy exceptions based on various criteria such as Beacon Score, Debt to Income, and Loan to Value. This information is kept in multiple different tables, and right now the Loan to Value column is causing multiple entries in my report because a specific loan might have multiple pieces of collateral. For proper exception monitoring, I only need one entry.
With all that said, how might I execute the following code, with a distinct value for dbo.Folders.Id? Just putting 'DISTINCT' after the SELECT statement does not seem to work. (Sensitive values masked with '#'.)
SELECT dbo.Folders.LoanOfficerId,
dbo.Folders.Id,
dbo.CollateralType.Description,
dbo.Customers.CUSTNAME,
dbo.Folders.DateLoanActivated,
dbo.Folders.CurrentAccountBalance,
dbo.Folders.UnadvancedCommitAmount,
dbo.Folders.BeaconScore,
dbo.Folders.DebtToIncome,
dbo.Collateral.LoanToValue
FROM dbo.Folders
INNER JOIN dbo.Customers
ON dbo.Folders.CustomersNAMEKEY = dbo.Customers.NAMEKEY
INNER JOIN dbo.Collateral
ON dbo.Folders.Id = dbo.Collateral.FoldersID
INNER JOIN dbo.CollateralType
ON dbo.Collateral.CollateralTypeCollCode = dbo.CollateralType.CollCode
WHERE ( (dbo.Folders.BeaconScore < ###)
AND (dbo.Folders.BeaconScore > ###)
AND (dbo.Folders.CloseCode = 'O')
AND (dbo.Folders.CollateralCode <> ##)
)
OR ( (dbo.Folders.CloseCode = 'O')
AND (dbo.Folders.CustomerType <> '###')
AND (dbo.Folders.CustomerType <> '###')
AND (dbo.Folders.DebtToIncome > ##)
)
OR ( (dbo.Folders.CloseCode = 'O')
AND (dbo.Folders.CustomerType = '###')
AND (dbo.Folders.DebtToIncome > ##)
)
OR ( (dbo.Folders.CloseCode = 'O')
AND (dbo.Folders.CustomerType = '###')
AND (dbo.Folders.DebtToIncome > ##)
)
OR (dbo.Collateral.LoanToValue > dbo.CollateralType.LTV)
Any constructive criticism on my code is welcome. (Static values in the above statement are on the docket to be corrected later with a thresholds/criteria table.) From what I have seen, others have suggested using ROW_COUNT() with PARTITION, but I am unable to make the syntax work.
Comment about formatting: learn to use table aliases. They make the query easier to read and write.
If you only need one row from the results, you can use row_number(). This enumerates the rows for each folder (in your case) and you would just use the first one. You can do this using:
with t as (
<your query here>
)
select t.*
from (select t.*,
row_number() over (partition by FoldersId order by (select NULL)) as seqnum
from t
) t
where seqnum = 1;
On the other hand, if you needed to aggregate information from the collateral tables, then you would use group by in your query with the appropriate aggregation functions.

Getting row number for query

I have a query which will return one row. Is there any way I can find the row index of the row I'm querying when the table is sorted?
I've tried rowid but got #582 when I was expecting row #7.
Eg:
CategoryID Name
I9GDS720K4 CatA
LPQTOR25XR CatB
EOQ215FT5_ CatC
K2OCS31WTM CatD
JV5FIYY4XC CatE
--> C_L7761O2U CatF <-- I want this row (#5)
OU3XC6T19K CatG
L9YKCYAYMG CatH
XKWMQ7HREG CatI
I've tried rowid with unexpected results:
SELECT rowid FROM Categories WHERE CategoryID = 'C_L7761O2U ORDER BY Name
EDIT: I've also tried J Cooper's suggestion (below), but the row numbers just aren't right.
using (var cmd = conn.CreateCommand()) {
cmd.CommandText = string.Format(#"SELECT (SELECT COUNT(*) FROM Recipes AS t2 WHERE t2.RecipeID <= t1.RecipeID) AS row_Num
FROM Recipes AS t1
WHERE RecipeID = 'FB3XSAXRWD'
ORDER BY Name";
cmd.Parameters.AddWithValue("#recipeId", id);
idx = Convert.ToInt32(cmd.ExecuteScalar());
Here is a way to get the row number in Sqlite:
SELECT CategoryID,
Name,
(SELECT COUNT(*)
FROM mytable AS t2
WHERE t2.Name <= t1.Name) AS row_Num
FROM mytable AS t1
ORDER BY Name, CategoryID;
Here's a funny trick you can use in Spatialite to get the order of values. If you use the count() function with a WHERE clause limiting to only values >= the current value, then the count will actually give the order. So if I have a point layer called "mypoints" with columns "value" and "val_order" then:
SELECT value, (
SELECT count(*) FROM mypoints AS my
WHERE my.value>=mypoints.value) AS val_order
FROM mypoints
ORDER BY value DESC;
Gives the descending order of the values.
I can update the "val_order" column this way:
UPDATE mypoints SET val_order = (
SELECT count(*) FROM mypoints AS my
WHERE my.value>=mypoints.value
);
What you are asking can be explained in two different ways, but I'm assuming you want to sort the resulting table and then number those rows according to the sort.
declare #resultrow int
select
#resultrow = row_number() OVER (ORDER BY Name Asc) as 'Row Number'
from Categories WHERE CategoryID = 'C_L776102U'
select #resultrow