CTE is not returning expected values - sql

I have a table that I am trying to find duplicate rows using a Common Table Expression. The fields that I working with are as follows:
LogTime (DataType: datetime2(7),null)
ControllerIP (DataType: nvchar(max),null)
I have two rows of data that have the same data in them, as far as I can tell. I did a LEN check on both columns to make sure they are equal lengths as well, yet the rows do not come back as duplicates
when using the CTE below. Is there something different I need to do with the LogTime column ? I have never run into this.
WITH CTE AS
(
SELECT rn = ROW_NUMBER()
OVER(
PARTITION BY LogTime , ControllerIP
ORDER BY Id ASC), *
FROM [DownTime].[dbo].[Records]
)
SELECT * FROM cte
WHERE FileName = '141101.CSV' AND rn > 1
Order By ID
GO
Also, I am using Microsoft SQL Server 2008R2.

Your plan is sound. If you're not finding duplicates, it's because duplicates don't exist. You can apply some functions to the columns to make finding duplicates more likely, such as trimming spaces from the IP and reducing the precision of the datetime2.
WITH CTE AS (
SELECT rn = ROW_NUMBER() OVER(
PARTITION BY CAST(LogTime AS datetime2(2)), RTRIM(LTRIM(ControllerIP))
ORDER BY Id ASC), *
FROM [DownTime].[dbo].[Records]
)
SELECT * FROM cte
WHERE FileName = '141101.CSV' AND rn > 1
Order By ID

Related

Removing duplicate rows based on one column same values but keep one record

SQL Server Version
Remove all dupe rows (row 3 thru 18) with service_date = '2018-08-29 13:05:00.000' but keep the oldest row (row 2) and of course keep row 1 since its different service_date. Don't mind the create_timestamp or document_file since it's the same customer. Any idea?
In SQL Server, we can try deleting using a CTE:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY service_date ORDER BY create_timestamp) rn
FROM yourTable
)
DELETE
FROM cte
WHERE rn > 1;
The strategy here is to assign a row number to each group of records sharing the same service_date, with 1 being assigned to the oldest record in that group. Then, we can phrase the delete by just targeting all records which have a row number greater than 1.
You don't need to use Partition function.please use the below query for efficient performance.i have tested its working fine.
with result as
(
select *, row_number() over(order by create_timestamp) as Row_To_Delete from TableName
)
delete from result where result.Row_To_Delete>2
I think you will want to remove these data per customer basis
I mean, if customers are different you will want to keep the entries even on the same date
If you you will require the addition of Customer column in partition by clause used to identify duplicate rows in SQL
By copying and modifying Tim's solution, you can check following
;WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer, service_date ORDER BY create_timestamp) rn
FROM yourTable
)
DELETE
FROM cte
WHERE rn > 1;

Remove oldest duplicates and keep latest duplicate by time stamp

I have a query as follows:
;WITH Duplicates AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY ChannelName, SerialNumber, ReadingDate ORDER BY ChannelName) AS Rownumber
FROM [Staging].[UriData]
)
DELETE FROM Duplicates WHERE Rownumber > 1
--AND ROWNUMBER >=< ???
OPTION (MAXRECURSION 0)
This works great and finds the duplicates in the table. However, the table is frequently updated with corrected data.
By the time the query has run, there could have been three or more updates.
This means I want to delete all but the latest records. There is a timestamp field in the table, that denotes when the latest insert happened. I am assuming I should use this field to determine which is the latest row, and any that are not the highest row number, delete them. Is this the correct approach?
TIA
Of course you can use timestamp column with ROW_NUMBER() & you don't need to use recursion hint as your CTE has not any recursion level.
;WITH Duplicates AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ChannelName, SerialNumber, ReadingDate ORDER BY timestamp DESC) AS Rownumber
FROM [Staging].[UriData]
)
DELETE d
FROM Duplicates d
WHERE Rownumber > 1;
DELETE older
FROM Staging.UriData older
WHERE EXISTS(SELECT 1
FROM Staging.UriData newer
WHERE newer.ChannelName = older.older
and newer.SerialNumber = older.SerialNumber
and newer.ReadingDate = older.ReadingDate
and newer.timestamp > older.timestamp
)

Select Single and Duplicate Row and Return Multiple Columns

I'm currently working with my database in SQL Server. I have a table with 23 fields and it has single and duplicate rows. How can I select both of them without having any duplicate data.
I have try this query:
SELECT
Code, Stuff, and other fields....
FROM
(
SELECT
*,ROW_NUMBER() OVER (PARTITION BY Code ORDER BY Code) AS RN
FROM
my_table
)t
WHERE RN = 1
The above code just return the data from the duplicate rows. But, I want the "single rows" also returned.
This is the illustration.
Thank you for the help.
Could it be as simple as:
SELECT DISTINCT Code, Stuff FROM MyTable
Or, just add stuff to the partition by clause:
PARTITION BY Code,Stuff ORDER BY Code
Try This
You may need to add Stuff and more fields in Partition BY
SELECT
Code, Stuff
FROM
(
SELECT
*,ROW_NUMBER() OVER (PARTITION BY Code,Stuff ORDER BY Code) AS RN
FROM
my_table
)t
WHERE RN = 1

Select all but last row in Oracle SQL

I want to pull all rows except the last one in Oracle SQL
My database is like this
Prikey - Auto_increment
common - varchar
miles - int
So I want to sum all rows except the last row ordered by primary key grouped by common. That means for each distinct common, the miles will be summed (except for the last one)
Note: the question was changed after this answer was posted. The first two queries work for the original question. The last query (in the addendum) works for the updated question.
This should do the trick, though it will be a bit slow for larger tables:
SELECT prikey, authnum FROM myTable
WHERE prikey <> (SELECT MAX(prikey) FROM myTable)
ORDER BY prikey
This query is longer but for a large table it should faster. I'll leave it to you to decide:
SELECT * FROM (
SELECT
prikey,
authnum,
ROW_NUMBER() OVER (ORDER BY prikey DESC) AS RowRank
FROM myTable)
WHERE RowRank <> 1
ORDER BY prikey
Addendum There was an update to the question; here's the updated answer.
SELECT
common,
SUM(miles)
FROM (
SELECT
common,
miles,
ROW_NUMBER() OVER (PARTITION BY common ORDER BY prikey DESC) AS RowRank
FROM myTable
)
WHERE RowRank <> 1
GROUP BY common
Looks like I am a little too late but here is my contribution, similar to Ed Gibbs' first solution but instead of calculating the max id for each value in the table and then comparing I get it once using an inline view.
SELECT d1.prikey,
d1.authnum
FROM myTable d1,
(SELECT MAX(prikey) prikey myTable FROM myTable) d2
WHERE d1.prikey != d2.prikey
At least I think this is more efficient if you want to go without the use of Analytics.
query to retrieve all the records in the table except first row and last row
select * from table_name
where primary_id_column not in
(
select top 1 * from table_name order by primary_id_column asc
)
and
primary_id_column not in
(
select top 1 * from table_name order by primary_id_column desc
)

SQL ROW_NUMBER and sorting issue

In SQL 2005/2008 database we have table BatchMaster. Columns:
RecordId bigint - autoincremental id, BatchNumber bigint - unique non-clustered index, BatchDate). We have sproc that returns paginated data from this table. That sproc works fine for most of the clients, but at one SQL server instance we have problem with records order.
In general, at sproc we do
select * from
(
select row_number() over (order by bm.BatchDate desc, bm.BatchNumber desc) as Row,
*
from dbo.BatchMaster bm with (nolock)
)
where Row between #StartingRow and #EndgingRow
So, as you can notice from the script above we want return records sorted by BatchDate and BatchNumber. That's not gonna happen for one of our client:
Records are in wrong order. Also, notice first column (Row), it is not in ascending order.
Can someone explain why so?
Assuming you want the lowest BatchNumber for a given BatchDate with the smallest Row number and that you want orderer by the Row, try this:
select * from
(
select row_number() over (order by bm.BatchDate desc, bm.BatchNumber asc) as Row,
*
from dbo.BatchMaster bm with (nolock)
)
where Row between #StartingRow and #EndgingRow
order by Row
Your code doesn't actually sort the results, it only sets 'Row' based on the order of BatchDate and Batchnumber and appears to be doing that correctly. You need to add ORDER BY Row to your statement.
Change your query to include a sort in the outermost query
select * from
(
select row_number() over (order by bm.BatchDate desc, bm.BatchNumber desc) as Row,
*
from dbo.BatchMaster bm with (nolock)
)
where Row between #StartingRow and #EndgingRow
order by Row
The ORDER BY clause in your ROW_NUMBER ranking function only applies to calculating the value of that ranking function, it does not actually order the results.
If you would like the records returned in a certain order you will need specify that in your query: ORDER BY [Row]