remove duplicate records with a criteria

remove duplicate records with a criteria - sql

I am using a script which requires only unique values. And I have a table which has duplicates like below, i need to keep only unique values (first occurrence) irrespective of what is present inside the brackets.
can I delete the records and keep the unique records using a single query?
Input table
ID Name
1 (Del)testing
2 (Del)test
3 (Delete)testing
4 (Delete)tester
5 (Del)tst
6 (Delete)tst
So the output tables should be something like
Input table
ID Name
1 (Del)testing
2 (Del)test
3 (Delete) tester
4 (Del)tst

SELECT DISTINCT * FROM FOO;
It depends how much data you have to retrieve, if you only have to change Delete -> Del you can try with REPLACE
http://technet.microsoft.com/en-us/library/ms186862.aspx
also grouping functions should help you
I don't think this would be easy query

Assumption: The name column always has all strings in the format given in the sample data.
Try this:
;with cte as
(select *, rank() over
(partition by substring(name, charindex(')',name)+1,len(name)+1 - charindex(')',name))
order by id) rn
from tbl
),
filtered_cte as
(select * from cte
where rn = 1
)
select rank() over (partition by getdate() order by id,getdate()) id , name
from filtered_cte
How this works:
The first CTE cte uses rank() to rank the occurrence of the string outside brackets in the name column.
The second CTE filtered_cte only returns the first row for each occurence of the specified string. In this step, we get the expected results, but not in the desired format.
In this step we partition by and order by the getdate() function. This function is chosen as a dummy to give us continuous values for the id column while using the rank function as we did in step 1.
Demo here.
Note that this solution will return filtered values, but not delete anything in the source table. If you wish, you can delete from the CTE created in step 1 to remove data from the source table.

First use this update to make them uniform
Update table set name = replace(Name, '(Del)' , '(Delete)')
then delete the repetitive names
Delete from table where id in
(Select id from (Select Row_Number() over(Partition by Name order by id) as rn,* from table) x
where rn > 1)

First create the input date table
CREATE TABLE test
(ID int,Name varchar(20));
INSERT INTO test
(`ID`, `Name`)
VALUES
(1, '(Del)testing'),
(2, '(Del)test'),
(3, '(Delete)testing'),
(4, '(Delete)tester'),
(5, '(Del)tst'),
(6, '(Delete)tst');
Select Query
select id, name
from (
select id, name ,
ROW_NUMBER() OVER(PARTITION BY substring(name,PATINDEX('%)%',name)+1,20) ORDER BY name) rn
from test ) t
where rn= 1
order by 1
SQL Fiddle Link
http://www.sqlfiddle.com/#!6/a02b0/34

Related

Computed column formula for row_number function

I am trying to find a formula for a computed column that will return same results as the ROW_NUMBER() function. since I cannot use the ROW_NUMBER() function and save results on the table.
I have a table as below;
ID YEAR
1 2018
2 2018
3 2019
4 2019
5 2020
6 2018
7 2019
I would like a formulae that will compute and assign numbers the rows depending with year as shown below;
ID YEAR COMPUTED COLUMN
1 2018 1
2 2018 2
3 2019 1
4 2019 2
5 2020 1
6 2018 3
7 2019 3

I am trying to find a formula for a computed column that will return same results as the ROW_NUMBER() function.
You just can't use window functions in a computed column; window functions operate on a range of rows (called the window frame), while a computed column has visibility to the row it belongs to only.
I cannot use the ROW_NUMBER() function and save results on the table
While this is technically possible, I would not recommend it. This is derived information, that can be computed on the fly whenever needed. You could use a view instead:
create view myview
as
select
id,
year,
row_number() over (partition by year order by id) rn
from mytable

There is always a way around it.
It's not a good way though...
select
id,
year,
row_number() over(partition by year order by id) rn,
(select count(*) from Table1 t where t.YEAR=d.YEAR and t.ID<=d.ID) PoorRN
from Table1 d
order by id
• This is going to have a very bad performance.
If your table has over a million rows - just go ahead and through it on production and tell us what happened ;)

Update: Based on the comments by OP, I have a new suggestion -
From the original table - create a view and then give numbers to each year
create view original_table_view
as
select *, ROW_NUMBER() over ( partition by [year] order by ID) as numbering
from original_table
Create a new table and apply a Merge on the new table from view
create table New_table (id int, [year] int, numbering int)
Create Merge syntax -
merge new_table t using original_table_view v
on (t.id = v.id) --merge_condition
when matched then update set
t.[year]=v.[year],
t.numbering=v.numbering
when not matched by target
then insert ( id, [year], numbering)
values (v.id, v.[year],numbering)
when not matched by source
then delete;
SO now if in future, if new values are inserted in the original table, you just need to run the merge query to update the new table.
view data -
select * from original_table order by ID
select * from original_table_view order by ID
select * from New_table order by ID
Old Answer:
I confused your query with calculated column instead of the computed column (sorry about that)
You cannot store function as a computed column in the SQL server. However, you can create a view on the table and then call the same.
example -
create table as below
create table year_computed2 (ID int, Year int )
go
insert into year_computed2 values (1,2018 )
insert into year_computed2 values (2,2018 )
insert into year_computed2 values (3,2019 )
insert into year_computed2 values (4,2019 )
insert into year_computed2 values (5,2020 )
insert into year_computed2 values (6,2018 )
insert into year_computed2 values (7,2019 )
go
Now Create a view using Row_number() function (you can also use Rank() and Ntile())
alter view year_computed_view as
select *, ROW_NUMBER() over ( partition by [year] order by ID) as Rn from year_computed2
Now Query this view like below
select * from year_computed_view order by ID
enter image description here
In-case you were talking about the calculated column - below answer is
available
Dense_Rank()
select DISTINCT id, [year], DENSE_RANK() over ( order by [year]) as [Dense_rank]
from year_computed
ORDER BY ID
Output is like:
If Dense_rank does't fulfill then try below two functions -
Rank()
select DISTINCT id, [year], RANK() over ( partition by [year] order by ID) as [Rank]
from year_computed
ORDER BY ID
enter image description here
NTile()
declare #ntile_count int = (select count (distinct [year])from year_computed)
select id, [year], ntile(#ntile_count) over (partition by [year] order by id) as rn from year_computed ORDER BY ID
enter image description here

Change value of duplicated rows

There is a table with tow columns(ID, Data) and there are 3 rows with same value.
ID Data
4 192.168.0.22
4 192.168.0.22
4 192.168.0.22
Now I want to change third row DATA column. In update SQL Server Generate an error that I ca not change the value.
I can delete all 3 rows. But I can not delete third row separately.
This table is for a software that I bought and I changed the third Server IP.

You can try the following query
create table #tblSimilarValues(id int, ipaddress varchar(20))
insert into #tblSimilarValues values (4, '192.168.0.22'),
(4, '192.168.0.22'),(4, '192.168.0.22')
Use Below query if you want to change all rows
with oldData as (
select *,
count(*) over (partition by id, ipaddress) as cnt
from #tblSimilarValues
)
update oldData
set ipaddress = '192.168.0.22_1'
where cnt > 1;
select * from #tblSimilarValues
Use Below query if you want to skip firs row
;with oldData as (
select *,
ROW_NUMBER () over (partition by id, ipaddress order by id, ipaddress) as cnt
from #tblSimilarValues
)
update oldData
set ipaddress = '192.168.0.22_2'
where cnt > 1;
select * from #tblSimilarValues
drop table #tblSimilarValues
You can find the live demo live demo here

Since there is no column that allows us to distinguish these rows from each other, there's no "third row" (nor a first or second one for that matter).
We can use a ROW_NUMBER function to apply arbitrary row numbers to these rows, however, and if we place that in a CTE, we can apply DELETE/UPDATE actions via the CTE and use the arbitrary row numbers:
declare #t table (ID int not null, Data varchar(15))
insert into #t(ID,Data) values
(4,'192.168.0.22'),
(4,'192.168.0.22'),
(4,'192.168.0.22')
;With ArbitraryAssignments as (
select *,ROW_NUMBER() OVER (PARTITION BY ID, Data ORDER BY Data) as rn
from #t
)
delete from ArbitraryAssignments where rn > 2
select * from #t
This produces two rows of output - one row was deleted.
Note that I say that the ROW_NUMBER is arbitrary. One of the expressions in both the PARTITION BY and ORDER BY clauses is the same. By definition, then, we know that no real ORDER is defined by this (because all rows within the same partition, by definition, have the same value for that expression).

In this case ID columns allows duplicate value which is wrong, ID should be unique.
Now what you can do is create a new column make that unique or Primary Key or change the duplicate values of ID column and make it Unique/Primary key.
Now as per your Unique key/Primary key you can update DATA column value by query as below:
UPDATE <Table Name>
SET DATA = 'new data'
WHERE ID = 3;

Unique ID using function with every record in Insert statement

I have a statement in stored procedure
INSERT into table(ID, name, age)
SELECT fnGetLowestFreeID(), name, age
FROM #tempdata
The function fnGetLowestFreeID() gets the lowest free ID of the table table.
I want to insert unique ID with every record in the table. I have tried iteration and transaction. But they aren't fitting the scenario.
I cannot use Identity Column. I have this restriction of using IDs between 0-4 and assigning the lowest free ID using that function. In case of returned ID greater than 4, the function is returning an error. Suppose there are already 1 and 2 in the table. The function will return 0 and I have to assign this ID to the new record, 3 to the next record and so on on the basis of number of records in the #tempdata.

try this
CREATE TABLE dbo.Tmp_City(Id int NOT NULL IDENTITY(1, 1),
Name varchar(50) , Country varchar(50), )
OR
ALTER TABLE dbo.Tmp_City
MODIFY COLUMN Id int NOT NULL IDENTITY(1, 1)
OR
Create a Sequence and assign Sequence.NEXTVAL as ID
in the insert statement

You can make use of a rank function like row_number and do something like this.
INSERT into table(ID, name, age)
SELECT row_number() over (order by id) + fnGetLowestFreeID(), name, age
FROM #tempdata

Here are 3 scenarios-
1)Show the function which you are using
2) Doesn't make sense to use a function and make it unique
still- you can use rank-
INSERT into table(ID, name, age)
SELECT row_number() over (order by id) + fnGetLowestFreeID(), name, age
FROM #tempdata
3)Else, get rid of function and use max(id)+1 because you dont want to use identitiy column

You could use a Numbers table to join the query doing your insert. You can google the concept for more info, but essentially you create a table (for example with the name "Numbers") with a single column "nums" of some integer type, and then you add some amount of rows, starting with 0 or 1, and going as far as you need. For example, you could end with this table content:
nums
----
0
1
2
3
4
5
6
Then you can use such a table to modify your insert, you don't need the function anymore:
INSERT into table(ID, name, age)
SELECT t2.nums, t.name, t.age
FROM (
SELECT name, age, row_number() over (order by name) as seq
FROM #tempdata
) t
INNER JOIN (
SELECT n.nums, row_number() over (order by n.nums) as seq
FROM Numbers n
WHERE n.nums < 5 AND NOT EXISTS (
SELECT * FROM table WHERE table.ID = n.nums
)
) t2 ON t.seq = t2.seq
Now, this query leaves out one of your requirements, that would be launching an error when no slots are available, but that is easy to fix. You can do previously a query and test if the count of records in table plus the sum of records in #tempdata is higher than 5. If so, you launch the error as you know there would not be enough free slots for the records in #tempdata.
Side note: table looks like a terrible name for a table, I hope that in your real code you have a meaningful name :)

Gow to select unique id when ids are same and have different string values in variable

I am working with Microsoft SQL Server. Now I have many same ids and the values of other variables associated with them may or may not be same.
I just want to select one unique id every time (I don't care about values of other variables). Values of other variables can be anything. I am just focused on selecting unique id and any values associated with any of the duplicate ids.

You could use the row_number function to assign a unique number to every ID, and then query just one of them:
SELECT id, variablea, variableb
FROM (SELECT id, variablea, variableb,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY varaiblea) AS rn
FROM mytable) t
WHERE rn = 1

Another option is to use the WITH TIES clause in concert with Row_Number()
Note: No Extra Field.
Example
Declare #YourTable Table ([id] int,[variableA] varchar(50),[variableB] varchar(50))
Insert Into #YourTable Values
(1,'xyz','abc')
,(1,'fgh','rty')
,(2,'qwe','ui')
,(3,'jk','vbn')
,(3,'asd','ty')
,(3,'fgh','po')
Select top 1 with ties *
From #YourTable
Order By Row_Number() over (Partition By ID Order by variableA)
Returns
id variableA variableB
1 fgh rty
2 qwe ui
3 asd ty

How to retrieve specific rows from SQL Server table?

I was wondering is there a way to retrieve, for example, 2nd and 5th row from SQL table that contains 100 rows?
I saw some solutions with WHERE clause but they all assume that the column on which WHERE clause is applied is linear, starting at 1.
Is there other way to query a SQL Server table for a specific rows in case table doesn't have a column whose values start at 1?
P.S. - I know for a solution with temporary tables, where you copy your select statement output and add a linear column to the table. I am using T-SQL

Try this,
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY ColumnName ASC) AS rownumber
FROM TableName
) as temptablename
WHERE rownumber IN (2,5)

With SQL Server:
; WITH Base AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY id) RN FROM YourTable
)
SELECT *
FROM Base WHERE RN IN (2, 5)
The id that you'll have to replace with your primary key or your ordering, YourTable that is your table.
It's a CTE (Common Table Expression) so it isn't a temporary table. It's something that will be expanded together with your query.

There is no 2nd or 5th row in the table.
There is only the 2nd or 5th result in a resultset that you return, as determined by the order you specify in that query.

If you are on SQL Server 2005 or above, you could use Row_Number() function. Ex:
;With CTE as (
select col1, ..., row_number() over (order by yourOrderingCol) rn
from yourTable
)
select col1,...
from cte
where rn in (2,5)
Please note that yourOrderingCol will decide the value of row number (i.e. rn).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

remove duplicate records with a criteria - sql

SELECT DISTINCT * FROM FOO; It depends how much data you have to retrieve, if you only have to change Delete -> Del you can try with REPLACE http://technet.microsoft.com/en-us/library/ms186862.aspx also grouping functions should help you I don't think this would be easy query

First use this update to make them uniform Update table set name = replace(Name, '(Del)' , '(Delete)') then delete the repetitive names Delete from table where id in (Select id from (Select Row_Number() over(Partition by Name order by id) as rn,* from table) x where rn > 1)

Related

Computed column formula for row_number function

Change value of duplicated rows

Unique ID using function with every record in Insert statement

Gow to select unique id when ids are same and have different string values in variable

How to retrieve specific rows from SQL Server table?

Categories

Resources