Remove the duplicates expect one record in SQL Server - sql

Here I wanted to delete all records with value 1 and only keep a single record

Without knowing your DBMS it's really tough to know which query you need. If your dbms supports cte and row_number() then below query will work.
with cte as
(select *,row_number()over(order by column_1)rn from table_name)t
delete cte where rn>1
In SQL Server this will work fine.

Given the nature of your data, I would suggest removing all rows and adding a new one back in:
truncate table t;
insert into t(column_1)
values (1);
Be careful! The truncate table removes all rows from the table.

Related

Oracle Insert Select with order by

I am working on a plsql procedure where i am using an insert-select statement.
I need to insert into the table in ordered manner. but the order by i used in the select sql is not working.
is there any specific way in oracle to insert rows in orderly fashion?
The use of an ORDER BY within an INSERT SELECT is not pointless as long as it can change the content of the inserted data, i.e. with a sequence NEXTVAL included in the SELECT clause. And this even if the inserted rows won't be sorted when fetched - that's the role of your ORDER BY clause in your SELECT clause when accessing the rows.
For such a goal, you can use a work-around placing your ORDER BY clause in a sub-query, and it works:
INSERT INTO myTargetTable
(
SELECT mySequence.nextval, sq.* FROM
( SELECT f1, f2, f3, ...fx
FROM mySourceTable
WHERE myCondition
ORDER BY mySortClause
) sq
)
The typical use case for an ordered insert is in order to co-locate particular value in the same blocks (effectively reducing the clustering factor on indexes on columns by which you have ordered the data).
This generally requires a direct path insert ...
insert /*+ append */ into ...
select ...
from ...
order by ...
There's nothing invalid about this as long as you accept that it's only worthwhile for bulk data, that the data will load above the high water mark only, and that there are locking issues involved.
Another approach which achieves mostly the same effect, but which is more arguably more suitable for OLTP systems, is to create the table in a cluster.
The standard Oracle table is a heap-organized table. A heap-organized table is a table with rows stored in no particular order.
Sorting has nothing to do while inserting rows. and is completely pointless. You need an ORDER BY only while projecting/selecting the rows.
That is how the Oracle RDBMS is designed.
I'm pretty sure that Oracle does not guarantee to insert rows to a table in any specific order (even if the rows were inserted in that order).
Performance and storage considerations far outweigh ordering considerations (as every user might have a different preference for order)
Why not just use an "ORDER BY" clause in your SELECT statement?
Or better yet, create a VIEW that already has the ORDER BY clause in it?
CREATE VIEW your_table_ordered
SELECT *
FROM your_table
ORDER BY your_column

Insert into combined with select where

Let's say we have a query like this (my actual query is similar to this but pretty long)
insert into t1(id1,c1,c2)
select id1,c1,c2 from t2
where not exists(select * from t1 where t1.id1=t2.id1-1)
Does this query select first and insert all, or insert each selected item one by one?
it matters because I'm trying insert a record depending on the previous inserted records and it doesn't seem to work.
First the select query is ran. So it will select all the rows that match your filter. After that the insert is performed. There is not row by row insertion when you use one operation.
Still if you want to do something recursive that will check after each insert you can use CTEs (Common Table Expressions). http://msdn.microsoft.com/en-us/library/ms190766(v=sql.105).aspx
This runs a select statement one time and then inserts based on that. It is much more efficient that way.
Since you already know what you will be inserting, you should be able to handle this in your select query rather than looking at what you have already inserted.

Delete rows from CTE in SQL SERVER

I have a CTE which is a select statement on a table. Now if I delete 1 row from the CTE, will it delete that row from my base table?
Also is it the same case if I have a temp table instead of CTE?
Checking the DELETE statement documentation, yes, you can use a CTE to delete from and it will affect the underlying table. Similarly for UPDATE statements...
Also is it the same case if I have a temp table instead of CTE?
No, deletion from a temp table will affect the temp table only -- there's no connection to the table(s) the data came from, a temp table is a stand alone object.
You can think of CTE as a subquery, it doesn't have a temp table underneath.
So, if you run delete statement against your CTE you will delete rows from the table. Of course if SQL can infer which table to upadte/delete base on your CTE. Otherwise you'll see an error.
If you use temp table, and you delete rows from it, then the source table will not be affected, as temp table and original table don't have any correlation.
In the cases where you have a sub query say joining multiple tables and you need to use this in multiple places then both cte and temp table can be used. If you however want to delete records based on the sub query condition then cte is the way to go. Sometimes you can simply use the delete statement with out a need of cte since it's a delete statement and only rows that satisfy the query conditions get deleted even though multiple conditions are used for filtering.

Deleting duplicate rows in a database without using rowid or creating a temp table

Many years ago, I was asked during a phone interview to delete duplicate rows in a database. After giving several solutions that do work, I was eventually told the restrictions are:
Assume table has one VARCHAR column
Cannot use rowid
Cannot use temporary tables
The interviewer refused to give me the answer. I've been stumped ever since.
After asking several colleagues over the years, I'm convinced there is no solution. Am I wrong?!
And if you did have an answer, would a new restriction suddenly present itself? Since you mention ROWID, I assume you were using Oracle. The solutions are for SQL Server.
Inspired by SQLServerCentral.com http://www.sqlservercentral.com/scripts/T-SQL/62866/
while(1=1) begin
delete top (1)
from MyTable
where VarcharColumn in
(select VarcharColumn
from MyTable
group by VarcharColumn
having count(*) > 1)
if ##rowcount = 0
exit
end
Deletes one row at a time. When the second to last row of a set of duplicates disappears then the remaining row won't be in the subselect on the next pass through the loop. (BIG Yuck!)
Also, see http://www.sqlservercentral.com/articles/T-SQL/63578/ for inspiration. There RBarry Young suggests a way that might be modified to store the deduplicated data in the same table, delete all the original rows, then convert the stored deduplicated data back into the right format. He had three columns, so not exactly analogous to what you are doing.
And then it might be do-able with a cursor. Not sure and don't have time to look it up. But create a cursor to select everything out of the table, in order, and then a variable to track what the last row looked like. If the current row is the same, delete, else set the variable to the current row.
This is a completely Jacked up way to do it, but given the assanine requirements, here is a workable solution assuming SQL 2005 or later:
DELETE from MyTable
WHERE ROW_NUMBER() over(PARTITION BY [MyField] order by MyField)>1
I would put a unique number of fixed size in the VARCHAR column for the duplicated rows, then parse out the number and delete all but the minimum row. Maybe that's what his VARCHAR constraint is for. But that stinks because it assumes that your unique number will fit. Lame question. You didn't want to work there anyway. ;-)
Assume you are implementing the DELETE statement for a SQL engine. how will you delete two rows from a table that are exactly identical? You need something to distinguish one from the other!
You actually cannot delete entirely duplicate rows (ALL columns being equal) under the following constraints(as provided to you)
No use of ROWID or ROWNUM
No Temporary Table
No procedural code
It can, however be done even if one of the conditions is relaxed. Here are solutions using at least one of the three conditions
Assume table is defined as below
Create Table t1 (
col1 vacrchar2(100),
col2 number(5),
col3 number(2)
);
Duplicate rows identification:
Select col1, col2, col3
from t1
group by col1, col2, col3
having count(*) >1
Duplicate rows can also be identified using this:
select c1,c2,c3, row_number() over (partition by (c1,c2,c3) order by c1,c2,c3) rn from t1
NOTE: The row_number() analytic function cannot be used in a DELETE statement as suggested by JohnFx at least in Oracle 10g.
Solution using ROWID
Delete from t1 where row_id > ( select min(t1_inner.row_id) from t1 t1_innner where t1_inner.c1=t1.c1 and t1_inner.c2=t1.c2 and t1_inner.c3=t1.c3))
Solution using temp table
create table t1_dups as (
//write query here to find the duplicate rows as liste above//
)
delete from t1
where t1.c1,t1.c2,t1.c3 in (select * from t1.dups)
insert into t1(
select c1,c2,c3 from t1_dups)
Solution using procedural code
This will use an approach similar to the case where we use a temp table.
create table temp as
select c1,c2
from table
group by c1,c2
having(count(*)>1 or count(*)=1);
Now drop the base table .
Rename the temp table to base table.
Mine was resolved using this query:
delete from where in (select from group by having count(*) >1)
in PLSQL

Insert into temp values (select.... order by id)

I'm using an Informix (Version 7.32) DB. On one operation I create a temp table with the ID of a regular table and a serial column (so I would have all the IDs from the regular table numbered continuously). But I want to insert the info from the regular table ordered by ID something like:
CREATE TEMP TABLE tempTable (id serial, folio int );
INSERT INTO tempTable(id,folio)
SELECT 0,folio FROM regularTable ORDER BY folio;
But this creates a syntax error (because of the ORDER BY)
Is there any way I can order the info then insert it to the tempTable?
UPDATE: The reason I want to do this is because the regular table has about 10,000 items and in a jsp file, it has to show every record, but it would take to long, so the real reason I want to do this is to paginate the output. This version of Informix doesn't have Limit nor Skip. I can't renumber the serial because is in a relationship, and this is the only solution we could get a fixed number of results on one page (for example 500 results per page). In the Regular table has skipped id's (called folio) because they have been deleted. if i were to put
SELECT * FROM regularTable WHERE folio BETWEEN X AND Y
I would get maybe 300 in one page, then 500 in the next page
You can do this by breaking up the SQL into two temp tables:
CREATE TEMP TABLE tempTable1 (
id serial,
folio int);
SELECT folio FROM regularTable ORDER BY folio
INTO TEMP tempTable2;
INSERT INTO tempTable1(id,folio) SELECT 0,folio FROM tempTable2;
In Informix when using a SELECT as a sub-clause in an INSERT statement, you are limited
to a subset of the SELECT syntax.
The following SELECT clauses are not supported in this case:
INTO TEMP
ORDER BY
UNION.
Additionally, the FROM clause of the SELECT can not reference the same table as referenced by the INSERT (not that this matters in your case).
It's been years since I worked on Informix, but perhaps something like this will work:
INSERT INTO tempTable(id,folio)
SELECT 0, folio
FROM (
SELECT folio FROM regularTable ORDER BY folio
);
You might try it iterating a cursor over the SELECT ... ORDER BY and doing the INSERTs within the loop.
It makes no sense to order the rows as you insert into a table. Relational databases do not allow you to specify the order of rows in a table.
Even if you could, SQL does not guarantee a query will return rows in any order, such as the order you inserted them. You must specify an ORDER BY clause to guarantee an order for a query result.
So it would do you no good to change the order in which you insert the rows.
As stated by Bill, there's not a lot of point ordering the input, you really need to order the output. In the simplistic example you've provided, it just makes no sense, so I can only assume that the real problem you're trying to solve is more complex - deduplication perhaps?
The functionality you're after is CREATE SEQUENCE, but I'm pretty sure it's not available in such an old version of Informix.
If you really need to do what you're asking, you could look into UNLOADing the data in the required order, and then LOADing it again. That would ensure the SERIAL values get allocated sequentially.
Would something like this work?
SELECT
folio
FROM
(
SELECT
ROWNUM n,
folio
FROM
regularTable
ORDER BY
folio
)
WHERE
n BETWEEN 501 AND 1000
It may not be terribly efficient if the table grows larger or you're fetching later "pages", but 10K rows is pretty small.
I don't recall if Informix has a ROWNUM concept, I use Oracle.