I have a SQL query where I am going to be transferring a fair amount of response data down the wire, but I want to get the total rowcount as quickly as possible to facilitate binding in the UI. Basically I need to get a snapshot of all of the rows that meet a certain criteria, and then be able to page through all of the resulting rows.
Here's what I currently have:
SELECT --primary key column
INTO #tempTable
FROM --some table
--some filter clause
ORDER BY --primary key column
SELECT ##ROWCOUNT
SELECT --the primary key column and some others
FROM #tempTable
JOIN -- some table
DROP TABLE #tempTable
Every once in a while, the query results end up out of order (presumably because I am doing an unordered select from the temp table).
As I see it, I have a couple of options:
Add a second order by clause to the select from the temp table.
Move the order by clause to the second select and let the first select be unordered.
Create the temporary table with a primary key column to force the ordering of the temp table.
What is the best way to do this?
Use number 2. Just because you have a primary key on the table does not mean that the result set from select statement will be ordered (even if what you see actually is).
There's no need to order the data when putting it in the temp table, so take that one out. You'll get the same ##ROWCOUNT value either way.
So do this:
SELECT --primary key column
INTO #tempTable
FROM --some table
--some filter clause
SELECT ##ROWCOUNT
SELECT --the primary key column and some others
FROM #tempTable
JOIN -- some table
ORDER BY --primary key column
DROP TABLE #tempTable
Move the order by from the first select to the second select.
A database isn't a spreadsheet. You don't put the data into a table in a particular order.
Just make sure you order it properly when you get it back out.
Personally I would select out the data in the order you want to eventually have it. So in your first select, have your order by. That way it can take advantage of any existing indexes and access paths.
Related
I have this table, where every column is a VARCHAR (or equivalent):
field001 field002 field003 field004 field005 .... field500
500 VARCHAR columns. No primary keys. And no column is guaranteed to be unique. So the only way to know for sure if two rows are the same is to compare the values of all columns.
(Yes, this should be in TheDailyWTF. No, it's not my fault. Bear with me here).
I inserted a duplicate set of rows by mistake, and I need to find them and remove them.
There's 12 million rows on this table, so I'd rather not recreate it.
However, I do know what rows were mistakenly inserted (I have the .sql file).
So I figured I'd create another table and load it with those. And then I'd do some sort of join that would compare all columns on both tables and then delete the rows that are equal from the first table. I tried a NATURAL JOIN as that looked promising, but nothing was returned.
What are my options?
I'm using Amazon Redshift (so PostgreSQL 8.4 if I recall), but I think this is a general SQL question.
You can treat the whole row as a single record in Postgres (and thus I think in Redshift).
The following works in Postgres, and will keep one of the duplicates
delete from the_table
where ctid not in (select min(ctid)
from the_table
group by the_table); --<< Yes, the group by is correct!
This is going to be slow!
Grouping over so many columns and then deleting with a NOT IN will take quite some time. Especially if a lot of rows are going to be deleted.
If you want to delete all duplicate rows (not keeping any of them), you can use the following:
delete from the_table
where the_table in (select the_table
from the_table
group by the_table
having count(*) > 1);
You should be able to identify all the mistakenly inserted rows using CREATEXID.If you group by CREATEXID on your table as below and get the count you should be able to understand how many rows were inserted in your transaction and remove them using DELETE command.
SELECT CREATEXID,COUNT(1)
FROM yourtable
GROUP BY 1;
One simplistic solution is to recreate the table, e.g.
CREATE TABLE my_temp_table (
-- add column definitions here, just like the original table
);
INSERT INTO my_temp_table SELECT DISTINCT * FROM original_table;
DROP TABLE original_table;
ALTER TABLE my_temp_table RENAME TO original_table;
or even
CREATE TABLE my_temp_table AS SELECT DISTINCT * FROM original_table;
DROP TABLE original_table;
ALTER TABLE my_temp_table RENAME TO original_table;
It is a trick but probably it helps.
Each row in the table containing the transaction ID in which it row was inserted/updated: System Columns. It is xmin column. So using it you can to find the transaction ID in which you inserted the wrong data. Then just delete the rows using
delete from my_table where xmin = <the_wrong_transaction_id>;
PS: Be careful and try it on the some test table first.
Can I select rows on row version?
I am querying a database table periodically for new rows.
I want to store the last row version and then read all rows from the previously stored row version.
I cannot add anything to the table, the PK is not generated sequentially, and there is no date field.
Is there any other way to get all the rows that are new since the last query?
I am creating a new table that contains all the primary keys of the rows that have been processed and will join on that table to get new rows, but I would like to know if there is a better way.
EDIT
This is the table structure:
Everything except product_id and stock_code are fields describing the product.
You can cast the rowversion to a bigint, then when you read the rows again you cast the column to bigint and compare against your previous stored value. The problem with this approach is the table scan each time you select based on the cast of the rowversion - This could be slow if your source table is large.
I haven't tried a persisted computed column of this, I'd be interested to know if it works well.
Sample code (Tested in SQL Server 2008R2):
DECLARE #TABLE TABLE
(
Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
Data VARCHAR(10) NOT NULL,
LastChanged ROWVERSION NOT NULL
)
INSERT INTO #TABLE(Data)
VALUES('Hello'), ('World')
SELECT
Id,
Data,
LastChanged,
CAST(LastChanged AS BIGINT)
FROM
#TABLE
DECLARE #Latest BIGINT = (SELECT MAX(CAST(LastChanged AS BIGINT)) FROM #TABLE)
SELECT * FROM #TABLE WHERE CAST(LastChanged AS BIGINT) >= #Latest
EDIT: It seems I've misunderstood, and you don't actually have a ROWVERSION column, you just mentioned row version as a concept. In that case, SQL Server Change Data Capture would be the only thing left I could think of that fits the bill: http://technet.microsoft.com/en-us/library/bb500353(v=sql.105).aspx
Not sure if that fits your needs, as you'd need to be able to store the LSN of "the last time you looked" so you can query the CDC tables properly. It lends itself more to data loads than to typical queries.
Assuming you can create a temporary table, the EXCEPT command seems to be what you need:
Copy your table into a temporary table.
The next time you look, select everything from your table EXCEPT everything from the temporary table, extract the keys you need from this
Make sure your temporary table is up to date again.
Note that your temporary table only needs to contain the keys you need. If this is just one column, you can go for a NOT IN rather than EXCEPT.
In my table I have so many duplicate records
SELECT ENROLMENT_NO_DATE, COUNT(ENROLMENT_NO_DATE) AS NumOccurrences
FROM Import_Master GROUP BY ENROLMENT_NO_DATE HAVING ( COUNT(ENROLMENT_NO_DATE) > 1 )
I need to remove duplicate record if it is occur second time... Need to keep first or any of one record. How can I do that?
You can use CTE to perform this task:
;with cte as
(
select ENROLMENT_NO_DATE,
row_number() over(partition by ENROLMENT_NO_DATE order by ENROLMENT_NO_DATE) rn
from Import_Master
)
delete from cte where rn > 1
See SQL Fddle with Demo
One method could be to create a secondary, temporary table
CREATE TABLE Import_Master_Deduped AS SELECT * FROM Import_Master WHERE FALSE;
This will create an empty table with identical structure to Import_Master. Now impose uniqueness on the new table with an index:
CREATE UNIQUE INDEX Import_Master_Ndx ON Import_Master_Deduped(ENROLMENT_NO_DATE);
Finally copy the table with duplicated records inside with INSERT IGNORE, so that duplicated records will not get inserted:
INSERT IGNORE INTO Import_Master_Deduped SELECT * FROM Import_Master;
At this point, after checking everything is OK, you can rename the two tables swapping their names (this will lose any old indexes), or TRUNCATE the Import_Master table and copy back the deduped records from the new table into the old.
In the second case, recreate the UNIQUE constraint on the old table to avoid further duplicates; in the first, recreate any old indexes on the new table.
Finally, you remove the table you don't need anymore.
I am trying to insert values from a table into another existing table and have just the values I am inserting be sorted in descending order based on a specific column while leaving the existing records at the top of the table. How do I do that? I have tried to use an Order By statement but whether I use the column name of the table I'm pulling from or the destination table's column name I get an error. Also this is being run in VBA using DoCmd.RunSQL.
Here is my existing query:
INSERT INTO AllMetersAvgRSSI
(longitude,latitude,AvgRSSI)
Select
Prem.longitude, Prem.latitude,
DataByColl.[Avg RSSI]
From [Prem]
Left
Join DataByColl ON (Prem.meter_miu_id
= DataByColl.[MIU ID])
Order BY [AvgRSSI] desc
Final Result
I continued to fiddle with this and discovered than you can use an order by just like I have shown above to do exactly as I was trying to do. The problem I was apparently having was caused by the names of the column I wanted sorted being changed only from Avg RSSI to AvgRSSI. When I changed the destination table to have the same field name as the source table it orders the incoming information while leaving the existing information alone. I also did a test where I changed the name of the destination table to AverageRSSI and it worked the same way. So in the end it was the names of the fields being differed only by a space that was causing the problem. The final Query is:
INSERT INTO AllMetersAvgRSSI
(longitude,latitude,[Avg RSSI])
Select
Prem.longitude, Prem.latitude,
DataByColl.[Avg RSSI]
From [Prem]
Left
Join DataByColl ON (Prem.meter_miu_id
= DataByColl.[MIU ID])
Order BY [Avg RSSI] desc
Ordering in an INSERT makes no sense from a database standpoint. How the database puts the rows into a table depends on the underlying physical structure of the table, not the order in which they are inserted.
Maybe your application relies on an auto incrementing column being in a certain order which would then be dependent on the order of insertion, but if that's the case then I would say that you've made a mistake in your database design as there shouldn't be business logic designed around an auto incrementing column.
Remove the ORDER BY from your INSERT statement and if you need to retrieve rows in a particular order later then use an ORDER BY there.
Create a temp table, add the first result set in the desired order. Insert your new values into the table, query the table to return your new results with an order by into your temp table, select your temp table the results will be in the order you added them unless you do another order by.
Don't forget to drop your temp table after displaying the results.
I want to run the following sql command:
ALTER TABLE `my_table` ADD UNIQUE (
`ref_id` ,
`type`
);
The problem is that some of the data in the table would make this invalid, therefore altering the table fails.
Is there a clever way in MySQL to delete the duplicate rows?
SQL can, at best, handle this arbitrarily. To put it another way: this is your problem.
You have data that currently isn't unique. You want to make it unique. You need to decide how to handle the duplicates.
There are a variety of ways of handling this:
Modifying or deleting duplicate rows by hand if the numbers are sufficiently small;
Running statements to update or delete duplicate that meet certain criteria to get to a point where the exceptions can be dealt with on an individual basis;
Copying the data to a temporary table, emptying the original and using queries to repopulate the table; and
so on.
Note: these all require user intervention.
You could of course just copy the table to a temporary table, empty the original and copy in the rows just ignoring those that fail but I expect that won't give you the results that you really want.
if you don't care which row gets deleted, use IGNORE:
ALTER IGNORE TABLE `my_table` ADD UNIQUE (
`ref_id` ,
`type`
);
What you can do is add a temporary identity column to your table. With that you can write query to identify and delete the duplicates (you can modify the query little bit to make sure only one copy from the set of duplicate rows are retained).
Once this is done, drop the temporary column and add unique constraint to your original column.
Hope this helps.
What I've done in the past is export the unique set of data, drop the table, recreate it with the unique columns and import the data.
It is often faster than trying to figure out how to delete the duplicate data.
There is a good KB article that provides a step-by-step approach to finding and removing rows that have duplicate values. It provides two approaches - a one-off approach for finding and removing a single row and a broader solution to solving this when many rows are involved.
http://support.microsoft.com/kb/139444
Here is a snippet I used to delete duplicate rows in one of the tables
BEGIN TRANSACTION
Select *,
rank() over (Partition by PolicyId, PlanSeqNum, BaseProductSeqNum,
CoInsrTypeCd, SupplierTypeSeqNum
order by CoInsrAmt desc) as MyRank
into #tmpTable
from PlanCoInsr
select distinct PolicyId,PlanSeqNum,BaseProductSeqNum,
SupplierTypeSeqNum, CoInsrTypeCd, CoInsrAmt
into #tmpTable2
from #tmpTable where MyRank=1
truncate table PlanCoInsr
insert into PlanCoInsr
select * from #tmpTable2
drop table #tmpTable
drop table #tmpTable2
COMMIT
This worked for me:
ALTER TABLE table_name ADD UNIQUE KEY field_name (field_name)
You will have to find some other field that is unique because deleting on ref_id and type alone will delete them all.
To get the duplicates:
select ref_id, type from my_table group by ref_id, type having count(*)>1
Xarpb has some clever tricks (maybe too clever): http://www.xaprb.com/blog/2007/02/06/how-to-delete-duplicate-rows-with-sql-part-2/