My scenario is like I have thousands of row in a table. And the table has 8 columns. I would like to check whether the exact same row exists in the table. My requirement I shouldn't insert the same row in the table twice. Obviously, to do so I have to compare every row before any new insertion. I don't have much SQL expertise, so I don't know what is the best way to so that I can gain maximum efficiency. Please advice me.
A simple option is to create unique index that contains all those columns, e.g.
create unique index ui1_your_table on your_table (col1, col2,..., col8);
Database would take care about the rest.
Related
It is possible to maintain a table of a database organized alphabetically
through triggers whenever you insert a new row like this:
INSERT INTO Software (name_software) VALUES ('linux');
name_software
1 windows
2 CAD
name_software
1 CAD
2 linux
3 windows
I am using the sybase central. I apologize if my post seems very inconsistent tried to explain in the simplest way.
Thank you.
The order of rows in a table (physically in the database) is decided by a clustered index. Put one on the name_software column and that's it.
But
1) you really don't "need" to sort the data in the table physically like this. It is a database... :) You can sort it by a query.
2) clustered index is most often on primary key and there can of course be only one on a table...
You want to re-order the entire table (and re-seed the identity column for that table) each time you insert (or update) a record?
Why can't you include ORDER BY ASC in your query when retrieving data instead?
I have a table people with less than 100,000 records and I have taken a backup of this table using the following:
create table people_backup as select * from people
I add some new records to my people table over time, but eventually I want to merge the records from my backup table into people. Unfortunately I cannot simply DROP my table as my new records will be lost!
So I want to update the records in my people table using the records from people_backup, based on their primary key id and I have found 2 ways to do this:
MERGE the tables together
use some sort of fancy correlated update
Great! However, both of these methods use SET and make me specify what columns I want to update. Unfortunately I am lazy and the structure of people may change over time and while my CTAS statement doesn't need to be updated, my update/merge script will need changes, which feels like unnecessary work for me.
Is there a way merge entire rows without having to specify columns? I see here that not specifying columns during an INSERT will direct SQL to insert values by order, can the same methodology be applied here, is this safe?
NB: The structure of the table will not change between backups
Given that your table is small, you could simply
DELETE FROM table t
WHERE EXISTS( SELECT 1
FROM backup b
WHERE t.key = b.key );
INSERT INTO table
SELECT *
FROM backup;
That is slow and not particularly elegant (particularly if most of the data from the backup hasn't changed) but assuming the columns in the two tables match, it does allow you to not list out the columns. Personally, I'd much prefer writing out the column names (presumably those don't change all that often) so that I could do an update.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
select bottom rows in natural order
People imagine that i have this table :
persons
columns of the table are NAME and ID
and i insert this
insert into persons values ('name','id');
insert into persons values ('John','1');
insert into persons values ('Jack','3');
insert into persons values ('Alice','2');
How can i select this information order by the insertion? My query would like :
NAME ID
name id
John 1
Jack 3
Alice 2
Without indexs (autoincrements), it's possible?
I'm pretty sure its not. From my knowldege sql data order is not sequetional with respect to insertion. The only idea I have is along with each insertion have a timestamp and sort by that time stamp
This is not possible without adding a column or table containing a timestamp. You could add a timestamp column or create another table containing IDs and a timestamp and insert in to that at the same time.
You cannot have any assumptions about how the DBMS will store data and retrieve them without specifying order by clause. I.e. PostgreSQL uses MVCC and if you update any row, physically a new copy of a row will be created at the end of a table datafile. Using a plain select causes pg to use sequence scan scenario - it means that the last updated row will be returned as the last one.
I have to agree with the other answers, Without a specific field/column todo this... well its a unreliable way... While i have not actually ever had a table without an index before i think..
you will need something to index it by, You can go with many other approaches and methods... For example, you use some form of concat/join of strings and then split/separate the query results later.
--EDIT--
For what reason do you wish not to use these methods? time/autoinc
Without storing some sort of order information during insert, the database does not automatically keep track of every record ever inserted and their order (this is probably a good thing ;) ). Autoincrement cannot be avoided... even with timestamp, they can hold same value.
I am making a table that will be borrowing a couple of columns from another table, but not all of them. Right now my new table doesn't have anything in it. I want to add X number of empty tuples to my table so that I can then begin adding data from my old table to my new table via update statements.
Is that the best way to do it? What is the syntax for adding empty rows? Thanks
Instead of inserting nulls and then updating them, cant you just insert data from the other table directly. using something like this -
INSERT INTO Table1 (col1, col2, col3)
SELECT column1, column2, column3
FROM Table2
WHERE <some condition>
If you still want to insert empty records, then you will have to make sure that all your columns allow nulls. Then you can use something like this -
Table1
PrimaryKey_Col | col1 | col2 | col3
Insert INTO Table1 (PrimaryKey_col) values (<some value>)
This will make sure your a new row is inserted with a primary key and the rest of the columns are nulls. And these records can be updated later.
No, this is not even a good way, let alone the best way.
If you look at it conceptually adding empty rows serves no purpose.
In databases each row of a table corresponds to a true statement (fact). Adding a row with all NULLs (even if possible) records nothing and represents inconsistent data on its own. Especially having multiple empty records.
Also, if you are even able to add a row with all NULLs to a table that's an indication that you have no data integrity rules for the row so and that's mostly what databases are about - integrity rules and quality of data. A well designed database should not accept contradictory or meaningless data (and empty row is meaningless)
As Pavanred answered inserting real data is a single command, so there is no benefit in making it two or more commands.
I have a table, and there is no column which stores a field of when the record/row was added. How can I get the latest entry into this table? There would be two cases in this:
Loop through entire table and get the largest ID, if a numeric ID is being used as the identifier. But this would be very inefficient for a large table.
If a random string is being used as the identifier (which is probably very, very bad practise), then this would require more thinking (I personally have no idea other than my first point above).
If I have one field in each row of my table which is numeric, and I want to add it up to get a total (so row 1 has a field which is 3, row 2 has a field which is 7, I want to add all these up and return the total), how would this be done?
Thanks
1) If the id is incremental, "select max(id) as latest from mytable". If a random string was used, there should still be an incremental numeric primary key in addition. Add it. There is no reason not to have one, and databases are optimized to use such a primary key for relations.
2) "select sum(mynumfield) as total from mytable"
for the last thing use a SUM()
SELECT SUM(OrderPrice) AS OrderTotal FROM Orders
assuming they are all in the same column.
Your first question is a bit unclear, but if you want to know when a row was inserted (or updated), then the only way is to record the time when the insert/update occurs. Typically, you use a DEFAULT constraint for inserts and a trigger for updates.
If you want to know the maximum value (which may not necessarily be the last inserted row) then use MAX, as others have said:
SELECT MAX(SomeColumn) FROM dbo.SomeTable
If the column is indexed, MSSQL does not need to read the whole table to answer this query.
For the second question, just do this:
SELECT SUM(SomeColumn) FROM dbo.SomeTable
You might want to look into some SQL books and tutorials to pick up the basic syntax.