I have a SQL Server 2000 table in a production environment with about 80 million rows. I need to add a nullable bit column to the table. While adding a column with null value to a production table is a quick operation with just a schema update, I also need to add an index on that column.
Will the table/server lock up when I add the index? Is there a way of achieving this with the least possible impact on performance?
Thanks
You should always test your changes in a staging environment before updating production, and that environment should be a realistic representation of your production environment. If that is not possible for some reason, then make the change after business hours or during non-peak hours.
Adding an index on a bit column is generally not a good idea. Indexes work best when the column has a large sampling of possible values. 1,0, or null for 80 million rows will produce a large index that doesn't do much good, so to answer your question, the presence of this index will have a negative impact on server performance.
Related
I am a newbie in Postgres.
We have implemented SCD type-2 in our project using Postgres. The input file is a full refresh file with approximately 30 million records daily.
Account number is the key column.
The approximate number of new records will be 20K/day.
If a record is missing from the source, then that record is closed with an End date in the target. The approximate number of records being closed- 10k/day
The run time for the query is increasing steadily. Will indexing help speed up the process?
Any suggestion on the index to be used?
Are those 30 million records stored as each row in the database? Well if it is so, then indexing (creating and maintaining) that many records will also be a burden to the database to some extent. However, there's this new index that PostgreSQL has introduced called BRIN Index which might help you a bit. I had written a blog about this some months ago. You can have a look over it and obviously research over it more.
http://blog.bajratechnologies.com/2016/09/16/Postgres-BRIN-Index/
You'll have too look at the execution plans of the slow queries to be able to determine if indexes will help and what indexes you should create.
The correct index often helps a lot with a query, and with a read-only database you can create as many as you need.
You should make sure that any indexes are created after you load the table, since indexes slow down insert a lot. Either drop and recreate the table before the daily load or truncate and drop all the indexes.
I have a simple table that contains a varchar(100). I am trying to populate it with 1 billion unique records. I have a stored proc that takes a table type parameter containig 1000 records at a time and inserts it into the table while checking no duplicate exists. After about 50 million the performance goes down. I tried sharding the table and using the sql table partitioning with balanced distribution but no gain was observed.
How can i build this solution in sql with reasonable performance?
You might want to try de-duping the data before you put it into the database, then disabling the unique key while inserting so you don't have to deal with rebuilding it as you go.
I have a table in Oracle 11g R2 that contains approximately 90,000 spatial (geographic) records. Hundreds of the records are duplicated due to bad practice of the users.
Is there anyway to measure the performance of the database/table before and after removing the duplicates?
A table with 90000 records is quite little table. Hundreds of duplicates is less then 1% - it is also quite little amount of a "garbage". This amount can't make big performance problems (if your application have a good design). I don't think that you can create tests that shows any significant difference in performance between "before" and "after".
Also you can delete duplicates and then create unique constraint to prevent such situation in future.
One way to measure global performance of an Oracle database is via the facilities of the Grid Control (aka Enterprise Manager) that shows a number of measurements (CPU, IOs, memory, etc).
Another way is to run some typical queries in sqlplus (with SET TIMING ON) and compare their response times before the removal and after the removal. That is assuming that by "performance" you mean the elapsed time for those queries.
Like Dmitry said 90,000 rows is a very small table, with a tiny fraction of duplicate rows. The presence or absence of those duplicates is unlikely to make any noticeable difference.
i, create a temp table from the source table(with the indexes of
course)
ii, after it delete the duplicated rows from the temp table (or the source, its egal)
iii, see the explain
plans both of these tables and you will get the answer
i have a huge table (200mln records). about 70% is not need now (there is column ACTIVE in a table and those records have value N ). There are a lot of multi-column indexes but none of them includes that column. Will removing that 70% records improve SELECT (ACTIVE='Y') performance (because oracle has to read table blocks with no active records and then exclude them from final result)? Is shrink space necessary?
It's really impossible to say without knowing more about your queries.
At one extreme, access by primary key would only improve if the height of the supporting index was reduced, which would probably require deletion of the rows and then a rebuild of the index.
At the other extreme, if you're selecting nearly all active records then a full scan of the table with 70% of the rows removed (and the table shrunk) would take only 30% of the pre-deletion time.
There are many other considerations -- selecting a set of data and accessing the table via indexes, and needing to reject 99% of rows after reading the table because it turns out that there's a positive correlation between the required rows and an inactive status.
One way of dealing with this would be through list partitioning the table on the ACTIVE column. That would move inactive records to a partition that could be eliminated from many queries, with no need to index the column, and would keep the time for full scans of active records down.
If you really do not need these inactive records, why do you just not delete them instead of marking them inactive?
Edit: Furthermore, although indexing a column with a 70/30 split is not generally helpful, you could try a couple of other indexing tricks.
For example, if you have an indexed column which is frequently used in queries (client_id?) then you can add the active flag to that index. You could also construct a partial index:
create index my_table_active_clients
on my_table (case when active = 'Y' then client_id end);
... and then query on:
select ...
from ...
where (case when active = 'Y' then client_id end) = :client_id
This would keep the index smaller, and both indexing approaches would probably be helpful.
Another edit: A beneficial side effect of partitioning could be that it keeps the inactive records and active records "physically" apart, and every block read into memory from the "active" partition of course only has active records. This could have the effect of improving your cache efficiency.
Partitioning, putting the active='NO' records in a separate partition, might be a good option.
http://docs.oracle.com/cd/B19306_01/server.102/b14223/parpart.htm
Yes it will most likely. But depending on your access schema the increase will most likely not as big. Setting an index including the column would be a better solution for future IMHO.
Most probably no. Delete will not reduce the size of the table's segment. Additional maintenance might help. After the DELETE execute also:
ALTER TABLE <tablename> SHRINK SPACE COMPACT;
ALTER INDEX <indexname> SHRINK SPACE COMPACT; -- for every table's index
Alternatively you can use old school approach:
ALTER TABLE <tablename> MOVE;
ALTER INDEX <indexnamename> REBUILD;
When delting 70% of table also consider possible solution CTAS (create table as select). It will be much faster.
Indexing plays a vital role in SELECT query. The performance will drastically increase
if you use those indexed columns in the query. Ya deleting rows will enhance the performance
for sure somewhat but not drastically.
I have a large table (~40 mill records) in SQL Server 2008 R2 that has high traffic on it (constantly growing, selected and edited...)
Up to now I was accessing rows on this table by its id (simple identity key), I have a column let's call it GUID, that is unique for most of the rows but some of the rows has the same value for that column.
That GUID column is nvarchar(max) and the table contains about 10 keys and constrains, index just on the simple identity key column.
I want to set an index on this column without causing anything to crash or making the table unavailable.
How can I do so ?
Please keep in mind this is a large table that has high traffic on, and it must stay online and available
Thanks
Well, the answer to this one is easy (but you probably won't like it): You can't.
SQL Server requires the index key to be less then 800 bytes. It also requires the key to be stored "in-row" at all times. As a NVARCHAR(MAX) column can grow significantly larger then 800 bytes (up to 2GB) and is also most often stored outside of the standard row-data-pages SQL Server does not allow an index key to include a NVARCHAR(MAX) column.
One option you have is to make this GUID column an actual UNIQUEIDENTIFIER datatype (or at least a CHAR(32). Indexing GUIDs is still not recommended because they cause high fragmentation, but at least with that it is possible. However, that is not a quick nor simple thing to do and if you need the table to stay online during this change, I strongly recommend you get outside help.