Methods to Accelerate Read From a Large Table

Methods to Accelerate Read From a Large Table - sql

I try to log every thing in SQL, so think to add a table named log and add every things in it, the log table is:
ID UNIQUEIDENTIFIER -- PK
LogDate DATETIME PK
IP NVARCHAR
Action NVARCHAR
Info XML
UniqueID BIGINT
I log every things like: login, check permission, see pages, access object and .. to this table
Then I figured also need Some Log-Restore Implementations, So some log records are restorable, some of them not, The Log table have about 8 millions Records, but the restorable records are about 200 thousands, So every time we need to restore, need to select on 8 millions, then I decide to add new Table and Add restorable logs
to this new table: log_restore:
ID UNIQUEIDENTIFIER
LogDate DATETIME
IP NVARCHAR
Action NVARCHAR
Info XML
UniqueID BIGINT -- PK
OK when I need To log every thing is fine.
But when I need to see logs: The procedure get all records from log table and merge(union) them with log_restore table.
So I need to accelerate this procedure with no effect on insert (means do not slower that), this is my ideas:
When Add record to log_restore Add it to log table also (So in select no need to union)
Create view with this select command
Add Simple DataType Columns instead of XML
Add Clustered PK on simple DataType Column Like BIGINT
What are your ideas? any suggestion?

In general, one should try using as little space as possible; it greatly helps reducing disk seeks when executing a query. And comparing smaller datatypes always require less time!
The following tunings can be made on the columns:
use non-nullable columns (decreases storage space, diminishes the number of tests)
store LogDate in the form of a timestamp (UNSIGNED INT, 4 bytes) instead of DATETIME (8 bytes)
IP address shouldn't be stored as a NVARCHAR; if you are storing IPv4 adresses, 4 bytes would be enough (BINARY(4)). IPv6 support requires 18 bytes (VARBINARY(16)). Meanwhile, NVARCHAR would require 30 bytes for IPv4, and 78 bytes for IPv6... (search the web for inet_ntoa, inet_aton, inet_ntop, inet_pton to learn how to switch between binary and string representation of the adresses)
instead of storing similar data in two separate tables, add a Restorable flag column of type BIT indicating whether a log entry can or cannot be restored
your idea about the Info column is right: it would be better to use a TEXT or NTEXT data type
instead of using a NVARCHAR type for Action, you could consider having an Action table containing all the possible actions (assuming they are in finite number), and referencing them with an integer foreign key (the smaller the int, the better)
Index optimization is very important too. Use an index on multiple columns if your query tests multiple column at the same time. For example, if you select all the restorable rows corresponding to a specific IP over a certain range of time, this would greatly enhance the speed of the query:
CREATE NONCLUSTERED INDEX IX_IndexName ON log (Restorable ASC, IP ASC, LogDate ASC)
If you need to retrieve all the restorable rows from an IP adress corresponding to a specific action, over a given range of time, the index should be chosen as such:
CREATE NONCLUSTERED INDEX IX_IndexName ON log (IP ASC, Action ASC, LogDate ASC)
Etc.
To be honest, I would actually need to see your full SQL query in order to do proper optimization...

Options for table enhancements:
Add a Restorable bit null column and create a filtered index on it.
'XML' data type is a LOB data type and is stored outside the row. If you are not using any of XML data type methods, then you do not need it. It does hamper your performance a lot. Add and XML_code varchar () null column and copy all data from your XML column.
Choose the length of the column so to keep maximum row size (total max size of all columns) less than 8Kb. Varchar (MAX) column may be stored in row if row fits 8kb. So if you have significant number of short XMLs, then VARCHAR (MAX) can help.
If you are not working with Unicode data, then change all NVARCHAR to VARCHAR columns
Use a UNION ALL with a Where cluse to filter duplicates instead of just UNION
UNIQUEIDENTIFIER column does not help you here. If two records cannot have same datetime (or maybe datetime2) value, then it can you unique ID on its own. Alternatively consider changintIDcolumn tointas you order byint` in sensible manner.
Re your thoughts: number (4) will not help. Create indexes on both tables, that follow Where clauses and JOIN columns.
Make several iterations: Simplify data types - Check performance. Create index(es) - Check again. And so on. There should be a balance between minimizing space used and usability. You may want to keep some data in text, rather encoding to int or binary.
Use Profiler or Tuning adviser to determine bottlenecks and improvement opportunities.

I'm going to make a wild guess that the property that identifies whether something is "restorable" or not is the Action column. If that is the case, then partition the table by that column and forget about the log_restore table.
MSDN - Partitioned Table and Index Concepts

The first thing you have to be worried is the memory of the machine, How much the server has? then you should compare with your database size, or maybe just the size of the table you are working on. If the memory is to low compared with the size of the table, then you have to add more memory to the server. Thats the first thing you have do.
A Sysadmin’s Guide to Microsoft SQL Server Memory

Related

Best type for ID column in Activity Log table

I need to choose a type for ID column in Activity Log table which would look like this:
ID
Type
UserID
User Logged in
UUID1
Something happened
UUID2
Any thoughts or recommendations?

You basically have three choices for an auto-generated id:
int (typically 4 bytes)
bigint (typically 8 bytes)
universal unique identifier (UUID) of some sort, depending on the database
If you don't expect your table to grow into billions of rows, then use int. If your table can get really big (think millions of inserts per day), then use bigint. Both of these preserve insertion order, which can be quite convenient.
If you want to anonymize the inserts, then use a UUID. Do note that this takes up more space. And because the values are not (necessarily) generated in order they can cause table or index fragmentation.
I think it is fair to say that the most typical type would be int. Also note that you may be using a database that has other options.

Adding Column To Table In Sql Server Take Long Time

I want to add a new column to a Sql Server table with about 20,000 rows using this query
ALTER TABLE [db].[dbo].[table]
ADD entry_date date;
the query result still working, until now it takes 10 minute and still working, Is it normal !

10 minutes seems like a long time, but adding a column takes time.
Why? Adding a column (generally) requires changing the layout of the table on the pages where the data is stored. In your case, the entry date is 4 bytes, so every row needs to be expanded by 4 bytes -- and fit on one data page.
Adding columns generally requires rewriting all data.
Also note that while this is occurring, locks and transactions are also involved. So, if other activity on the server is accessing the table (particularly data changes), then that also affects performance.
Oh, did I mention that indexes are involved as well? Indexes care about the physical location of data.
All that said, 10 minutes seems like a long time for 20,000 rows. You might find that things go much faster if you do:
Create a new table with the columns you want -- but no indexes except for the primary key or clustered index.
Insert the data from the existing table.
Add whatever indexes and triggers you want.
Of course, you need to be careful if you have an identity column.

Alter Column Size in Large SQL Server database causes issues

I have a table with more than 600 million records and a NVARCHAR(MAX) column.
I need to change the column size to fixed size but I usually have two options with this large DB:
Change it using the designer in SSMS and this takes a very long time and perhaps never finishes
Change it using Alter Column, but then the database doesn't work after that, perhaps because we have several indexes
My questions would be: what are possible solutions to achieve the desired results without errors?
Note: I use SQL Server on Azure and this database is in production
Thank you
Clarification: I actually have all the current data within the range of the new length that I want to put

Never did this on Azure, but did it with hundred million rows on SQL Server.
I'll add a new column of the desired size allowing null values of course and creating the desired index(es).
After that, I'll update in chunks this new column trimming / converting old column values to new one. After all, remove indexes on the old column and removing it.

Change it using the designer in SSMS and this takes a very long time
and perhaps never finishes
What SSMS will do behind the scenes is
create a new table with the new data type for your column
copy all the data from original table to newe table
rename new table with the old name
drop old table
recreate all the indexes on new table
Of course it will take time to copy all of your "more than 600 million records".
Besides, the whole table will locked for the duration of data loading with (holdlock, tablockx)
Change it using Alter Column, but then the database doesn't work after
that, perhaps because we have several indexes
That is not true.
If there is any index involved, and it can be only indexes that have that field ad included column beacuse of its datatype nvarchar(max), server will give you an error and will do nothing until you drop that index. And in case there is no index affected it just cannot "doesn't work after that, perhaps because we have several indexes".
Please think one more time what you want to achieve changing the type of that column. If you think you'll gain some space doing so it's wrong.
After passing from nvarchar(max) to nchar(max) (from variable length type to fixed length type) your data will occupy more space than now because you are going to store fixed number of bytes even if the column has null or 1-2-3 characters.
And if you just want to change max to smth else, for example 8000, you gain nothing because this is not the real data size but only the maximum size the data can have. And if your strings size is small enough, your nvarchar(max) is already stored as in-row data, not as LOB data as it was with ntext

Adding an index to a large table

I have a large table (~40 mill records) in SQL Server 2008 R2 that has high traffic on it (constantly growing, selected and edited...)
Up to now I was accessing rows on this table by its id (simple identity key), I have a column let's call it GUID, that is unique for most of the rows but some of the rows has the same value for that column.
That GUID column is nvarchar(max) and the table contains about 10 keys and constrains, index just on the simple identity key column.
I want to set an index on this column without causing anything to crash or making the table unavailable.
How can I do so ?
Please keep in mind this is a large table that has high traffic on, and it must stay online and available
Thanks

Well, the answer to this one is easy (but you probably won't like it): You can't.
SQL Server requires the index key to be less then 800 bytes. It also requires the key to be stored "in-row" at all times. As a NVARCHAR(MAX) column can grow significantly larger then 800 bytes (up to 2GB) and is also most often stored outside of the standard row-data-pages SQL Server does not allow an index key to include a NVARCHAR(MAX) column.
One option you have is to make this GUID column an actual UNIQUEIDENTIFIER datatype (or at least a CHAR(32). Indexing GUIDs is still not recommended because they cause high fragmentation, but at least with that it is possible. However, that is not a quick nor simple thing to do and if you need the table to stay online during this change, I strongly recommend you get outside help.

How do I add a column to large sql server table

I have a SQL Server table in production that has millions of rows, and it turns out that I need to add a column to it. Or, to be more accurate, I need to add a field to the entity that the table represents.
Syntactically this isn't a problem, and if the table didn't have so many rows and wasn't in production, this would be easy.
Really what I'm after is the course of action. There are plenty of websites out there with extremely large tables, and they must add fields from time to time. How do they do it without substantial downtime?
One thing I should add, I did not want the column to allow nulls, which would mean that I'd need to have a default value.
So I either need to figure out how to add a column with a default value in a timely manner, or I need to figure out a way to update the column at a later time and then set the column to not allow nulls.

ALTER TABLE table1 ADD
newcolumn int NULL
GO
should not take that long... What takes a long time is to insert columns in the middle of other columns... b/c then the engine needs to create a new table and copy the data to the new table.

I did not want the column to allow nulls, which would mean that I'd need to have a default value.
Adding a NOT NULL column with a DEFAULT Constraint to a table of any number of rows (even billions) became a lot easier starting in SQL Server 2012 (but only for Enterprise Edition) as they allowed it to be an Online operation (in most cases) where, for existing rows, the value will be read from meta-data and not actually stored in the row until the row is updated, or clustered index is rebuilt. Rather than paraphrase any more, here is the relevant section from the MSDN page for ALTER TABLE:
Adding NOT NULL Columns as an Online Operation
Starting with SQL Server 2012 Enterprise Edition, adding a NOT NULL column with a default value is an online operation when the default value is a runtime constant. This means that the operation is completed almost instantaneously regardless of the number of rows in the table. This is because the existing rows in the table are not updated during the operation; instead, the default value is stored only in the metadata of the table and the value is looked up as needed in queries that access these rows. This behavior is automatic; no additional syntax is required to implement the online operation beyond the ADD COLUMN syntax. A runtime constant is an expression that produces the same value at runtime for each row in the table regardless of its determinism. For example, the constant expression "My temporary data", or the system function GETUTCDATETIME() are runtime constants. In contrast, the functions NEWID() or NEWSEQUENTIALID() are not runtime constants because a unique value is produced for each row in the table. Adding a NOT NULL column with a default value that is not a runtime constant is always performed offline and an exclusive (SCH-M) lock is acquired for the duration of the operation.
While the existing rows reference the value stored in metadata, the default value is stored on the row for any new rows that are inserted and do not specify another value for the column. The default value stored in metadata is moved to an existing row when the row is updated (even if the actual column is not specified in the UPDATE statement), or if the table or clustered index is rebuilt.
Columns of type varchar(max), nvarchar(max), varbinary(max), xml, text, ntext, image, hierarchyid, geometry, geography, or CLR UDTS, cannot be added in an online operation. A column cannot be added online if doing so causes the maximum possible row size to exceed the 8,060 byte limit. The column is added as an offline operation in this case.

The only real solution for continuous uptime is redundancy.
I acknowledge #Nestor's answer that adding a new column shouldn't take long in SQL Server, but nevertheless, it could still be an outage that is not acceptable on a production system. An alternative is to make the change in a parallel system, and then once the operation is complete, swap the new for the old.
For example, if you need to add a column, you may create a copy of the table, then add the column to that copy, and then use sp_rename() to move the old table aside and the new table into place.
If you have referential integrity constraints pointing to this table, this can make the swap even more tricky. You probably have to drop the constraints briefly as you swap the tables.
For some kinds of complex upgrades, you could completely duplicate the database on a separate server host. Once that's ready, just swap the DNS entries for the two servers and voilà!
I supported a stock exchange company
in the 1990's who ran three duplicate
database servers at all times. That
way they could implement upgrades on
one server, while retaining one
production server and one failover
server. Their operations had a
standard procedure of rotating the
three machines through production,
failover, and maintenance roles every
day. When they needed to upgrade
hardware, software, or alter the
database schema, it took three days to
propagate the change through their
servers, but they could do it with no
interruption in service. All thanks
to redundancy.

"Add the column and then perform relatively small UPDATE batches to populate the column with a default value. That should prevent any noticeable slowdowns"
And after that you have to set the column to NOT NULL which will fire off in one big transaction. So everything will run really fast until you do that so you have probably gained very little really. I only know this from first hand experience.
You might want to rename the current table from X to Y. You can do this with this command sp_RENAME '[OldTableName]' , '[NewTableName]'.
Recreate the new table as X with the new column set to NOT NULL and then batch insert from Y to X and include a default value either in your insert for the new column or placing a default value on the new column when you recreate table X.
I have done this type of change on a table with hundreds of millions of rows. It still took over an hour, but it didn't blow out our trans log. When I tried to just change the column to NOT NULL with all the data in the table it took over 20 hours before I killed the process.
Have you tested just adding a column filling it with data and setting the column to NOT NULL?
So in the end I don't think there's a magic bullet.

select into a new table and rename. Example, Adding column i to table A:
select *, 1 as i
into A_tmp
from A_tbl
//Add any indexes here
exec sp_rename 'A_tbl', 'A_old'
exec sp_rename 'A_tmp', 'A_tbl'
Should be fast and won't touch your transaction log like inserting in batches might.
(I just did this today w/ a 70 million row table in < 2 min).
You can wrap it in a transaction if you need it to be an online operation (something might change in the table between the select into and the renames).

Another technique is to add the column to a new related table (Assume a one-to-one relationship which you can enforce by giving the FK a unique index). You can then populate this in batches and then you can add the join to this table wherever you want the data to appear. Note I would only consider this for a column that I would not want to use in every query on the original table or if the record width of my original table was getting too large or if I was adding several columns.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas