SQL Server partitioning and index strategy - sql

I'm looking for a partitioning solution. And, obviously, I'm interested in the performance side of it. Means, not to make it too bad while benefiting maintenance time.
My question is related to one of the tables that I have. And it looks similar to
Id bigint IDENTITY PRIMARY KEY CLUSTERED,
DatetimeFiled,
/*
a lot of other fields
*/
According to the data structure and usage, it's suitable to split the table into partitions by the DatetimeFiled (classic), because I have filters by date on this table.
But I do have filters by Id as well. Moreover, I have JOINs that use the Id field as a predicate, which now benefits from its uniqueness (https://www.brentozar.com/archive/2015/08/performance-benefits-of-unique-indexes/).
So, I decided to use Id ,DatetimeFiled as UNIQUE CLUSTERED INDEX.
But I doubt will it still benefit JOINs by Id field?
And is it ok to use that kind of field order, because I saw that partitioned field is often in the first place?

Using a trailing clustered index column as the partition column is a common and useful approach. You can find rows by Id by seeking the Clustered Index in each partition, and you can find rows by DatetimeFiled through partition elimination.

Related

More tables with the same clustered index, can I omit the PK index on the satellite table?

I have multiple tables storing call data, and they have the same clustered index: start_time (DATETIME). The base table is "calls", and I have a "calls_participants" and a "calls_other_data". All tables also have a call_id CHAR(36) column which identifies a call, so it should be indexed of course.
I am going to store a lot of rows (1 billion) and would like to save space and maintenance costs if possible, so my idea is to index the call_id column only on the base table, and so the other tables would not have any index apart from the CLUSTERED start_time index.
Then if I would have to access a row in the calls_other_data table based on the call_id, I would write something like this:
SELECT cod.some_column
FROM calls_other_data cod
WHERE cod.start_time = (SELECT start_time
FROM calls
WHERE call_id = '36-chars-unique-value')
AND cod.call_id = '36-chars-unique-value'
I would say the performance of this query is quite the same as if there was an index on calls_other_data.call_id, since the calls.call_id index can be used the same way: the start_time value is included automatically, so the same steps have to be performed by SQL Server:
Index seek on (either table).call_id to get the start_time
Clustered index seek on calls_other_data.start_time
I just never read about such design and would like to read other people's opinion about it :) Are you aware of any drawbacks?
Obviously, if a row is missing from the calls table, then it will be hard to look for it in the other tables, but that I do not mind.
Thanks :)
I see what you're trying to get at. calls_other_data would still carry both a call_id column and a start_time column, just like the calls table, but the calls_other_date.call_id column would not be indexed, because indexes come with a storage cost. That seems to be your thinking.
Something to note here is that since your clustered index is not unique on any of your tables, sql will make it unique by adding some additional data called a uniqueifier. So you already have extra storage here that you may not have considered, making your attempts to "optimize" storage somewhat moot.
I would advise against the approach. Storage is cheap, unique indexes are of great help to the optimizer, and indexes on foreign key columns (or foreign-key-like columns, if you don't actually have any referential integrity) are a good rule of thumb.

How to create trigger "ORDER BY"

I want to order my table Jogadores by Total value which I can achieve by writing this:
ALTER TABLE `Jogadores` ORDER BY `Total` DESC ;
My question is how do I use this as a trigger every time I edit a Total value or insert a new row?
I'm using phpMyAdmin
Thanks
I'm not sure I understand what you are trying to accomplish. I am going to assume you want your table sorted in that fashion for when you retrieve it. If that is true, you can sort it upon retrieval rather than altering the table.
SELECT *
FROM Jogadores
ORDER BY Total DESC;
Why? That's not usually how SQL databases work
What if some totals are the same? What is the second way to sort them?
It's also usually not a good idea to store a total in your operational db. Is this a data warehouse?
It would probably be easier to add an index on total and create a view to order by that column.
After reading the new comments, what you want is not doable. See this answer.
Original content:
As others have said, the physical order of the rows in the table doesn't matter since you can always sort when querying. If you're thinking of clustered indexes, then all you need to do is define your primary key properly.
The InnoDB term for a primary key index. InnoDB table storage is organized based on the values of the primary key columns, to speed up queries and sorts involving the primary key columns. For best performance, choose the primary key columns carefully based on the most performance-critical queries. Because modifying the columns of the clustered index is an expensive operation, choose primary columns that are rarely or never updated.
So a clustered index would achieve what you want, but it is probably not what you actually need. Just so you know, the clustered index speedups are practically zero if you're dealing with less than a million or so rows (rough estimate).

Searching for record(s) in a table that has over 200 Million Rows

Which type of index should be used on the table? It is initially inserted (one a month) into a empty table. I then place a non clustered composite index on two of the columns. Wondering if merging the two fields into one would increase performance when searching. Or does it not matter? Should I be working with an identity column that has a primary key clustered index?
You should index the field(s) most likely to be used in the where clause as people query the table. Don't worry about the primary key - it already has an index.
If you can define a unique primary key that can be used when querying the table, this will be used as the clustered index and will be the fastest for selects.
If your select query has to use the two fields you mentioned, keep them separate. Performance will not be impacted and the schema is not spoiled.
"A clustered index is particularly efficient on columns that are often searched for ranges of values. After the row with the first value is found using the clustered index, rows with subsequent indexed values are guaranteed to be physically adjacent."
With this in mind you probably won't see much benefit from haveing a clustered index on your primary key (ID) unless it have business meaning for your aplication. If you have a Date value that you are commonly querying, then it may make more sense to add a clustered index to that
select * from table where created > '2013-01-01' and created < '2013-02-01'
I have seen datawarehouses use a concatenated key approach. Whether this works for you depends on your queries. Obviously querying a single field value will be faster than multiple fields, particularly when there is one less lookup in the B-tree index.
Alternatively, if you have 200 million rows in a table you could look at breaking the data out into multiple tables if it makes sense to do so.
You're saying that you're loading all this data every month so I have to assume that all the data is relevant. If there was data in your table that is considered "old" and not relevant to searches, then you could move data out into a archive table (using the same schema) so your queries only run against "current" data.
Otherwise, you can look at a sharding approach as used by NoSQL like MongoDB. If MongoDB is not an option, you could achieve the same shard key like logic in your application. I doubt that your database SQL drivers will support sharding natively.

Issue with the big tables ( no primary key available)

Tabe1 has around 10 Lack records (1 Million) and does not contain any primary key. Retrieving the data by using SELECT command ( With a specific WHERE condition) is taking large amount of time. Can we reduce the time of retrieval by adding a primary key to the table or do we need to follow any other ways to do the same. Kindly help me.
A primary key does not have a direct affect on performance. But indirectly, it does. This is because when you add a primary key to a table, SQL Server creates a unique index (clustered by default) that is used to enforce entity integrity. But you can create your own unique indexes on a table. So, strictly speaking, a primary index does not affect performance, but the index used by the primary key does.
WHEN SHOULD PRIMARY KEY BE USED?
Primary key is needed for referring to a specific record.
To make your SELECTs run fast you should consider adding an index on an appropriate columns you're using in your WHERE.
E.g. to speed-up SELECT * FROM "Customers" WHERE "State" = 'CA' one should create an index on State column.
Primarykey will not help if you don't have Primarykey in where cause.
If you would like to make you quesry faster, you can create non-cluster index on columns in where cause. You may want include columns on top of your index(it depend on your select cause)
The SQL optimizer will seek on your indexs that will make your query faster.
(but you should think about when data adding in your table. Insert operation might takes time if you create index on many columns.)
It depends on the SELECT statement, and the size of each row in the table, the number of rows in the table, and whether you are retrieving all the data in each row or only a small subset of the data (and if a subset, whether the data columns that are needed are all present in a single index), and on whether the rows must be sorted.
If all the columns of all the rows in the table must be returned, then you can't speed things up by adding an index. If, on the other hand, you are only trying to retrieve a tiny fraction of the rows, then providing appropriate indexes on the columns involved in the filter conditions will greatly improve the performance of the query. If you are selecting all, or most, of the rows but only selecting a few of the columns, then if all those columns are present in a single index and there are no conditions on columns not in the index, an index can help.
Without a lot more information, it is hard to be more specific. There are whole books written on the subject, including:
Relational Database Index Design and the Optimizers
One way you can do it is to create indexes on your table. It's always better to create a primary key, which creates a unique index that by default will reduce the retrieval time .........
The optimizer chooses an index scan if the index columns are referenced in the SELECT statement and if the optimizer estimates that an index scan will be faster than a table scan. Index files generally are smaller and require less time to read than an entire table, particularly as tables grow larger. In addition, the entire index may not need to be scanned. The predicates that are applied to the index reduce the number of rows to be read from the data pages.
Read more: Advantages of using indexes in database?

Clustered index dilemma - ID or sort?

I have a table with two very important fields:
id INT identity(1,1) PRIMARY KEY
identifiersortcode VARCHAR(900)
My app always sorts and pages search results in the UI based on identifiersortcode, but all table joins (and they are legion) are on the id field. (Aside: yes, the sort code really is that long. There's a strong BL reason.)
Also, due to O/RM use, most SELECT statements are going to pull almost every column.
Currently, the clustered index is on id, but I'm wondering if the TOP / ORDER BY portion of most queries would make identifiersortcode a more attractive option as the clustered key, even considering all of the table joins going on.
Inserts on the table and changes to the identifiersortcode are limited enough that changing my clustered index would be a problem for insert/update operations.
Trying to make the sort code's non-clustered index a covering index (using INCLUDE) is not a good option. There are a number of large columns, and some of them have a lot of update activity.
Kimberly L. Tripp's criteria for a clustered index are that it be:
Unique
Narrow
Static
Ever Increasing
Based on that, I'd stick with your integer identity id column, which satisfies all of the above. Your identifiersortcode would fail most, if not all, of those requirements.
To correctly determine which field will benefit most from the clustered index, you need to do some homework. The first thing that you should consider is the selectivity of your joins. If your execution plans filter rows from this table FIRST, then join on the other tables, then you are not really benefiting from having the clustered index on the primary key, and it makes more sense to have it on the sort key.
If however, your joins are selective on other tables (they are filtered, then an index seek is performed to select rows from this table), then you need to compare the performance of the change manually versus the status quo.
Currently, the clustered index is on id, but I'm wondering if the TOP / ORDER BY portion of most queries would make identifiersortcode a more attractive option as the clustered key, even considering all of the table joins going on.
Making identifiersortcode a CLUSTERED KEY will only help if it is used both in filtering and ordering conditions.
This means that it is chosen a leading table in all your joins and uses Clustered Index Scan or Clustered Index Range Scan access path.
Otherwise, it will only make the things worse: first, all secondary indexes will be larger in size; second, inserts in non-increasing order will result in page splits which will make them run longer and result in a larger table.
Why, for God's sake, does your identifier sort code need to be 900 characters long? If you really need 900 characters to be distinct for sorting, it should probably be broken up into multiple fields.
Appart from repeating what Chris B. said, I think you should really stick to your current PK, since - as you said - all joins are on the Id.
I guess you already have indexed the identifiersortcode....
Nevertheless, IF you have performance issues, would reaaly think twice about this ##"%$£ identifiersortcode !-)