How to create an index for a string column in sql? - sql

I have a table with 3 columns: a list id, name and numeric value.
The goal is to use the table to retrieve and update the numeric value for a name in various lists.
The problem is that sql refuses to create an index with the name column because it's a string column with variable length.
Without an index selecting with a name will be inefficient, and the option of using a static length text column will be wasting a lot of storage space as names can be fairly long.
What is the best way to build this table and it's indexes?
(running sql server 2008)

If your string is longer than 900 bytes, then it can't be an index key, regardless of whether it is variable or fixed length.
One idea would be to at least make seeks more selective by adding a computed column. e.g.
CREATE TABLE dbo.Strings
(
-- other columns,
WholeString VARCHAR(4000),
Substring AS (CONVERT(VARCHAR(10), WholeString) PERSISTED
);
CREATE INDEX ss ON dbo.Strings(Substring);
Now when searching for a row to update, you can say:
WHERE s.Substring = LEFT(#string, 10)
AND s.WholeString = #string;
This will at least help the optimizer narrow its search down to the index pages where the exact match is most likely to live. You may want to experiment with that length as it depends on how many similar strings you have and what will best help the optimizer weed out a single page. You may also want to experiment with including some or all of the other columns in the ss index, with or without using the INCLUDE clause (whether this is useful will vary greatly on various factors such as what else your update query does, read/write ratio, etc).

A regular index can't be created on ntext or text columns (i guess your name column is of that type, or (n)varchar longer than 900 bytes). You can create full-text index on that column type.

Related

What is a better schema for indexing: a combined varchar column or several integer columns?

I want to make my table schema better. This table will insert a record per microsecond.
The table is already too big, so I could not test the table itself.
Current setup (columns id, name, one, two, three):
SELECT *
FROM table
WHERE name = 'foo'
AND one = 1
AND two = 2
AND three = 3;
Maybe in the future (columns id, name, path):
SELECT *
FROM table
WHERE
name = 'foo'
AND path = '1/2/3';
If I change three integer columns to one varchar column, will the SQL run faster than now?
Using PostgreSQL
varchar length will 5~12.
I think I can use bigint with zerofill (1/2/3 to 1000010200003) which may be faster than varchar.
Premature optimization is the root of all evil.
If you have a fixed number of integers, or at least a reasonable upper limit, stick with having an individual column for each.
You would then use a combined index over alk columns, ideally with the not nullable and selective columns first.
If you want to optimize, use smallint which only takes up two bytes.
If I change three integer columns to one varchar column, will the SQL run faster than now?
Not noticeably so. You might produce some small impacts on performance, balancing things such as:
Are the string columns bigger or smaller than the integer keys (resulting in marginal bigger or smaller data pages and indexes)?
Is an index on two variable length strings less efficient than an index on on variable length string and three fixed length keys?
Do the results match what you need or is additional processing needed after you fetch a record?
In either case the available index is going to be used to find the row(s) that match the conditions. This is an index seek, because the comparisons are all equality. Postgres will then go directly to the rows you need. There is a lot of work going on beyond just the index comparisons.
You are describing 1,000,000 inserts per second or 84 millions inserts each day -- that is a lot. Under such circumstances, you are not using an off-the-shelf instance of Postgres running on your laptop. You should have proper DBA support to answer a question like this.

use string value like numeric value in Oracle

I have a condition in my oracle query:
AND a.ACCARDACNT > '0880080200000006' and a.ACCARDACNT < '0880080200001000'
type of ACCARDACNT column in table is varchar and indexed but in that condition I want to use it as number. When I execute this query, the execution plan shows that CBO can use index and scan the table by index.
is it true?
I want to use and compare them as number and also an indexed be used. Is there any solution?
If it is guaranteed that all ACCARDACNT are numbers, then just use
and to_number(a.accardacnt) > 880080200000006 and a.accardacnt < 880080200001000;
This makes sure that the numbers are no compared as strings where '2' > '10', because looking at the first characters '2' is greater than '1'.
(In case of decimal numbers, make sure that the the decimal separator stored in the strings matches the current session settings.)
If you want to provide an index for this, use this function index:
create index idx_accardacnt on mytable( to_number(accardacnt) );
or a composite index containing to_number(accardacnt). As the execution plan for the strings query showed an index to be used, the same should be true for the numeric comparision and function index. (Remember a DBMS is free to use the provided indexes or not. We are simply offering them, but the DBMS knows best whether it makes sense to use them in a query or not.)
Think you cannot use numeric comparison and index together.
the execution plan shows that CBO can use index
There is a chance that Full Index Scan is used here, so it just a table scan but with less columns.
Possible approach is to convert numbers to fixed length strings with leading zeros and then use ones in comparison. In this case the index will be used.
Your current query should be able to use an index. But the problem is that you are comaparing text but expecting it to sort numerically. It may not, in general, because text sorts lexicographically in SQL (i.e. in dictionary order). So, to get the correct sorting behavior you will have to cast ACCARDACNT to a number:
AND CAST(LTRIM(a.ACCARDACNT, '0') AS FLOAT) BETWEEN 880080200000007 AND 880080200000999
Another option still would be a computed column:
alter table mytable add accardacnt_num as (to_number(accardacnt));
and provide one or more indexes containing this column:
create index idx_accardacnt_num on mytable(accardacnt_num);
Then existing code continues working, but new queries can benefit from the numeric column:
and a.accardacnt_num > 880080200000006 and a.accardacnt_num < 880080200001000;
I think the following logic does what you might really want:
a.ACCARDACNT > '0880080200000006' and
a.ACCARDACNT < '0880080200001000' and
length(ACCARDACNT) = 16
In addition, this can use an index on the column, if an appropriate one is available.
This would not be correct if you wanted the 15-character account number '880080200000060' to match your criteria. My guess is that you do not want this.
You have to create a function index on column : accardacnt and
create index idx_fn_accardant on table_name( to_number(accardacnt) );
convert it into number in the where clause of the query query :
where to_number(ACCARDACNT) > 0880080200000006 and to_number(ACCARDACNT) < 0880080200001000

SQL Server: Get records that match specifc data at specific location?

I have a varchar column that has data such as 00110100001110100100010011111, and I need to get back records that have 1 in position 5 and 0 in position 11. What is the fastest way I can search for them?
Right now I'm thinking of using substring: substring(column, 5, 1)==1 and substring (column, 11,1)==0. Is this the best way?
Thanks.
LIKE '____1_____0%' is the simplest way with your current structure. It will involve a full table scan though due to the leading wildcard.
What does this string of characters represent though?
If it is a fixed set of boolean values you might consider separating them out into individual bit columns and indexing them individually.
This is more space efficient as 8 values can fit into 2 bytes (including null bitmap) as opposed to 2 values in 2 bytes for the varchar version.
You might well still end up with table scans however as these indexes will not be selective enough to be used except if the values are skewed and you are searching for the less common values but at least SQL Server will be able to maintain separate column statistics and use the indexes when this would help.
If it is an arbitrary set (e.g. an ever growing history of states) then you should probably separate out into a new table (EntityId, Position (int), Value (bit)). You can then use a relational division query to bring back all EntityIds matching the desired pattern.
SELECT EntityId
WHERE ( Position = 5
AND Value = 1
)
OR ( Position = 11
AND Value = 0
)
GROUP BY EntityId
HAVING COUNT(*) = 2
Use SUBSTRING. You can parameterise substring so if you want positions 3 and 13 you can change it or have it in a UDF etc
It depends what you want of course
If it's static positions, use Martin Smith's answer because it's cleaner
I suspect you need to refactor this column into several discrete ones though
Do positions 5 and 11 stay constant? Do you have ability to create computed columns and indexes?
If the answer to both of these questions is "yes", then you should be able to achieve good performance by implementing the following general idea:
Create computed column on substring(column, 5, 1).
Create computed column on substring(column, 11,1).
Create a composite index on both of these columns.
Then, in your query, just use the exact same expressions as in the definitions of your computed columns (such as: substring(column, 5, 1)==1 and substring (column, 11,1)==0, as you already proposed).
That being said, if you can, do yourself a favor and normalize your data model. Your table is not even in the 1st normal form!

MySQL MyISAM indexing a text column

I have a slow performing query on my table. It has a where clause such as:
where supplier= 'Microsoft'
The column type is text. In phpmyadmin I looked to see if I could add an index to the table but the option is disabled. Does this mean that you can not index a text column? Does this mean that every update query like this is performing a full table scan?
Would then the best thing to do is separate the column into it's own table and place an ID in the current table then place an index on that? Would this potentially speed up the query?
You need to add a prefix length to the index. Have a look at Column Indexes docs
The following creates an index on the first 50 bytes of supplier field:
mysql> create index supplier_ix on t(supplier(50));
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0
But maybe you should rethink the datatype of supplier? Judging from the name, it doesn't sound like a typical text field...
You should do a check on
select max(length(supplier)) from the_table;
If the length is less than 255, you can (and you should) convert it to varchar(255) and built an index on it
Choosing a right data type is more important.
If the length is long, built an index on limited length will help.
Did I understand you right? It's a TEXT column? As in the type that corresponds to BLOB? Might I advise considering a VARCHAR for this column?

Creating indexes for optimizing the execution of Stored Prcocedures

The WHERE clause of one of my queries looks like this:
and tbl0.Type = 'Alert'
AND (tbl0.AccessRights like '%'+TblCUG0.userGroup+'%'
or tbl0.AccessRights like 'All' )
AND (tbl0.ExpiryDate > CONVERT(varchar(8), GETDATE(), 1)
or tbl0.ExpiryDate is null)
order by tbl0.Priority,tbl0.PublishedDate desc, tbl0.Title asc
I will like to know on which columns can I create indexes and which type of index will best suit. Also I have heard that indexes dont work with Like and Wild cards at the starting. So what should be the approach to optimize the queries.
1 and tbl0.Type = 'Alert'
2 AND (tbl0.AccessRights like '%'+TblCUG0.userGroup+'%'
3 or tbl0.AccessRights like 'All' )
4 AND (tbl0.ExpiryDate > CONVERT(varchar(8), GETDATE(), 1)
5 or tbl0.ExpiryDate is null)
most likely, you will not be able to use an index with a WHERE clause like this.
Line 1, You could create an index on tbl0.Type, but if you have many rows and few actual values, SQL Server will just skip the index and table scan anyway. Also, having nothing to do with the index issue, a column like this, a code/flag value is better as a fixed width value char(1), tiny int, etc, where "A"=alert or 1=alert. I would name the column XyzType, where Xyz is what the type describes (DoctorType, CarType, etc). I would create a new table XyzTye, with a FK back to this column in tb10. this new table would have two columns XyzType PK and XyzDescription, where you expand out the name.
Line 2, are you combining multiple values into tbl0.AccessRights? and trying to use the LIKE to find values within it? if so, split this out into a different table and then you can remove the like and possibly add an index there.
Line 3, OR kills an index usage. Imagine looking through the phone book for all names that are "Smith" or start with "G", you can't just use the index. You may try splitting the query into a UNION or UNION ALL around the OR so an index can be used (one part looks for "Smith" and the other part looks for "G"). You have not provided enough of the query to determine if this is possible or not in your case. You many need to use a derived table that contains this UNION so you can join it to the rest of your query.
Line 4, tbl0.ExpiryDate could benifit from a index, but the or will kill its usage, see the Line 3 comment.
Line 5, you may try the OR union trick discussed above, or just not use NULL, put in a a default like '01/01/3000' so you don't need the OR.
SQL Server's Database Tuning Advisor can suggest which indexes will optimize your query, including covering indexes that will optimize the selected columns that you do not include in your query. Just because you add an index doesn't mean that the query optimizer will use it. Some indexes may cost more to use than others, so the optimizer will choose the bext indexes using the underlying tables' statistics.
Out-of-hand you could use add all ordering and criteria columns to an index, but that would be useless if for example, there are too few distinct Priority values to make it worth the storage.
You are right about LIKE and wildcards. An index is a btree which means that it can speed quick searches for specific values or range queries. A wildcard at the beginning means that the query will have to touch all records to check whether they match the pattern. A wildcard at the end means that the query will only have to touch items that start with the substring up to the wildcard, partially turning this into a range query that can benefit from an index.