What is a better schema for indexing: a combined varchar column or several integer columns? - sql

I want to make my table schema better. This table will insert a record per microsecond.
The table is already too big, so I could not test the table itself.
Current setup (columns id, name, one, two, three):
SELECT *
FROM table
WHERE name = 'foo'
AND one = 1
AND two = 2
AND three = 3;
Maybe in the future (columns id, name, path):
SELECT *
FROM table
WHERE
name = 'foo'
AND path = '1/2/3';
If I change three integer columns to one varchar column, will the SQL run faster than now?
Using PostgreSQL
varchar length will 5~12.
I think I can use bigint with zerofill (1/2/3 to 1000010200003) which may be faster than varchar.

Premature optimization is the root of all evil.
If you have a fixed number of integers, or at least a reasonable upper limit, stick with having an individual column for each.
You would then use a combined index over alk columns, ideally with the not nullable and selective columns first.
If you want to optimize, use smallint which only takes up two bytes.

If I change three integer columns to one varchar column, will the SQL run faster than now?
Not noticeably so. You might produce some small impacts on performance, balancing things such as:
Are the string columns bigger or smaller than the integer keys (resulting in marginal bigger or smaller data pages and indexes)?
Is an index on two variable length strings less efficient than an index on on variable length string and three fixed length keys?
Do the results match what you need or is additional processing needed after you fetch a record?
In either case the available index is going to be used to find the row(s) that match the conditions. This is an index seek, because the comparisons are all equality. Postgres will then go directly to the rows you need. There is a lot of work going on beyond just the index comparisons.
You are describing 1,000,000 inserts per second or 84 millions inserts each day -- that is a lot. Under such circumstances, you are not using an off-the-shelf instance of Postgres running on your laptop. You should have proper DBA support to answer a question like this.

Related

dictionary database, one table vs table for each char

I have a very simple database contains one table T
wOrig nvarchar(50), not null
wTran nvarchar(50), not null
The table has +50 million rows. I execute a simple query
select wTran where wOrig = 'myword'
The query takes about 40 sec to complete. I divided the table based on the first char of wOrig and the execution time is much smaller than before (based on each table new length).
Am I missing something here? Should not the database use more efficient way to do the search, like binary search?
My question What changes to the database options - based on this situation - could make the search more efficient in order to keep all the data in one table?
You should be using an index. For your query, you want an index on wTran(wOrig). Your query will be much faster:
create index idx_wTran_wOrig on wTran(wOrig);
Depending on considerations such as space and insert/update characteristics, a clustered index on (wOrig) or (wOrig, wTran) might be the best solution.

Why is my SQL QUERY using CONTAINS numbers taking up to 2 minutes

I have a table named Locations with a FullText Index on all columns. There's one PK Column (INT) and the rest are TEXT/VARCHAR. This table has 300,000 records.
The following query is taking 2 minutes to return one record.
SELECT TOP 1 * FROM Locations WHERE CONTAINS(*, '"1*"') ORDER BY LocationID
This slow query time is consistant when using any combination of numbers from 1 to 3 digits in length.
Using a characters (a-zA-Z) are performing normally, with a sub 25 milisecond response time.
Any idea why the numeric values are causing such a performance hit?
I suspect it is a combination of 2 causes.
Cause 1: Wildcard searches on common prefixes are slow. Do the records contain a lot of strings (numeric or alphanumeric) that begin with "1"? If so, that might explain the poor performance.
Wildcard searches tend to be slower than other full text searches. The more terms there are that contain the prefix ("1" in your case), the more work the full text engine has to do.
Although 300,000 records is not a lot of records for the full text engine to handle, factors like the number of unique terms in each record and the number of records and columns in which each of those terms is found will contribute even more to the search performance.
Cause 2: Missing index on ORDER BY columns. You should make sure the LocationID column is indexed since that is how you're sorting the results. It is possible that "1*" is generating a lot of results, all of which need to be sorted. If there is no index, the sort could take a long time.

Index on VARCHAR column

I have a table of 32,589 rows, and one of the columns is called 'Location' and is a Varchar(40) column type. The column holds a location, which is actually a suburb, all uppercase text.
A function that uses this table does a:
IF EXISTS(SELECT * FROM MyTable WHERE Location = 'A Suburb')
...
Would it be beneficial to add an index to this column, for efficiency? This is more a read-only table, so not much edits or inserts except for maintanance.
Without an index SQL Server will have to perform a table scan to find the first instance of the location you're looking for. You might get lucky and have the value be in one of the first few rows, but it could be at row 32,000, which would be a waste of time. Adding an index only takes a few second and you'll probably see a big performance gain.
I concur with #Brian Shamblen answer.
Also, try using TOP 1 in the inner select
IF EXISTS(SELECT TOP 1 * FROM MyTable WHERE Location = 'A Suburb')
You don't have to select all the records matching your criteria for EXISTS, one is enough.
An opportunistic approach to performance tuning is usually a bad idea.
To answer the specific question - if your function is using location in a where clause, and the table has more than a few hundred rows, and the values in the location column are not all identical, creating an index will speed up your function.
Whether you notice any difference is hard to say - there may be much bigger performance problems lurking in the database, and you might be fixing the wrong problem.

How to create an index for a string column in sql?

I have a table with 3 columns: a list id, name and numeric value.
The goal is to use the table to retrieve and update the numeric value for a name in various lists.
The problem is that sql refuses to create an index with the name column because it's a string column with variable length.
Without an index selecting with a name will be inefficient, and the option of using a static length text column will be wasting a lot of storage space as names can be fairly long.
What is the best way to build this table and it's indexes?
(running sql server 2008)
If your string is longer than 900 bytes, then it can't be an index key, regardless of whether it is variable or fixed length.
One idea would be to at least make seeks more selective by adding a computed column. e.g.
CREATE TABLE dbo.Strings
(
-- other columns,
WholeString VARCHAR(4000),
Substring AS (CONVERT(VARCHAR(10), WholeString) PERSISTED
);
CREATE INDEX ss ON dbo.Strings(Substring);
Now when searching for a row to update, you can say:
WHERE s.Substring = LEFT(#string, 10)
AND s.WholeString = #string;
This will at least help the optimizer narrow its search down to the index pages where the exact match is most likely to live. You may want to experiment with that length as it depends on how many similar strings you have and what will best help the optimizer weed out a single page. You may also want to experiment with including some or all of the other columns in the ss index, with or without using the INCLUDE clause (whether this is useful will vary greatly on various factors such as what else your update query does, read/write ratio, etc).
A regular index can't be created on ntext or text columns (i guess your name column is of that type, or (n)varchar longer than 900 bytes). You can create full-text index on that column type.

SQL Server: Get records that match specifc data at specific location?

I have a varchar column that has data such as 00110100001110100100010011111, and I need to get back records that have 1 in position 5 and 0 in position 11. What is the fastest way I can search for them?
Right now I'm thinking of using substring: substring(column, 5, 1)==1 and substring (column, 11,1)==0. Is this the best way?
Thanks.
LIKE '____1_____0%' is the simplest way with your current structure. It will involve a full table scan though due to the leading wildcard.
What does this string of characters represent though?
If it is a fixed set of boolean values you might consider separating them out into individual bit columns and indexing them individually.
This is more space efficient as 8 values can fit into 2 bytes (including null bitmap) as opposed to 2 values in 2 bytes for the varchar version.
You might well still end up with table scans however as these indexes will not be selective enough to be used except if the values are skewed and you are searching for the less common values but at least SQL Server will be able to maintain separate column statistics and use the indexes when this would help.
If it is an arbitrary set (e.g. an ever growing history of states) then you should probably separate out into a new table (EntityId, Position (int), Value (bit)). You can then use a relational division query to bring back all EntityIds matching the desired pattern.
SELECT EntityId
WHERE ( Position = 5
AND Value = 1
)
OR ( Position = 11
AND Value = 0
)
GROUP BY EntityId
HAVING COUNT(*) = 2
Use SUBSTRING. You can parameterise substring so if you want positions 3 and 13 you can change it or have it in a UDF etc
It depends what you want of course
If it's static positions, use Martin Smith's answer because it's cleaner
I suspect you need to refactor this column into several discrete ones though
Do positions 5 and 11 stay constant? Do you have ability to create computed columns and indexes?
If the answer to both of these questions is "yes", then you should be able to achieve good performance by implementing the following general idea:
Create computed column on substring(column, 5, 1).
Create computed column on substring(column, 11,1).
Create a composite index on both of these columns.
Then, in your query, just use the exact same expressions as in the definitions of your computed columns (such as: substring(column, 5, 1)==1 and substring (column, 11,1)==0, as you already proposed).
That being said, if you can, do yourself a favor and normalize your data model. Your table is not even in the 1st normal form!