I need to store variable string length in a SQL Server table, length can vary 0 to 10000 or so characters.
I need to filter the text with like operator %abc or abc%.
I thought of using varchar or nvarchar, but I'm unable to create index on this datatype.
Please suggest design advise for choosing the column data type and make the column indexable.
Thank you
SQL Server will use an index for like 'abc%'. However, it will not use the index if the wildcard is first.
If you are searching for complete words, then you should investigate contains() and full text search.
I thought of using varchar or nvarchar, but I'm unable to create index on this datatype.
Those are the proper datatypes for text based information.
The reason you're not able to create an index has nothing to do with the datatype per se - but in SQL Server, an index can be created only if the column (or set of columns) in the index have a max. theoretical size of 900 bytes (or less).
This means, a VARCHAR(1000) (or VARCHAR(MAX)) cannot be indexed. This size limit is your issue - not the datatype.
Solution: there's really only one solution: use fewer bytes in the columns you want to index. Or alternatively, as Gordon suggested, check out the full-text indexing capabilities of SQL Server.
Related
What is the good approach to keep a nvarchar field unique. I have a field which is storing URLs of MP3 files. The URL length can be anything from 10 characters to 4000. I tried to create an index and it says it cannot create the index as the total length exceeds 900 bytes.
If the field is not indexed, it's going to be slow to search anything. I am using C#, ASP.net MVC for the front end.
You could use CHECKSUM command and put index on column with checksum.
--*** Add extra column to your table that will hold checksum
ALTER TABLE Production.Product
ADD cs_Pname AS CHECKSUM(Name);
GO
--*** Create index on new column
CREATE INDEX Pname_index ON Production.Product (cs_Pname);
GO
Then you can retrieve data fast using following query:
SELECT *
FROM Production.Product
WHERE CHECKSUM(N'Bearing Ball') = cs_Pname
AND Name = N'Bearing Ball';
Here is the documentation: http://technet.microsoft.com/en-us/library/ms189788.aspx
You can use a hash function (although theoretically it doesn't guarantee that two different titles will have different hashes, but should be good enough: MD5 Collisions) and then apply the index on that column.
MD5 in SQL Server
You could create a hash code of the url and use this integer as a unique index on your db. Beware of converting all characters to lowercase first to ensure that all url are in the same format. Same url will generate equal hash code.
after some time i landed in sybase (ASE 15.. to be specific) world and i am bit terrified over time
missing functions and functionality i know from sql server makes me feel like i am in early 90'
to the point
i have to prepare single shot report
an have some text stored as image column (dont know why someone did that)
so what i did was
select CAST(CAST(REQUEST AS VARBINARY(16384)) AS VARCHAR(16384)) as RequestBody
from table
the problem emarges becouse some requests are longer than 16384
and have no idea how to get the data
and what is even worse i dont know where to look for information as sybase documentation is in best case scarse, and in comparison with MS world its non existant
According to the docs you need to use CONVERT function like this:
SELECT CONVERT(VARBINARY(2048), raw_data) as raw_data_str FROM table;
Instead of using varbinary(16384) and varchar(16384), try using varbinary(max) and varchar(max). In that case, the maximum datalength will be 2 GB.
See:
http://msdn.microsoft.com/en-us/library/ms176089.aspx and http://msdn.microsoft.com/en-us/library/ms188362.aspx
What is the length of the REQUEST column in the table?
Can you please help me concerning this matter (I didnĀ“t found it in the Teradata documentation, which is honestly little overwhelming): My table had this column -BAN DECIMAL(9,0)-, and now I want to change it to - BAN DECIMAL(15,0) COMPRESS 0.- How can I do it? What does COMPRESS constraint 0. or any other mean anyway?
I hope this is possible, and I don`t have to create a new table and then copy the data form the old table. The table is very very big - when I do COUNT(*) form that table I get this error: 2616 numeric overflow occurred during computation
The syntax diagram for ALTER TABLE doesn't seem to support directly changing a column's data type. (Teradata SQL DDL Documentation). COMPRESS 0 compresses zeroes. Teradata supports a lot of different kinds of compression.
Numeric overflow here probably means you've exceeded the range of an integer. To make that part work, just try casting to a bigger data type. (You don't need to change the column's data type to do this.)
select cast(count(*) as bigint)
from table_name;
You asked three different questions:
You cannot change the data type of a column from DECIMAL(9,0) to DECIMAL(15,0). Your best bet would be to create a new column (NEW_BAN), assign values from your old column, drop the old column and rename NEW_BAN back to BAN).
COMPRESS 0 is not a constraint. It means that values of "zero" are compressed from the table, saving disk space.
Your COUNT(*) is returning that error becasue the table has more than 2,147,483,647 rows (the max value of an INTEGER). Cast the result as BIGINT (as shown by Catcall).
And I agree, the documentation can be overwhelming. But be patient and focus only on the SQL titles for your exact release. They really are well written.
You can not use ALTER TABLE to change the data type from DECIMAL(9,0) to DECIMAL(15,0) because it cross the byte boundary required to store the values in the table. For Teradata 13.10, see the Teradata manual for SQL Data Definition Language Detailed Topics pages 61-65 for more details on using ALTER TABLE to change column data types.
I have huge table with 2 columns: Id and Title. Id is bigint and I'm free to choose type of Title column: varchar, char, text, whatever. Column Title contains random text strings like "abcdefg", "q", "allyourbasebelongtous" with maximum of 255 chars.
My task is to get strings by given substring. Substrings also have random length and can be start, middle or end of strings. The most obvious way to perform it:
SELECT * FROM t LIKE '%abc%'
I don't care about INSERT, I need only to do fast selects. What can I do to perform search as fast as possible?
I use MS SQL Server 2008 R2, full text search will be useless, as far as I see.
if you dont care about storage, then you can create another table with partial Title entries, beginning with each substring (up to 255 entries per normal title ).
in this way, you can index these substrings, and match only to the beginning of the string, should greatly improve performance.
If you want to use less space than Randy's answer and there is considerable repetition in your data, you can create an N-Ary tree data structure where each edge is the next character and hang each string and trailing substring in your data on it.
You number the nodes in depth first order. Then you can create a table with up to 255 rows for each of your records, with the Id of your record, and the node id in your tree that matches the string or trailing substring. Then when you do a search, you find the node id that represents the string you are searching for (and all trailing substrings) and do a range search.
Sounds like you've ruled out all good alternatives.
You already know that your query
SELECT * FROM t WHERE TITLE LIKE '%abc%'
won't use an index, it will do a full table scan every time.
If you were sure that the string was at the beginning of the field, you could do
SELECT * FROM t WHERE TITLE LIKE 'abc%'
which would use an index on Title.
Are you sure full text search wouldn't help you here?
Depending on your business requirements, I've sometimes used the following logic:
Do a "begins with" query (LIKE 'abc%') first, which will use an index.
Depending on if any rows are returned (or how many), conditionally move on to the "harder" search that will do the full scan (LIKE '%abc%')
Depends on what you need, of course, but I've used this in situations where I can show the easiest and most common results first, and only move on to the more difficult query when necessary.
You can add another calculated column on the table: titleLength as len(title) PERSISTED. This would store the length of the "title" column. Create an index on this.
Also, add another calculated column called: ReverseTitle as Reverse(title) PERSISTED.
Now when someone searches for a keyword, check if the length of keyword is same as titlelength. If so, do a "=" search. If length of keyword is less than the length of the titleLength, then do a LIKE. But first do a title LIKE 'abc%', then do a reverseTitle LIKE 'cba%'. Similar to Brad's approach - ie you do the next difficult query only if required.
Also, if the 80-20 rules applies to your keywords/ substrings (ie if most of the searches are on a minority of the keywords), then you can also consider doing some sort of caching. For eg: say you find that many users search for the keyword "abc" and this keyword search returns records with ids 20, 22, 24, 25 - you can store this in a separate table and have this indexed.
And now when someone searches for a new keyword, first look in this "cache" table to see if the search was already performed by an earlier user. If so, no need to look again in main table. Simply return results from "cache" table.
You can also combine the above with SQL Server TextSearch. (assuming you have a valid reason not to use it). But you could nevertheless use Text search first to shortlist the result set. and then run a SQL query against your table to get exact results using the Ids returned by the TExt Search as a parameter along with your keyword.
All this is obviously assuming you have to use SQL. If not, you can explore something like Apache Solr.
Create index view there is new feature in sql create index on the column that you need to search and use that view after in your search that will give your more faster result.
Use ASCII charset with clustered indexing the char column.
The charset influences the search performance because of the data
size on both ram and disk. The bottleneck is often I/O.
Your column is 255 characters long so you can use normal index on
your char field rather than full text, which is faster. Do not
select unnecessary columns in your select statement.
Lastly, add more RAM to the server and Increase cache size.
Do one thing, use primary key on specific column & index it in cluster form.
Then search using any method (wild card or = or any), it will search optimally because the table is already in clustered form, so it knows where he can find (because column is already in sorted form)
I have 2 columns containing text one will be max 150 chars long and the other max 700 chars long,
My question is, should I use for both varchar types or should I use text for the 700 chars long column ? why ?
Thanks,
The varchar data type in MySQL < 5.0.3 cannot hold data longer than 255 characters. While in MySQL >= 5.0.3 it has a maximum of 65,535 characters.
So, it depends on the platform you're targeting, and your deployability requirements. If you want to be sure that it will work on MySQL versions less than 5.0.3, go with a text type data field for your longer column
http://dev.mysql.com/doc/refman/5.0/en/char.html
An important consideration is that with varchar the database stores the data directly in the table, and with text the database stores a pointer to a separate tablespace in which the data is stored. So, unless you run into the limit of a row length (64K in MySQL, 4-32K in DB2, 8K in SQL Server 2000) I would normally use varchar.
Not sure about mysql specifically, but in MS SQL you definitely should use a VARCHAR for anything under 8000 characters long, if you want to be able to run any sort of comparison on the value in the field. For example this would be possible with a VARCHAR:
select your_column from your_table
where your_column like '%dogs%'
but not with a TEXT field.
More information regarding TEXT field in mysql 5.4 can be found here and more information about the VARCHAR field can be found here.
I'd pick the option based on the type of data being entered. For example, if it's a (potentially) super long username and a biography, I'd do varchar and text. Sure, you're limiting the bio to 700 chars, but what if you add HTML formatting down the road, keeping the 700 char limit but allowing HTML tags for formatting?
Twitter may use text for their tweets, since it could be quicker to add meta data to the posts (e.g. url href's and #user PK's) to cache the additional data instead of caluclate it every rendering.