search HTML stored as binary image in SQl2000/2005 (without fulltext) - sql

I am building a simple search tool to search through 'n' articles of html content. I have tried the fulltext search option and all was well until we went live and I have had a load of trouble with the webhost getting stuff sorted properly.
So I might have to move to a host that does not have SQL fulltext support.
All of the articles are stored in a SQL 'image' column, all I want to do is run a LIKE'%keyword%' search on this column, but have no idea how to do this or if it is even possible.
Can SQLserver decode the binary and do a search on the fly?
Or will I be better off just storing a text only version of the content in a second column?
I have looked at the Lucene.net project but am not sure if this will work on a shared hosting platform.
any help will much appreciated.
cheers.
craig

It depends on your version of SQL server - in 2000, you're probably out of luck. "Image" really is just a binary blob - no string functions or anything will work on it.
In SQL Server 2005, you could possibly convert this (either in the database schema or on the fly, with a CAST) to VARCHAR(MAX) - a text type up to 2 GB, which can deal with the normal string functions, and can be searched using WHERE CAST(blob AS VARCHAR(MAX)) LIKE '.......'
It won't be exactly lightning swift - but it might work. I would prefer changing the datatype of that column to VARCHAR(Max), though - all just text, up to 2 GB supported - should be good enough for a few HTML documents.
Marc

Related

How to get the length of the contents of a varbinary field with SQL in Advantage Database Server db?

Does anyone know if it is possible to get the length of the contents of a varbinary field using SQL with Advantage Database Server V11?
Regards
The obvious function to look for would be LENGTH(field) or LEN(field) (see online help).
If those only work on character fields, then you can always cast.

Storing 30k-char strings, should I use NVARCHAR(max)?

From what I fond on internet, I should be using NVARCHAR(max) to store some strings that contain up to 30k characters. I don't need to search through the strings, only store and retrieve it. The type TEXT seems obsolete, and VARCHAR isn't as friendly as NVARCHAR about Unicode, so I should be using NVARCHAR(max), right ?
Secondly, for now I'm using some sql database of a free webhosting service, i'm creating the structure via the phpmyadmin, and when creating columns it doesn't suggest NVARCHAR, only VARCHAR and other types. Is NVARCHAR still available, maybe with a DDL command ? The sql version is 5.5.32
Thanks and sorry if my questions are unclear, i'm kind of a newb

Search and Replace a a partial string / substring in mssql tables

I was tasked with moving an installation of Orchard CMS to a different server and domain. All the content (page content, menu structure, links, etc.) is stored in an MSSQL database. The good part: When moving the physical files of the Orchard installation to the new server, the database will stay the same, no need to migrate it. The bad thing: There are lots and lots of absolute URLs scattered all over the pages and menus.
I have isolated / pinned down the tables and fields in which the URLs occur, but I lack the (MS)SQL experience/knowledge to do a "search - replace". So I come here for help (I have tried exporting the tables to .sql files, doing a search-replace in a text editor, and then re-importing the .sql files to the database, but ran into several syntax errors... so i need to do this the "SQL way").
To give an example:
The table Common_BodyPartRecord has the field Text of type ntext that contains HTML content. I need to find every occurance of the partial string /oldserver.com/foo/ and replace it with /newserver.org/bar/. There can be multiple occurances of the pattern within the same table entry.
(In total I have 5 patterns that will need replacing, all partial string / substrings of urls, domains/paths, etc.)
I usually do frontend stuff and came to this assignment by chance. I have used MySQL back in the day I was playing around with PHP related stuff, but never got past eh basics of SQL - it would be helpful if you could keep your explainations more or less newbie-friendly.
The SQL server version is SQL Server 9.0.4053, I have access to the database via the Microsoft SQL Server Management Studio 12
Any help is highly appreciated!
You can't manipulate the NTEXT datatype directly, but you can CAST it to VARCHAR(MAX), then use the REPLACE function to perform the string replacement, then CAST it back to NTEXT. This can all be done in a single UPDATE statement.
update MyTable
set MyColmun = cast(replace(cast(MyColumn as nvarchar(max)), N'/oldserver.com/foo/', N'/newserver.org/bar/') as ntext)
where cast(MyColumn as nvarchar(max)) LIKE N'%/oldserver.com/foo/%'
The WHERE clause in the UPDATE statement below is used to prevent SQL Server from making non-changes, i.e. if the value does not need to be changed then there is no need to update it to itself.
The CAST function is used to change the data type of a value. NTEXT is a legacy data type used for storing large character values, NVARCHAR(MAX) is a new and more versatile data type for storing large character values. The REPLACE function can not operate on NTEXT values, hence the need to CAST it to NVARCHAR(MAX) first, do the replace, then CAST it back to NTEXT afterwards.

What is the best SQL type to use for a large string variable?

Apologies for the rather basic question.
I have an error string that is built dynamically. The data in the string is passed by various third parties so I don't have any control, nor do I know the ultimate size of the string.
I have a transaction table that currently logs details and I want to include the string so that I can reference back to it if necessary.
2 questions:
How should I store it in the database?
Should I do anything else such as contrain the string in code?
I'm using Sql Server 2008 Web.
If you want to store non unicode text, you can use:
varchar(max) or nvarchar(max)
Maximum length is 2GB.
Other alternatives are:
binary or varbinary
Drawbacks: you can't search into these fields and index and order them
and the maximum size : 2GB.
There are TEXT and NTEXT, but they will be deprecated in the future,
so I don't suggest to use them.
They have the same drawbacks as binary.
So the best choice is one of varchar(max) or nvarchar(max).
You can use SQL Server nvarchar(MAX).
Check out this too.
Eventualy, you can enable and use a FILESTREAM feature of SQL Server 2008 (it's supported by WEB edition), and deal with extra large amount of data in sense of documents.
Of course, you need to be sure that you will use a benefit of this service.

Using full-text search with PDF files in SQL Server 2005

I've got a strange problem with indexing PDF files in SQL Server 2005, and hope someone can help. My database has a table called MediaFile with the following fields - MediaFileId int identity pk, FileContent image, and FileExtension varchar(5). I've got my web application storing file contents in this table with no problems, and am able to use full-text searching on doc, xls, etc with no problems - the only file extension not working is PDF. When performing full-text searches on this table for words which I know exist inside of PDF files saved in the table, these files are not returned in the search results.
The OS is Windows Server 2003 SP2, and I've installed Adobe iFilter 6.0. Following the instructions on this blog entry, I executed the following commands:
exec sp_fulltext_service 'load_os_resources', 1;
exec sp_fulltext_service 'verify_signature', 0;
After this, I restarted the SQL Server, and verified that the iFilter for the PDF extensions is installed correctly by executing the following command:
select document_type, path from sys.fulltext_document_types where document_type = '.pdf'
This returns the following information, which looks correct:
document_type: .pdf
path: C:\Program Files\Adobe\PDF IFilter 6.0\PDFFILT.dll
Then I (re)created the index on the MediaFile table, selecting FileContent as the column to index and the FileExtension as its type. The wizard creates the index and completes successfully. To test, I'm performing a search like this:
SELECT MediaFileId, FileExtension FROM MediaFile WHERE CONTAINS(*, '"house"');
This returns DOC files which contain this term, but not any PDF files, although I know that there are definitely PDF files in the table which contain the word house.
Incidentally, I got this working once for a few minutes, where the search above returned the correct PDF files, but then it just stopped working again for no apparent reason.
Any ideas as to what could be stopping SQL Server 2005 from indexing PDF's, even though Adobe iFilter is installed and appears to be loaded?
Thanks Ivan. Managed to eventually get this working by starting everything from scratch. It seems like the order in which things are done makes a big difference, and the advice given on the linked blog to to turn off the 'load_os_resources' setting after loading the iFilter probably isn't the best option, as this will cause the iFilter to not be loaded when the SQL Server is restarted.
If I recall correctly, the sequence of steps that eventually worked for me was as follows:
Ensure that the table does not have an index already (and if so, delete it)
Install Adobe iFilter
Execute the command exec sp_fulltext_service 'load_os_resources', 1;
Execute the command exec sp_fulltext_service 'verify_signature', 0;
Restart SQL Server
Verify PDF iFilter is installed
Create full-text index on table
Do full re-index
Although this did the trick, I'm quite sure I performed these steps a few times before it eventually started working properly.
I've just struggled with it for an hour, but finally got it working. I did everything you did, so just try to simplify the query (I replaced * with field name and removed double quotes on term):
SELECT MediaFileId, FileExtension FROM MediaFile WHERE CONTAINS(FileContent, 'house')
Also when you create full text index make sure you specify the language. And the last thing is maybe you can try to change the field type from Image to varbinary(MAX).