Using full-text search with PDF files in SQL Server 2005 - sql-server-2005

I've got a strange problem with indexing PDF files in SQL Server 2005, and hope someone can help. My database has a table called MediaFile with the following fields - MediaFileId int identity pk, FileContent image, and FileExtension varchar(5). I've got my web application storing file contents in this table with no problems, and am able to use full-text searching on doc, xls, etc with no problems - the only file extension not working is PDF. When performing full-text searches on this table for words which I know exist inside of PDF files saved in the table, these files are not returned in the search results.
The OS is Windows Server 2003 SP2, and I've installed Adobe iFilter 6.0. Following the instructions on this blog entry, I executed the following commands:
exec sp_fulltext_service 'load_os_resources', 1;
exec sp_fulltext_service 'verify_signature', 0;
After this, I restarted the SQL Server, and verified that the iFilter for the PDF extensions is installed correctly by executing the following command:
select document_type, path from sys.fulltext_document_types where document_type = '.pdf'
This returns the following information, which looks correct:
document_type: .pdf
path: C:\Program Files\Adobe\PDF IFilter 6.0\PDFFILT.dll
Then I (re)created the index on the MediaFile table, selecting FileContent as the column to index and the FileExtension as its type. The wizard creates the index and completes successfully. To test, I'm performing a search like this:
SELECT MediaFileId, FileExtension FROM MediaFile WHERE CONTAINS(*, '"house"');
This returns DOC files which contain this term, but not any PDF files, although I know that there are definitely PDF files in the table which contain the word house.
Incidentally, I got this working once for a few minutes, where the search above returned the correct PDF files, but then it just stopped working again for no apparent reason.
Any ideas as to what could be stopping SQL Server 2005 from indexing PDF's, even though Adobe iFilter is installed and appears to be loaded?

Thanks Ivan. Managed to eventually get this working by starting everything from scratch. It seems like the order in which things are done makes a big difference, and the advice given on the linked blog to to turn off the 'load_os_resources' setting after loading the iFilter probably isn't the best option, as this will cause the iFilter to not be loaded when the SQL Server is restarted.
If I recall correctly, the sequence of steps that eventually worked for me was as follows:
Ensure that the table does not have an index already (and if so, delete it)
Install Adobe iFilter
Execute the command exec sp_fulltext_service 'load_os_resources', 1;
Execute the command exec sp_fulltext_service 'verify_signature', 0;
Restart SQL Server
Verify PDF iFilter is installed
Create full-text index on table
Do full re-index
Although this did the trick, I'm quite sure I performed these steps a few times before it eventually started working properly.

I've just struggled with it for an hour, but finally got it working. I did everything you did, so just try to simplify the query (I replaced * with field name and removed double quotes on term):
SELECT MediaFileId, FileExtension FROM MediaFile WHERE CONTAINS(FileContent, 'house')
Also when you create full text index make sure you specify the language. And the last thing is maybe you can try to change the field type from Image to varbinary(MAX).

Related

SQL developer spool to txt row

I have a bat file that calls sql developer and spool out a query to a text file, however the result is all in one row, it seems it doesn't know how to identify a row.
For example, if I run the spool script in sql developer manually, the txt file looks perfect like this:
"Item","Qty","Price"
"A11","4","0.86"
"A12","3","0.56"
"A14","5","0.3"
But if I ran it with the bat file, it came out like this:
"Item","Qty","Price""A11","4","0.86""A12","3","0.56""A14","5","0.3"
Without the right format, when I import it to excel file, all the data are just in one cell.
I have tried all kinds of format like SET PAGESIZE, SET TERMOUT...but none of these work. In my another device I ran exactly the same code, and I do not have this problem.
bat file code:
#echo off
C:
cd C:\sqldeveloper\sqldeveloper\bin
sdcli migration -actions=mkconn,runsql -connDetails=target_oracle:oracle:XXXXX -conn=target_oracle -sql="C:\Desktop\1.sql"
1.sql:
spool "C:\Desktop\test.txt"
#C:\2.sql as script(F5);
spool off
2.sql:
Select /*csv*/ * From (
select * from item
);
I got stuck here for a while, if you have solution please let me know, thank you.
The answer is to use the right SQL Developer command line interface for the job.
We have two:
SDCLI - this is a headless version of the full SQL Developer program - fancy way of saying, no GUI. It can do things like perform a database export or invoke a Cart feature. Using it to run a SQL statement via the Migration task is like using a flamethrower to defrost your car's windshield - although probably not nearly as fun.
SQLcl - this is a command-line, interactive interface to the Oracle database. It's a java based version of SQL*Plus. It has the same code as SQL Developer when it comes to making connections, running scripts, etc - but it's only 20MB vs 200+MB, and it only need a JRE vs a JDK.
Both of these programs are in your sqldeveloper\bin folder - but SQLcl is also a separate, standalone, supported product.
So, to do what you want, you need to change this:
sdcli migration -actions=mkconn,runsql -connDetails=target_oracle:oracle:XXXXX -conn=target_oracle -sql="C:\Desktop\1.sql"
to this:
sql user/pwd#server:port/service #c:\users\jdsmith\desktop\1.sql
And your 1.sql can be this:
spool c:\users\jdsmith\desktop\locations-so.csv
Select /*csv*/ * From locations;
spool off
exit
Which gives us this
Bonus: SQLcl will run through this MUCH faster.

How to use this weird .sql file?

I have a very strange 'reload.sql' file that I need to use to build a database.
It references about 200 XXX.dat files with straight-up readable data (although useless without explanations regarding the meaning of the fields).
I have tried msssql server, mysql workbench (on a server local-hosted on wamp), and directly accessing it through DBeaver and IBConsole, but I cannot manage to execute/build it.
It uses a weird syntax. There are elements like
begin
...
end
go
that hinted me towards T-SQL, but using sqlcmd on it gave me thousands upon thousands of errors regarding keywords.
Specifically, the very first batch of executable lines says
SET OPTION date_order = 'YMD'
go
SET OPTION PUBLIC.preserve_source_format = 'OFF'
go
SET TEMPORARY OPTION tsql_outer_joins = 'ON'
go
SET TEMPORARY OPTION st_geometry_describe_type = 'binary'
go
SET TEMPORARY OPTION st_geometry_on_invalid = 'Ignore'
go
SET TEMPORARY OPTION non_keywords = 'attach,compressed,detach,kerberos,nchar,nvarchar,refresh,varbit'
go
which generates about 150 errors 'Incorrect syntax near OPTION keyword' on its own, and according to google is part of a 'rexx' procedure but 'date_order' should then be 'DATFMT', right?
Another track is that of SyBase, but I cannot for the life of me get it to work (through my trials I did manage to build a .db file, that, well, is useless to me since I can't build it either..).
I've tried accessing it through ODBC pilots as well but none worked (the paradox ODBC did not crash, but said there was an error with a FROM clause, which are generated automatically...).
I need to know a way to build a database from this file or directly access the data it references, which I can't really post since it contains private medical data.
Also what madman came up with this.
The very first google link (for me anyway) against 'st-geometry-describe-option' shows this is a SAP SQL Anywhere database i.e. http://dcx.sybase.com/1200/en/dbadmin/st-geometry-describe-option.html
So I would suggest starting from the SQL Anywhere documentation and you will need to install the database software beforehand.

Help with DB2 Error when trying to execute SQL

I started using the system with a pre-made file called DB2.SQL. I am using this because it is what the tutorial said to use. I then edited this file and replaced the contents with my own code:
CREATE DATABASE BANKDB13 BUFFERPOOL BP0;
When I try to execute a SQL it though, I get this error:
DSNE377A INPUT DATA SET RECFM MUST BE F OR FB WTIH LRECL 80
What does this error mean and how do I correct it on the file?
I am running it with Vista TN3270 on Windows 7 over TSO, in SPUFI mode.
What I've tried so far:
When I start editing the file, I have a screen to change the defualts, and I have changed the RECORD FORMAT to F and FB as well as setting the RECORD LENGTH to 80 with no success.
EDIT:
I resolved the problem by deleting the DB2.SQL file and recreating it, and also making sure that the sizes I gave for the files were consistent with each other.
What SQL are you trying to execute on it?
The error means that the Record Format in the input data set must be either "F IXED" or "F IXED B" LOCK with a logical record length of 80.
So this is what the error means, how to correct it depends on the SQL you're running and the desired outcome.
What Tutorial is it that you refer to, do you have a link? Is this a real world problem, homework or you expanding your knowledge into mainframe DB2?
Your SQL snippet above is creating a DB, what is the INPUT DATASET file format that you are subsequently running SQL on?

SQL Server 2005 Management Studio - Recover Accidentally Closed Tab

Is there a way to do this if an unsaved tab gets accidentally closed?
I was able to recover a query I was working on after accidentally closing the tab. If you actually ran the query, it should be in SQL Server's query cache. Query the query cache and order the results by creation date. More info on the SQL Server query cache:
Modify a query like this one (found at http://msdn.microsoft.com/en-us/library/ee343986(v=SQL.100).aspx)
SELECT cp.objtype AS PlanType,
OBJECT_NAME(st.objectid,st.dbid) AS ObjectName,
cp.refcounts AS ReferenceCounts,
cp.usecounts AS UseCounts,
st.text AS SQLBatch,
qp.query_plan AS QueryPlan
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st;
to get your desired result. The "st.text" column will have the query that was run on the database server.
I also found at MSDN website that it is not possible to recover these files, but I would give a try to this (it worked for me):
Take a look in the folder C:\Users\YOURUSERIDHERE\Documents\SQL Server Management Studio\Backup Files\Solution1 and choose files for date when machine restarted or crash happened. SQLBlog.com
Take a look in the folder C:\Users\”[your username goes here]“\AppData\Local\Temp\ (this wasn't work for me because my .sql files had 0KB and .tmp files had something, but I couldn't find the way to 'extract' code from these .tmp files). Suppose that sometimes can be helpful, depending on reason of system reboot/crash. ayesamson.com
I'm not sure that there is, but using TimeSnapper can be a help to show what was previously in the window.
I don't believe so. I checked on the msdn website and there's a thread about this and the answer is no.
Navigate to My Documents\SQL Server Management Studio Express\Backup Files\Solution1 you will find the Recovered backlogs.This is the only solution.
1.Take a look in the folder C:\Users\YOURUSERIDHERE\AppData\Local\temp, then sort files by date modified and pick the last .sql that has a size greater than 0 bytes. That worked for me.
Unfortunately SSMS currently does not have the Undo Closed Tab feature. I have created the following Connect Item so Microsoft will hopefully add this in a future version: https://connect.microsoft.com/SQLServer/Feedback/Details/1044403

search HTML stored as binary image in SQl2000/2005 (without fulltext)

I am building a simple search tool to search through 'n' articles of html content. I have tried the fulltext search option and all was well until we went live and I have had a load of trouble with the webhost getting stuff sorted properly.
So I might have to move to a host that does not have SQL fulltext support.
All of the articles are stored in a SQL 'image' column, all I want to do is run a LIKE'%keyword%' search on this column, but have no idea how to do this or if it is even possible.
Can SQLserver decode the binary and do a search on the fly?
Or will I be better off just storing a text only version of the content in a second column?
I have looked at the Lucene.net project but am not sure if this will work on a shared hosting platform.
any help will much appreciated.
cheers.
craig
It depends on your version of SQL server - in 2000, you're probably out of luck. "Image" really is just a binary blob - no string functions or anything will work on it.
In SQL Server 2005, you could possibly convert this (either in the database schema or on the fly, with a CAST) to VARCHAR(MAX) - a text type up to 2 GB, which can deal with the normal string functions, and can be searched using WHERE CAST(blob AS VARCHAR(MAX)) LIKE '.......'
It won't be exactly lightning swift - but it might work. I would prefer changing the datatype of that column to VARCHAR(Max), though - all just text, up to 2 GB supported - should be good enough for a few HTML documents.
Marc