So I've learned the difference between FREETEXT, FREETEXTTABLE, CONTAINS, and CONTAINSTABLE. And I've created a pretty cool search engine that combines a full-text enabled search with a tagging system (with a little help from you guys).
But where have you gone to really learn about and master full-text searching and get the most out of it in real-world scenarios? I'm struggling now with things like database design with full-text indexing in mind, and writing efficient queries that reference multiple tables each with their own full-text-indexed columns.
Any good articles or tutorials you know of are welcome.
Not an article or a tutorial, but if you're willing to spend a few bucks your single best source of information would be Pro Full-Text Search in SQL Server 2008 by Michael Coles and Hilary Cotter.
http://apress.com/book/view/9781430215943
You could start by going straight to the source (assuming that you haven't already).
Full-Text Search (SQL Server)
SQL Server 2008 Full-Text Search: Internals and Enhancements
Related
So I have some experience with Microsoft Access, building database apps for people, vba, etc for different folks at work....different divisions. And I have actually learned a lot in that realm...however,
now the need for SQL Server has arrived, and I have never really ventured into that realm...so let the questions begin:
how vastly different is what i am about to get myself into?
i know that experience is the best teacher, but i actually learned a lot through books when it came to access, vba, sql, etc....so can anyone suggest materials/resources for learning like this??
seems as though I am going to have to learn to be the dba....so i gotta get crackin on the learning so i appreciate any and all help A LOT!! thanks!
Microsoft e-learning provides a few free courses and a free ebook targeting SQL Server 2008 here (not sure why you are targeting SQL Server 2005).
Of interest: SQL Server 2005 Learning Resources
Database design skills to some extent transcend the RDBMS; a big difference when moving from Access is the use of stored procedures and the T-SQL constructs available to you.
The Microsoft Press books are excellent.
SQL Server 2005 Books Online contains almost everything you need to know, but the structure is quite daunting at first.
One thing that is really worth learning is how to use is SQL Server Profiler, not just for profiling performance problems, but also for seeing what is happening behind the scenes.
I have a SQL 2000 database with around 10 million rows and I need to make a query to get the product information based on the full / partial text search.
Based on this I need to join back to other tables to check on my business process.
I have this implemented using SQL proc, but I can only validate around 6 rows a sec (without threads.. its a long business logic). I am trying to find a better ways to improve performance.
Lucene.NET might help on this. I have couple of questions.
Can you point me to right sources.
While building index on Lucene, how would I sync up with the SQL database and lucene DB?
Do you think Lucene can give real performance gain?
You can start with Mark Krellenstein's 'Search Engine versus DBMS', to see whether a full text search engine, such as Lucene, is the solution for you. In theory, Lucene should be faster than SQL for textual search, but your mileage may vary.
You can do incremental updates with Lucene, which are a bit similar to database replication. This keeps the Lucene index synchronized with the database.
Here is an article on using LINQ to Lucene to work with SQL. This may point you in the right direction.
One of the aspects of computer science/practical software engineering I am weaker at is actually doing significant work in database systems. That is to say, I can do simple queries on smaller datasets, no problem. However, working with complex queries on large datasets invokes a level of understanding of databases beyond me right now. For example, I built an amusing query some time ago that computed a join using a n^2 size where n=20,000- the hosting server suspended my account for blowing the CPU. Shocking.
I am interested in bringing myself up to speed on how to design schema and queries that, well, don't bring down the server. Pursuant to that end, what materials do you recommend that discuss professional database/SQL design and writing?
I would go to the bookstore and pick out some books on performance tuning for the database of your choice (it is very differnet depending onteh database backend). This will help you understand what not to do which is critical to designing databases.
Here's a site with a lot of good info
http://wiki.lessthandot.com/index.php/Category:Data_Management
For generic SQL I would go for Celko's books. For vendor specific, it depends on the platform of your choice. I know the SQL Server platform well and for that my praise go to the Inside series.
Blogs are also usefull, look at the all time SQL tag right here on SO and check the top answerers info, some have personal blogs that are very usefull. Eg. go through Quassnoi's blog, it has a LOT of useful info on MySQL, Oracle, SQL Server.
Looks like my data warehouse project is moving to Teradata next year (from SQL Server 2005).
I'm looking for resources about best practices on Teradata - from limitations of its SQL dialect to idioms and conventions for getting queries to perform well - particularly if they highlight things which are significantly different from SQL Server 2005. Specifically tips similar to those found in The Art of SQL (which is more Oracle-focused).
My business processes are currently in T-SQL stored procedures and rely fairly heavily on SQL Server 2005 features like PIVOT, UNPIVOT, and Common Table Expressions to produce about 27m rows of output a month from a 4TB data warehouse.
One place to start is here: http://www.teradataforum.com/
This might be a little late, but there are a few things which I can warn you about Teradata which I have learned.
Use the most recent version as often as possible.
For V12 the optimizer was re-written and the database performs much better now.
Try to realize that SQL Server and Teradata are very different beasts, most of the concepts will not transition well.
Do not underestimate the importance of a primary index.
The locks that teradata uses are very primitive when compared to other databases.
Do NOT use TERA mode. You do not have any code which is legacy, ANSI mode is far superior and is widely encouraged.
Join indexes are very helpful tools, but they do not provide all the answers.
Parallelism, take the time to understand how FASTLOAD, MULTILOAD, and TPUMP works and find out how one can leverage it with their ETL strategy.
If you are attempting to run a query which needs to be performant, do not use any casts, the optimizer will not use statistics to generate the best execution plan.
Working with dates are going to be a pain, just a warning.
Teradata is very DDL oriented, try to understand all the syntax related when creating a table.
Compression is a wonderful tool, if you have any values which are repeated in a table, make use of it.
There are not many tools available with Teradata, be prepared to build a lot. The tools that exist are very expensive.
Unfortunately, I do not know much about SQL Server, so I cannot say what tools in SQL Server appear in Teradata.
Hope this helps
I would also look into the recently launched Teradata Developer Exchange as well as the TeradataForum and forums on Teradata's main website.
I don't know of any good references available online. Teradata has some design manuals that are available for download, but they're more instruction manuals and not "best practices" as such. check them out here: http://www.info.teradata.com/DataWarehouse/eTeradata-BrowseBy.cfm?page=Teradata%20Database
Alternatively, you need to find a friendly Teradata expert to bounce ideas off. Try Teradata themselves, or find a local consultant with Teradata experience.
Best Practices on Teradata isn't a topic that gets lots of discussions and most of the best tricks tend to be proprietary knowledge of the person/people who discovered them.
Sorry,
David Stewardson
Satyam Computer Services
Top of the list on a Google search for "Teradata Best Practices" gave me TERADATA ADVISORY GROUP SETS BEST PRACTICES FOR BUSINESS OBJECTS AND TERADATA CUSTOMERS
EDIT: Seeing as that's just advertising, as you've pointed out, see how you go with these. Please bear in mind that I don't have a clue what Teradata is and can't see myself using it any time this side of the 22nd century AD.
Teradata Discussion Forums
Best Practices for Teradata Deployments
Best Study Guides For NCR Teradata Certifications
The middle one looks promising with it's nice long link tree at the top
Oracle® Business Intelligence Applications Installation and Configuration Guide > Preinstallation and Predeployment Considerations for Oracle BI Applications > Teradata-Specific Database Guidelines for Oracle Business Analytics Warehouse >
and the first link, to the forums, should put you in touch with the right people.
I have an MS SQL database and have a varchar field that I would like to do queries like where name like '%searchTerm%'. But right now it is too slow, even with SQL enterprise's full text indexing.
Can someone explain how Lucene .Net might help my situation? How does the indexer work? How do queries work?
What is done for me, and what do I have to do?
I saw this guy (Michael Neel) present on Lucene at a user group meeting - effectively, you build index files (using Lucene) and they have pointers to whatever you want (database rows, whatever)
http://code.google.com/p/vinull/source/browse/#svn/Examples/LuceneSearch
Very fast, flexible and powerful.
What's good with Lucene is the ability to index a variety of things (files, images, database rows) together in your own index using Lucene and then translating that back to your business domain, whereas with SQL Server, it all has to be in SQL to be indexed.
It doesn't look like his slides are up there in Google code.
This article (strangely enough it's on the top of the Google search results :) has a fairly good description of how the Lucene search could be optimised.
Properly configured Lucene should easily beat SQL (pre 2005) full-text indexing search. If you on MS SQL 2005 and your search performance is still too slow you might consider checking your DB setup.