Can I write SQL using speech recognition? - sql

I have wrist pain when I type and I would like to start writing SQL statements, stored procedure, and views using speech recognition.

Yes. SQL is well-suited to speech recognition (as well-suited as a programming language can be, that is), given it's limited vocabulary and sentence-like structure. Aside from formatting the SQL so that it looks nice, I can dictate it much faster than typing. Dictating code isn't for everyone, however. It can be quite frustrating in the beginning. The people who try this and stick with it will probably be those who have no other choice.
I use Dragon NaturallySpeaking 10 Professional. The Professional version has the tools that are needed to create a custom vocabulary like this. Version 9 should work fine, also. It's expensive, so try to get the company you work for to pay for it if possible. Get a decent headset microphone also. The one that comes with NaturallySpeaking isn't good enough (but you may want to try it first to see if it works for you). KnowBrainer is a good place for microphone recommendations.
2009-01-05 Update: I have added tips below specific to dictating in SQL Server Management Studio.
2012-01-04 Update: I have been keeping track of Microsoft's WSR for quite a while now, hoping tools would be added to easily create a completely custom vocabulary from scratch like I am doing in this tutorial with NaturallySpeaking. Unfortunately, it appears that this can only be done through the API (SAPI). I don't have the time to write that code, so I will continue to use NaturallySpeaking to write code until something better comes along.
Preparation
Clean up your database names and code
Dictating "SELECT PT_17, PT_28, PT_29 FROM HIK.dbo.PATINFO" would be a pain in the butt, but I guess it would be possible. You would have to set a lot of pronunciations, since NaturallySpeaking would have no idea how "PT_17" would sound. This would be preferable for dictation:
SELECT Patient.FirstName, Patient.MiddleName, Patient.LastName FROM Claim.dbo.Patient AS Patient WHERE Patient.LastName LIKE '%smith%'
I switched to my TSQL vocabulary to dictate the above statement. Everything up to the LIKE statement is spoken just as it appears. '%smith%' was dictated as "open-single-quote percent-sign sierra mike india tango hotel percent-sign close-single-quote [PAUSE] compound-that". Using consistent table aliases and always preceding fields with them helps improve accuracy, since NaturallySpeaking keeps statistics of how often one word appears near another.
Create a word list of SQL keywords
Put one word on each line. You can optionally follow a word with a backslash (\) and a pronunciation. NaturallySpeaking uses a small backup dictionary of words to determine the pronunciation of words you add to a vocabulary, so it has no problem figuring out how SELECT, FROM, and WHERE are pronounced. It can sometimes figure out a compound word, and it makes its best guess for something like XACT_ABORT. I would provide pronunciations for cases like these. The database you use will determine what words the list contains - check your documentation for a list of keywords. Your list will look something like this, but be much longer.
SELECT
WHERE
FROM
XACT_ABORT\exact-abort
MAXDOP
NOLOCK\no-lock
LEN
RETURNS
CURSOR
MONEY
Also add these words
\New-Line
\New-Paragraph
\All-Caps
\All-Caps-On
\All-Caps-Off
\Cap
\Caps-On
\Caps-Off
\No-Caps
\No-Caps-On
\No-Caps-Off
\No-Space
\No-Space-On
\No-Space-Off
\space-bar
\tab-key
a\alpha
b\bravo
c\charlie
d\delta
e\echo
f\foxtrot
g\golf
h\hotel
i\india
j\juliet
k\kilo
l\lima
m\mike
n\november
o\oscar
p\papa
q\quebec
r\romeo
s\sierra
t\tango
u\uniform
v\victor
w\whiskey
x\xray
y\yankee
z\zulu
PM
AM
one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
thirty
fourty
fifty
sixty
seventy
eighty
ninety
hundred
thousand
million
billion
trillion
Keep this list around, since you'll probably modify it several times and re-create your vocabulary to get it the way you like it.
Create a word list of your database object names
This is how I do it in SQL Server:
SELECT DISTINCT * FROM
(
SELECT DISTINCT [name] FROM Database1.[dbo].[sysobjects] WHERE xtype not IN ('F', 'S', 'PK', 'D', 'UQ')
UNION
SELECT DISTINCT column_name AS [name] FROM Database1.information_schema.[columns]
UNION
SELECT DISTINCT [name] FROM Database2.[dbo].[sysobjects] WHERE xtype not IN ('F', 'S', 'PK', 'D', 'UQ')
UNION
SELECT DISTINCT column_name AS [name] FROM Database2.information_schema.[columns]
...
) AS UnionTable
Copy and paste the results into a text file.
Create pronunciations for your database object names
Use the same format for pronunciations as listed above. An easy way to create these is to use a regex search and replace function. In SQL Server Management Studio or Visual Studio the following (non-standard) regex will create pronunciations for two word mixed case names.
Find: ^{[A-Z][a-z]+}{[A-Z][a-z]+}$
Replace: \0\\\1-\2
Review the pronunciations and clean up anything that doesn't look right. For acronyms, ASP becomes `A.S.P.'. Keep this list around, as well. If you decide to make vocabularies for other programming languages, you will probably include these words if you're a database developer.
Create a text document that contains all of your SQL code (views, procedures, etc.)
SQL Server:
SELECT * FROM Database1.dbo.[View] UNION SELECT * FROM Database1.dbo.Routine UNION
SELECT * FROM Database2.dbo.[View] UNION SELECT * FROM Database2.dbo.Routine
...
ORDER BY [Name]
Remove comments and literal strings. Regex search and replace works well for this.
Build your vocabulary
Install NaturallySpeaking and create a new user if you have not already.
Create a new vocabulary
Click on "NaturallySpeaking | Manage Vocabularies...". Click New. Name the vocabulary something appropriate, such as "SQL". Base it on "Base General - Empty Dictation". When it asks you if you want to scan your email or documents, click cancel.
Import words
Click "Words | Import". Add the two word lists you created and import them.
Adapt to writing style
Click "Tools | Accuracy Center". Click "Add words from your documents to the vocabulary". Use the default settings, and select the document you created which contains your code.
Try dictating some SQL
The first thing you'll probably want to dictate is a select statement. Keep in mind that SELECT is what you use to begin a command in NaturallySpeaking that selects text. Because of this, you'll want to say "Cap" before dictating it so NaturallySpeaking doesn't get confused. That's it. Well, at least enough to get you started. Modify your word lists, pronunciations, and word properties as needed. There are other things you can do to increase accuracy and the speed at which you can dictate. As I think of them, I will edit this post and add them here.
Tips for dictating into SQL Server Management Studio
If you dictate into SQL Server Management Studio, you may notice very slow performance. Try the following to alleviate this:
Turn off all toolbars (create macros
to access commonly used
functionality)
Keep as few panes and
documents open as possible
Keep only one database open at a time
Hide search results after you're done
with them (Ctrl+R)
If all else
fails, close and reopen management
studio
Display the tab stops in the edit window to make it easier to format your SQL.
Query Analyzer from SQL Server 2000 does not have these issues.

http://voicecode.io
I recently released VoiceCode, a coding-by-voice solution I created to solve my own RSI issues.
I use it for coding in Sublime Text and Xcode, as well as general computer usage. It works for writing code in any language including SQL. The great thing about this solution is that all the commands can be chained into "command phrases" so you don't have to pause between every individual command like you do with other voice command solutions.
It has builtin support for all standard variable-name formats (snake case, camel case, etc), has builtin commands for every permutation of keyboard shortcuts (ie command-shift-5, command-option-shift-T, and so on), has cursor movement commands, app switching commands, window switching commands, commands for symbol combos like "=>", "||", ">=", etc, and tons more. Plus it is very easy to add your own custom commands as well.

Related

SQL - searching database with the LIKE operator

Given your data stored somewhere in a database:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:
Dinosaurs
And the SQL:
WHERE articleBody LIKE '%Dinosaurs%'
Copes fine with returning the correct set of records.
How would we cope however, if a user mispells dinosaurs? IE:
Dinosores
(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.
Any way programatically?
Edit
Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:
Dinosores wrocks
returns records instead of doing:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
which would return squadoosh?
If you're using SQL Server, have a look at SOUNDEX.
For your example:
select SOUNDEX('Dinosaurs'), SOUNDEX('Dinosores')
Returns identical values (D526) .
You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).
SELECT DIFFERENCE('Dinosaurs', 'Dinosores'); --returns 4
Edit:
After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by #Neil Knight (+1 to that, for me!).
This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.
Perhaps your RDBMS has a SOUNDEX function? You didn't mention which one was involved here.
SQL Server's SOUNDEX
Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.
SSIS Fuzzy Lookup
I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.
Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help:
http://msdn.microsoft.com/en-us/library/ms142491.aspx
Here is another SO question from someone setting up a thesaurus to handle common misspellings:
FORMSOF Thesaurus in SQL Server
Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.
Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.
You can also try the SubString(), to eliminate the first 3 or so characters . Below is an example of how that can be achieved
SELECT Fname, Lname
FROM Table1 ,Table2
WHERE substr(Table1.Fname, 1,3) || substr(Table1.Lname,1 ,3) = substr(Table2.Fname, 1,3) || substr(Table2.Lname, 1 , 3))
ORDER BY Table1.Fname;

Good SQL search tool?

FreeTextTable is really great for searching, as it actually returns a relevancy score for each item it finds.
The problem is, it doesn't support the logical operator AND, so if I have 10 items with the word 'ice' in it, but not 'cream', and vice versa, then 20 results will be returned, when in this scenario 0 should've been returned.
Are there any alternative tools to search a SQL Server database? Or should I just write my own code to provide 'AND' functionality (I.E. doing two seperate searches in the scenario 'Ice'Cream' (splitting each search by spaces))
You can try SQL Search from RedGate.
It is a free tool (though not open source) - I have used it before and it is very powerful.
There is also a free SQL Search tool from ApexSQL you can try. It integrates into SSMS and can also show relationship diagrams and help with safely removing/renaming objects in your database. They do require you to leave email but the product itself is completely free. ApexSQL Search
Since you have full text search enabled to use FREETEXTTABLE perhaps you could make use of CONTAINS instead? (I have to be honest, I've not used full text search myself).
It would appear you can query like this:
SELECT Name, Price FROM Product
WHERE CONTAINS(Name, 'ice')
AND CONTAINS(Name, 'cream')

Searching with words one character long (MySQL)

I have a table Books in my MySQL database which has the columns Title (varchar(255)) and Edition (varchar(20)). Example values for these are "Introduction to Microeconomics" and "4".
I want to let users search for Books based on Title and Edition. So, for example they could enter "Microeconomics 4" and it would get the proper result. My question is how I should set this up on the database side.
I've been told that FULLTEXT search is generally a good way to do things like this. However, because the edition is sometimes just a single character ("4"), full text search would have to be setup to look at individual characters (ft_min_word_len = 1).. This, I've heard, is very inefficient.
So, how should I setup searches of this database?
UPDATE: I'm aware the CONCAT/LIKE could be used here.. My question is whether it would be too slow. My Books database has hundreds of thousands of books and a lot of users are going to be searching it..
here are the steps for solution
1) read the search string from user.
2) make the string in to parts according to space(" ") between the words.
3) use following query for getting the result
SELECT * FROM books WHERE Title LIKE '%part[0]%' AND Edition LIKE '%part[1]%';
here part[0] and part[1] are separated words from the given word
the PHP code for the above could be
<?php
$string_array=explode(" ",$string); //$string is the value we are searching
$select_query="SELECT * FROM books WHERE Title LIKE '%".$string_array[0]."%' AND Edition LIKE '%".$string_array[1]."%';";
$result=mysql_fetch_array(mysql_query($select_query));
?>
for $string_array[0] it could be extended to get all the parts except last one which can be applied for the case "Introduction to Microeconomics 4"
For your application, where you're interested in just title and edition, I suspect that using a FULLTEXT index with MATCH/AGAINST and reducing the ft_min_word_len to 1 would not have that much impact performance-wise (if you were data was more verbose or user written content, then I might hesitate).
The easiest way to check is to change the value, REPAIR the table to account for the new ft_min_word_len and rebuild the index, and do some simple benchmarking.
Having said that, for your application, I might consider looking into Sphinx. It's definitely going to be magnitudes faster, and your content is relatively static, so a delay between re-indexing (Sphinx's main drawback IMO) isn't an issue. Plus, with careful usage of the wordforms and exceptions, you could map things like 4/four/fourth/IV all to the same token for improved searching.

First Name Variations in a Database

I am trying to determine what the best way is to find variations of a first name in a database. For example, I search for Bill Smith. I would like it return "Bill Smith", obviously, but I would also like it to return "William Smith", or "Billy Smith", or even "Willy Smith". My initial thought was to build a first name hierarchy, but I do not know where I could obtain such data, if it even exists.
Since users can search the directory, I thought this would be a key feature. For example, people I went to school with called me Joe, but I always go by Joseph now. So, I was looking at doing a phonetic search on the last name, either with NYSIIS or Double Metaphone and then searching on the first name using this name heirarchy. Is there a better way to do this - maybe some sort of graded relevance using a full text search on the full name instead of a two part search on the first and last name? Part of me thinks that if I stored a name as a single value instead of multiple values, it might facilitate more search options at the expense of being able to address a user by the first name.
As far as platform, I am using SQL Server 2005 - however, I don't have a problem shifting some of the matching into the code; for example, pre-seeding the phonetic keys for a user, since they wouldn't change.
Any thoughts or guidance would be appreciated. Countless searches have pretty much turned up empty. Thanks!
Edit: It seems that there are two very distinct camps on the functionality and I am definitely sitting in the middle right now. I could see the argument of a full-text search - most likely done with a lack of data normalization, and a multi-part approach that uses different criteria for different parts of the name.
The problem ultimately comes down to user intent. The Bill / William example is a good one, because it shows the mutation of a first name based upon the formality of the usage. I think that building a name hierarchy is the more accurate (and extensible) solution, but is going to be far more complex. The fuzzy search approach is easier to implement at the expense of accuracy. Is this a fair comparison?
Resolution: Upon doing some tests, I have determined to go with an approach where the initial registration will take a full name and I will split it out into multiple fields (forename, surname, middle, suffix, etc.). Since I am sure that it won't be perfect, I will allow the user to edit the "parts", including adding a maiden or alternate name. As far as searching goes, with either solution I am going to need to maintain what variations exists, either in a database table, or as a thesaurus. Neither have an advantage over the other in this case. I think it is going to come down to performance, and I will have to actually run some benchmarks to determine which is best. Thank you, everyone, for your input!
In my opinion you should either do a feature right and make it complete, or you should leave it off to avoid building a half-assed intelligence into a computer program that still gets it wrong most of the time ("Looks like you're writing a letter", anyone?).
In case of human names, a computer will get it wrong most of the time, doing it right and complete is impossible, IMHO. Maybe you can hack something that does the most common English names. But actually, the intelligence to look for both "Bill" and "William" is built into almost any English speaking person - I would leave it to them to connect the dots.
The term you are looking for is Hypocorism:
http://en.wikipedia.org/wiki/Hypocorism
And Wikipedia lists many of them. You could bang out some Python or Perl to scrape that page and put it in a db.
I would go with a structure like this:
create table given_names (
id int primary key,
name text not null unique
);
create table hypocorisms (
id int references given_names(id),
name text not null,
primary key (id, name)
);
insert into given_names values (1, 'William');
insert into hypocorisms values (1, 'Bill');
insert into hypocorisms values (1, 'Billy');
Then you could write a function/sproc to normalize a name:
normalize_given_name('Bill'); --returns William
One issue you will face is that different names can have the same hypocorism (Albert -> Al, Alan -> Al)
I think your basic approach is solid. I don't think fulltext is going to help you. For seeding, behindthename.com seems to have large amount of the data you want.
Are you using SQl Server 2005 Express with Advanced Services as to me it sounds you would benefit from the Full Text indexing and more specifically Contains and Containstable which you can use with specific instructions here is a link for the uses of Containstable:
http://msdn.microsoft.com/en-us/library/ms189760.aspx
and here is the download link for SQL Server 2005 With Advanced Services:
http://www.microsoft.com/downloads/details.aspx?familyid=4C6BA9FD-319A-4887-BC75-3B02B5E48A40&displaylang=en
Hope this helps,
Andrew
You can use the SQL Server Full Text Search and do an inflectional search.
Basically like:
SELECT ProductId, ProductName
FROM ProductModel
WHERE CONTAINS(CatalogDescription, ' FORMSOF(THESAURUS, metal) ')
Check out:
http://en.wikipedia.org/wiki/SQL_Server_Full_Text_Search#Inflectional_Searches
http://msdn.microsoft.com/en-us/library/ms345119.aspx
http://www.mssqltips.com/tip.asp?tip=1491
Not sure what your application is, but if your users know at the time of sign up that people from their past might be searching the database for them, you could offer them the chance in the user profile to define other names they might be known as (including last names, women change these all the time and makes finding them much harder!) and that they want people to be able to search on. Store these in a separate related table. Then search on that. Just make the structure such that you can define one name as the main name (the one you use for everything except the search.)
You'll find that you're dabbling in an area known as "Natural Language Processing" and you'll need to do several things, most of which can be found under the topic of stemming.
Simplistic stemming simply breaks the word apart, but more advanced algorithms associate words that mean the same thing - for instance Google might use stemming to convert "cat" and "kitten" to "feline" and search for all three, weighing the actual word provided by the user as slightly heavier so exact matches return before stemmed matches.
It's a known problem, and there are open source stemmers available.
-Adam
No, Full Text searches will not help to solve your problem.
I think you might want to take a look at some of the following links: (Funny, no one mentioned SoundEx till now)
SoundEx - MSDN
SoundEx - Google results
InformIT - Tolerant Search algorithms
Basically SoundEx allows you to evaluate the level of similarity in similar sounding words. The function is also available on SQL 2005.
As a side issue, instead of returning similar results, it might prove more intuitive to the user to use a AJAX based script to deliver similar sounding names before the user initiates his/her search. That way you can show the user "similar names" or "did you mean..." kind of data.
Here's an idea for automatically finding "name synonyms" like Bill/William. That problem has been studied in the broader context of synonyms in general: inducing them from statistics of which words commonly appear in the same contexts in a large text corpus like the Web. You could try combining that approach with a list of names like Moby Names; I don't know if it's been done before.
Here are some pointers.

Best way to implement a stored procedure with full text search

I would like to run a search with MSSQL Full text engine where given the following user input:
"Hollywood square"
I want the results to have both Hollywood and square[s] in them.
I can create a method on the web server (C#, ASP.NET) to dynamically produce a sql statement like this:
SELECT TITLE
FROM MOVIES
WHERE CONTAINS(TITLE,'"hollywood*"')
AND CONTAINS(TITLE, '"square*"')
Easy enough. HOWEVER, I would like this in a stored procedure for added speed benefit and security for adding parameters.
Can I have my cake and eat it too?
I agreed with above, look into AND clauses
SELECT TITLE
FROM MOVIES
WHERE CONTAINS(TITLE,'"hollywood*" AND "square*"')
However you shouldn't have to split the input sentences, you can use variable
SELECT TITLE
FROM MOVIES
WHERE CONTAINS(TITLE,#parameter)
by the way
search for the exact term (contains)
search for any term in the phrase (freetext)
The last time I had to do this (with MSSQL Server 2005) I ended up moving the whole search functionality over to Lucene (the Java version, though Lucene.Net now exists I believe). I had high hopes of the full text search but this specific problem annoyed me so much I gave up.
Have you tried using the AND logical operator in your string? I pass in a raw string to my sproc and stuff 'AND' between the words.
http://msdn.microsoft.com/en-us/library/ms187787.aspx