Synonyms in QnA Maker - synonym

How can I make QnAMaker recognize synonyms in the questions asked in order to help it return the correct answer. For example, the word disbursement and distribution in my work mean the same thing. Is there a way to ensure that QnA maker will understand each.

All you need to do is include the phrases as individual questions for that particular "answer". As you can see from the included image, you can enter an initial phrase (aka question) and then enter additional phrases by clicking "Add alternative phrases". Both of
these phrases are then associated to the one provided answer.
Hope of help!

Related

is this possible to manipulate a description to same meaning but different words with data manipulation

i want to copy a data from a website which sells courses like ITIL, Prince2 and PMP and many other IT sector courses now there are 20,000 different courses's description is there.
However, i want to use selenium to scrape all the data but description is still subject to copyright.
Kindly let me know how i can manipulate all of that description to data to same meaning but different words.
Is there any API which can give me an access to build an code which will be helping these description data by using it's synonymous or which can change it's grammer to completely new sentennces but same meaning.
Kindly let me know where to start this.
Thanks,
The task you are referring to is called paraphrasing.
There is a lot of research on the field. In arXiv you fill find research papers on the topic. However, since you are asking for an API, I am assuming you don't want to implement these models by your self. Luckily, some authors have published their models online on GitHub. (Note: some are a re-implementation by someone else.)
When you use some of these implementations, note that most offer a pre-trained model. Do read which data set was used for training and try to pick the one that is the most similar to the data that you are facing. By doing so, more words in the domain of your descriptions will be available and more synonyms can be used.

Question Answering with Lucene

For a toy project, I want to implement an automated question answering system with Lucene and I'm trying to figure out a reasonable way to implement it. The basic operation is as follows:
1) The user will enter a question.
2) The system will identify the keywords in the question.
3) The keywords will be searched in a large knowledgebase and matching sentences will be shown as answers.
My knowledgebase (i.e., corpus) is not structured. It is just a large, continuous text (say, a user manual without any chapters). I mean that the only structure is that sentences and paragraphs are identified.
I plan to treat each sentence or paragraph as a separate document. To present the answer in a context, I may consider keeping one sentence/paragraph before/after the indexed one as payload. I would like to know if that makes sense. Also, I'm wondering if there are other tried and well-known approaches for that kind of systems. As an example, another approach that comes to mind is to index large chunks of the corpus as documents with the token positions, then process the vicinity of found keywords to construct my answers.
I would appreciate direct recommendations based on experience or intuition, but also tutorials or introductory materials to question-answering systems with Lucene in mind.
Thanks.
It's not an unreasonable approach to take.
One enhancement you might consider is incorporating learning feedback, so that you can continually improve the scoring of content vs search terms. To do this you would ask users to rate the answers that come back ('helpful vs unhelpful'), that way you can start to rank documents against keywords based on the historical data. You could classify potential documents as helpful/unhelpful for given keywords by using a simple Bayesian classifier.
Indexing each sentence as a document will give you some problems. You've pointed out one: you would need to store the surrounding texts a payloads. That means you'll need to store each sentence three times (before, during and after), and you'll have to manually get into the payload.
If you want to go the route of each sentence being a document, I would recommend coming up with an ID for each sentence and storing that as a separate field. Then you can display [ID-1, ID, ID+1] in each result.
The bigger question though is: how should you break up the text into documents? Identifying semantically related areas seems difficult, so doing it by sentence/paragraph might be the only way to go. A better way would be if you could find which text is the header of a section, and then put everything in that section as a document.
You might also want to use the index (if your corpus has one). The terms there could be boosted, as they are presumably more important.
Instead of luncene which does text indexing, search and retrieval, I think using something like Apache Mahout would help with this. Mahout considers text as knowledge and doing that makes the answering the question better than just text matching. Mahout is a machine learning and data mining f/w which fits this domain better. Just a very high level thought.
--Sai

Business Applications: What are the fundamental features of a search form?

In a typical business application it is quite common to have forms that are used for searching.
Some basic features are:
A pane that contains the search criteria
A grid to display the results
Sorting on the grid
A detail page that opens when an item is selected in the results grid
What other features would you expect in a business application's search functionality?
Maybe it's a bit trite but there is some sense in this picture:
removed dead ImageShack link
Do it as it shown at the second example, not as at the 3rd one.
There is a well known extreme programming principle - YAGNI. I think it's absolutely appliabe to almost any problem. You always can add something new if it's necessary, but it's much more difficult to remove something what is already exist because someone already uses it even if it's wrong.
How about the ability to save search criteria, in order to easily re-run a search later. Or, the ability to easily, cleanly, print the list of results.
If search refining is allowed (given a search result, limited future searches to the current results), you may also want to add a breadcrumb system, so that the user can see the sequence of refinements that lead you to the current result-set -- and by clicking on a breadcrumb, return to a previous refinement stage.
Faceted search:
(source: msdn.com)
This is displayed in the area in the right ellipse. There are filters and the engine shows the number of results that will remain after aplying the filter. This is very useful and can be done without pain in some search engines, such as Apache Solr. Of course, implement this only if filters make sense in your task.
Aggregate summary info, like total(s), count(s) or percentages.
One or more menus, like right click context for the grid, a ribbon or menu on top.
Your list for the UI elements is kinda good. Export, print (asking them whether it is really necessary to print this?), category/tag and language selection is worth to consider. Smart and working pagination (don't forget ordering).
Please do not force a search to open in a new (or even worse, always in the same window). Links of search results should be copy-pastable (always use GET),
But it really matters to have a functional (i.e. a really good) algorithm. Mostly I google company websites, because their search engine is, cough, awwwwkward. Looking for a feature chart, technical spec, pricing etc. one is not interested in press releases and vica-versa.
Search engine providers offer integration into company websites.
Use Auto-complete wherever possible on your text input fields.
If using selects or combo boxes with related information try and use chain selects to organise the information.
Where results depend on location try and serve relevant results.
Also remember to keep the search form as simple as possible even down to one text field. To refine the search you can have an alternate form as an "Advanced Search interface".
Printing, export.
A grid to display the results
Watch out not to display results a user is not authorized to see (roles / permissions / access rights).
A detail page that opens when an item is selected in the results grid
In case a user attempts to circumvent the search page links and enter some document directly, again, check out for permissions.
Validation, validation, validation.
It should be very hard, near impossible, for me to run a query that makes no sense. ie, start date occurring after an end date.
Export a numerical dataset (even if it only has one numeric column - so just make it so by default) to CSV for import into Excel (people love this function, even if only 1% of users seem to use it with any regularity. Just ask yourself when's the last time you highlighted something for copy-n-paste. Would it have been easier to open a CSV?
Refinable searches (think Google's use of site: -). People who use the search utility a lot will appreciate this. People who don't won't know it's not there.
The ability to choose to display 1 records, 5 records, 100 records, 1000 records, etc. "Paging" I believe is what we most commonly call it ;).
You mentioned sortable grids. Somebody else mentioned auto-sum or auto-count. Those are good if (once again) you have largely numeric data. But those are almost report-oriented functions.
Hope this helps.
One thing you can do is have a drop down of most common searches in plain english. e.g. "High value sales in New York in last 5 days". This is the equivalent of user selecting an amount, the city, date ranges etc. done conveniently for them.
Another thing is to have multiple search criteria tabs based on perspective of the user. Like "sales search", "reporting search", "admin search" etc.
ALso consider limiting the number of entries retrieved in the search and allow users to do more narrow searches. This depends on the business needs however.
The most commonly used search option listed first and in a prominent location.
I think your requirements are good. Take a cue from Google. Google got it right. One text box where you type whatever you want, and your engine spits out the answers. Most folks will try this, and if the answers are good enough, then that is what they will use. In the back-end, you'll probably want to flatten all of the data into a big honkin' table and then index it or use a SQL query with "LIKE" in it.
However, you will probably want to allow the user to refine the search. For this, have a link to "Advanced Search" and use a form there to specify filter criteria. This lets the user zero in on the results if basic search is not good enough. For the results on th is page, you will certainly want to have sorting on key fields, but do it after you have produced the initial result set.
It depends on the content that you are searching for.. make it relevant :) Search always look easy but can be incredibly difficult to get right.
Not mentioned yet, but very important I think - a search that actually works. This item is often neglected and makes the rest a bit moot.

What is the logic behind google spellcheck

When I wanted to search a word or some thing in google; If there is some spelling mistake in that word or sentence, google can get back me with correct spell or corrected sentence. Can anyone explain me how exactly this is being done. I will happy if anyone can explain in terms of programming than in terms of database and all those stuff. Thank you.
Combination of string comparison (with dictionary), stemming and popularity match word base on its large user statistic data.
EDIT: there's a wikipedia page that may helps you understand how computer spell check works.

Tool or methods for automatically creating contextual links within a large corpus of content?

Here's the basic scenario - I have a corpus of say 100,000 newspaper-like articles. Minimally they will all have a well-defined title, and some amount of body content.
What I want to do is find runs of text in articles that ought to link to other articles.
So, if article Foo has a run of text like "Students in 8th grade are being encouraged to read works by John-Paul Sartre" and article Bar is titled (and about) "The important works of John-Paul Sartre", I'd like to automagically create that HTML link from Foo to Bar within the text of Foo.
You should ask yourself something before adding the links. What benefit for users do you want to achieve by doing this? You probably want to increase the navigability of your site. Maybe it is better to create an easier way to add links to older articles in form used to submit new ones. Maybe it is possible to add a "one click search for selected text" feature. Maybe you can add a wiki-like functionality that lets users propose link for selected text. You probably want to add links to related articles (generated through tagging system or text mining) below the articles.
Some potential problems with fully automated link adder:
You may need to implement a good word sense disambiguation algorithm to avoid confusing or even irritating the user by placing bad automatic links with regex (or simple substring matching).
As the number of articles is large you do not want to generate the html for extra links on every request, cache it instead.
You need to make a decision on duplicate titles or titles that contain other title as substring (either take longest title or link to most recent article or prefer article from same category).
TLDR version: find alternative solutions that provide desired functionality to the users.
What you are looking for are text mining tools. You can find more info and links at http://en.wikipedia.org/wiki/Text_mining. You might also want to check out Lucene and its ports at http://lucene.apache.org. Using these tools, the basic idea would be to find a set of similar articles based on the article (or title) in question. You could search various properties of the article including titles and content or both. A tagging system a la Delicious (or Stackoverflow) might also be helpful. Rather than pre-creating the links between articles, you'd present the relevant articles in an interface much like the Related questions interface on the right-hand side of this page.
If you wanted to find and link specific text in each article, I think you'd need to do some preprocessing to select pertinent phrases to key on. Even then I think it would be very hard not to miss things due to punctuation/misspellings or to not include irrelevant links for the same reasons.