How to 'remember' removal of entities (IN UI) - wit.ai

In the wit.ai UI when entering a story the interface constantly identifies and tags a number of entities that I don't want it to.
Screenshot showing UI
In the image above you can see two "package_type" entities being registered against the "How do I buy" sentence. I cant stop this from happening. Its wrong.
If I remove them manually, it adds them back in anyway.
I've reached out to their support, with no answer and also tweeted them to see if I can get a response.
Any help would be greatly appreciated.

You have to define correctly the package_type entity in Understanding tab. What search strategy to choose. If you know that package_type will always be one of the standard words from some sort of a list = set it only to be keywords and add them in the list (all in Understanding tab). That would be very strict and correct recognition of the entity all the time!
If you set package_type to be free-text and keywords - then that entity will be recognized by the existing keyword OR also you trust wit to "guess" it sometimes - things might go fuzzy here.
And if you use "trait" ... just don't:) don't use it for the type of entities which are more or less defined topics. This is used more for "detecting" the notion of the phrase. Positive \ Negative \ Happy \ Sad... It works - but it's a little fuzzy too:)
So just switch package_type to keywords, add some package types to the keywords list and see how the recognition works.
Made this: https://wit.ai/yuraantonov/My%20First%20App/entities

Related

How to use Alignment API to generate a Alignment Format file?

I am going to attend the Instance Matching of OAEI, now I need to make my results to Alignment Format. In order to achieve it, I have learned official tutorials.(link:http://alignapi.gforge.inria.fr/tutorial/tutorial1/index.html).
But there are many differences between the method taught and the method I want. In other words, I can't understand the API.
This is my situation:
I have 2 rdf file(person11.rdf and person12.rdf respectively.data link is http://oaei.ontologymatching.org/2010/im/index.html, the PR dataset), each file has information of many person. I want to find the coreferent entities, the results must be printed in Alignment Format. I find the results by using SPARQL, but I don't know how to print it in Alignment Format.
So, I have three questions:
First, if I want to generate a Alignment Format file, is the method taught the only way?
Second, can you give me your method(code better) to generate the Alignment Format file? Maybe I am wrong from the beginning, can you give me some suggestions?
Third, if you attended OAEI or know something about Instance Matching, can you give me some advice? I want to find the coreferent entities.
Thank you!
First question: I guess that the "mentioned method" is the one in tutorial1. It is not the appropriate one since you have to write a program to output the alignment format and this is a command line interface tutorial. In this case, you'd better look at http://alignapi.gforge.inria.fr/tutorial/tutorial2/index.html
Then, there are basically two ways to do:
The advised one (for several reasons and for participating to OAEI) is to follow these tutorials, to create an empty alignment in it, to create the correspondences from the results of your SPARQL query and to render it. Everything is covered by the tutorials but the part concerning your SPARQL queries. This assumes that you are programming in Java.
The non-advised solution (primarily non advised because you will have to debug your own renderer), is to write, in any programming language that you want a program that output the format (which corresponds to what you cite).
Think about it: how would you expect that the Alignment API knows the results of your SPARQL query? If you come up with a nice solution, contact the API developers, they may integrate it and others could benefit.
Second question: I cannot do better than what is above.
Third question: too general. Read the OAEI results (http://oaei.ontologymatching.org) and look at the code of others.
Good luck!

Query String vs Resource Path for Filtering Criteria

Background
I have 2 resources: courses and professors.
A course has the following attributes:
id
topic
semester_id
year
section
professor_id
A professor has the the following attributes:
id
faculty
super_user
first_name
last_name
So, you can say that a course has one professor and a professor may have many courses.
If I want to get all courses or all professors I can: GET /api/courses or GET /api/professors respectively.
Quandary
My quandary comes when I want to get all courses that a certain professor teaches.
I could use either of the following:
GET /api/professors/:prof_id/courses
GET /api/courses?professor_id=:prof_id
I'm not sure which to use though.
Current solution
Currently, I'm using an augmented form of the latter. My reasoning is that it is more scale-able if I want to add in filtering/sorting criteria.
I'm actually encoding/embedding JSON strings into the query parameters. So, a (decoded) example might be:
GET /api/courses?where={professor_id: "teacher45", year: 2016}&order={attr: "topic", sort: "asc"}
The request above would retrieve all courses that were (or are currently being) taught by the professor with the provided professor_id in the year 2016, sorted according to topic title in ascending ASCII order.
I've never seen anyone do it this way though, so I wonder if I'm doing something stupid.
Closing Questions
Is there a standard practice for using the query string vs the resource path for filtering criteria? What have some larger API's done in the past? Is it acceptable, or encouraged to use use both paradigms at the same time (make both endpoints available)? If I should indeed be using the second paradigm, is there a better organization method I could use besides encoding JSON? Has anyone seen another public API using JSON in their query strings?
Edited to be less opinion based. (See comments)
As already explained in a previous comment, REST doesn't care much about the actual form of the link that identifies a unique resource unless either the RESTful constraints or the hypertext transfer protocol (HTTP) itself is violated.
Regarding the use of query or path (or even matrix) parameters is completely up to you. There is no fixed rule when to use what but just individual preferences.
I like to use query parameters especially when the value is optional and not required as plenty of frameworks like JAX-RS i.e. allow to define default values therefore. Query parameters are often said to avoid caching of responses which however is more an urban legend then the truth, though certain implementations might still omit responses from being cached for an URI containing query strings.
If the parameter defines something like a specific flavor property (i.e. car color) I prefer to put them into a matrix parameter. They can also appear within the middle of the URI i.e. /api/professors;hair=grey/courses could return all cources which are held by professors whose hair color is grey.
Path parameters are compulsory arguments that the application requires to fulfill the request in my sense of understanding otherwise the respective method handler will not be invoked on the service side in first place. Usually this are some resource identifiers like table-row IDs ore UUIDs assigned to a specific entity.
In regards to depicting relationships I usually start with the 1 part of a 1:n relationship. If I face a m:n relationship, like in your case with professors - cources, I usually start with the entity that may exist without the other more easily. A professor is still a professor even though he does not hold any lectures (in a specific term). As a course wont be a course if no professor is available I'd put professors before cources, though in regards to REST cources are fine top-level resources nonetheless.
I therefore would change your query
GET /api/courses?where={professor_id: "teacher45", year: 2016}&order={attr: "topic", sort: "asc"}
to something like:
GET /api/professors/teacher45/courses;year=2016?sort=asc&onField=topic
I changed the semantics of your fields slightly as the year property is probably better suited on the courses rather then the professors resource as the professor is already reduced to a single resource via the professors id. The courses however should be limited to only include those that where held in 2016. As the sorting is rather optional and may have a default value specified, this is a perfect candidate for me to put into the query parameter section. The field to sort on is related to the sorting itself and therefore also belongs to the query parameters. I've put the year into a matrix parameter as this is a certain property of the course itself, like the color of a car or the year the car was manufactured.
But as already explained previously, this is rather opinionated and may not match with your or an other folks perspective.
I could use either of the following:
GET /api/professors/:prof_id/courses
GET /api/courses?professor_id=:prof_id
You could. Here are some things to consider:
Machines (in particular, REST clients) should be treating the URI as an opaque thing; about the closest they ever come to considering its value is during resolution.
But human beings, staring that a log of HTTP traffic, do not treat the URI opaquely -- we are actually trying to figure out the context of what is going on. Staying out of the way of the poor bastard that is trying to track down a bug is a good property for a URI Design to have.
It's also a useful property for your URI design to be guessable. A URI designed from a few simple consistent principles will be a lot easier to work with than one which is arbitrary.
There is a great overview of path segment vs query over at Programmers
https://softwareengineering.stackexchange.com/questions/270898/designing-a-rest-api-by-uri-vs-query-string/285724#285724
Of course, if you have two different URI, that both "follow the rules", then the rules aren't much help in making a choice.
Supporting multiple identifiers is a valid option. It's completely reasonable that there can be more than one way to obtain a specific representation. For instance, these resources
/questions/38470258/answers/first
/questions/38470258/answers/accepted
/questions/38470258/answers/top
could all return representations of the same "answer".
On the /other hand, choice adds complexity. It may or may not be a good idea to offer your clients more than one way to do a thing. "Don't make me think!"
On the /other/other hand, an api with a bunch of "general" principles that carry with them a bunch of arbitrary exceptions is not nearly as easy to use as one with consistent principles and some duplication (citation needed).
The notion of a "canonical" URI, which is important in SEO, has an analog in the API world. Mark Seemann has an article about self links that covers the basics.
You may also want to consider which methods a resource supports, and whether or not the design suggests those affordances. For example, POST to modify a collection is a commonly understood idiom. So if your URI looks like a collection
POST /api/professors/:prof_id/courses
Then clients are more likely to make the associate between the resource and its supported methods.
POST /api/courses?professor_id=:prof_id
There's nothing "wrong" with this, but it isn't nearly so common a convention.
GET /api/courses?where={professor_id: "teacher45", year: 2016}&order={attr: "topic", sort: "asc"}
I've never seen anyone do it this way though, so I wonder if I'm doing something stupid.
I haven't either, but syntactically it looks a little bit like GraphQL. I don't see any reason why you couldn't represent a query that way. It would make more sense to me as a single query description, rather than breaking it into multiple parts. And of course it would need to be URL encoded, etc.
But I would not want to crazy with that right unless you really need to give to your clients that sort of flexibility. There are simpler designs (see Roman's answer)

Does my API design violate RESTful principles?

I'm currently (I try to) designing a RESTful API for a social network. But I'm not sure if my current approach does still accord to the RESTful principles. I'd be glad if some brighter heads could give me some tips.
Suppose the following URI represents the name field of a user account:
people/{UserID}/profile/fields/name
But there are almost hundred possible fields. So I want the client to create its own field views or use predefined ones. Let's suppose that the following URI represents a predefined field view that includes the fields "name", "age", "gender":
utils/views/field-views/myFieldView
And because field views are kind of higher logic I don't want to mix support for field views into the "people/{UserID}/profile/fields" resource. Instead I want to do the following:
utils/views/field-views/myFieldView/{UserID}
Another example
Suppose we want to perform some quantity operations (hope that this is the right name for it in English). We have the following URIs whereas each of them points to a list of persons -- the friends of them:
GET people/exampleUID-1/relationships/friends
GET people/exampleUID-2/relationships/friends
And now we want to find out which of their friends are also friends of mine. So we do this:
GET people/myUID/relationships/intersections/{Value-1};{Value-2}
Whereas "{Value-1/2}" are the url encoded values of "people/exampleUID-1/friends" and "people/exampleUID-2/friends". And then we get back a representation of all people which are friends of all three persons.
Though Leonard Richardson & Sam Ruby state in their book "RESTful Web Services" that a RESTful design is somehow like an "extreme object oriented" approach, I think that my approach is object oriented and therefore accords to RESTful principles. Or am I wrong?
When not: Are such "object oriented" approaches generally encouraged when used with care and in order to avoid query-based REST-RPC hybrids?
Thanks for your feedback in advance,
peta
I've never worked with REST, but I'd have assumed that GETting a profile resource at '''/people/{UserId}/profile''' would yield a document, in XML or JSON or something, that includes all the fields. Client-side I'd then ignore the fields I'm not interested in. Isn't that much nicer than having to (a) configure a personalised view on the server or (b) make lots of requests to fetch each field?
Hi peta,
I'm still reading through RESTful Web Services myself, but I'd suggest a slightly different approach than the proposed one.
Regarding the first part of your post:
utils/views/field-views/myFieldView/{UserID}
I don't think that this is RESTful, as utils is not a resource. Defining custom views is OK, however these views should be (imho) a natural part of your API's URI scheme. To incorporate the above into your first URI example, I would propose one of the following examples instead of creating a special view for it:
people/{UserID}/profile/fields/name,age,gender/
people/{UserID}/profile/?fields=name,age,gender
The latter example considers fields as an input value for your algorithm. This might be a better approach than having fields in the URI as it is not a resource itself - it just puts constraints on the existing view of people/{UserID}/profile/. Technically, it's very similar as pagination, where you would limit a view by default and allow clients to browse through resources by using ?page=1, ?page=2 and so on.
Regarding the second part of your post:
This is a more difficult one to crack.
First:
Having intersection in the URI breaks your URI scheme a bit. It's not a resource by itself and also it sits on the same level as friends, whereas it would be more suitable one level below or as an input value for your algorithm, i.e.
GET people/{UserID}/relationships/friends/intersections/{Value-1};{Value-2}
GET people/{UserID}/relationships/friends/?intersections={Value-1};{Value-2}
I'm again personally inclined to the latter, because similarly as in the first case, you are just constraining the existing view of people/{UserID}/relationships/friends/
Secondly, regarding:
Whereas "{Value-1/2}" are the url
encoded values of
"people/exampleUID-1/friends" and
"people/exampleUID-2/friends"
If you meant that {Value-1/2} contain the whole encoded response of the mentioned GET requests, then I would avoid that - I don't think that the RESTful way. Since friends is a resource by itself, you may want to expose it and access it directly, i.e.:
GET friends/{UserID-1};{UserID-2};{UserID-3}
One important thing to note here - I've used ; between user IDs in the previous example, whereas I used , in the fields example above. The reasoning is that both represent a different operator. In the first case we needed OR (,) in order to get all three fields, while in the last example above we had to use AND (;) in order to get an intersection.
Usage of two types of operators can over-complicate the API design, but it should provide more flexibility in the end.
thanks for your clarifying answers. They are exactly what I was asking for. Unfortunately I hadn't the time to read "RESTful Web Services" from cover to cover; but I will catch it up as soon as possible. :-)
Regarding the first part of my post:
You're right. I incline to your first example, and without fields. I think that the I don't need it at all. (At the moment) Why do you suggest the use of OR (,) instead of AND (;)? Intuitively I'd use the AND operator because I want all three of them and not just the first one existing. (Like on page 121 the colorpairs example)
Regarding the second part:
With {Value-1/2} I meant only the url-encoded value of the URIs -- not their response data. :) Here I incline with you second example. Here it should be obvious that under the hood an algorithm is involed when calculating intersecting friends. And beside that I'm probably going to add some further operations to it.
peta

Business Applications: What are the fundamental features of a search form?

In a typical business application it is quite common to have forms that are used for searching.
Some basic features are:
A pane that contains the search criteria
A grid to display the results
Sorting on the grid
A detail page that opens when an item is selected in the results grid
What other features would you expect in a business application's search functionality?
Maybe it's a bit trite but there is some sense in this picture:
removed dead ImageShack link
Do it as it shown at the second example, not as at the 3rd one.
There is a well known extreme programming principle - YAGNI. I think it's absolutely appliabe to almost any problem. You always can add something new if it's necessary, but it's much more difficult to remove something what is already exist because someone already uses it even if it's wrong.
How about the ability to save search criteria, in order to easily re-run a search later. Or, the ability to easily, cleanly, print the list of results.
If search refining is allowed (given a search result, limited future searches to the current results), you may also want to add a breadcrumb system, so that the user can see the sequence of refinements that lead you to the current result-set -- and by clicking on a breadcrumb, return to a previous refinement stage.
Faceted search:
(source: msdn.com)
This is displayed in the area in the right ellipse. There are filters and the engine shows the number of results that will remain after aplying the filter. This is very useful and can be done without pain in some search engines, such as Apache Solr. Of course, implement this only if filters make sense in your task.
Aggregate summary info, like total(s), count(s) or percentages.
One or more menus, like right click context for the grid, a ribbon or menu on top.
Your list for the UI elements is kinda good. Export, print (asking them whether it is really necessary to print this?), category/tag and language selection is worth to consider. Smart and working pagination (don't forget ordering).
Please do not force a search to open in a new (or even worse, always in the same window). Links of search results should be copy-pastable (always use GET),
But it really matters to have a functional (i.e. a really good) algorithm. Mostly I google company websites, because their search engine is, cough, awwwwkward. Looking for a feature chart, technical spec, pricing etc. one is not interested in press releases and vica-versa.
Search engine providers offer integration into company websites.
Use Auto-complete wherever possible on your text input fields.
If using selects or combo boxes with related information try and use chain selects to organise the information.
Where results depend on location try and serve relevant results.
Also remember to keep the search form as simple as possible even down to one text field. To refine the search you can have an alternate form as an "Advanced Search interface".
Printing, export.
A grid to display the results
Watch out not to display results a user is not authorized to see (roles / permissions / access rights).
A detail page that opens when an item is selected in the results grid
In case a user attempts to circumvent the search page links and enter some document directly, again, check out for permissions.
Validation, validation, validation.
It should be very hard, near impossible, for me to run a query that makes no sense. ie, start date occurring after an end date.
Export a numerical dataset (even if it only has one numeric column - so just make it so by default) to CSV for import into Excel (people love this function, even if only 1% of users seem to use it with any regularity. Just ask yourself when's the last time you highlighted something for copy-n-paste. Would it have been easier to open a CSV?
Refinable searches (think Google's use of site: -). People who use the search utility a lot will appreciate this. People who don't won't know it's not there.
The ability to choose to display 1 records, 5 records, 100 records, 1000 records, etc. "Paging" I believe is what we most commonly call it ;).
You mentioned sortable grids. Somebody else mentioned auto-sum or auto-count. Those are good if (once again) you have largely numeric data. But those are almost report-oriented functions.
Hope this helps.
One thing you can do is have a drop down of most common searches in plain english. e.g. "High value sales in New York in last 5 days". This is the equivalent of user selecting an amount, the city, date ranges etc. done conveniently for them.
Another thing is to have multiple search criteria tabs based on perspective of the user. Like "sales search", "reporting search", "admin search" etc.
ALso consider limiting the number of entries retrieved in the search and allow users to do more narrow searches. This depends on the business needs however.
The most commonly used search option listed first and in a prominent location.
I think your requirements are good. Take a cue from Google. Google got it right. One text box where you type whatever you want, and your engine spits out the answers. Most folks will try this, and if the answers are good enough, then that is what they will use. In the back-end, you'll probably want to flatten all of the data into a big honkin' table and then index it or use a SQL query with "LIKE" in it.
However, you will probably want to allow the user to refine the search. For this, have a link to "Advanced Search" and use a form there to specify filter criteria. This lets the user zero in on the results if basic search is not good enough. For the results on th is page, you will certainly want to have sorting on key fields, but do it after you have produced the initial result set.
It depends on the content that you are searching for.. make it relevant :) Search always look easy but can be incredibly difficult to get right.
Not mentioned yet, but very important I think - a search that actually works. This item is often neglected and makes the rest a bit moot.

Tool or methods for automatically creating contextual links within a large corpus of content?

Here's the basic scenario - I have a corpus of say 100,000 newspaper-like articles. Minimally they will all have a well-defined title, and some amount of body content.
What I want to do is find runs of text in articles that ought to link to other articles.
So, if article Foo has a run of text like "Students in 8th grade are being encouraged to read works by John-Paul Sartre" and article Bar is titled (and about) "The important works of John-Paul Sartre", I'd like to automagically create that HTML link from Foo to Bar within the text of Foo.
You should ask yourself something before adding the links. What benefit for users do you want to achieve by doing this? You probably want to increase the navigability of your site. Maybe it is better to create an easier way to add links to older articles in form used to submit new ones. Maybe it is possible to add a "one click search for selected text" feature. Maybe you can add a wiki-like functionality that lets users propose link for selected text. You probably want to add links to related articles (generated through tagging system or text mining) below the articles.
Some potential problems with fully automated link adder:
You may need to implement a good word sense disambiguation algorithm to avoid confusing or even irritating the user by placing bad automatic links with regex (or simple substring matching).
As the number of articles is large you do not want to generate the html for extra links on every request, cache it instead.
You need to make a decision on duplicate titles or titles that contain other title as substring (either take longest title or link to most recent article or prefer article from same category).
TLDR version: find alternative solutions that provide desired functionality to the users.
What you are looking for are text mining tools. You can find more info and links at http://en.wikipedia.org/wiki/Text_mining. You might also want to check out Lucene and its ports at http://lucene.apache.org. Using these tools, the basic idea would be to find a set of similar articles based on the article (or title) in question. You could search various properties of the article including titles and content or both. A tagging system a la Delicious (or Stackoverflow) might also be helpful. Rather than pre-creating the links between articles, you'd present the relevant articles in an interface much like the Related questions interface on the right-hand side of this page.
If you wanted to find and link specific text in each article, I think you'd need to do some preprocessing to select pertinent phrases to key on. Even then I think it would be very hard not to miss things due to punctuation/misspellings or to not include irrelevant links for the same reasons.