How can I implement hierarchical tags (tags belonging to other tags) with acts_as_taggable_on? - ruby-on-rails-3

On our website for a cancer-related organization, we have a flat tag structure with tags like "Leukemia" but also "Chronic Myelogenous Leukemia" and "Acute Lymphoblastic Leukemia". We have a rule that anything tagged with a specific kind of leukemia should also be tagged with the main "leukemia" tag, but there is no programmatic link between them.
It'd be nice if there were such a link--some relation between tags describing one as being a parent/child of another--so that on, say, the "Leukemia" page we could have some links to the sub-topics: AML, CML, etc.
It look like the developers don't plan on supporting this (according to a Jan 2011 github issue), but it seems like a common enough use case that maybe someone's found a workaround, perhaps by modifying the Tag model to make tags themselves taggable (insert Xzibit photo).

Related

2sxc Knowledge Management solution hurdles

I'm evaluating 2sxc as a possible platform for implementing a knowledge management solution but we're in a bit of a rush. Our alternative is DNN Live Articles.
So far I really like the look of 2sxc, but I have questions regarding our possible use of it.
The main questions I have are around hierarchical lists like nested Categories and permissions.
From the look of some of the apps I've installed like FAQs with Categories but I can't find anything yet where they are nested. I tried creating a Content Type and adding fields where the first is the Category Name and the second is Parent Category. I created a new Content Type Field with a Data Type of Entity, but the only option for Input Type is default and Content Block Items. It works but when you create a new category the content that comes up in the Parent Category field covers just about everything - not sure I understand the concept behind this.
Then the second issue is permissions. Does this system somehow incorporate permissions because we'd like to lock down knowledge articles by category, but I haven't seen any implementations that showcase how one would do this.
Regarding #1 I don't understand your question, sorry :)
Regarding #2: there is no rule-based security, so you can't say "items with category X may be edited, but category Y may not"
BUT: you can easily implement this in your UI, if your main concern is user guidance and not "bad people with very good IT skills"

Tree structure of data in REST - URL always from root?

Problem
When the data have a tree structure of parent/child/grandchild entities, we often duplicate the information in the URL, specifying parent IDs, even if that's not necessary. What's the best way to design the RESTful API in such case? Can the URLs be shortened and the parent IDs omitted?
Example
The tree is as follows: The top-most entity is a product. Each product has 0-N reviews. Each review can have 0-M comments attached. In theory, there can be an arbitrary depth of this tree.
The naive RESTful API would look like this (assuming only GET endpoints):
/products ... list of products
/products/123 ... specific product 123
/products/123/reviews ... list of reviews for product '123'
/products/123/reviews/abc ... specific review 'abc'
/products/123/reviews/abc/comments ... list comments for review 'abc'
Hang on, wait a minute... The last two labels I have written do not say anything about product '123'. Yes, the review 'abc' belongs to that product, but as a human, I don't need to know that. And if the review ID 'abc' is unique among all reviews, neither does the computer.
So for example when we send an update (PATCH) request for review 'abc', we don't need to know whole hierarchy of parent objects up to the tree root (products), e.g that it belongs to product '123' in this case. Of course, we assume each object has an unique ID among all objects of that entity - but that's a natural behavior for example in RDBs, so many people (well, their APIs) are in this situation.
Questions
If the IDs of "child entities" are unique among all entities of that type, would it be best practice to design the API like this?
/reviews/abc ... specific review 'abc'
/reviews/abc/comments ... list comments for review 'abc'
/comments/xyz ... specific comment 'xyz'
If answer to (1) is yes, should an endpoint like this be valid as well? Why? Why not?
/products/123/reviews/abc/comments/xyz ... specific comment 'xyz'
If short URLs are allowed (or even preferred), isn't this a bit inconsistent then?
/products/123/reviews ... list reviews for product '123'
/reviews/abc ... specific review 'abc'
/reviews ... what should be here? all reviews?
Yes.
Depends - I wouldn't recommend it, but if you find a use case for it, why not?
I see no inconsistency - yes, in this situation /reviews should be a list of all reviews in system, but if that makes no sense for your application, then /reviews can just yield a 404 and everything's fine.
Ideally, design of URLs should be decoupled from the rest of the REST API. That means, as far as your URLs are uniquely identifying your resources, they're (from purely theoretical point of view) "well designed".
But API is an interface and it should be treated as such. API is consumed by machines, but those machines are written by people, so in fact, design matters. It's the same reason why to have nice URLs on your blog - there is no technical reason for it, but it improves the experience of users if they want to read, share, remember or understand your URLs (you may say that Google searches for keywords in URLs and so it is a technical reason, but no, it's not - Google's bot is just one of your users - website consumers - and optimization for the bot is just like any other optimization for your users, thus it's interface design).
In case design of your URLs matters (for any reason), then in my opinion the best approach is to keep them simple. As simple, as you can. Your observation is very right - you don't need to mimic hierarchy of your resources or the way you store data in database. Eventually it would only get in your way and in a way of people who want to consume your API.
If a resource is uniquely identified within a collection by an ID, then design your URLs just /collection/{id}. Look how Facebook does it - majority of its API does exactly this. Structure of their URLs is pretty flat.
There doesn't even need to be a /collection resource for listing all existing objects. You can have them linked only from places, where it makes sense, like /products/123/reviews, where you can list links pointing to /reviews/{id}.
Why I think complicated URLs are bad?
Relations between resources are graphs and you can't put graphs to URLs
Putting other IDs and hierarchies into URLs makes things more complicated for no reason. Usually, hierarchies are not so simple in APIs - relations between resources are more often very complicated graphs, not simple trees. So don't put linking between resources into your URLs - there are better places (hypermedia formats, link headers, or at least linking by ID references) where to put information about relations and those are not limited to one string like URLs, so with them you can define relations better.
You're torturing your consumer by requesting too much parameters
By requiring more information in URL from consumer, you force him to remember all this context and all those IDs or know those values in advance. You require more (unnecessary) input, but in reality, there is no reason for consumer to remember product's ID just to check out one of its reviews.
Evolvability
In case your URLs are not decoupled well, you should really think of what happends if structure of your data changes in time. With simple URLs, nothing really happens. With complicated URLs, every time you change the way your API resources are related, you'll need to change also URLs so they keep up with your structure. And as everyone knows, changing URLs is hard - whether we are talking about web or APIs. Hypermedia somehow solves this, but even without hypermedia you can do at least so little that you keep your URLs light and as change-prone, as it gets.
Your design could look like this
/products/{id} - specific product, links to an endpoint with list of its reviews
/products/{id}/reviews - lists links to endpoints of reviews of the product
/reviews/{id} - specific review, should link to reviewed product and it could even link to the list above, if it seems to be useful for an API consumer
In fact, any of those resources can also link to any other thing in the system, if its useful or if there is a logical connection. Some linking systems (such as hypermedia) make understanding those links easier, because you can specify a rel attribute, which says to consumer where the link is pointing to (self points to itself, next could point to another page, etc.).
Of course, as always, it depends on your specific case. But generally, I'd recommend to keep URLs decoupled and simple. Also, I wouldn't recommend to to try to mirror any complicated relations or hierarchies in URLs.
As long as the URL can uniquely identify the resource, it is correct.
So the approaches in both Q-1) and Q-2) are fine to use and can be mixed. It is like provide different entry points to the same resource.
The answer to the question comes back to your business use-case. If there is no need for more than one entry points, should just stick with one and it will simplify the code.
To Q-3, ‘/reviews’ will mean all reviews. Also you don’t need to support that if there is no business use-case to get all reviews in your system.
Hope this help.

Hiding or Promoting specific content within a page to search engines

A bit of an SEO question here.
I've got a site with a ton of pages, of content. I know lots of the content is the same on each page.
I thought that Search Engines keyed off of the differences in page content so that they could promote the correct data, but when I look at the summary in google and bing, the summary shows my 'feedback' block (which is where I just ask for feedback).
Yahoo (and the summary in Facebook) shows my search options menu.
These aren't really things that are going to make a person want to click on the page.
So I'm wondering what the best way is to either hide this content from search engines, or improve the visibility of the other content that should get indexed.
The page structure is pretty consistent, so I thought it would have been easy for the search robots to pick this stuff out, but apparently not.
You may want to try using a meta tag like this.
< META NAME="description" CONTENT="Here is a short summary of the page" >
Search engines also prefer title and header tags over regular text.
Meta is the best way to do that.
However,Beware that your structure of page is a also important, which means search engines prefer to use metal tag, but they also weigh the structures, keywords, headers things like that.
I encountered such trouble couple of months ago. I found Google showed price and download rather than meta description. I solved that by reorganize meta description(more accurate and shorter,177 characters)eliminate tags from price and download tags. And made some slight adjustments to the structure. Now the Google summary is what I want.
Hope this helps you!

Tool or methods for automatically creating contextual links within a large corpus of content?

Here's the basic scenario - I have a corpus of say 100,000 newspaper-like articles. Minimally they will all have a well-defined title, and some amount of body content.
What I want to do is find runs of text in articles that ought to link to other articles.
So, if article Foo has a run of text like "Students in 8th grade are being encouraged to read works by John-Paul Sartre" and article Bar is titled (and about) "The important works of John-Paul Sartre", I'd like to automagically create that HTML link from Foo to Bar within the text of Foo.
You should ask yourself something before adding the links. What benefit for users do you want to achieve by doing this? You probably want to increase the navigability of your site. Maybe it is better to create an easier way to add links to older articles in form used to submit new ones. Maybe it is possible to add a "one click search for selected text" feature. Maybe you can add a wiki-like functionality that lets users propose link for selected text. You probably want to add links to related articles (generated through tagging system or text mining) below the articles.
Some potential problems with fully automated link adder:
You may need to implement a good word sense disambiguation algorithm to avoid confusing or even irritating the user by placing bad automatic links with regex (or simple substring matching).
As the number of articles is large you do not want to generate the html for extra links on every request, cache it instead.
You need to make a decision on duplicate titles or titles that contain other title as substring (either take longest title or link to most recent article or prefer article from same category).
TLDR version: find alternative solutions that provide desired functionality to the users.
What you are looking for are text mining tools. You can find more info and links at http://en.wikipedia.org/wiki/Text_mining. You might also want to check out Lucene and its ports at http://lucene.apache.org. Using these tools, the basic idea would be to find a set of similar articles based on the article (or title) in question. You could search various properties of the article including titles and content or both. A tagging system a la Delicious (or Stackoverflow) might also be helpful. Rather than pre-creating the links between articles, you'd present the relevant articles in an interface much like the Related questions interface on the right-hand side of this page.
If you wanted to find and link specific text in each article, I think you'd need to do some preprocessing to select pertinent phrases to key on. Even then I think it would be very hard not to miss things due to punctuation/misspellings or to not include irrelevant links for the same reasons.

SEO for Ultraseek 5.7

We've got Ultraseek 5.7 indexing the content on our corporate intranet site, and we'd like to make sure our web pages are being optimized for it.
Which SEO techniques are useful for Ultraseek, and where can I find documentation about these features?
Features I've considered implementing:
Make the title and first H1 contain the most valuable information about the page
Implement a sitemap.xml file
Ping the Ultraseek xpa interface when new content is added
Use "SEO-Friendly" URL strings
Add Meta keywords to the HTML pages.
The most important bit of advice anyone can get when optimizing a website for search engines and indeed for tools like Ultraseek is this...
Write your web pages for your human audience first and foremost. Don't do anything odd to try and optimize your website for a search engine. Don't stuff keywords into your URL if it makes the URL less sensible. Think human first.
Having said this, the following techniques usually make things better for both the humans and the machines!
Use headings (h1 through h6) to give your page a structure. Imagine them being arranged in a tree view, with a h1 containing some h2 tags and h2 tags containing h3 tags and so on. I usually use the h1 tag (there should be only one h1 tag) for the site name and the h2 tag for the page name, with h3 tags as sub-headings where appropriate.
Sitemaps are very useful as they contain a list of your pages, consider this a request of pages you would like included in any index. They don't normally contain much context though.
Friendly URL strings are great for humans. I'd much rather visit www.website.com/Category/Music/ than www.website.com?x=3489 - it does also mean that you give the machines some more context for your page. It especially helps if the URL matches your h1 and h2 tags. Like this:
www.website.com/Category/Music/
Website
Category: Music
Welcome to the music category!
Meta keywords (and description) are useful - but as per the above advice, you need to make sure that it all matches up. Use a small but targeted set of keywords that highlight what is specifically different about the page and make sure your description is a good summary of the page content. Imagine that it is used beneath the title in a list of search results (even though it might not be!)
Navigation! Providing clear navigation, as well as back links (such as bread-crumbs) will always help. If someone clicks on a search result, it might not be the exact page they are after, but it may well be very close. By highlighting where people have landed in your navigation and by providing a bread-crumb that tells them where they are, they will be able to traverse your pages easily even if the search hasn't taken them to the perfect location.