How to build a recursive structure with MongoDB - sql

I'm trying to do something usually simple with SQL (with foreign key in the same table for example) (it may be as easy with MongoDB, I just don't know yet) which is to build a recursive data structure.
For this example, I'll talk about Pages in a Website. I'd like to make a multiple level page structure. So there could be:
Home
Our Products
Product 1
Product 2
About us
Where are we?
Contact us
Let's say pages would have a title and a content.
I need to know what's the best way to do this, and also how I could build a sitemap based on that data structure (page that shows every page from every level).
I'm building a node.js app with MongoDB for this case.
EDIT: Wouldn't it work by simply referencing a parent page in each page? Pages would be like { title: 'test', content: 'hello world', parentPage: ObjectID(parent page) }
Thanks for the help!

Personally I would implement a materialised paths structure here, it is very easy to update and query using prefixed none case insensitive regexs (which means it will use an index), so an example would look like:
{_id: {}, path: 'about_us/where_are_we'}
This also, as you can see, allows for SEO friendly URLs to hit directly on this tree giving you maximum power. This is particulary helpful in help systems where you like to display a URL like:
/help/how-to-use-my-site
Since how-to-use-my-site can hit directly on the path or even futher you can house two fields and hit directly on the full text like:
{_id: {}, path: 'about_us/where_are_we', normalised_url: 'where_are_we'}
Of course as the previous answer said you have to know how you wish to access you content but materialised paths are a good start in my opinion.
You can read more on tree structures in Mongo here: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB

You will need to know how you want to access to your data.
The last time I was using a tree structure I implemented this (I took inspiration from various sources) in Ruby, it stores an _id path and the complete uri (slugified page titles), it is a pain to handle structures like this.
On the other side, you can create a collection documents (roots) and embedded documents (branches and leaves). It is more simple to handle but you will have to get the whole tree when querying, and you can query the inner documents only if you know how deep it is.
From my past experiences all the work to support a tree structure is not worth the candle (unless it is a requirement), most users will create a loose structure based more on tags than fixed categories.

Related

Formatting couchdb-lucene results with a couchdb list

Situation...
I have a simple couchapp that lists out emails that are stored in the couch database, these emails are queried with a simple view and then piped through a list to give me a pretty table that I can click on the emails to view them. That works great.
The next evolution of this app was to add some fulltext searching of the subject line of the emails with couchdb-lucene, and I think I have that nailed down as well as I can search using lucene and get valid results back. What I can't quite grasp is how do I take those results and pipe them back into my existing list function so they get formated correctly?
Here is an example of my view + list URL that gives me the HTML
http://localhost:5984/tenant103/_design/Email/_list/emaillist/by_type?startkey=["Email",2367264774866]&endkey=["Email",0]&limit=20&descending=true&include_docs=true
And here is my search URL that also gives me results
http://localhost:5984/_fti/local/tenant103/_design/Email/by_subject?q=OM-2875&include_docs=true
My thinking was I would build the URL like this
http://localhost:5984/_fti/local/tenant103/_design/Email/_list/emaillist/by_subject?q=OM-2875&include_docs=true
But that just returns
{
reason: "bad_request",
code: 400
}
This is a learning project for myself with CouchDB so I may not be getting some simple concepts here.
CouchDB-Lucene does not natively support list transformations and CouchDB can only apply list transformations to its own map/reduce views. Sorry about that!
Robert Newson.

Umbraco: A/B testing, links in structure

I'm having a problem when trying to A/B test certain nodes in my node-tree in Umbraco.
What I want to do is to copy a node in the node-tree to a specific spot and use that B-structure to see which of the structures works best, using Google analytics.
For example we have two node structures, let's call them "Private" and "Sweden".
Their structure with childnodes and properties are exactly the same. The only difference between them is the propertyvalues (content). The "Private"-URL is www.mysite.com/Private and the "Sweden"-URL is www.mysite.com/Sweden.
What I would like to do is to change every link on the B-structure, so that it points to its match at the A-structure. The problem is that since it's two different structures, it will have two different alternative links.
With other words, it should be a coinsidence that it enters the B-structure, then be moved back to the A-structure in the next click.
We manage what page it should load (either the A-node or the B-node) with scripts, so that it has a 50% chance for each node, and if it lands on the B-node, Google analytics will save data. What we can't manage is that every link on that page will be to the A-node.
I'd appreciate any help I can get.
Regards,
David
There's a couple of ways that seem likely to give you a start at least.
The /config/urlrewriting.config file allows you to set up multiple redirect rules within umbraco so a section like the following might work in sending all requests (whether (/sweden/pagename/ or /private/pagename/) back to the private structure. Not sure how GA will handle it:
rewriteUrlParameter="ExcludeFromClientQueryString" destinationUrl="http://www.mysite.com/private/$1" redirect="Domain" redirectMode="Permanent" ignoreCase="true" />
Secondly a simple httpmodule (http://support.microsoft.com/kb/307996) can process all page requests and redirect as required - you could do a gaq_push here directly or indirectly.
I'd be interested to know how you get on - it seems a good area for extension to Umbraco.
I'm not sure I have understood perfectly what you need to do, so please excuse any assumptions that may prove mistaken. Here's what I think:
Since A & B nodes should share the same html content (besides the links of course), why don't you make the link href attribute dynamic by using a bit of razor in the template or macro:
#{var isANode = CurrentPage.Parent.Name == "Sweden"; }
A similar approach would work if you are using web forms.
We finally came to the final decision to use the alternative template-solution. Since there seem to be no generic solution for my case of this problem we had to create an alternative template with specific macros to render the different information for every documenttype we're using.
Creating dynamic links for every page is a hell of a job in this stage in the project, since there are so many pages and links. Also some links are made in javascript, so there's another problem.
I copied the a-structure to another node, only for the reason to be able to change propertyvalues. There might be a problem logging and track the information with Google Analytics though, so that's the next step for us in this project. In our alternative templates we're getting the propertyvalues from the b-structure.
Still, if anyone have some better solution I'd highly appreciate it!
Regards,
David

simple wiki and reference tracking

I'm trying to wrap my head around designing a simple Wiki style app. In a traditional wiki, say wikipedia, are 'links' referenced in any kind of backend/complex way? Ie HABTM... or are links simply links?
I'm trying to decide myself what to do, a bit different but similar. I have pages written by individuals which they can attribute to themselves or credit a.. say a famous author. Should I save this attribution as merely a tag? The tag would create a reference to the famous person, which may or may not exist, but could also be created, but nothing more than a link. OR, do I dive deep and create a real data relationship (HABTM) ?
Thoughts?
A SQL-style Has And Belongs To Many mapping table is never necessary in Mongo.
If you'd like to provide, for example, a "what links here" view for a page, then I would do something like this for each page in your Wiki. I'll give an example of a page about pandas:
{
_id: "Panda",
text: "Page's contents go here",
links: ["Raccoon", "Weasel"]
}
You're using the page's title as its _id. To find titles of pages that link to "Raccoon", you can query like:
db.pages.find({"links": "Raccoon"})
Obviously, you should make an index on "links".

How to handle Url structure for parent/child data?

We are implementing a new Url structure for our existing site (Urdb.org) and I'm struggling with Url mechanics and how it relates to SEO.
In our world, we deal with the parent entity: "record", i.e. world-records, e.g. "Largest Toothpick Beard" and "attempt", e.g. "George Gaspar, Feb 2009". There is only one page, but the various attempts are on different tabs within the page.
The choices for Url are:
urdb.org/WR/toothpick-beard#GeorgeGaspar1
urdb.org/WR/toothpick-beard/GeorgeGaspar1
urdb.org/WR/toothpick-beard?attempt=GeorgeGaspar1
I had been planning on going with choice 1, but the problem is that unless I'm mistaken the page has to load first, then dynamically switch to the attempt view that the user is requesting which would be awkward.
Choice #2 seems to work from a server-side POV, but I'm strongly inclined to reduce the number of unique Urls across the site.
The only reason I list #3 is that I know in Google Webmaster tools I can instruct them to ignore certain querystring values.
Help is appreciated!
The last thing you want to do for SEO is limit the amount of content (unless its all dupe). You'll find everywhere that content is king. More spiderablke content / better chance of ranking.
Anyways, i'd say the second suggestion for full on SEO friendlyness and if you don't want to be spidered you could use suggestion 3 and exclude the parameter attempt from search engines using robots.txt

Suggestions on addressing REST resources for API

I'm a new REST convert and I'm trying to design my first RESTful (hopefully) api and here is my question about addressing resources
Some notes first:
The data described here are 3d render
jobs
A user (graphics company) has multiple projects.
A project has multiple render jobs.
A render job has multiple frames.
There is a hierarchy enforced in the data (1 render job
belongs to one project, to one user)
How's this for naming my resourses...?
https:/api.myrenderjobsite.com/
/users/graphicscompany/projects
/users/graphicscompany/projects/112233
/users/graphicscompany/projects/112233/renders/
/users/graphicscompany/projects/112233/renders/889900
/users/graphicscompany/projects/112233/renders/889900/frames/0004
OR a shortened address for renders?
/users/graphicscompany/renders/889900
/users/graphicscompany/renders/889900/frames/0004
OR should I shorten (even more) the address if possible, omitting the user when not needed...?
/projects/112233/
/renders/889900/
/renders/889900/frames/0004
THANK YOU!
Instead of thinking about your api in terms of URLs, try thinking of it more like pages and links
between those pages.
Consider the following:
Will it be reasonable to create a resource for users? Do you have 10, 20 or 50 users? Or do you have 10,000 users? If it is the latter then obviously creating a single resource that represents all users is probably not going too work to well when you do a GET on it.
Is the list of Users a reasonable root url? i.e. The entry point into your service. Should the list of projects that belong to a GraphicsCompany be a separate resource, or should it just be embedded into the Graphics Company resource? You can ask the same question of each of the 1-to-many relationships that exist. Even if you do decide to merge the list of projects into the GraphicsCompany resource, you may still want a distinct resource to exist simple for the purpose of being able to POST to it in order to create a new project for that company.
Using this approach you should be able get a good idea of most of the resources in your API and how they are connected without having to worry about what your URLs look like. In fact if you do the design right, then any client application you right will not need to know anything about the URLs that you create. The only part of the system that cares what the URL looks like is your server, so that it can dispatch the request to the right controller.
The other significant question you need to ask yourself is what media type are you going to use for these resources. How many different clients will need to access these resources? Are you writing the clients, or is someone else? Should you attempt to reuse an existing standard like XHTML and classes/microformats? Could you squeeze most of the information into Atom? Maybe Atom with some extra namespaces like GDATA does it? Or is this only going to be used internally so you can just create your own media types, like application/vnd.YourCompany.Project+xml, application/vnd.YourCompany.Render+xml, etc.
There are many things to think about when designing a REST api, don't get hung up on what your URLs look like and you should really try to avoid doing "design by URL".
Presuming that you authenticate to the service, I would use the 1st option, but remove the user, particularly if the user is the currently logged in user.
If user actually represents something else (like client), I would include it, but not if it simply designates the currently logged in user. Agree with StaxMan, though, don't worry too much about squeezing the paths, as readability is key in RESTful APIs.
Personally I would not try to squeeze path too much, that is, some amount of redundant information is helpful both to quickly see what resource is, and for future expansion.
Generally users won't be typing paths anyway, so verbosity is not all that bad.