writing SEO-friendly pages that can be toggled public or private - seo

our application wants to be able to create static, searchable pages based on user profile information, which would be linkable to other public profiles.
I am looking at LinkedIn as an example...it seems like they actually auto-generate the page to be a static file that is indexable and searchable.
Can someone suggest how we would do this? I am thinking there would need to be a cron job that runs and writes a the path and file name.
The user may want to keep the whole page private, in which case I imagine it would need to delete it.
There's alot of sub-requirements but that's the general concept and wanted to start getting ideas and feedback.
Thanks.

You can do without the cron job if you generate the static pages in real time whenever the profile information is created/updated or whenever user changed the setting to keep info public/private. This way you are not constantly looping through all users, and do not depend on another component (your cron job) to be running.

One alternative would be to adopt an explicit RESTful information architecture so that a profile resource ("page") is addressable with a permanent URL. The resulting resource could be a static page. Or not. That would be an implementation detail invisible to the search engine crawler and any web browser accessing the resource.

umnik700's answer is fairly dead-on if you're not considering issues related to authentication or who gets to see what. Consider the difference between the profiles you see when you're logged into Facebook versus those same profiles' publicly facing, searchable counterparts. Even MySpace, with a lot less consideration for search engine privacy, has viewability that is dependent on your relationship to the other person, defaulting, for private profiles, to "This profile has been set to private by the user" or something to that extent.
If you're looking to suddenly scale out a social tool where individuals are eliciting their personal information, I would suggest umnik700's answer (dynamically generate the content, but not the URLs, for public versions of the profile) with the following corollary: you need to be able to support privacy preferences varying from extremely strict to completely open, and default to a version that at least errs on the stricter, more private version of the profile. If you're just now pushing out searchable personal content when there never was any way to find it outside the site before, it's important not to abuse information given under different pretenses.
I know this probably requires maybe more scalability and added functionality than you were hoping this project would take, but to do otherwise could be most likely taken as a violation of your user base's tacit trust. Anyway, the best strategy to do this will probably require you to lean on your database more anyway, so it might be time to rework it a bit--including adding some privacy preferences.

Related

DokuWiki: Mix content from human with content from automated process

We run DokuWiki.
We have one page for every server.
We want to mix automated content (like number of CPUs) with content created by human beings by hand and keyboard.
What is an easy and not so "dirty" way to solve this?
Include generated pages and their sections into user-maintained pages or vice versa. As a benefit you will be able to forbid user access to generated pages(namespaces) via ACL.
Use plugins like data or sqlite to include smaller pieces of information on the page.
It might be enough to have discussions available for generated pages.

Alfresco permissions depending on whether document is currently part of workflow or not

Out-of-the-box, an Alfresco user can read a document based on:
The document's permissions
The user's role
The user's groups
Whether the user owns the document or not
Maybe some other factors I forgot?
Now, I want to add a new factor: Whether the document is currently part of a workflow.
Alfresco's permissionDefinitions.xml allows me to define permissions based on authorities such as ROLE_LOCK_OWNER etc, but it does not seem to be the right place to add permission conditions.
I guess I will have to write some Java source code, but I am not sure what classes are responsible for this, and whether there is an Alfresco way to customize them?
So, I assume you want to somehow have nodes that are attached to a workflow have different access rights? You need to think about the behavior you want in all of the UIs and protocols you are exposing (e.g. share, WebDAV, CIFS, FTP, etc.).
If you want to set a permission on a node, you can do that via JavaScript as well as Java (See http://docs.alfresco.com/5.2/references/API-JS-setPermission.html and http://docs.alfresco.com/5.2/references/dev-services-permission.html). As was mentioned in one of the comments, you can also get the number of active workflows on a node by referencing the activeWorkflows property in JavaScript (http://docs.alfresco.com/5.2/references/API-JS-ScriptNode.html) or in Java
Depending on the specifics, I might implement this in different ways, but if all you want to do is have the permission change, you could just update it at the beginning and end of your workflow with a simple javascript call. The only thing bad about that is that it doesn't take into consideration the workflow getting canceled. You could also create a policy/behavior on an aspect you attach or even have a rule or job run that updates content based on the activeWorkflows values.

Yii: maximizing code reuse with per-user site configurations

The client I'm working for has a CMS written in Yii. Currently a part of their business is customizing the CMS to meet the specific needs of each customer. About 90% of the code is reused, essentially by copying and pasting from one directory to another. While I've been working on this project, I've had to merge changes in to the shared codebase several times.
All, or most, of these sites are hosted on the same server, and it would seem that it would make more sense to have a single login, that changed what features we showed based on the login. In some case that means overriding whole or partial views (eg, the _form.php might change from customer to customer) including the controller and model. Most of the time, it means adding a new controller for a bit of functionality written just for that client.
I've read about having both a front and backend site here: http://www.yiiframework.com/wiki/63/organize-directories-for-applications-with-front-end-and-back-end-using-webapplicationend-behavior but that doesn't seem to be the right fit (I don't want everyone coming to a different start php file, for instance)
Ideally, I'd have users log in, and get assigned a site id, which will filter data in the shared MVC objects, and will add in the ones specifically for them, or override the ones where necessary
Intuitively it seems like something like this would make sense:
Shared controllers go here:
/protected/controllers
Overrides and additions for client1 go here:
/protected/controllers/client1
or:
/protected/client1/controllers
But I'm not sure how to get Yii to do this in the most efficient and easy to manage way. Is this something that's going to work with Yii, or am I breaking it in ways unintended? If it will work, what's the best way to accomplish it so that it's clear to me six months from now, or some random developer who replaces me?
Do you know RBAM ?
With Role Based access you can profile your application in more-or-less granular way

Suggestions on addressing REST resources for API

I'm a new REST convert and I'm trying to design my first RESTful (hopefully) api and here is my question about addressing resources
Some notes first:
The data described here are 3d render
jobs
A user (graphics company) has multiple projects.
A project has multiple render jobs.
A render job has multiple frames.
There is a hierarchy enforced in the data (1 render job
belongs to one project, to one user)
How's this for naming my resourses...?
https:/api.myrenderjobsite.com/
/users/graphicscompany/projects
/users/graphicscompany/projects/112233
/users/graphicscompany/projects/112233/renders/
/users/graphicscompany/projects/112233/renders/889900
/users/graphicscompany/projects/112233/renders/889900/frames/0004
OR a shortened address for renders?
/users/graphicscompany/renders/889900
/users/graphicscompany/renders/889900/frames/0004
OR should I shorten (even more) the address if possible, omitting the user when not needed...?
/projects/112233/
/renders/889900/
/renders/889900/frames/0004
THANK YOU!
Instead of thinking about your api in terms of URLs, try thinking of it more like pages and links
between those pages.
Consider the following:
Will it be reasonable to create a resource for users? Do you have 10, 20 or 50 users? Or do you have 10,000 users? If it is the latter then obviously creating a single resource that represents all users is probably not going too work to well when you do a GET on it.
Is the list of Users a reasonable root url? i.e. The entry point into your service. Should the list of projects that belong to a GraphicsCompany be a separate resource, or should it just be embedded into the Graphics Company resource? You can ask the same question of each of the 1-to-many relationships that exist. Even if you do decide to merge the list of projects into the GraphicsCompany resource, you may still want a distinct resource to exist simple for the purpose of being able to POST to it in order to create a new project for that company.
Using this approach you should be able get a good idea of most of the resources in your API and how they are connected without having to worry about what your URLs look like. In fact if you do the design right, then any client application you right will not need to know anything about the URLs that you create. The only part of the system that cares what the URL looks like is your server, so that it can dispatch the request to the right controller.
The other significant question you need to ask yourself is what media type are you going to use for these resources. How many different clients will need to access these resources? Are you writing the clients, or is someone else? Should you attempt to reuse an existing standard like XHTML and classes/microformats? Could you squeeze most of the information into Atom? Maybe Atom with some extra namespaces like GDATA does it? Or is this only going to be used internally so you can just create your own media types, like application/vnd.YourCompany.Project+xml, application/vnd.YourCompany.Render+xml, etc.
There are many things to think about when designing a REST api, don't get hung up on what your URLs look like and you should really try to avoid doing "design by URL".
Presuming that you authenticate to the service, I would use the 1st option, but remove the user, particularly if the user is the currently logged in user.
If user actually represents something else (like client), I would include it, but not if it simply designates the currently logged in user. Agree with StaxMan, though, don't worry too much about squeezing the paths, as readability is key in RESTful APIs.
Personally I would not try to squeeze path too much, that is, some amount of redundant information is helpful both to quickly see what resource is, and for future expansion.
Generally users won't be typing paths anyway, so verbosity is not all that bad.

Which multilingual web design solution is fastest for the user, if this is indeed an issue?

Context:
I'm in the design phase of what I'm hoping will be a big website (lots of traffic, lots of users reading and writing to database).
I want to offer this website in the three languages I speak myself (English, French, and by the time I finish the website, I will hopefully have learned enough Spanish to offer that too)
Dilemma:
I'm wondering how I should go about offering these various languages (and perhaps more in the future).
Criteria:
Many methods exist for designing multi-language websites. I'm looking for the technique that will result in a faster browsing experience for the user.
Choices:
Currently, I can think of (and have read about) the following choices. They are sorted in order of preference up to now.
Store all language-specific strings
in a database and fetch the good one
depending on prefered-language
(members can choose which language
they prefer),
browser-default-language and which
language is selected during the
current session, in that order.
Pros:
Most of the time, a single
test at the beggining of the session
confirms which language to use for
the remainder of the session (stored
in a SESSION variable). Otherwise, a
user logging in also fetches the
right language and keeps it until
he/she logs out (no further tests). So the testing part should be
pretty fast.
Cons:
I'm afraid that accessing the
database all the time would be quite
time-consuming (longer page load for
the user), especially considering
that lots of users could also be
accessing the database at the same
time for the same reason (getting the website text in the correct language), but also
for posting comments and the such.
Strings which include variables
(e.g. "Hello " + user.name + ", how
are you?") are harder to
store because the variable (e.g.
user name) changes for each user.
A direct link to a portal for a specific language would be ugly (e.g. www.site.com?lang=es)
Store all language-specific strings
in a text file and fetch the good one
depending on prefered-language
(members can choose which language
they prefer),
browser-default-language and which
language is selected during the
current session, in that order.
Pros:
Most of the time, a single
test at the beggining of the session
confirms which language to use for
the remainder of the session (stored
in a SESSION variable). Otherwise, a
user logging in also fetches the
right language and keeps it until
he/she logs out (no further tests). So the testing part should be
pretty fast.
Cons:
I'm afraid that accessing the
text file all the time would be quite
time-consuming (longer page load for
the user), especially considering
that lots of users could also be
accessing the file at the same
time for the same reason (getting the website text in the correct language).
Strings which include variables
(e.g. "Hello " + user.name + ", how
are you?") are harder to
store because the variable (e.g.
user name) changes for each user.
I don't think multiple users could access the text file concurrently, though I may be wrong. If that's the case though, every user loading a page would have to wait for his/her turn to access the text file.
Fetching the very last string of the text file could be pretty long...
A direct link to a portal for a specific language would be ugly (e.g. www.site.com?lang=es)
Creating multiple versions of the website in seperate folders, where each version is in a different language.
Pros:
No extra-treatment is needed for handling languages, so no extra waiting time.
Cons:
Maintaining the website will be like going to school: painfull, long, makes you stupid after doing the same thing over and over again.
ugly url (e.g. www.site.com/es/ instead of www.site.com)
Additionnaly, the coices above could be combined with one or more of the following techniques:
Caching certain frequently requested pages (in a singleton or static PHP function?). Certain sentences could also be cached for every language.
Pros
Quicker access for frequently-requested pages.
Which pages need caching can be determined dynamically, with time.
Cons
I'm not sure about this one, but would this end up bloating the server's RAM?
Rewritting the url could be used for many things.
A user looking for direct access to one language could do so using www.site.com/fr/somefile and would be redirected to www.site.com/somefile, but with the language selected beign stored in a session variable.
Pros
Search engines like this because they have two different pages to show for two different languages
Cons
Bookmarking a page doesn't mean you'll en up with the right language when you come back, unless I put the language information in the url (www.site.com/somefile?lang=fr)
A little more info
I usually user the following technologies to make a website:
PHP
SQL
XHTML
CSS
Javascript (and AJAX)
This being said, if a solution requires that I learn a new language or something, I'm very open to doing so. I have no deadline for this project and I do intend to learn a lot from doing it!
Conclusion:
What I'm looking for is a method that allows me to offer multiple languages while not increasing page load time and not going crazy when trying to maintain the website. If you guys/gals have other ideas I should consider, I will try adding them to my list. Another possibility is that I'm overdoing this. Maybe I won't gain enough time with these methods for this all to be worth it, I just don't know how to verify if I need to worry about this or not.. so if you have any ideas for that, it would also help me.
Whether you use a database or a filesystem to store the translations, you should be loading the text all at once and then serving it from memory. Most applications will typically not have so much text that this becomes a problem. In Java or .Net this could be accomplished by storing the text in a singleton or static object. Then all the strings are in RAM and do not need to be loaded or parsed. If your platform does not have a convenient way to store data in ram, you could run a separate caching application such as memcached.
The rest of your concerns can be mitigated by hiding the details. Build or find a framework that lets you load your translations and then look them up by some key. If you decide to switch to files or a database later, the rest of your code is unaffected. In the short term do whichever is easier for you. I've found that it's best to have a mix: it's easier to manage application text along with the source code in a version control system. But some text changes often, or needs to change without requiring a build+deployment cycle, and that text should be in the DB.
Finally, don't build strings with substitutions in them. Use some kind of format string, because otherwise your translators will go crazy trying to translate sentence fragments.
(Warning: Java code sample)
//WRONG
String msg = "Hello, " + username + ", welcome back.";
//RIGHT
String fmt = "Hello, %s, welcome back."; // in real code: load this string from a file or the db
String msg = fmt.format(username);
Another person mentioned encoding the language in the URL. This is the preferred way to do it if you care what a search engine thinks of your site. Google recommends using different hostnames or a different subdirectory. This means that the language headers sent by the user can't be used for anything, except perhaps initially sending them to one landing page or another. You will need to determine the language for each request based on the incoming URL (this actually simplifies your code a lot later on). In Java I'd store the language code in the Request and just grab it whenever I need it.
The easiest way to handle language codes in the URL is to use re-writing. A client sends a request for www.yoursite.com/de/somepage and internally you re-write the request to www.yoursite.com/somepage and store the language identifier somewhere. In Java each request has an HttpServletRequest object where you can store attributes for the lifecycle of the request. If your framework doesn't have anything like that you can just add a parameter to the url: www.yoursite.com/de/somepage => www.yoursite.com/somepage?lang=de. If you are using hostname-based languages you can use hostnames such as de.yoursite.com or www.yoursite.de. There are pros and cons to using this approach. For one thing, using country-code TLDs means registering new TLDs and trying to figure out whether a country code is appropriate to represent a language (it's often not). Using differnet hostnames/domains means you have to consider under what domains cookies are stored. If you want a cookie-free subdomain you need to plan this carefully. But from the coding side a language-based hostname doesn't need any additional re-writing; you can read the hostname (it's the Host header in the HTTP request) and parse that to determine the language.
Offer the initial page in a language depending on the Accept-Language HTTP header.
Let the user set the language in the current session and, if they're authenticated, in their user profile.
In your code and templates, mark strings as "translatable." You should have tools that gather all the strings from your codebase and let your translaters translate them.
Have a layer which loads the translations from the database either individually or as a bundle, and apply them to the page which is loading. Cache these parts to make them fast -- every page load shouldn't make a hundred calls to the database for every translatable string.
Checkout how Django does it -- it should be enlightening.
"I'm afraid that accessing [the database/text file] all the time would be quite time-consuming"
It would be, but that's why you'd likely be using caching to some extent. Nearly all large sites are accessing data stored outside the HTML page itself and, as such, utilize caching techniques as needed.
Your question regarding speed really is irrelevant to having multiple languages. It's an issue of storing data (content) so it's easy to maintain and present to the user. Whether it's one language or 10 the problem is the same.
Create the most generic form of the site as you can. Import the translation from a database, with fall back (i.e. an order of languages, if a translation does not exist then use the next best langauge (For German: German, Dutch, English etc).
You would solve performance issues by keeping caches of the dynamically created pages. [Check the dependent data and update if necessary]
The perfered language that a user would like is passed along in the HTTP request headers. Having a select language+query string would often be unnecessary.
Resource files would be one way to go. It is easier to send to translators. However it can be difficult to resuse amongst multiple websites.
Databases are convient because it is the first thing that should be backed up on a website. It also has the benefit of being fast. However, if you have an extremely database focused project, you may not want to add additional strain on your database.
For my solutions I want this:
The language should be indicated in the URL, it works better with google indexing the page and people following the links in google's search result.
As much pre-generated translations as possible, for faster page-serving.
The first is quite easily done by having an URL like http://example.com/fr/and-so-on. URL rewriting can turn that into http://example.com/and-so-on?lang=fr which is potentially easier to handle.
For pre-generating translations, it is good to use a html template framework so you can generate translated templates from one set of source templates. A blunt approach is to generate a sed-script from a language key-value files, and run that sed script on each template to get a translated version.
What remains then is to translate the dynamically generated parts of the pages. There are a few tools for that java has bundles, gnu gettext is a quite nice tool.