How do permalinks get stored? - apache

I would like to implement permalinks on my website (I use JSP+Servlets if this makes any difference) and was wondering how they work. Are they stored as physical pages on the server or values go into the database and URLs are generated dynamically?
For example, http://jsfiddle dot netnet/8MBHZ/
Is 8MBHZ a physical html page?

This is the static URL of the page. Such request comes to the server, the value 8MBHZ is retrieved from the URL. Using this value, you can find the page content in the database. Then this extracted content is rendered.
(Static URLs are not indexed multiple times (in contrast to dynamic). This has a positive effect on search engine optimization (SEO)).

Related

Share dynamic content on LinkedIn

I have a JS based CMS that populates a single page with different content based on URL parameters passed to the page. I am using the shareURL format (https://www.linkedin.com/shareArticle?mini=true&url=''&title=''&summary=''&source='')
But the parameters I pass are never used it always falls back to what is being served directly from the server.
Do I have to use the API to make this work and if so can I use the API without making the user authenticate?
Is there a correct way to pass this so that linked in will display the correct data.
After testing this more I realised that the linked ins share URL does not take its parameters it only takes what is served from the server. So I changed my build process not to get the pages in run time but to precompile them onto the server. Maybe in the future linked in will have resolved this for dynamic pages.

Sitemap on dynamic generated content

I'd need to index my website on Google and other search engines, though my website is a database of IP addresses and the webpage is dynamically generated like
example.com/show.php?&ip=ipvalue
Would it be possible to index on Google and other search engines every IP address I have on the database by linking directly the direct URL as showed above?
I know how to set up a proper sitemap file to index static content though I cannot understand how I could tell a search engine to index a URL that doesn't physically exist unless the user passes a value which is in the database.
No one (neither human visitors nor search engine bots) knows if a document is created dynamically or if it exists as a static file.
A search engine would have no reason to handle a link to http://example.com/show.php?ip=127.0.0.1 differently from a link to http://example.com/ip/127.0.0.1. By using URL rewriting (e.g., mod_rewrite for Apache), you could rewrite your URLs in such a way.
So: Just link to these pages from a place that search engine crawlers can access.

Should a sitemap have *every* url

I have a site with a huge number (well, thousands or tens of thousands) of dynamic URLs, plus a few static URLs.
In theory, due to some cunning SEO linkage on the homepage, it should be possible for any spider to crawl the site and discover all the dynamic urls via a spider-friendly search.
Given this, do I really need to worry about expending the effort to produce a dynamic sitemap index that includes all these URLs, or should I simply ensure that all the main static URLs are in there?
That actual way in which I would generate this isn't a concern - I'm just questioning the need to actually do it.
Indeed, the Google FAQ (and yes, I know they're not the only search engine!) about this recommends including URLs in the sitemap that might not be discovered by a crawl; based on that fact, then, if every URL in your site is reachable from another, surely the only URL you really need as a baseline in your sitemap for a well-designed site is your homepage?
If there is more than one way to get to a page, you should pick a main URL for each page that contains the actual content, and put those URLs in the site map. I.e. the site map should contain links to the actual content, not every possible URL to get to the same content.
Also consider putting canonical meta tags in the pages with this main URL, so that spiders can recognise a page even if it's reachable through different dynamical URLs.
Spiders only spend a limited time searching each site, so you should make it easy to find the actual content as soon as possible. A site map can be a great help as you can use it to point directly to the actual content so that the spider doesn't have to look for it.
We have had a pretty good results using these methods, and Google now indexes 80-90% of our dynamic content. :)
In an SO podcast they talked about limitations on the number of links you could include/submit in a sitemap (around 500 per page with a page limit based on pagerank?) and how you would need to break them over multiple pages.
Given this, do I really need to worry
about expending the effort to produce
a dynamic sitemap index that includes
all these URLs, or should I simply
ensure that all the main static URLs
are in there?
I was under the impression that the sitemap wasn't necessarily about disconnected pages but rather about increasing the crawling of existing pages. In my experience when a site includes a sitemap, minor pages even when prominently linked to are more likely to appear on Google results. Depending on the pagerank/inbound links etc. of your site this may be less of an issue.

Can search engines index pages generated by server side code?

I'm guessing a site like stack overflow doesn't keep an html file around for every question ever asked. Instead, server-side code creates the page every time a question is clicked on(I think). Is it possible for search engines to index every quesiton on Stack Overflow, or would a page-per-question need to be kept in the directory so the search engine can crawl it?
Yes. Search engines can index dynamically generated pages no problem. In fact, from the search engine bot's perspective, it can't really even distinguish between a dynamically generated page and a static one.
You might be interested by the Dynamic URLs vs. static URLs post on the Official Google Webmaster Central Blog.
Yes it's perfectly possible - when a link is followed the server returns HTML just like any other web page. The only difference is that the server generated it, rather than a person.
As far as the client (be it a browser or search engine) is concerned, there is no difference between a server-generated page and a static file. They're virtually indistinguishable (depending on how the page is generated, it might be missing Last-Modified headers, etc). As such, yes, search engines can index generated pages without a problem.
That said, there is something to be said for giving them a hint. Using sitemaps, for example, gives a search engine a nice listing of all your pages, so it's less likely to miss them. More importantly, it can summarize last modified times, to focus the search engine's attention on what has changed recently. This isn't mandatory, but it does help - regardless of whether the pages are static HTML or generated.
Any link that uses a GET can be followed by most crawlers. Anything that requires a POST will generally be ignored.
The mechanism for generating the page is irrelevant.
yes if this is not restricted by robot.txt or meta tags.Search engine requests web page like normal user,no one have access to server side code(if your site isn't hacked))
Search engines can see pretty much anything on a given Web page that isn't hidden behind client-side code (i.e., JavaScript).
So, if there's a URL that you can enter into your browser's address bar to get this page, and this page is linked to from somewhere, a search engine will find it and "see" the same content that you do. The fact that the page was generated dynamically by a server is irrelevant to a search engine, since what is sent to a browser upon requesting a URL is still just an HTML file.
In other words, that HTML file doesn't exist in the same form on the server - i.e., it's actually some server-side code that generates HTML, not a static HTML file - but that's not what a search engine is crawling though and indexing, rather links to document URLs that are exactly what you see in your browser's address bar.

When is a website considered "static" or "dynamic"

I have created a site, which parses XML files and display its content on the appropriate page. Is my site a dynamic web page or static web page?
How do dynamic and static web pages differ?
I feel it's dynamic, because I parse the content from xml files; initially i don't have any content in my main page..
What do you think about this, please explain it..
I would describe your pages as dynamic. "Static" usually means that the file sitting on the web server is delivered as-is to the user; since you're assembling the pages from data files, I'd call them dynamic even if you're not building in any dynamically-changing data.
I don't think this is a hard and fast definition though. If someone feels the page is static because it's assembled from static pages, that's another way to look at it.
This is actually an interesting question..
I would have said it's a dynamic website, as the content is generated programmatically.. but if the XML files do not change, it's no less "static" than straight HTML files served though Apache.
Say you have a site that is regular HTML files - it would be considered a static web-page; but if you take those HTML files, store them in a database, and have a simple page that allows /view.php?page=index - does that make it a dynamic site?
I would say no, it's just a static site served through a database, or XML files (instead of a file-system).
Basically: if the content changes without you manually editing those XML files, I would say it's a dynamic site. If it does change, then I would say it's a static site.
Static web pages would be plain HTML content that are delivered. If you are processing any type of XML files at the server side and generating content accordingly, this is a dynamic page. Static pages change content when the page is actually edited & modified.
First result on Google if you had searched for it explains it. http://websiteowner.info/articles/pages/pagetypes.asp
Also, stating that static websites are not updated regularly is not correct. The web and HTML was around even before we started writing stuff in Perl & PHP. There are/were sites that had heavy traffic and were being modified manually.
a simple way to distinguish between static and dynamic:
Static: straight HTML files
Dynamic: HTML is generated through server-side code and a data store(XML, database, etc.)
KISS - Dynamic pages change without changing the page itself.
Your pages are dynamic, because once deployed the content can be changed without changing the page's HTML.
Any content that is fixed and always renders the same is considered Static.