Are code examples indexed by search engines? - seo

If I have a code example in a blog article with escaped content wrapped inside of a <pre><code> tag combination then will a search engine read that? What about if the code is also syntax highlighted (so it's littered with <span> tags with special, color-based classes)?

As long as it is in the page source, crawlers can see it. They might treat it and use it different ways. it is Google and so anything happen in their algorithms.

Related

Trying to understand Google Results and meta tags

Note: this does NOT regard ranking, I just want the results to look better overall.
I'm working with a "news site" with a lot of articles, some dynamic, some static.
The developers haven't really given much thought about SEO but now want the Google Results to look a bit prettier - which landed on my table.
In the source code there's a few meta-tags, example:
<meta name="twitter:title" content="content">
<meta name="og:title" content="content">
Running it through Google Structured Data Testing Tool shows what I'd expect but it doesn't look like my search result for that specific link has the correct snippet.
Seems like it doesn't want to pick the og:description content all the time. Sometimes it does, and sometimes it also adds the title again in the snippet.
What I don't get: is Google using og:title for results or is that only for ex Facebook sharing? Do I simply need this one below, since that is actually missing from the code?
The description itself would be the same as og:description since they contain the same content.
<meta name="description" content="content">
As far as I understand it can be quite tricky to customize these sorts of things but could it really be that hard to have any sort of consistency throughout the results from our page?
There are two things you can do but both come with a caveat.
Google takes anything from your site as a suggestion. There is no way to program it to perform identically in all situations. If Google's algorithm believes there is a better way to present a result - it will ignore any direction you give it and auto-generate a new presentation for your page.
That said there's two things you can do:
Add meta tags with the exact text you'd like to appear on the SERP. The page title may or may not be appended with your brand/company name. If it already contains the company/brand name, Google is more likely to leave it where it is.
Google takes text from the page based on what it thinks is more important/relevant to the search. For News, using either HTML5 elements (nav, article, aside) or labelling your divs with a class using those key words will help Google understand what the real content is. Asides are less likely to be used while Articles will be focused upon.
I would also recommend having authors write their own custom descriptions and insert them with your CMS. They're likely much better at constructing a good summary than Google or an auto-summary script. Google will experiment with alt descriptions occasionally but once something solidifies itself as popular in terms of click rate, it'll stick.

Code related web searches

Is there a way to search the web which does NOT remove punctuation? For example, I want to search for window.window->window (Yes, I actually do, this is a structure in mozilla plugins). I figure that this HAS to be a fairly rare string.
Unfortunately, Google, Bing, AltaVista, Yahoo, and Excite all strip the punctuation and just show anything with the word "window" in it. And according to Google, on their site, at least, there is NO WAY AROUND IT.
In general, searching for chunks of code must be hard for this reason... anyone have any hints?
google codesearch ("window.window->window" but it doesn't seem to get any relevant result out of this request)
There is similar tools all over the internet like codase or koders but I'm not sure they let you search exactly this string. Anyway they might be useful to you so I think they're worth mentioning.
edit: It is very unlikely you'll find a general purpose search engine which will allow you to search for something like "window.window->window" because most search engines will do some processing on the document before storing it. For instance they might represent it internally as vectors of words (a vector space model) and use that to do the search, not the actual original string. And creating such a vector involves first cutting the document according to punctuation and other critters. This is a very complex and interesting subject which I can't tell you much more about. My bad memory did a pretty good job since I studied it at school!
BTW they might do the same kind of processing on your query too. You might want to read about tf-idf which is probably light years from what google and his friends are doing but can give you a hint about what happens to your query.
There is no way to do that, by itself in the main Google engine, as you discovered -- however, if you are looking for information about Mozilla then the best bet would be to structure your query something more like this:
"window.window->window" +Mozilla
OR +XUL
+ Another search string related to what you are
trying to do.
SymbolHound is a web search that does not remove punctuation from the queries. There is an option to search source code repositories (like the now-discontinued Google Code Search), but it also has the option to search the Internet for special characters. (primarily programming-related sites such as StackOverflow).
try it here: http://www.symbolhound.com
-Tom (co-founder)

Hiding or Promoting specific content within a page to search engines

A bit of an SEO question here.
I've got a site with a ton of pages, of content. I know lots of the content is the same on each page.
I thought that Search Engines keyed off of the differences in page content so that they could promote the correct data, but when I look at the summary in google and bing, the summary shows my 'feedback' block (which is where I just ask for feedback).
Yahoo (and the summary in Facebook) shows my search options menu.
These aren't really things that are going to make a person want to click on the page.
So I'm wondering what the best way is to either hide this content from search engines, or improve the visibility of the other content that should get indexed.
The page structure is pretty consistent, so I thought it would have been easy for the search robots to pick this stuff out, but apparently not.
You may want to try using a meta tag like this.
< META NAME="description" CONTENT="Here is a short summary of the page" >
Search engines also prefer title and header tags over regular text.
Meta is the best way to do that.
However,Beware that your structure of page is a also important, which means search engines prefer to use metal tag, but they also weigh the structures, keywords, headers things like that.
I encountered such trouble couple of months ago. I found Google showed price and download rather than meta description. I solved that by reorganize meta description(more accurate and shorter,177 characters)eliminate tags from price and download tags. And made some slight adjustments to the structure. Now the Google summary is what I want.
Hope this helps you!

Tool or methods for automatically creating contextual links within a large corpus of content?

Here's the basic scenario - I have a corpus of say 100,000 newspaper-like articles. Minimally they will all have a well-defined title, and some amount of body content.
What I want to do is find runs of text in articles that ought to link to other articles.
So, if article Foo has a run of text like "Students in 8th grade are being encouraged to read works by John-Paul Sartre" and article Bar is titled (and about) "The important works of John-Paul Sartre", I'd like to automagically create that HTML link from Foo to Bar within the text of Foo.
You should ask yourself something before adding the links. What benefit for users do you want to achieve by doing this? You probably want to increase the navigability of your site. Maybe it is better to create an easier way to add links to older articles in form used to submit new ones. Maybe it is possible to add a "one click search for selected text" feature. Maybe you can add a wiki-like functionality that lets users propose link for selected text. You probably want to add links to related articles (generated through tagging system or text mining) below the articles.
Some potential problems with fully automated link adder:
You may need to implement a good word sense disambiguation algorithm to avoid confusing or even irritating the user by placing bad automatic links with regex (or simple substring matching).
As the number of articles is large you do not want to generate the html for extra links on every request, cache it instead.
You need to make a decision on duplicate titles or titles that contain other title as substring (either take longest title or link to most recent article or prefer article from same category).
TLDR version: find alternative solutions that provide desired functionality to the users.
What you are looking for are text mining tools. You can find more info and links at http://en.wikipedia.org/wiki/Text_mining. You might also want to check out Lucene and its ports at http://lucene.apache.org. Using these tools, the basic idea would be to find a set of similar articles based on the article (or title) in question. You could search various properties of the article including titles and content or both. A tagging system a la Delicious (or Stackoverflow) might also be helpful. Rather than pre-creating the links between articles, you'd present the relevant articles in an interface much like the Related questions interface on the right-hand side of this page.
If you wanted to find and link specific text in each article, I think you'd need to do some preprocessing to select pertinent phrases to key on. Even then I think it would be very hard not to miss things due to punctuation/misspellings or to not include irrelevant links for the same reasons.

SEO for Ultraseek 5.7

We've got Ultraseek 5.7 indexing the content on our corporate intranet site, and we'd like to make sure our web pages are being optimized for it.
Which SEO techniques are useful for Ultraseek, and where can I find documentation about these features?
Features I've considered implementing:
Make the title and first H1 contain the most valuable information about the page
Implement a sitemap.xml file
Ping the Ultraseek xpa interface when new content is added
Use "SEO-Friendly" URL strings
Add Meta keywords to the HTML pages.
The most important bit of advice anyone can get when optimizing a website for search engines and indeed for tools like Ultraseek is this...
Write your web pages for your human audience first and foremost. Don't do anything odd to try and optimize your website for a search engine. Don't stuff keywords into your URL if it makes the URL less sensible. Think human first.
Having said this, the following techniques usually make things better for both the humans and the machines!
Use headings (h1 through h6) to give your page a structure. Imagine them being arranged in a tree view, with a h1 containing some h2 tags and h2 tags containing h3 tags and so on. I usually use the h1 tag (there should be only one h1 tag) for the site name and the h2 tag for the page name, with h3 tags as sub-headings where appropriate.
Sitemaps are very useful as they contain a list of your pages, consider this a request of pages you would like included in any index. They don't normally contain much context though.
Friendly URL strings are great for humans. I'd much rather visit www.website.com/Category/Music/ than www.website.com?x=3489 - it does also mean that you give the machines some more context for your page. It especially helps if the URL matches your h1 and h2 tags. Like this:
www.website.com/Category/Music/
Website
Category: Music
Welcome to the music category!
Meta keywords (and description) are useful - but as per the above advice, you need to make sure that it all matches up. Use a small but targeted set of keywords that highlight what is specifically different about the page and make sure your description is a good summary of the page content. Imagine that it is used beneath the title in a list of search results (even though it might not be!)
Navigation! Providing clear navigation, as well as back links (such as bread-crumbs) will always help. If someone clicks on a search result, it might not be the exact page they are after, but it may well be very close. By highlighting where people have landed in your navigation and by providing a bread-crumb that tells them where they are, they will be able to traverse your pages easily even if the search hasn't taken them to the perfect location.