mod rewrite convert to lowercase and repace + with - - apache

I have a simple news site that gets news from Rss feeds. Currently the Urls are something like this:
http://domain.com/news/This+and+That+Happened+There/
I would simply like to change this to
http://domain.com/news/this-and-that-happened-there/
(looks more clean)
Your help will be appreciated

I'm sorry, but I doubt it's possible in your case. Based on what you're saying, it's the This+and+That+Happened+There part that is the unique ID of the news item and that would mean that it has to remain unchanged in order for it to work on the website (be recognised on the backend/database).
If, on the other hand, you have some other ID in the url. Say http://domain.com/news/This+and+That+Happened+There/123/ where 123 is the ID, then it should be doable, as whatever's before that shouldn't matter. All you'd need is to have your CMS (or whatever backend you're using) to have the urls formatted that way.
You did add seo tag to this question, so I understand that's what you want improve. I'd say in this case it won't matter (but I'm not expert in that field).

Related

How to limit scrapy to a particular section of a website, e.g. http://www.domain.com/section/

I have a scrapy project which crawls all the internal links of a given website. This is working fine, however we have found a few situations where we want to limit the crawling to a particular section of a website.
For example, if you could imagine a bank has a special section for investor information, e.g. http://www.bank.com/investors/
So in the example above, everything in http://www.bank.com/investors/ only would be crawled. For example, http://www.bank.com/investors/something/, http://www.bank.com/investors/hello.html, http://www.bank.com/investors/something/something/index.php
I know I could write some hacky code on parse_url which scans the URL and does a pass if it doesn't meet my requirements (i.e. it's not /investors/), but that seems horrible.
Is there a nice way to do this?
Thank you.
I figured this out.
You need to add an allow() for the pattern you want to allow.
For example:
Rule(LinkExtractor(allow=(self.this_folder_only)), callback="parse_url", follow=True)
Everything else will be denied.

Is there any way to change the structure of urls in Shopify?

Is there any way to change the actual structure of a url in Shopify, rather than just the handle? For instance if I add a product it will be available at the following urls:
/products/some-product
/collections/collection-name/products/some-product
Is there any way I could change this to /collection-name/some-product, i.e. remove unnecessary words from the url?
I also realise you can add redirects, but this isn't what I want.
When thinking on the product page you should never think of playing or using the url which has 'collections'. If you take a deep look on the source code of a product you'll realize they all have a rel canonical tag pointing to the
../products/some-product
even if the product is displayed within the url
../collections/collection-name/products/some-product
If the collections url doesn't have that canonical tag, use it, otherwise crawlers/robots would consider it duplicate content because 2 different urls would show the same content.
Then if you're ok with the first part, you'll only have
../products/some-product
In such case, you will never be able to change the
../products/
part. But this is good as it helps Shopify store owners maintain a really well structured organization of products.
If you still for some reason need to play hard with urls, you can deep a bit into Application Proxies.

Using sunspot_rails with rails

I'm writing a rails app that allows people to submit links and titles a little bit like reddit.
I want to make it so that people can enter a url and find the record with a similar url. This gets tricky if people leave off the http:// at the beginning or do or don't have the trailing "/".
How do I set that up using Solr/Sunspot?
If I've understand your query properly, there are a couple of problems combined here.
You can probably use something like URI.parse(incoming_url) and then extract the relevant parts of the URL that you want. I'd then use that info and convert it to a slug using something like slugify or acts_as_slug.
I'm not sure why you want to tie that functionality into Sunspot though.

SEO Friendly URL Rewriter Parameters

I would appreciate you advice on how to incorporate parameters into SEO Friendly URLs
We have decided to have the "techie" parameters first, followed by the "SEO Slug"
\product\ABC123\fly-your-own-helicopter
much like S.O. - if the SEO Slug changes, or is truncated, or missing, we still have the Product and ABC123 parameters; various articles say that having such extra data doesn't harm SEO ranking.
We need to have additional parameters; we could use "-" to separate parameters as it makes them look similar to the SEO Slug, or we could/should use something else?
\product\ABC123-BOYTOY-2\boys\toys\fly-your-own-helicopter
This is product=ABC123, in Category=BOYTOY and Page=2.
We also want to keep the hierarchy as flat as possible, and thus I think:
\product-ABC123-BOYTOY-2\boys\toys\fly-your-own-helicopter
would be better - one level less.
We have a number of "zones", e.g.
\product-ABC123\seo-slug-for-product
\category-BOYTOY\seo-slug-for-category
\article-54321\terms-and-conditions
it would help us a lot if we could just user our 5 digit Page ID number instead, so these would become
\12345-ABC123\seo-slug-for-product
\23456-BOYTOY\seo-slug-for-category
\54321\terms-and-conditions
(Products & Categories have a number of different Page IDs for different templates, this style would take us straight to the right one)
I would appreciate your insight into what parameter separators to use, and if the leading techi-data is going work well for us.
In case relevant:
Classic ASP application on IIS7 + MSSQL2008
Product & Category codes contain A-Z, 0-9, "_" only.
Personally, I don't think any of the following:
\12345-ABC123\seo-slug-for-product
\product-ABC123-BOYTOY-2\boys\toys\fly-your-own-helicopter
are particular "friendly". They may be "ok" for SEO, but you might lose your friendly part in the coding you have at the beginning of the url.
Why can't you have something like this:
\product\seo-slug-for-product
And then have a table or dictionary which maps the slug to the product ID. That way when your MVC controller receives the slug as a parameter it can look up all the other values.
Worst case, I would do it the SO way. Which is more like:
\product\123456\seo-slug-for-product
The number is the product ID. I think they do it so that the titles to the articles can change and the old URLs still work. That's why:
SEO Friendly URL Rewriter Parameters
and
SEO Friendly URL Rewriter Parameters
work. They use:
<link rel="canonical"
href="https://stackoverflow.com/questions/3023298/seo-friendly-url-rewriter-parameters">
to ensure that google indexes only one page.

Is a deep directory structure a bad thing for SEO?

a friend of mine told me that the company he works at are redoing their SEO for their large website. Large == both number of pages and traffic they get a day.
Currently they have a (quote) deeply nested site , which i'm assuming means /x/y/z/a/b/c.. or something. I also know it's very unRESTful from some of the pages i've also seen -> eg. foo.blah?a=1&b=2&c=3......z=24 (yep, lots of crap in the url).
So updating their SEO sounds like a much needed thing.
But, they are going flat. I mean -> totally flat. eg. /foo-bar-pew-pew-abc-article1
This scares the bollox out of me.
From what he said (if i understood him right), each - character doesn't mean a new heirachial level.
so /foo-bar-pew-pew-abc-article1 does not mean /foo/bar/pew/pew/abc/article1
A space could be replace by a -. A + represents a space, but only if the two words are suppose to be one word (whatever that means). ie. Jean-Luke will be jean+luke but if i had a subject like 'hello world, that would be listed ashello-world`.
Excuse me while i blow my head up.
Is this just mean or is it totally silly to go completly flat. To mean, I was under the impression that when SEO people say keep it as flat as possible, they are trying to say keep it to 1 or 2 levels. 4 is the utter max=.
Is this me or is a flat heirachy a 'really really good thing' for seo ... for MEDIUM and LARGE sites (lots of resources, not necessairly lots of hits/page views).
Well, let's take a step back and look at what SEO is supposed to accomplish; it's meant to help a search engine identify quality, relevant content for users based on key phrases and terms.
Take, for example, the following blog URLs:
* http://blog.example.com/articles/2010/01/20/how-to-improve-seo/
* http://blog.example.com/how-to-improve-seo/
Yes, one is deep and the other is flat; but the URL structure is important for two reasons:
URL terms and phrases are high-value targets for determining relevance of a page by a search engine
A confusing URL may immediately force a user to skip your link in the search results
Let's face it: Google and other search engines can associate even the worst URLs with relevant content.
Take, for example, a search for "sears kenmore white refrigerator" in Google: http://www.google.com/search?q=sears+kenmore+white+refrigerator&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a.
Notice the top hit? The URL is http://www.sears.com/shc/s/p_10153_12605_04665802000P , and yet Google replaces the lousy URL with www.sears.com › Refrigerators › Top Freezers. (Granted, 2 results down is the true URL.)
If your goal for SEO is optimized organic relevance, then I would wholeheartedly recommend generating either key/value pairs in the URL, like www.sears.com/category/refrigerators/company/kenmore (meh), or phrase-like URLs like www.sears.com/kenmore/refrigerators/modelNumber. You want to align your URLs with the user's search terms and phrases to maximize your effort.
In the end, if you offer valuable content and you structure your content and site properly, the search engines will accurately gather it. You just need to help them realize how specific and authoritative your content is. :)
Generally the less navigation to reach content the better. But with a logical breadcrumb strategy and well thought out deep linking the excess of directory depth can be managed and not hurt seo and the visibility in search.
Remember that Google is trying to return the most relevant link and the best user experience, so if your site has 3 urls coming up for the same search term and it take 2 or 3 exits to find the appropriate content, Google will read that as bad and start lowering all of your urls in SERPs.
You have to consider how visitors will find your content - not navigate it. Think content discovery and just navigation.
HTH
Flat or deeply nested really shouldn't affect the SEO. The key part is how those individual pages are linked to will determine how they get ranked. I did write some basic stuff on this years ago see here, but essentially if pages are not buried deeply within a site, i.e. it takes several clicks (or links from Google's perspective) then they should rank fairly much the same in either case. Google used to put a lot more weight on keywords in URL's but this has been scaled back in more recent algorithm changes. It helps to have keywords there, but its no longer the be-all and end-all.
What you/they will need to consider are the following two important points:
1) How will the URL structure be perceived by the users of the site? Will they they be able to easily navigate the site and not have to rely on the URL structure in the address bar?
2) In making navigational changes such as this its vitally important to set-up redirects from old url's. Google, hates 404's and they should either put in 410 (Gone) HTTP responses for pages are no longer valid or 301 HTTP response for permanent redirects (with new url).
In making any large changes such as this you can save loads of time getting the site indexed successfully by utilising XML sitemaps and Google's webmaster console.