real estate large scale 301 redirect - apache

Trying to work out what to do in regard to a redirect of a new client real estate website.
We have no access at all to the old site and the url structure on new is forcibly different due to randomly generated property IDs (our system generates a different ID from old)
The old url structure is www.mydomain.com/property/view?=1111
The new url structure is www.mydomain.com/property/street-name/2222
My instinct is to do manual 301s for every property (about 6000), matching by page title, but sadly I cant as I have no access to the structure of the old website and despite spidering it numerous times I cant get a pull of all properties off.
If any could give me any advice on what best to do to avoid bad user experience and a google frying, would really appreciate it.
Thanks in advance.
Mark

It depends on what the 1111 is. If that corresponds to an MLS ID number (some sort of UID) then you should be able to use regex to get it to work. Most of the IDX vendors offer a way to grab listings via an MLSID.
If that 1111 is instead just a GUID of the previous IDX vendor, then you might be out of luck and would need to do everything manually.

Related

Solution for listening to file once/download

I have energyshop.se which is a small webshop im doing for a customer. She sells various stuff but she has meditations on .mp3 and on discs. The customers can choose to buy:
a disc which is then sent to their home address.
one listening to one part of a meditation or one listening to all parts in a meditation
some items are also available for download for another price.
So question is how do I solve this? We use PayPal at her site to make the payments and I know that after a payment I can redirect the users to a "thankyouforthepurhase"-page if I want to. That leads me to think that one option is to take the users to that page where they can download/listen, but I dont know how to "connect" the shopped item with the isplayed files to be downloaded and/or listene once too and how to limit this. I mean if the page is energyshop.se/thanks someone that have made the purchase can just copy that address and go straight there.
There is also a idea about codes in some way. If they make a purchase they get a code sent to them for download or listening, but how do I generate this code which has to be unique everytime and the password ha to change everytime someone has entered it so someone dont save the code as well.
Well im kinda out of ideas and not sure how to do this. I just wanna wrap this project up but I think I hve to solve this for her.
I guess the solution depends on how much control (or security) she feels is required. If all you need is a reasonable confidence that a user has paid, then you can be much more relaxed about the whole thing.
The 'thanks' page could easily provide the content in concert with the transaction id paypal will return - you can use this to control what is shown. If you're not too worried about a 'listen-once' item being replayed, or 'download' being downloaded a bunch of times then you can avoid a bunch of edge-case stuff (where download fails or listen-once feed dies halfway, etc) and simply serve links to content based on the transaction.
If its abused then you can put effort into locking down content, serving mp3 streams from one-time links, tracking downloads in a database etc, but that will cost significantly more (in both time and server resources) so if you can, try simple first :)

Why randomize your file names for cloud storage/CDN?

When you look at a profile picture on a social networking site like Twitter, they store image files like:
http://a1.twimg.com/profile_images/1082228637/a-smile_twitter_100.jpg
or even with a date somewhere in the path like 20110912. The only immediate benefit I can think of is preventing a bot from going through and downloading all files in your storage in a linear fashion. Am I missing any other benefits? What is the best way to go about randomizing it?
I am using Amazon S3 so I will have one subdomain serving all my static content. My plan was to store an integer ID in my database and then just concat the URL with the id to form the location.
One reason I cryptographically scramble identifiers in public URLs is so that the business' rate of growth is not always public.
If the current ids can be deduced simply by creating a new user account or uploading an image, then an outside person can calculate the growth rate (or an upper limit) by doing this on a regular basis and seeing how many ids were used during the elapsed time.
Whether it's stagnating or whether it's exploding exponentially, I want to be able to control the release of this information instead of letting competitors or business analysts be able to deduce it for themselves.
Offline examples of this are invoice and check numbers. If you get billed by or paid by a company on a regular basis, then you can see how many invoices or checks they write in that time period.
Here's a CPAN (Perl) module I maintain that scrambles 32-bit ids using two way encryption based on SkipJack:
http://metacpan.org/pod/Crypt::Skip32
It's a direct translation of the Skip32 algorithm written in C by Greg Rose:
http://www.qualcomm.com.au/PublicationsDocs/skip32.c
Use of this approach maps each 32-bit id into an (effectively random) corresponding 32-bit number which can be reversed back into the original id. You don't have to save anything extra in your database.
I convert the scrambled id into 8 hex digits for displaying in URLs.
Once your ids approach 4.29 billion (32-bits) you'll need to plan for extending the URL structure to support more, but I like having shorter URLs for as long as possible.
Changing URLs is a safe way to invalidate outdated assets.
It is also a necessity if you want to allow users storing private images. Using a path deductible from the users account name/id/path would render privacy settings useless as soon as you store assets on a CDN.
Mainly, it prevents name collisions. More than one person might upload "IMG_0001.JPG", for example. You also avoid limits on the number of files in one directory, and you can shard images across multiple servers - there's no way a huge site like Twitter or Facebook could store all photos on one server, no matter how large.

New site going on to an old domain

I have a client who over the years has managed to get their product to the top of Google for many different search terms. They're adamant that the new site shouldn't have a detrimental effect to their google ranking.
The site will be replacing the site that is on there current domain, as well as going up on to 5 further domains.
Will any of this lose the client there current ranking on google?
Google re-ranks the sites it has regularly. If the site changes, the ranking very well could... if more or fewer people link to it or if the terms on the site (the content) is different.
The effect might be good or bad, but uploading different content isn't going to make their rank go away overnight or anything like that.
Page Rank is most about incoming links. So if the incoming links won't be broken page rank will not be affected that much.
Though, overall ranking is not just Page Rank, so... further discussion is needed
if they retain current link structure they should be fine

How to decide whether to split up a VB.Net application and, if so, how to split it up?

I have 2 1/2 years experience of VB.Net, mostly self taught, so please bear with me if I seem rather noobish still and do not know some of the basics. I would recommend you grab a cup of tea before starting on this, as it appears to have got quite long...
I currently have a rather large application (VB.Net website) of over 15000 lines of code at the last count. It does not do retail or anything particularly complex like that - it is literally just a wholesale viewing website with admin frontend, catalogue / catalogue management system and pageview system.
I don't really know much about how .Net applications work in the background - whether they are all loaded on the same thread or if each has its own thread... I just know how to code them, or at least like to think I do... :-)
Basically my application is set up as follows:
There are two different areas - the customer area and the administration frontend.
The main part of the customer frontend is the Catalogue. The MasterPage will load a list of products but that's all, and this is common to all the customer frontend pages.
I tend to work on only one or several parts of the application at a time before uploading the changes. So, for example, I may alter the hierarchy of the Catalogue and change the Catalogue page to match the hierarchy change whilst leaving everything else alone.
The pageview database is getting really quite large and so it is getting rather slow when the application is first requested due to the way it works.
The application timeout is set to 5 minutes - don't know how to change it, I have even tried asking this question on here and seem to remember the solution was quite complex and I was recommended not to change it, but if a customer requests the application 5 minutes after the last page view then it will reload the application from scratch. This means there is a very slow page load whenever it exceeds 5 minutes of inactivity.
I am not sure if this needs consideration to determine how best to split the application up, if at all, but each part of the catalogue system is set up as follows:
A Manager class at the top level, which is used by the admin frontend to add, edit and remove items of the specified type and the customer frontend to retrieve a list of items of the specified type. For example the "RangeManager" will contain a list of product "Ranges" and will be used to interact with these from the customer frontend.
An Item class, for example Range, which contains a list of Attributes. For example Name, Description, Visible, Created, CreatedBy and so on. The form for adding / editing loops through these to display relevant controls for the administrator. For example a Checkbox for BooleanAttribute.
An Attribute class, which can be of type StringAttribute, BooleanAttribute, IntegerAttribute and so on. There are also custom Attributes (not just datatypes) such as RangeAttribute, UserAttribute and so on. These are given a data field which is used to get a piece of data specific to the item it is contained in when it is first requested. Basically the Item is given a DataRow which is stored and accessed by Attributes only when they are first requested.
When one item is requested from a specific manager is requested, the manager will loop through all the items in the database and create a new instance of the item class. For example when a Range is requested from the RangeManager, the RangeManager will loop through all of the DataRows in the Ranges table and create a new instance of Range for each one. As stated above it simply creates a new instance with the DataRow, rather than loading all the data into it there and then. The Attributes themselves fetch the relevant data from the DataRow as and when they're first requested.
It just seems a tad stupid, in my mind, to recompile and upload the entire application every time I fix a minor bug or a spelling mistake for a word which is in the code behind (for example if I set the text of a Label dynamically). A fix / change to the Catalogue page, the way it is now, may mean a customer trying to view the Contact page, which is in no way related to the Catalogue page apart from by having the same MasterPage, cannot do so because the DLL is being uploaded.
Basically my question is, given my current situation, how would people suggest I change the architecture of the application by way of splitting it into multiple applications? I mean would it be just customer / admin, or customer / admin and pageviews, or some other way? Or not at all? Are there any other alternatives which I have not mentioned here? Could web services come in handy here? Like split the catalogue itself into a different application and just have the masterpage for all the other pages use a web service to get the names of the products to list on the left hand side? Am I just way WAY over-complicating things? Judging by the length of this question I probably am, and it wouldn't be the first time... I have tried to keep it short, but I always fail... :-)
Many thanks in advance, and sorry if I have just totally confused you!
Regards,
Richard
15000 LOC is not really all that big.
It sounds like you are not pre-compiling your site for publishing. You may want to read this: http://msdn.microsoft.com/en-us/library/1y1404zt(v=vs.80).aspx
Recompiling and uploading the application is the best way to do it. If all you are changing is your markup, that can be uploaded individually (e.g. changing some html layout in an aspx page).
I don't know what you mean here by application timeout, but if your app domain recycles every 5 minutes, then that doesn't seem right at all. You should look into this.
Also, if you find yourself working on various different parts of the site (i.e. many different changes), but need to deploy only some items in isolation, then you should look into how you are using your source control tools (you are using one, aren't you?). Look into something like GIT and branching/merging.
Start by reading:
Application Architecture Guide

Good URL strategy for sitemap and SEO

I run a site where users have their own profile pages. They are also able to post products for sale (that they have made) and write/import blog posts. I am going to be implementing a sitemap and I need to make a final decision with the URL strategy.
Here's what I currently have for products (where 1234 is the product ID that I use to lookup that product):
N.B "product" is a fixed string (although it's another word in the actual site) - all others are dynamic depending on the item.
example.com/product/1234.product-category.product-name
should I change to any of these? i.e:
example.com/maker/users_name/product-category/product-name/1234
example.com/product/product-category/product-name/1234
example.com/product/1234/product-category/product-name
The main items for consideration are:
Where should the product ID go in the URL? Both in terms of readability by the user but also in SEO terms
Should I include the user's name (as he/she made that product) ?
Should I attempt to remove the ID altogether?
I think the first example (example.com/product/1234.product-category.product-name) is the best format but I would consider changing then "." to "-". I am just thinking that if somehow a product name ends in something that triggers an different handler on your server like ".php" or ".jsp" you might have some undesired effects.
Where should the product ID go in the URL? Both in terms of readability by the user but also in SEO terms
I don't really think it matters too much where the product ID goes but as far as the user reading it, I think they pay attention to the end of the line so I would put the ID first leaving the most descriptive part (the product name) at the end.
Should I include the user's name (as he/she made that product) ?
Not sure if you allow your users to change user names, but if you did I would leave the user name out. An example would be someone getting married and changing their last name. This would hurt your SEO since the URL would change but search engines would have already indexed it the old way. You'd have to put some permantent redirects in place to handle this which could be avoided by just leaving the username out.
Should I attempt to remove the ID altogether?
You should leave the ID in the URL in the event that two products have the same name and your algorithm to generate the URL creates a duplicate link.
I prefer this:
example.com/users_name/product-category/product-name/1234
However, one should be aware that the url gets too long some times. It is difficult to represent or promote in a blog or a forum. Why not simply use
example.com/1234 and use the Title to put the other details like category and product name?
Now a days, I think search engines are getting smarter and short urls are used more and more.