i have a website which is being updated with a new Domain URL, i want to use a fast way of searching the whole website for the old url to find all of the references is there anything such as automation that can help me do this?
If the old url does not exist anymore maybe you can use a tool to check for dead links, for example deadlinkchecker or brokenlinkcheck.
Another way is to write a script to crawl the site.
Related
I have a little issue concerning the sitemap of a website.
The website is : http://parmemarin.com.
If I go on http://www.xml-sitemaps.com/ and try to generate a sitemap for my website I end up with only one link in my sitemap which is _inc/conditions.php
There is no index.php or other of my links (index.php?page=...)
Can someone help me on this one ?
Thanks
Well, that could be a problem with xml-sitemaps.com, it could be a problem with your site, or it could be a combination of both. I don't know that service, so I can't tell how it really works and what it does.
Skimming through your markup, I noticed that you linked to some URLs without encoding the ampersand. & (if used for parameters) should be & if you link it in HTML.
I wonder: Why don't you build your sitemap yourself? You are the only one who knows for sure which pages exist on your site. Collect all your URLs and put them in a file, like http://www.sitemaps.org/ describes.
thanks, for your answer.
I had totally forgot to delete the meta : no index, no follow ... That clearly didn't helped :)
I have a website which when you first go to the website it will just display the normal domain so /. When they use the form they will get forwarded to lets say /question/DYNAMIC(question id).
So google has no way to see these links.
Is there a way to tell google about all of these links without manually putting these in and without having to keep this up-to-date as some question might be removed at a later date?
Submit an XML sitemap
In the effort of building a live site on its actual live hosting platform is there a way to tell google to not YET index the website? I found the following:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710
But would that tell them to never come back or would they simply see the noindex tag and then not list the results, then when it comes back to crawl again later and my site is good to go I would have the noindex removed and the site would then start getting indexed?
Sounds like you want to use a robots.txt file instead:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449&topic=2370588&ctx=topic
Update your robots.txt file when you want your content to be indexed.
You can use the robot.txt method.
You can specify which subpage could be spidered. And google comes back, checking the file before indexing. So you can delete the file later in order to get fully indexed.
More Information
About /robots.txt
Robots.txt File Generator
You can always change it. The way Google and other robots find your page is if it is linked to on another page. As long as it isn't linked to on another page, it won't be found. Also, once your site is up, chances are that it will be far back in the list of sites.
I am looking for a way to utilize the FireShot API with JS to given a URL (or perhaps a list) use the FireShot API to take screenshot, upload to Imgur, then return the user the URLs or perhaps something like markdown to use quickly in forums.
Method 1: Open new window
I tried opening the URL in a new window, but found that I cant control that page with JS dues to cross domain problems. The same with iFrames.
Method 2: simple $.get()
A simple $.get() wont work because of the same cross domain issues I guess?
http://jsfiddle.net/t6aeq/
$.get($url.val(), function(data) {
console.log(data);
});
Via PHP "Proxy"
So I tried creating a simple PHP script that gets the HTML of the URL and returns it to my JS (using file_get_contents($url)). But some sites like Microsoft will detect that I am using some automated methods and give an error page of sorts. I also cant seem to find a way to use jQuery to query that returned HTML for link[rel=stylesheet], script, style and body to append to the head and a div respectively. I posted abt that on another question
A new Idea: Embed scripts on browser level
So I thought away of getting around these is using iMacros or GreeseMonkey or something to insert scripts into pages on the browser level instead? But any guidance or tips on how can I do that? Also, I'd prefer a pure JS/PHP method if available so users are not limited to using Browser plugin/scripts (tho I will be the only user for now)
It suddenly came to my mind that this may not work because the FireShot API key and Imgur is limited to the domain? Any solutions?
You might be able to inject the FireShot script using Greasemonkey. But, first use GM_xmlhttpRequest() to fetch an API key, for that page's domain, from the "Create FireShot API Key" page.
Note that GM_xmlhttpRequest() does not have the same cross-domain issues that $.get() has.
However, at this point you might be better off just writing your own Firefox add-on. Maybe start with FireShot's code for ideas. Also see the Screengrab add-on.
So I have a website that I recently made changes to, and one of the changes was removing a page from the site. I deleted the page, it doesn't exist anymore.
However, when you search for my site, one of the results is the page that I deleted. People are clicking on the page and getting an error.
How do I remove that page from the search results?
Here is the solution
First get ur site on google webmaster. Then go to site configuration -- > crawler access --> remove url . Click on New removal request and add the page you want to remove and make sure you have added that page to the robots.txt of your site. Google will deindex the page within 24 hrs.
You simply wait for googles robots to find out that it doesn't exist anymore.
A trick that used to work is to upload a sitemap to google where you add the url to the deleted page and set it to top priority and that it changes every day. That way the google robots will prio that page and quicker find out that its not there anymore.
There might be other ways but none that are known to me.
You can remove specific pages using the webmaster tools I believe.
Yahoo Web tools offer a similar service as I understand it.
This information was correct the last time I tried to do this a little while ago.
You should go to https://www.google.com/webmasters/tools/removals and remove the pages which you want.