Is there a way to check an URL before loading it? Reason why is because I have a URL that is changed dynamically and after multiple changes, the URL returns null and sometimes error on JSON.
Is this possible?
Thanks
Related
I'm scraping http://www.germandeli.com/Meats/Sausages
I would like to extract the link for every product(or item) from the page. I use scrapy shell to test but it keeps return the empty value [ ].
Here is the code I use:
response.xpath('*//h2[#class="item-cell-name"]/a/#href')
Any helps would be greatly appreciated.
Well unfortunately the item content is rendered through JS. But luckily the URL sends a AJAX request to fetch a JSON of the items. This makes it much easier for us to parse it. You can check the XHR tab in the google chrome console to imitate the request with the required headers.
This URL returns the list of products. The limit and the offset parameters in the URL can be played around with to fetch the next set of data. Also to parse the JSON content you can use json.loads from the standard library.
Is this possible? I have an action triggered off as JS, which i respond to appropriately, but sometimes, the request fails at which point I'd like to change to respond_to format to html so that I can redirect to a thank you page. So far I've tried:
redirect_to(thank_you_path, format: 'html') and return if #next_question.nil?
but this still tries to redirect it as a js response, which of course fails......is there anyway I can convert a JS request to an HTML response?
That wouldn't work because it would try to evaluate the html. You would need to return js to redirect the page
I am using scrapy tool to scrape content from website, i need help from you guys how to scrape the reponse which is dynamically loaded from ajax.
when content loading from ajax at that mean time url not changing it keep remains same but content would be changed so on that event i need to crawl.
thank you,
G.kavirajan
yield FormRequest('http://addons.prestashop.com/en/modules/featureproduct/ajax-homefeatured.php',
formdata={'type':'new','ajax':'1'},
callback=self.your_callback_method)
bellow are the urls that you can easily catch using fiddler or firebug
this is for featured tab http://addons.prestashop.com/en/modules/featureproduct/ajax-homefeatured.php?ajax=1&type=random
this is for new tab http://addons.prestashop.com/en/modules/featureproduct/ajax-homefeatured.php?ajax=1&type=new
you can request on these url directly to get results you required, although website is using POST request to get data for these url, but i tried with parameter GET request is also working properly
On my error page that I redirect to for any 404s, I'd like to record the url that the user tried to get to.
I've tried this but it doesn't work:
ErrorDocument 404 /error/?url=%{HTTP_REFERRER}
Can anyone tell me how to do it?
Try it with %{REQUEST_URI}. I'm not certain this will work in ErrorDocument since I've never tested it, but it's worth trying.
ErrorDocument 404 /error/?url=%{REQUEST_URI}
There isn't a direct way. Nor a perfect one. But there are few workarounds with PHP.
For example, I currently use a function to create the links of each page. So I would just need to add file_exists() to the main function (few lines in a single function).
This is the function I would use to create urls:
function url ($Value)
{
// Do some stuff with the url
// [Not showed]
if (!file_exists("/internal/path/".$Value))
{
// Call a function to store the error in a database
error ("404", $Value);
// One way of handling it. Replace '/' for ' ' and search that string.
// Example: "path/to/page" would become "path to page".
$Value=str_replace("/","%20",$Value);
return "http://www.example.com/search=".$Value;
}
else
{
// If the page exists, create the normal link.
return $FullUrl;
}
}
This is my regular way of creating an urls:
<?php url('path/to/page'); ?>
I just thought about this method. It's great as it allows you to find missing pages even IF the user doesn't click on the links. Thank you for making me think about it and now I'll use it in my page (:
Another 'simpler' method (in case you do not wrap links) is that you store last couple of pages visited in $_SESSION['lastpage']; and $_SESSION['lastlastpage'];, if 404 is found then store the corresponding page from which the user tried to access the broken page. It's not a perfect solution since you have to manually find the broken link in the previous page, but at least it gives you some idea of where it is.
Disadvantage: As you can see, both solutions ONLY work with internal broken links.
It would seem there isn't a way.
I have a use case where I am setting the page focus to a particular element (having an anchor before it). When a user is not signed in, there is a redirect to the login page and after signing in, the user is redirected to the page in question, with the URL encoded.
I see that a URL of the form link#target works as expected (focusing on the element) while the url encoded link link%23target doesn't. Is this expected behavior?
Edit: If this is the expected behavior, is there a work around to focus on the target? As in, a way around url encode?
Edit adding more info:
Assuming that there is a code
page1.html
... html before the anchor ...
<a name="test">Some code</a>
... html after the anchor ...
I am accessing the page as page1.html%23test. This doesn't work the same way as page1.html#test. Is there a jQuery method to implement this? Would location.hash contain test even after it has been url encoded? I have no control on changing the url encoding.
Edit:
As I knew which named anchor I wanted to go to after page is redirected, I did a
window.location.hash = namedAnchor
to solve the issue. This JS line is output only if a customer is successfully signed in. Solved my issue, though not the generic answer I was looking for. I was looking for a way to avoid escaping of # in url encode.
Yes. Encoding the # as %23 effectively says "I just mean a plain old "#" character, not a URL fragment". The same is true of other reserved characters: escaping them stops them from having special meaning in the URL.
In your case you do want to encode the URL when passing it to your login page as a parameter, but your login page should decode the URL before performing the redirect.
You can both parse this string with PHP or other script language, or with JavaScript using the encodeURIComponent. I wrote an article for that, you can check on http://www.stoimen.com/blog/2009/05/25/javascript-encode-cyrillic-symbols-with-encodeuricomponent/
Hope that can help you. However despite the default behavior you must check with either method.