how to crawl asp webform link using scrapy - scrapy

I want to scrape a webform site but the links aren't regular hrefs they are like below:
and I want to have scrapy get that link and go there
< a id="ctl00_ContentPlaceHolder1_DtGrdAttraf_ctl06_LnkBtnDisplayHadith" title="some title" class="Txt" onmouseover="changeStyle(this, 'lnk')" onmouseout="changeStyle(this, 'Txt TxtSmall')" href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("ctl00$ContentPlaceHolder1$DtGrdAttraf$ctl06$LnkBtnDisplayHadith", "", false, "", "http://www.sonnaonline.com/DisplayResults.aspx?Menu=1&ParentID=13&Flag=dbID&Selid=8483", false, true))">the link text</ a>

Asp.net is a form-driven framework. So, you have to fill in the form and manually post it to get to the page directs?
How to do that?
At first, you can have a look at here, my scrapy code.
https://github.com/Timezone-design/python-scrapy-asp-net/blob/master/scrapy_spider/spiders/burzarada_spider.py
You should first find out what WebForm_DoPostBackWithOptions() do in the page. You can just search the function by Ctrl+U, from the page source.
You will soon find out what it does, where does it fill these informations "ctl00$ContentPlaceHolder1$DtGrdAttraf$ctl06$LnkBtnDisplayHadith", "", false, "", "http://www.sonnaonline.com/DisplayResults.aspx?Menu=1&ParentID=13&Flag=dbID&Selid=8483", false, true in.
Then, the thing is clear.
You extract the href of the a tag to a string by
response.css('... a ::attr(href)').extract()[0].href # assuming there are many <a>s there
Then split the string "ctl00$ContentPlaceHolder1$DtGrdAttraf$ctl06$LnkBtnDisplayHadith", "", false, "", "http://www.sonnaonline.com/DisplayResults.aspx?Menu=1&ParentID=13&Flag=dbID&Selid=8483", false, true by commas, and, fill them in proper input elements and post it by scrapy.FormRequest.
yield scrapy.FormRequest(
'https://burzarada.hzz.hr/Posloprimac_RadnaMjesta.aspx',
formdata = {
'__EVENTTARGET': eventTarget,
'__EVENTARGUMENT': eventArgument,
'__LASTFOCUS': lastFocus,
'__VIEWSTATE': viewState,
'__VIEWSTATEGENERATOR': viewStateGenerator,
'__VIEWSTATEENCRYPTED': viewStateEncrypted,
'ctl00$MainContent$ddlPageSize': pageSize,
'ctl00$MainContent$ddlSort': sort,
},
callback=self.parse_multiple_pages
)
Explanation:
https://burzarada.hzz.hr/Posloprimac_RadnaMjesta.aspx # url to post the form.
formdata # form data as json. keys are input names.
callback # function to get the response and do next things.
Viola! You can get into the page and the response can be got as an argument in function you gave as callback.
You can see some examples in the link above.

Related

How to have PhantomJSCloud wait for imbedded javascript in content

I am passing an HTML string to PhantomJSCloud in the "content" of the page. The HTML has JS embedded in it inside script tags at the end of the string. When I return the jpeg that I am requesting from PJSC, the objects that the JS manipulates, have not been manipulated. I know the js works, because I can copy and paste the whole html string to a file, open it in chrome, and watch it happen.
It is using Chart.js, which has an animation option, but i have set it to false.
Currently my request JSON looks like this:
"pages": [
{
"content": "$$$CONTENT$$$",
"renderSettings": {
"quality": 100,
"selector": "[id='report']"
},
"requestSettings": {
"ioWait": 5000,
"waitInterval": 5000
}
}
]
}
Replacing the "$$$CONTENT$$$" with my actual HTML string. The whole request takes less than 5 seconds so the "waitInterval" doesn't seem to be what I'm looking for.
This turned out to not be an issue of whether it was waiting on the javascript. My javascript was manipulating text, and some of that text had a \n inside of it. It apparently needed a \\\n

Generate Url from virtual field or from value of another field

I would like to generate a Url in a list in keystoneJS. I prefer that the url not be stored in mongo.
Attempted:
Virtual field: works, but will not generate raw HTML for the href.
Types.Url: I get the href format, but I need a value from another field in my model, so it generates the url with undefined.. Example:
{ type: Types.Url, label: "Link", default: "[http://www.stackoverflow.com/ask?id=][1]" + this._id }
Any help on how to pull this off would be much appreciated.
For your second point, this._id is not available when adding fields to the model, hence why you're getting undefined.
Instead, try using a pre-save hook on your model:
yourModel.pre('save', function(next) {
this.link = "[http://www.stackoverflow.com/ask?id=][1]" + this._id;
next();
}
I'm not quite sure if you're trying to just generate a link in this way every time, or if the user should also be able to add their own link. If the later, you'll need to check if the link has been filled in in the pre-save hook.
I hope that helps and sorry it took so long to get an answer on this!

get request url after AJAX request

I have a search page with link Search?params but any subsequent search requests are made via Ajax forms using Asp.Net. It makes a request to an action with a different name like InstantSearch?params but in the browser I see Search?params.
From this page I have a link to another page and I need to save the Url to return back to this page.
But if I had an AJAX request, Request.Url returns InstantSearch?params, not the link from browser address bar. And the action from this link returns only a Partial View, so when it returns to the previous URL the page is messed up.
How do I get the link of the previous page, from the browser address bar in Asp.Net, not the actual last requested URL?
While searching we are loading masonry containers like this:
$("#main-content-container").load("/Kit/InstantSearch?" + parameters, function() {
$('#mason-container').imagesLoaded(function() {
$('#mason-container').masonry({
itemSelector: '.kit-thumb-container',
columnWidth: 210,
isFitWidth: true,
gutter: 10
});
});
});
Then I'm calling foundation Joyride on same page and need to pass current page URL to return back. Joyride calls onload of the page under this link:
#Html.ActionLink("Go to kit details help", "OrderPageHelp", "Kit", new { returnUrl = Request.Url }, new { #style = "font-size:16px;" })
The needed page return Url is Kit/Search?params, but Request.Url returns that last request when loading masonry with Kit/InstantSearch?params.
How can I pass the needed Url without hard-coding it?
So this ones a bit old but I found myself in a similar situation recently and found a quick work around. Posting it in case any one's interested.
You can solve this problem by taking advantage of the TempData class.
Temp Data can be used to store data in between requests. The information will remain as long as the session is active, until you retrieve the data again.
So when the user first loads the page, before the ajax method is triggered, store the data in a variable on the page AND in the TempData("YourVariableName") object. Create the Action Link with the Saved URL. When the ajax request is fired it will overwrite the value in Request.URL. So, Check for a value in the TempData("YourVariableName"), if it is there, use that value AND Reset the TempData("YourVariableName") value. This will keep the original value of the page URL even after many ajax requests have been triggered. Code in Visual Basic:
#Code
Dim LastURL As String = ""
If Not TempData("LastURL") Is Nothing Then
LastURL = TempData("LastURL")
TempData("LastURL") = LastURL
Else
LastURL = Request.Url.AbsoluteUri
TempData("LastURL") = LastURL
End If
End Code
And pass the value stored in the LastURL variable as a parameter to your action link.

Is it possible to pass data via the post method to magnific popup when ajax loading content?

I'm using magnific popup and ajax loading content into it and passing values to the ajax content by appending a query string to the url, which works fine except in IE7 (and probably IE8 as well). The reason is very likely the length of the query string, because it works when I shorten it.
So my question is, is it possible to pass it via some sort of data setting and make it use POST instead of GET. Or does it already use post and I just need to use the right method.
This is what I have:
$.magnificPopup.open({
tLoading:"",
modal:false,
type:'ajax',
alignTop:true,
items:{src:urlContainingVeryLongQueryString},
callbacks:
{
ajaxContentAdded:function()
{
...
My test url is 906 characters long in total (well within IE7's 2000ish limit).
ajax.settings option http://dimsemenov.com/plugins/magnific-popup/documentation.html#ajax_type is passed to jQuery.ajax method http://api.jquery.com/jQuery.ajax/#jQuery-ajax-settings , e.g.:
$.magnificPopup.open({
tLoading:"",
modal:false,
type:'ajax',
alignTop:true,
items:{src:'http://example.com/ajax'},
ajax: {
settings: {
type: 'POST',
data: {
foo: 'bar'
}
}
}
});

set jqGrid page before url is called

I am looking for a way to set the page of a jqGrid to x...
My use case is someone is using my grid...
They click on a patient to edit that patient (I am not using jqGrids modal edit screen... to many modal windows already)...
When the save what they did to that patient, I want to redirect the browser back to the screen where they clicked on that patient, and back to the SAME PAGE...
The thing to keep in mind.
I am using asp.net MVC4. I call the first page via an action method. The url variable of my grid is another action in the same controller. That action is what I send my page and row variables down to. I am sure that this can be done, However, I have no idea of how to achieve it. So far I have tried to set the page variable and rows variable in my document.ready before I call the jqGrid...
tbl.jqGrid({
loadBeforeSend: function () {
page: pageFromTemp;
rows: rowFromTemp
}
});
basically I have tried different ways to do it. The above is just one of them.
I have tried to reload the grid in the document.ready. But that doesn't make any sense. Why reload the grid when you haven't given it any of the parameters it needs...
I have tried to set the variable in the beforeRequest event. I have a function that I try and set it in...
beforeRequest: function () {
if ((rowFromTemp != "") && (pageFromTemp != "")) {
$(this).trigger('reloadGrid', [{ page: pageFromTemp, rowNum: rowFromTemp, url: '/Encounters/GetAjaxPagedGridData/' }]);
//$.extend($(this).setGridParam({ page: pageFromTemp })),
//$.extend($(this).setGridParam({ rowNum: rowFromTemp })),
//$.extend($(this).setGridParam({ url: '/Encounters/GetAjaxPagedGridData/' }))
//$.trigger('reloadGrid', [{ page: pageFromTemp, rowNum: rowFromTemp, url: '/Encounters/GetAjaxPagedGridData/'}]);
}
},
But that doesn't work either. I am obviously missing something. What am I doing wrong...
Got it to change to the right page using loadComplete and $("frTable").trigger({})
But now I am getting a flashing Loading screen which indicates to me that it is still loading the data...
If I set a breakpoint in my code, I can confirm that it is loading the data. I am not doing something right here.
Load the grid in document ready, have it's datatype set to local, have it's url unassigned, and have it hidden. When you want to have it load, trigger the load after setting the parameters and then show it to the user.