C# stumped with screen scraping issue on aspx page

C# stumped with screen scraping issue on aspx page - httpwebrequest

I'm having some trouble scraping some HTML that I'm getting from a postback on a site. It is an aspx page that I am trying to get the generated HTML from.
I have looked at the cookie data and session data and forum data being sent with Chrome developer tools and I still cannot get the page to respond with the search results despite mimicking almost all of it in my code.
There are 3 dropdowns on the page, 2 of which are pre-populated when you first visit the page. After choosing values for the first 2 (it does a postback every time you select on those two), it will populate values for the 3rd drop down. Once selecting a value in the 3rd drop down, you hit the search button and the results come back in a table below that.
After hitting the search button and getting the results on the screen, I went into developer tools and grabbed all of the values that looked relevant (especially all form values) and captured them in my code, but still no luck. Even captured the big viewstate exactly.
Here is a code sample of many code samples that I've tried. Admittedly, I'm not very familiar with some of these classes and I've been trying different code snippets.
I'm not sure if I'm doing it wrong in my code or if I'm just missing form data or cookies to make it execute the POST and return the correct data. My code currently returns HTML from the page back to the responseInString variable, but the HTML looks like it's the first version of the page (as if you visited it for the first time) with no drop down boxes selected and the 3rd is not populated with any values. So I don't know if my code is actually hitting the code-behind and doing the form POST to make it return data.
Any help would be greatly appreciated. Thank you!
using (var wb = new WebClient())
{
var data = new NameValueCollection();
data["_EVENTTARGET"] = "";
data["_EVENTARGUMENT"] = "";
data["_LASTFOCUS"] = "";
data["_VIEWSTATE"] = "(giant viewstate)";
data["__VIEWSTATEGENERATOR"] = "D86C5D2F";
//3 more form input/select fields after this with values corresponding to the drop downs.
wb.Headers.Add(HttpRequestHeader.Cookie,
".ASPXANONYMOUS=(long string);" +
"ASP.NET_SessionId=(Redacted);" +
" _gid=GA1.2.1071490528.1676265043;" +
"LoginToken=(Redacted);" +
"LoginUserID=(Redacted);" +
"_ga=GA1.1.1195633641.1675746985;" +
"_ga_38VTY8CNGZ=GS1.1.1676265043.7.1.1676265065.0.0.0");
wb.Headers.Add("Sec-Fetch-Dest", "document");
wb.Headers.Add("Sec-Fetch-Mode", "navigate");
wb.Headers.Add("Sec-Fetch-Site", "same-origin");
wb.Headers.Add("Sec-Fetch-User", "?1");
wb.Headers.Add("Content-Type", "application/x-www-form-urlencoded");
var response = wb.UploadValues("(the web page url)", "POST", data);
string responseInString = Encoding.UTF8.GetString(response);
return responseInString;
}

Related

TestCafe: Using the same Selector across multiple tests is returning old, incorrect data

I have several similar pages that all load up several header elements based on various inputs. They are auto-generated.
I am writing a test cafe test to confirm that the correct headers have loaded in the correct order for each page. Some pages have more headers, some pages have fewer headers.
My tests all follow the same basic pattern:
test.disablePageCaching('log in and check that columns load in correct order',
async(tc: TestController)=>{
const myPage = new MyPage(tc)
await tc.expect(myPage.getScreen().exists).ok() // Confirm page load
myPage.nagivateToRelevantPage();
const headers = Selector(headerClassName)
const expectedHeaders = ['array','of','expected','values']
const count = await headers.count
for (let i =0; i<count;i++){
const text = await headers.nth(i).innerText.toLowerCase()
await tc.expect(expectedHeaders[i].toLowerCase()).eql(text)
}
(if you spot any small syntax errors, please rest assured that it isn't a matter of an errant parenthesis or a misspelled variable name)
I have 4 of these tests in the same file, and I hop from one to the next. The thing is, I seem to be retaining old data when I hop from one text to the next.
Say my first test checked 10 header elements; my headers.count value is 10. If my second test only contains 3 header elements, I would expect my headers.count value to be 3. Instead, my headers.count value is still 10. Test Cafe seems to just be overwriting the previous data, while retaining the data from the previous test.
Is there an option of some sort to tell Test Cafe to purge this old data in between tests? I have tried the disablePageCaching option, but that is not working for me.

I eventually figured this out; the issue was that I was collecting data too soon after navigating to a new page. I needed to call await tc.expect(myPage.getScreen().exists).ok() after navigating to the new page; that gave test cafe enough time to recognize which data was new and which data was old.

Vuetify Data Table jump to page with selected item

Using Vuetify Data Tables, I'm trying to figure out if there's a way to determine what page the current selected item is on, then jump to that page. My use case for this is, I'm pulling data out of the route to determine which item was selected in a Data Table so when a user follows that URL or refreshes the page that same item is automatically selected for them. This is working just fine, however, I can't figure out how to get the Data Table to display the correct page of the selection.
For example, user visits mysite.com/11
The Data Table shows 10 items per page.
When the user enters the site, item #11 is currently auto-selected, but it is on the 2nd page of items. How can I get this to show items 11-20 on page load?

I ended up using a solution similar to what #ExcessJudgement posted. Thank you for putting that code pen together, BTW! I created this function:
jumpToSelection: function(){
this.$nextTick(() => {
let selected = this.selected[0];
let page = Math.ceil((this.products.indexOf(selected) + 1) / this.pagination.rowsPerPage);
this.pagination.sortBy = "id";
this.$nextTick(() => {
this.pagination.page = page;
});
});
}
I'm not sure why I needed to put this into a $nextTick(), but it would not work otherwise. If anybody has any insight into this, it would be useful to know why this is the case.
The second $nextTick() was needed because updating the sortBy, then the page was causing the page to not update, and since I'm finding the page based on the ID, I need to make sure it's sorted properly before jumping pages. A bit convoluted, but it's working.

Confirmation Draw has completed

I have created a custom button that will allow users to select with columns to output to CSV so the button is not created as part of the table initialization. I have a modal that pops up with checkboxes created off the column headers for selection. It is worth noting I have regex search on each column header. The issue is I am using server side processing and as a result the only exported rows are those visible. As a work around I have set it up to get the page.info().recordsDisplay and set the length of the page to that and draw. The modal pops up and says it is loading data from server once table is fully populated the HTML of the modal will change to the checkboxes for export. Once exported the table is reverted back to the default length. What I need to do is capture when the rows are fully rendered so I can do the HTML switch. Right now I am setting a timeout. The data can take a while to populate as there are some 13k rows if now search is applied. What is the best way to do this and is there a more efficient way?
var tableHeaders = [];
var table = $('#example').DataTable().columns().every( function () {
tableHeaders.push( $(this.header()).text() );
});
var pageLength = table.page.info().length;
table.context['0']._iDisplayLength = table.page.info().recordsDisplay;
table.draw();

There could be an issue with server-side processing parameters start/length that define the portion of data being requested from the server upon each draw.
If your source data has huge number of rows, chances are they're not sent all at once to make use of server-side processing. So, you can try to manipulate those parameters.
Alternatively, you may try to make use of draw event fired upon each redraw.

PDFBox: Fill out a PDF with adding repeatively a one-page template containing a form

Following SO question Java pdfBox: Fill out pdf form, append it to pddocument, and repeat I had trouble appending a cloned page to a new PDF.
Code from this page seemed really interesting, but didn't work for me.
Actually, the answer doesn't work because this is the same PDField you always modify and add to the list. So the next time you call 'getField' with initial name, it won't find it and you get an NPE. I tried with the same pdfbox version used (1.8.12) in the nice github project, but can't understand how he gets this working.
I had the same issue today trying to append a form on pages with different values in it. I was wondering if the solution was not to duplicate field, but can't succeed to do it properly. I always end with a PDF containing same values for each form.
(I provided a link to the template document for Mkl, but now I removed it because it doesn't belong to me)
Edit: Following Mkl's advices, I figured it out what I was missing, but performances are really bad with duplicating every pages. File size isn't satisfying. Maybe there's a way to optimize this, reusing similar parts in the PDF.

Finally I got it working without reloading the template each time. So the resulting file is as I wanted: not too big (4Mb for 164 pages).
I think I did 2 mistakes before: one on page creation, and probably one on field duplication.
So here is the working code, if someone happens to be stuck on the same problem.
Form creation:
PDAcroForm finalForm = new PDAcroForm(finalDoc, new COSDictionary());
finalForm.setDefaultResources(originForm.getDefaultResources())
Page creation:
PDPage clonedPage = templateDocument.getPage(0);
COSDictionary clonedDict = new COSDictionary(clonedPage.getCOSObject());
clonedDict.removeItem(COSName.ANNOTS);
clonedPage = new PDPage(clonedDict);
finalDoc.addPage(clonedPage);
Field duplication: (rename field to become unique and set value)
PDTextField field = (PDTextField) originForm.getField(fieldName);
PDPage page = finalDoc.getPages().get(nPage);
PDTextField clonedField = new PDTextField(finalForm);
List<PDAnnotationWidget> widgetList = new ArrayList<>();
for (PDAnnotationWidget paw : field.getWidgets()) {
PDAnnotationWidget newWidget = new PDAnnotationWidget();
newWidget.getCOSObject().setString(COSName.DA, paw.getCOSObject().getString(COSName.DA));
newWidget.setRectangle(paw.getRectangle());
widgetList.add(newWidget);
}
clonedField.setQ(field.getQ()); // To get text centered
clonedField.setWidgets(widgetList);
clonedField.setValue(value);
clonedField.setPartialName(fieldName + cnt++);
fields.add(clonedField);
page.getAnnotations().addAll(clonedField.getWidgets());
And at the end of the process:
finalDoc.getDocumentCatalog().setAcroForm(finalForm);
finalForm.setFields(fields);
finalForm.flatten();

web2py SQLFORM accepting too fast

I have a web2py SQLFORM that gets generated and returned by an AJAX call and the form is put in a DIV I defined.. I want that SQLFORM to be an update form, not an insert form. The problem is the form immediately runs it's accept function once it is written to that DIV. This doesn't happen if the form is for inserting, only updating. That initial accept fails and hitting the submit button does not allow for a second accept.
I don't know why the accept fails or why it happens immediately.
heres the JavaScript function that makes the AJAX call
function displayForm(currID){
//Remove everything from the DIV we want to use
$('#window').empty();
//Call the ajax to bring down the form to update the series
ajax('{{=URL('newForm')}}/'+currID,
[], 'window');
}
And here is the newForm controller
def newSerForm():
record = db.myTable(request.args[0])
form = SQLFORM(db.myTable, record, fields=['series_name','image_thumbnail'])
if form.accepts(request.vars,session):
print 'Series update successful!'
else:
print 'Series update Failed...'
return form
displayForm is fired by clicking a button and once you do the form accepts and fails and the submit button doesn't work again. Is there a way to make an SQLFORM do this? The weird thing is if I change this to make inserts into myTable, it works fine. It behaves exactly as it should. But doing it this way doesn't work.
Ok now this is where it gets weird.
I tried to achieve the same functionality here with a totally different approach, an iFrame. I made new functions in my controllers that create the form based on request.args[0]. looks like this
def editEntry():
print request.args[0]
record = db.myTable(request.args[0])
form = SQLFORM(db.CC_user_submission, record, fields=['series_name', 'image_thumbnail']).process()
return dict(form=form)
And then a corresponding HTML page that just displays form. What could be simpler right? I go to that page based on a link that gives the correct argument. Take me to a page with a form for updating. Updating works perfect. Great, now lets put it in an iFrame instead of linking to it. I put it in an iFrame on the original page. Open up the iFrame. Doesn't work. I have no idea what is going is there any part of an explanation to this?

By using the iFrame method I actually got this one to work. Since it required an iFrame to be appended with jQuery which needs quotes and the iFrame URL which also needs quotes, the notation got pretty confusing but it's doable. Looked like this:
var myURL = "{{=URL('editEntry')}}/"+idArg.toString();
$('#window').append("<iframe src = " + myURL + " width='450' height='400'> </iframe>");
It's not pretty but it works.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

C# stumped with screen scraping issue on aspx page - httpwebrequest

Related

TestCafe: Using the same Selector across multiple tests is returning old, incorrect data

Vuetify Data Table jump to page with selected item

Confirmation Draw has completed

PDFBox: Fill out a PDF with adding repeatively a one-page template containing a form

web2py SQLFORM accepting too fast

Categories

Resources