Every day, I need to check my visa application status on the USCIS website (https://egov.uscis.gov/casestatus/landing.do). Since manually doing it gets cumbersome, I created automation in UIPath to run every few hours and email me if the status changed. However, it still needs to open the browser, navigate to the page, read the result, etc.
Is there a better way of going about this?
I tried finding if USCIS has any API that I could programmatically call, but there doesn't seem to be any. I looked at the page and found that the text box for the receipt number has the following HTML:
<input id="receipt_number" name="appReceiptNum" type="text" class="form-control textbox" maxlength="13"/>
So, from Postman, I tried firing a GET request:
GET https://egov.uscis.gov/casestatus/landing.do?receipt_number=XXXXXXXX
where XXXXXXXX would be my actual application number. But this didn't work and it just returned the main page. I tried switching it to a POST, but that didn't work either and returned the same result. On further inspection, I realized that the actual result page has a different URL, so I tried GET and POST both, on the result URL:
GET https://egov.uscis.gov/casestatus/mycasestatus.do?receipt_number=XXXXXXXX
This gets me a page telling me that there were validation errors and they didn't recognize.
Went back to the manual process to see if I was missing anything. The result page URL has a format
https://egov.uscis.gov/casestatus/mycasestatus.do?JSESSIONID=ZZZZZZZZZ
where ZZZZZZZZZ is the value of JSESSIONID cookie set during the landing page. So I changed my process to:
Send a GET request to the landing page (https://egov.uscis.gov/casestatus/landing.do)
Copy the value of JSESSIONID cookie from the response and set that as a query parameter in the request to the result page (https://egov.uscis.gov/casestatus/mycasestatus.do), while sending receipt_number as the payload in a POST request
This isn't working either. My end goal was to write a Python or Java code (since those are the two I am familiar with) to get me the result, but I guess if I can't get my manual requests working from Postman, getting it to work from code is a pipe dream.
You don't need the session tag, just change the param name in your postman request to appReceiptNum and it will work: https://egov.uscis.gov/casestatus/mycasestatus.do?appReceiptNum=LINXXXXXXXXXX
#Alok
What you require is term "headless browser / scraper"
just created a quick sample ( but in node.js)
const today = formatYmd(new Date())
const browser = await puppeteer.launch()
const page = await browser.newPage()
console.log("going to URL")
await page.goto(url)
await page.$eval('#receipt_number', (el,receipt) => el.value = `${receipt}`, process.env.RECEIPT_NUMBER)
await page.click('input[type="submit"]')
console.log("waiting for submission to be completed.")
await page.waitForSelector('div.current-status-sec').catch(t => console.log("Not able to load status screen"))
const status = removeTags(await page.$eval('.current-status-sec', el => el.innerText))
console.log(`${today}: ${status}`)
await page.screenshot({path: `./screenshot/${today}_screenshot.png`})
browser.close()
You can find the full repo here.
https://github.com/Parthashah/uscis-status-check
The code will provide back screenshot and status
I just created a simple USCIS web crawler scraper Spring Boot app using HtmlUnit.
The only thing is my crawler ignores the wrong case number. But is working. The github link is here: https://github.com/somych1/USCISCaseStatusWebScraper
public ResponseDTO getStatus(String caseId){
ResponseDTO responseDTO = new ResponseDTO();
//browser setup
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setRedirectEnabled(true);
webClient.getOptions().setJavaScriptEnabled(false);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getCookieManager().setCookiesEnabled(false);
webClient.getOptions().setTimeout(8000);
webClient.getOptions().setDownloadImages(false);
webClient.getOptions().setGeolocationEnabled(false);
webClient.getOptions().setAppletEnabled(false);
try{
// loading the HTML to a Document Object
HtmlPage page = webClient.getPage(url);
// case lookup
HtmlInput input = page.getHtmlElementById("receipt_number");
input.setValueAttribute(caseId);
HtmlInput button = page.getElementByName("initCaseSearch");
HtmlPage pageAfterClick = button.click();
// new page after click
HtmlHeading1 h1 = pageAfterClick.getFirstByXPath("//div/h1");
HtmlParagraph paragraph = pageAfterClick.getFirstByXPath("//div/p");
//setting response object
responseDTO.setCaseId(caseId);
responseDTO.setStatus(status);
responseDTO.setDescription(description);
} catch (IOException ex) {
ex.printStackTrace();
}
return responseDTO;
}
I sucessfully display a web site on WebView2 in my VB.net (Visual Studio 2017) project but can not get html souce code. Please advise me how to get html code.
My code:
Private Sub testbtn_Click(sender As Object, e As EventArgs) Handles testbtn.Click
WebView2.CoreWebView2.Navigate("https://www.microsoft.com/")
End Sub
Private Sub WebView2_NavigationCompleted(sender As Object, e As CoreWebView2NavigationCompletedEventArgs) Handles WebView2.NavigationCompleted
Dim html As String = ?????
End Sub
Thank you indeed for your advise in advance.
I've only just started messing with the WebView2 earlier today as well, and was just looking for this same thing. I did manage to scrape together this solution:
Dim html As String
html = Await WebView2.ExecuteScriptAsync("document.documentElement.outerHTML;")
' The Html comes back with unicode character codes, other escaped characters, and
' wrapped in double quotes, so I'm using this code to clean it up for what I'm doing.
html = Regex.Unescape(html)
html = html.Remove(0, 1)
html = html.Remove(html.Length - 1, 1)
Converted my code from C# to VB on the fly, so hopefully didn't miss any syntax errors.
Adding to #Xaviorq8 answer, you can use Span to get rid of generating new strings with Remove:
html = Regex.Unescape(html)
html = html.AsSpan()[1..^1].ToString();
I must credit #Xaviorq8; his answer was needed to solve my problem.
I was successfully using .NET WebBrowser and Html Agility Pack but I wanted to replace WebBrowser with .NET WebView2.
Snippet (working code with WebBrowser):
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.Load(webBrowser1.DocumentStream);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[#id=\"apptAndReportsTbl\"]");
Snippet (failing code with WebView2):
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
string html = await webView21.ExecuteScriptAsync("document.documentElement.outerHTML");
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.LoadHtml(html);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[#id=\"apptAndReportsTbl\"]");
Success withWebView2 and Html Agility Pack
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
string html = await webView21.ExecuteScriptAsync("document.documentElement.outerHTML");
// thanks to #Xaviorq8 answer (next 3 lines)
html = Regex.Unescape(html);
html = html.Remove(0, 1);
html = html.Remove(html.Length - 1, 1);
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.LoadHtml(html);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[#id=\"apptAndReportsTbl\"]");
The accepted answer is on the right track. However, it's missing on important thing:
The returned string is NOT HTMLEncoded, it's JSON!
So to do it right, you need to deserialize the JSON, which is just as simple:
Dim html As String
html = Await WebView2.ExecuteScriptAsync("document.documentElement.outerHTML;")
html = Await JsonSerializer.DeserializeAsync(Of String)(html);
We have the requirement to take a form submission and save some data, then redirect the user to a page offsite, but in redirecting, we need to "submit" a form with POST, not GET.
I was hoping there was an easy way to accomplish this, but I'm starting to think there isn't. I think I must now create a simple other page, with just the form that I want, redirect to it, populate the form variables, then do a body.onload call to a script that merely calls document.forms[0].submit();
Can anyone tell me if there is an alternative? We might need to tweak this later in the project, and it might get sort of complicated, so if there was an easy we could do this all non-other page dependent that would be fantastic.
Anyway, thanks for any and all responses.
Doing this requires understanding how HTTP redirects work. When you use Response.Redirect(), you send a response (to the browser that made the request) with HTTP Status Code 302, which tells the browser where to go next. By definition, the browser will make that via a GET request, even if the original request was a POST.
Another option is to use HTTP Status Code 307, which specifies that the browser should make the redirect request in the same way as the original request, but to prompt the user with a security warning. To do that, you would write something like this:
public void PageLoad(object sender, EventArgs e)
{
// Process the post on your side
Response.Status = "307 Temporary Redirect";
Response.AddHeader("Location", "http://example.com/page/to/post.to");
}
Unfortunately, this won't always work. Different browsers implement this differently, since it is not a common status code.
Alas, unlike the Opera and FireFox developers, the IE developers have never read the spec, and even the latest, most secure IE7 will redirect the POST request from domain A to domain B without any warnings or confirmation dialogs! Safari also acts in an interesting manner, while it does not raise a confirmation dialog and performs the redirect, it throws away the POST data, effectively changing 307 redirect into the more common 302.
So, as far as I know, the only way to implement something like this would be to use Javascript. There are two options I can think of off the top of my head:
Create the form and have its action attribute point to the third-party server. Then, add a click event to the submit button that first executes an AJAX request to your server with the data, and then allows the form to be submitted to the third-party server.
Create the form to post to your server. When the form is submitted, show the user a page that has a form in it with all of the data you want to pass on, all in hidden inputs. Just show a message like "Redirecting...". Then, add a javascript event to the page that submits the form to the third-party server.
Of the two, I would choose the second, for two reasons. First, it is more reliable than the first because Javascript is not required for it to work; for those who don't have it enabled, you can always make the submit button for the hidden form visible, and instruct them to press it if it takes more than 5 seconds. Second, you can decide what data gets transmitted to the third-party server; if you use just process the form as it goes by, you will be passing along all of the post data, which is not always what you want. Same for the 307 solution, assuming it worked for all of your users.
You can use this aproach:
Response.Clear();
StringBuilder sb = new StringBuilder();
sb.Append("<html>");
sb.AppendFormat(#"<body onload='document.forms[""form""].submit()'>");
sb.AppendFormat("<form name='form' action='{0}' method='post'>",postbackUrl);
sb.AppendFormat("<input type='hidden' name='id' value='{0}'>", id);
// Other params go here
sb.Append("</form>");
sb.Append("</body>");
sb.Append("</html>");
Response.Write(sb.ToString());
Response.End();
As result right after client will get all html from server the event onload take place that triggers form submit and post all data to defined postbackUrl.
HttpWebRequest is used for this.
On postback, create a HttpWebRequest to your third party and post the form data, then once that is done, you can Response.Redirect wherever you want.
You get the added advantage that you don't have to name all of your server controls to make the 3rd parties form, you can do this translation when building the POST string.
string url = "3rd Party Url";
StringBuilder postData = new StringBuilder();
postData.Append("first_name=" + HttpUtility.UrlEncode(txtFirstName.Text) + "&");
postData.Append("last_name=" + HttpUtility.UrlEncode(txtLastName.Text));
//ETC for all Form Elements
// Now to Send Data.
StreamWriter writer = null;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = postData.ToString().Length;
try
{
writer = new StreamWriter(request.GetRequestStream());
writer.Write(postData.ToString());
}
finally
{
if (writer != null)
writer.Close();
}
Response.Redirect("NewPage");
However, if you need the user to see the response page from this form, your only option is to utilize Server.Transfer, and that may or may not work.
Something new in ASP.Net 3.5 is this "PostBackUrl" property of ASP buttons. You can set it to the address of the page you want to post directly to, and when that button is clicked, instead of posting back to the same page like normal, it instead posts to the page you've indicated. Handy. Be sure UseSubmitBehavior is also set to TRUE.
This should make life much easier.
You can simply use Response.RedirectWithData(...) method in your web application easily.
Imports System.Web
Imports System.Runtime.CompilerServices
Module WebExtensions
<Extension()> _
Public Sub RedirectWithData(ByRef aThis As HttpResponse, ByVal aDestination As String, _
ByVal aData As NameValueCollection)
aThis.Clear()
Dim sb As StringBuilder = New StringBuilder()
sb.Append("<html>")
sb.AppendFormat("<body onload='document.forms[""form""].submit()'>")
sb.AppendFormat("<form name='form' action='{0}' method='post'>", aDestination)
For Each key As String In aData
sb.AppendFormat("<input type='hidden' name='{0}' value='{1}' />", key, aData(key))
Next
sb.Append("</form>")
sb.Append("</body>")
sb.Append("</html>")
aThis.Write(sb.ToString())
aThis.End()
End Sub
End Module
Thought it might interesting to share that heroku does this with it's SSO to Add-on providers
An example of how it works can be seen in the source to the "kensa" tool:
https://github.com/heroku/kensa/blob/d4a56d50dcbebc2d26a4950081acda988937ee10/lib/heroku/kensa/post_proxy.rb
And can be seen in practice if you turn of javascript. Example page source:
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Heroku Add-ons SSO</title>
</head>
<body>
<form method="POST" action="https://XXXXXXXX/sso/login">
<input type="hidden" name="email" value="XXXXXXXX" />
<input type="hidden" name="app" value="XXXXXXXXXX" />
<input type="hidden" name="id" value="XXXXXXXX" />
<input type="hidden" name="timestamp" value="1382728968" />
<input type="hidden" name="token" value="XXXXXXX" />
<input type="hidden" name="nav-data" value="XXXXXXXXX" />
</form>
<script type="text/javascript">
document.forms[0].submit();
</script>
</body>
</html>
PostbackUrl can be set on your asp button to post to a different page.
if you need to do it in codebehind, try Server.Transfer.
#Matt,
You can still use the HttpWebRequest, then direct the response you receive to the actual outputstream response, this would serve the response back to the user. The only issue is that any relative urls would be broken.
Still, that may work.
I suggest building an HttpWebRequest to programmatically execute your POST and then redirect after reading the Response if applicable.
Here's what I'd do :
Put the data in a standard form (with no runat="server" attribute) and set the action of the form to post to the target off-site page.
Before submitting I would submit the data to my server using an XmlHttpRequest and analyze the response. If the response means you should go ahead with the offsite POSTing then I (the JavaScript) would proceed with the post otherwise I would redirect to a page on my site
In PHP, you can send POST data with cURL. Is there something comparable for .NET?
Yes, HttpWebRequest, see my post below.
The GET (and HEAD) method should never be used to do anything that has side-effects. A side-effect might be updating the state of a web application, or it might be charging your credit card. If an action has side-effects another method (POST) should be used instead.
So, a user (or their browser) shouldn't be held accountable for something done by a GET. If some harmful or expensive side-effect occurred as the result of a GET, that would be the fault of the web application, not the user. According to the spec, a user agent must not automatically follow a redirect unless it is a response to a GET or HEAD request.
Of course, a lot of GET requests do have some side-effects, even if it's just appending to a log file. The important thing is that the application, not the user, should be held responsible for those effects.
The relevant sections of the HTTP spec are 9.1.1 and 9.1.2, and 10.3.
Typically, all you'll ever need is to carry some state between these two requests. There's actually a really funky way to do this which doesn't rely on JavaScript (think <noscript/>).
Set-Cookie: name=value; Max-Age=120; Path=/redirect.html
With that cookie there, you can in the following request to /redirect.html retrieve the name=value info, you can store any kind of information in this name/value pair string, up to say 4K of data (typical cookie limit). Of course you should avoid this and store status codes and flag bits instead.
Upon receiving this request you in return respond with a delete request for that status code.
Set-Cookie: name=value; Max-Age=0; Path=/redirect.html
My HTTP is a bit rusty I've been going trough RFC2109 and RFC2965 to figure how reliable this really is, preferably I would want the cookie to round trip exactly once but that doesn't seem to be possible, also, third-party cookies might be a problem for you if you are relocating to another domain. This is still possible but not as painless as when you're doing stuff within your own domain.
The problem here is concurrency, if a power user is using multiple tabs and manages to interleave a couple of requests belonging to the same session (this is very unlikely, but not impossible) this may lead to inconsistencies in your application.
It's the <noscript/> way of doing HTTP round trips without meaningless URLs and JavaScript
I provide this code as a prof of concept: If this code is run in a context that you are not familiar with I think you can work out what part is what.
The idea is that you call Relocate with some state when you redirect, and the URL which you relocated calls GetState to get the data (if any).
const string StateCookieName = "state";
static int StateCookieID;
protected void Relocate(string url, object state)
{
var key = "__" + StateCookieName + Interlocked
.Add(ref StateCookieID, 1).ToInvariantString();
var absoluteExpiration = DateTime.Now
.Add(new TimeSpan(120 * TimeSpan.TicksPerSecond));
Context.Cache.Insert(key, state, null, absoluteExpiration,
Cache.NoSlidingExpiration);
var path = Context.Response.ApplyAppPathModifier(url);
Context.Response.Cookies
.Add(new HttpCookie(StateCookieName, key)
{
Path = path,
Expires = absoluteExpiration
});
Context.Response.Redirect(path, false);
}
protected TData GetState<TData>()
where TData : class
{
var cookie = Context.Request.Cookies[StateCookieName];
if (cookie != null)
{
var key = cookie.Value;
if (key.IsNonEmpty())
{
var obj = Context.Cache.Remove(key);
Context.Response.Cookies
.Add(new HttpCookie(StateCookieName)
{
Path = cookie.Path,
Expires = new DateTime(1970, 1, 1)
});
return obj as TData;
}
}
return null;
}
Copy-pasteable code based on Pavlo Neyman's method
RedirectPost(string url, T bodyPayload) and GetPostData() are for those who just want to dump some strongly typed data in the source page and fetch it back in the target one.
The data must be serializeable by NewtonSoft Json.NET and you need to reference the library of course.
Just copy-paste into your page(s) or better yet base class for your pages and use it anywhere in you application.
My heart goes out to all of you who still have to use Web Forms in 2019 for whatever reason.
protected void RedirectPost(string url, IEnumerable<KeyValuePair<string,string>> fields)
{
Response.Clear();
const string template =
#"<html>
<body onload='document.forms[""form""].submit()'>
<form name='form' action='{0}' method='post'>
{1}
</form>
</body>
</html>";
var fieldsSection = string.Join(
Environment.NewLine,
fields.Select(x => $"<input type='hidden' name='{HttpUtility.UrlEncode(x.Key)}' value='{HttpUtility.UrlEncode(x.Value)}'>")
);
var html = string.Format(template, HttpUtility.UrlEncode(url), fieldsSection);
Response.Write(html);
Response.End();
}
private const string JsonDataFieldName = "_jsonData";
protected void RedirectPost<T>(string url, T bodyPayload)
{
var json = JsonConvert.SerializeObject(bodyPayload, Formatting.Indented);
//explicit type declaration to prevent recursion
IEnumerable<KeyValuePair<string, string>> postFields = new List<KeyValuePair<string, string>>()
{new KeyValuePair<string, string>(JsonDataFieldName, json)};
RedirectPost(url, postFields);
}
protected T GetPostData<T>() where T: class
{
var urlEncodedFieldData = Request.Params[JsonDataFieldName];
if (string.IsNullOrEmpty(urlEncodedFieldData))
{
return null;// default(T);
}
var fieldData = HttpUtility.UrlDecode(urlEncodedFieldData);
var result = JsonConvert.DeserializeObject<T>(fieldData);
return result;
}
I have the following code that is using HtmlAgilityPack to pull back html code for a number of websites. All seems to be working well, apart from asos.com. When running a url through, it returns random characters (‹\b\0\0\0\0\0\0UÍ „ï&¾CãÁ¢ø›\bãhìÁ3-«Ziý}z‘š/»ómf³Ü`]In#iÉÑbr[œ¡Ä¬v7Ðœ¶7N[GáôSv;Ü°?[†.ã*3Ž¢G×ù6OƒäwPŒõH\rÙ¸\vzìmèÎ;M›4q_K¨Ð)
HtmlAgilityPack.HtmlDocument doc = new HtmlDocument();
doc.OptionReadEncoding = false;
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("http://www.asos.com/ASOS/ASOS-Sweatshirt-With-Contrast-Ribs/Prod/pgeproduct.aspx?iid=2765751&cid=14368&sh=0&pge=0&pgesize=20&sort=-1&clr=Red");
request.Timeout = 10000;
request.ReadWriteTimeout = 32000;
request.UserAgent = "TEST";
request.Method = "GET";
request.Accept = "text/html";
request.AllowAutoRedirect = false;
request.CookieContainer = new CookieContainer();
StreamReader reader = new StreamReader(request.GetResponse().GetResponseStream(), Encoding.Default); //put your encoding
doc.Load(reader);
string html = doc.DocumentNode.OuterHtml;
I have ran the url through Fiddler, however cant seem to see anything to suggest there should be a problem. Any ideas where i'm going wrong?
See header image from fiddler here: http://i.stack.imgur.com/2LRFY.png
This has nothing to do with Html Agility Pack, it's because you have set AllowAutoRedirect to false. Remove it and it will work. The site apparently does a redirect, you need to follow it if you want the final HTML text.
Note the Html Agility Pack has a utility HtmlWeb class that can download file directly as an HmlDocument:
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(#"http://www.asos.com/ASOS/ASOS-Sweatshirt-With-Contrast-Ribs/Prod/pgeproduct.aspx?iid=2765751&cid=14368&sh=0&pge=0&pgesize=20&sort=-1&clr=Red");
I know the title isn't very elaborate, but I have tried this multiple times (to figure out how) but I could never find out how to do so. I want to do stuff like upload a "paste" to pastebin.com, upload a picture to twitpic.com, upload a file to rapidshare.com, etcetera.
How would I do so? Thanks!
(Visual Basic 2010 Express | Windows 7 Ultimate)
I conscious that Visual Basic 2010 express will have some way to interact with the server side.
If you couldn't find you need to change the language.
To post in twitpic you need to use their API givne in the following URL.
http://twitpic.com/api.do
let's say
<form action="http://twitpic.com/api/uploadAndPost">
<input name="media"></input>
<input name="username"></input>
<input name="password"></input>
<input name="message"></input>
</form>
It depends. You can just do a cross-domain form submission (set the action to a page on another domain), or you can do server-server communication, or you can use JSONP (JSON wrapped in a function call).
VB.NET code for a Pastebin submission is:
Dim req As HttpWebRequest = DirectCast(WebRequest.Create("http://pastebin.com/api_public.php"), HttpWebRequest)
req.ContentType = "application/x-www-form-urlencoded"
req.Method = "POST"
Dim postData As String = "paste_code=Simple Example"
Dim postBytes As Byte() = Encoding.UTF8.GetBytes(postData)
req.ContentLength = postBytes.Length
Dim reqStream As Stream = req.GetRequestStream()
reqStream.Write(postBytes, 0, postBytes.Length)
reqStream.Close()
Dim resp As HttpWebResponse = DirectCast(req.GetResponse(), HttpWebResponse)
Dim respText As String = New StreamReader(resp.GetResponseStream(), Encoding.UTF8).ReadToEnd()
respText is the generated paste bin URL. This can obviously be improved. It's an initial demonstration.