How can I scrape captcha from this website
captchaImage.
I tried MSHTML but this website uses java script function to display retrieve captcha in it's src. Please try and answer me how can I achieve this.
Imports MahApps.Metro.Controls
Imports System.Net
Imports System.Windows.Forms
Class MainWindow
Inherits MetroWindow
Private Sub MetroWindow_Loaded(sender As Object, e As RoutedEventArgs)
wb.Navigate("https://www.irctc.co.in/eticketing/loginHome.jsf")
AddHandler wb.LoadCompleted, AddressOf wb_Loaded
End Sub
Private Sub btngo_Click(sender As Object, e As RoutedEventArgs) Handles btngo.Click
Dim htmldoc As MSHTML.IHTMLDocument2 = wb.Document
Dim usrtxtdoc As MSHTML.IHTMLElement = htmldoc.all.item("j_username", 0)
Dim usrpwddoc As MSHTML.IHTMLElement = htmldoc.all.item("j_password", 0)
Dim captchadoc As MSHTML.IHTMLElement = htmldoc.all.item("j_captcha", 0)
usrtxtdoc.innerText = txtusrname.Text
usrpwddoc.innerText = txtpwd.Text
captchadoc.innerText = txtcaptcha.Text
End Sub
Private Sub wb_Loaded(sender As Object, e As System.Windows.Navigation.NavigationEventArgs)
MsgBox("Loaded")
Dim htmldoc As MSHTML.IHTMLDocument2 = wb.Document
Dim htmldoc2 As MSHTML.HTMLDocument = wb.Document
Dim captchaimg As MSHTML.HTMLImg = htmldoc.all.item("cimage", 0)
Dim bitmap As New BitmapImage
bitmap.BeginInit()
bitmap.UriSource = New Uri(wb.FindResource("captchaImage"))
bitmap.EndInit()
imgcaptcha.Source = bitmap
End Sub
Private Sub wb_Navigated(sender As Object, e As NavigationEventArgs) Handles wb.Navigated
lblwbstatus.Content = "Load Completed"
End Sub
Private Sub wb_Navigating(sender As Object, e As NavigatingCancelEventArgs) Handles wb.Navigating
lblwbstatus.Content = "Navigating Please wait"
End Sub
Private Sub lblwbstatus_MouseDoubleClick(sender As Object, e As MouseButtonEventArgs) Handles lblwbstatus.MouseDoubleClick
wb.Refresh()
End Sub
End Class
you can download source from this link
Normally you would do this:
Dim htmldoc As mshtml.IHTMLDocument2 = wb.Document.DomDocument
Dim captchaimg As mshtml.HTMLImg = htmldoc.all.item("cimage", 0)
Dim imgRange As IHTMLControlRange = htmldoc.body.createControlRange()
For Each img As IHTMLImgElement In htmldoc.images
If img.nameProp = "captchaImage" Then
imgRange.add(img)
imgRange.execCommand("Copy", False, Nothing)
Using bmp As Bitmap = Clipboard.GetDataObject().GetData(DataFormats.Bitmap)
bmp.Save("c:\test.bmp")
End Using
End If
Next
However the image has an alpha channel that doesn't get copied to clipboard because of internet explorer issues (as you can read here Copying image from page results in black image).
Other ways are to check on internet explorer cache, but that image won't be cached because of HTTP headers, so you are out of luck.
the code above work very good with consideration -to the:
1- Item type- as in your webpage target example ("IMG").
2- the image proper name example: CaptchaImg.jpg to be written as CaptchaImg.jpg
3- add reference to (mshtml) and Imports mshtml into your project
4- right-Click reference on your project --> click add reference -->
-->Browse button click -choose or navigate to --->
C:\Windows\assembly\GAC\Microsoft.mshtml\7.0.3300.0__b03f5f7f11d50a3a\Microsoft.mshtml.dll
----> ok button click --- this will add Microsoft.mshtml.dll to your reference
finallt import to your project as (Imports mshtml).
5- change the directory bmp.Save("c:\test.bmp")---> to for example bmp.Save("c:\test\test.bmp")
for security and administration right privileged.
Related
I'm navigating to Google Images using a WebBrowser control. The aim is to be able to right click on any image and download and populate a PictureBox background.
I have my own ContextMenuStrip with Copy on it and have disabled the built in context menu.
The issue I am having is that the coordinate returned from CurrentDocument.MouseMove are always relative to the first (top left) image.
So my code works correctly if the Image I want is the very first image on the page, however clicking on any other Images always returns the coordinates of the first image.
It would appear that the coordinates are relative to each Image rather than the page.
Private WithEvents CurrentDocument As HtmlDocument
Dim MousePoint As Point
Dim Ele As HtmlElement
Private Sub Google_covers_Load(sender As Object, e As EventArgs) Handles MyBase.Load
WebBrowser1.IsWebBrowserContextMenuEnabled = False
WebBrowser1.ContextMenuStrip = ContextMenuStrip1
End Sub
Private Sub WebBrowser1_Navigated(sender As Object, e As WebBrowserNavigatedEventArgs) Handles WebBrowser1.Navigated
CurrentDocument = WebBrowser1.Document
End Sub
Private Sub CurrentDocument_MouseMove(sender As Object, e As HtmlElementEventArgs) Handles CurrentDocument.MouseMove
MousePoint = New Point(e.MousePosition.X, e.MousePosition.Y)
Me.Text = e.MousePosition.X & " | " & e.MousePosition.Y
End Sub
Private Sub ContextMenuStrip1_Opening(sender As Object, e As System.ComponentModel.CancelEventArgs) Handles ContextMenuStrip1.Opening
Ele = CurrentDocument.GetElementFromPoint(MousePoint)
If Ele.TagName = "IMG" Then
CopyToolStripMenuItem.Visible = True
Else
CopyToolStripMenuItem.Visible = False
End If
End Sub
Private Sub CopyToolStripMenuItem_Click(sender As System.Object, e As System.EventArgs) Handles CopyToolStripMenuItem.Click
Dim ToImg = Ele.GetAttribute("src")
mp3_row_edit.PictureBox1.BackgroundImage = New System.Drawing.Bitmap(New IO.MemoryStream(New System.Net.WebClient().DownloadData(ToImg)))
ToImg = Nothing
End Sub
This code allow to use a standard WebBrowser control to navigate to the Google Image search page and select/download an Image with a right-click of the Mouse.
To test it, drop a WebBrowser Control and a FlowLayoutPanel on a Form and navigate to a Google Image search page.
Things to know:
WebBrowser.DocumentCompleted: This event is raised each time one of the Sub-Documents inside a main HtmlDocument page is completed. Thus, it can be raised multiple times. We need to check whether the WebBrowser.ReadyState = WebBrowserReadyState.Complete.
Read these note about this: How to get an HtmlElement value inside Frames/IFrames?
The images in the Google search page can be inserted in the Document in 2 different manners: both using a Base64Encoded string and using the classic src=[URI] format. We need to be ready to get both.
The mouse click position can be espressed in either absolute or relative coordinates, referenced by the e.ClientMousePosition or e.OffsetMousePosition.
Read the notes about this feature here: Getting mouse click coordinates in a WebBrowser Document
The WebBrowser emulation mode can be important. We should use the most recent compatible mode available in the current machine.
Read this answer and apply the modifications needed to have the most recent Internet Explorer mode available: How can I get the WebBrowser control to show modern contents?.
Note that an event handler is wired up when the current Document is completed and is removed when the Browser navigates to another page. This prevents undesired calls to the DocumentCompleted event.
When the current Document is complete, clicking with the right button of the Mouse on an Image, creates a new PictureBox control that is added to a FlowLayouPanel for presentation.
The code in the Mouse click handler (Protected Sub OnHtmlDocumentClick()) detects whether the current image is a Base64Encoded string or an external source URI.
In the first case, it calls Convert.FromBase64String to convert the string into a Byte array, in the second case, it uses a WebClient class to download the Image as a Byte array.
In both cases, the array is then passed to another method (Private Function GetBitmapFromByteArray()) that returns an Image from the array, using Image.FromStream() and a MemoryStream initialized with the Byte array.
The code here is not performing null checks and similar fail-proof tests. It ought to, that's up to you.
Public Class frmBrowser
Private WebBrowserDocumentEventSet As Boolean = False
Private base64Pattern As String = "base64,"
Private Sub frmBrowser_Load(sender As Object, e As EventArgs) Handles MyBase.Load
WebBrowser1.ScriptErrorsSuppressed = True
WebBrowser1.IsWebBrowserContextMenuEnabled = False
End Sub
Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
If WebBrowser1.ReadyState = WebBrowserReadyState.Complete AndAlso WebBrowserDocumentEventSet = False Then
WebBrowserDocumentEventSet = True
AddHandler WebBrowser1.Document.MouseDown, AddressOf OnHtmlDocumentClick
End If
End Sub
Protected Sub OnHtmlDocumentClick(sender As Object, e As HtmlElementEventArgs)
Dim currentImage As Image = Nothing
If Not (e.MouseButtonsPressed = MouseButtons.Right) Then Return
Dim source As String = WebBrowser1.Document.GetElementFromPoint(e.ClientMousePosition).GetAttribute("src")
If source.Contains(base64Pattern) Then
Dim base64 As String = source.Substring(source.IndexOf(base64Pattern) + base64Pattern.Length)
currentImage = GetBitmapFromByteArray(Convert.FromBase64String(base64))
Else
Using wc As WebClient = New WebClient()
currentImage = GetBitmapFromByteArray(wc.DownloadData(source))
End Using
End If
Dim p As PictureBox = New PictureBox() With {
.Image = currentImage,
.Height = Math.Min(FlowLayoutPanel1.ClientRectangle.Height, FlowLayoutPanel1.ClientRectangle.Width)
.Width = .Height,
.SizeMode = PictureBoxSizeMode.Zoom
}
FlowLayoutPanel1.Controls.Add(p)
End Sub
Private Sub WebBrowser1_Navigating(sender As Object, e As WebBrowserNavigatingEventArgs) Handles WebBrowser1.Navigating
If WebBrowser1.Document IsNot Nothing Then
RemoveHandler WebBrowser1.Document.MouseDown, AddressOf OnHtmlDocumentClick
WebBrowserDocumentEventSet = False
End If
End Sub
Private Function GetBitmapFromByteArray(imageBytes As Byte()) As Image
Using ms As MemoryStream = New MemoryStream(imageBytes)
Return DirectCast(Image.FromStream(ms).Clone(), Image)
End Using
End Function
End Class
I got this code in vb.net
Imports RasterEdge.Imaging.Basic.TextSearch
Imports RasterEdge.XDoc.PDF
Public Class Form1
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
'Open a document
Dim doc As PDFDocument = New PDFDocument(Application.StartupPath & "/Vertrag.pdf")
'Set the search options
Dim options As RESearchOption = New RESearchOption()
options.IgnoreCase = False
options.WholeWord = False
'Replace "RasterEdge" with "Image"
doc.Replace("#Name", "#Lame", options) <-- Here i get the error
doc.Save(Application.StartupPath & "/testoutput.pdf")
End Sub
End Class
I marked where the System.ArgumentOutOfRangeException error pops up.
All of this uses the RasterEdge dlls to replace a piece of text with soe other piece of text (#Name --> #Lame (just a test))
Do you guys have any ideas?
I want my code to download a file from a website and save it to an directory that the user has selected in the FolderBrowserDialog ... i've tried this code below without success:
' Download the files
If My.Computer.Network.IsAvailable Then
Try
wClient.DownloadFile(New Uri("DOWNLOAD LINK"), FolderBrowserDialog1.SelectedPath & "FILENAME.123")
wClient.DownloadFile(New Uri("DOWNLOAD LINK"), FolderBrowserDialog1.SelectedPath & "FileName.123)
wClient.DownloadFile(New Uri("Download LINK"), FolderBrowserDialog1.SelectedPath & "FileName.123")
Catch ex As Exception
MessageBox.Show(ex.Message)
End Try
Here is some sample code I have written for you that should get you started.
first we declare wClient as a WebClient with Events so we can trigger what happens when the file downloads.I have used VLC Media Player as an example download, change to suit your needs. NOTE I did this with a button click event which you don't necessary need to do.
Imports System.ComponentModel
Imports System.Net
Public Class Form1
Private WithEvents wClient As New WebClient()
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim FolderBrowserDiaglog1 As New FolderBrowserDialog()
Dim folderPath As String = ""
Dim fileName As String = "vlc.exe"
Dim downloadFile As String = "https://get.videolan.org/vlc/2.2.6/win32/vlc-2.2.6-win32.exe" ''VLC MEDIA PLAYER
If FolderBrowserDiaglog1.ShowDialog() = DialogResult.OK Then
folderPath = FolderBrowserDiaglog1.SelectedPath
End If
If My.Computer.Network.IsAvailable Then
Dim combinePath As String = System.IO.Path.Combine(folderPath, fileName)
wClient.DownloadFileAsync(New Uri(downloadFile), combinePath)
End If
End Sub
Private Sub wClient_DownloadFileCompleted(sender As Object, e As AsyncCompletedEventArgs) Handles wClient.DownloadFileCompleted
MessageBox.Show("File Downloaded")
End Sub
End Class
Have a look in the wClient's event list and see the many options that are avalible such as the one that i have made that shows a messagebox once the file has been downloaded.
Webclient events https://msdn.microsoft.com/en-us/library/system.net.webclient_events(v=vs.110).aspx
I'm Making a Windows Phone's app that I can, From a webview called "DebWeb", get the ClassRoom of a specific class. The DebWeb load the site where is all the classRooms, but I want to make that my App search just my class.
Before I made an app with almost the same objetive (search the Name of a App from the Source Code), but it was made from VB for PC, now I'm working on VB for Metro (or for App Store) and I can't use the same code.
For example, On VB for PC I can use:
Dim EHTML = DebWeb.Document.All.Item(1)
Dim sourceString As String = EHTML.InnerHtml
'Use Regex Match to search from SourceString"
But on VB for Metro it's shows me the " 'Document' is not a member of 'Windows.UI.XAML.Controls.WebView' " error, so I can't get the Source Code from the page and I can't look for the ClassRoom.
I Looked on the MSDN page about Webview but the most close thing that I can do is to get the "DocumentTittle", but not the content.
This is my code, everything "works" except the "Source" variable:
Dim Source = DebWeb.[Control] 'Here is where I need the Control to get the SourceCode
Dim m As System.Text.RegularExpressions.Match = System.Text.RegularExpressions.Regex.Match(Source.ToString, _
"DERECHO CONSTITUCIONAL", _
System.Text.RegularExpressions.RegexOptions.IgnoreCase)
Edited with my Entire code:
Private Sub MainPage_Loaded(sender As Object, e As RoutedEventArgs) Handles Me.Loaded
Dim URL As String = "http://goo.gl/uqohKw"
Me.DebWeb.Navigate(New Uri(URL))
End Sub
Private Sub DebWeb_LoadCompleted(ByVal sender As Object, ByVal e As WebViewNavigationCompletedEventArgs)
LListo.Text = "Listo!"
Dim html As String = DebWeb.InvokeScriptAsync("eval", New String() {"document.documentElement.outerHTML;"}).ToString
Dim Source = html
Dim m As System.Text.RegularExpressions.Match = System.Text.RegularExpressions.Regex.Match(Source.ToString, _
"LECTURA CRÍTICA", _
System.Text.RegularExpressions.RegexOptions.IgnoreCase)
If (m.Success) Then
Dim key As String = m.Groups(1).Value
End If
End Sub
Something like this?
Private Sub Button_Click(sender As Object, e As RoutedEventArgs)
Try
Dim html As String = Await myWebView.InvokeScriptAsync("eval", New String() {"document.documentElement.outerHTML;"})
Catch ex As Exception
End Try
End Sub
More Info here
I have found the error at href so please help me
Private Sub WebBrowser1_NewWindow(ByVal sender As Object, ByVal e As System.ComponentModel.CancelEventArgs) Handles WebBrowser1.NewWindow
Dim thiselement As HtmlElement = WebBrowser1.Document.ActiveElement
Dim targeturl As String = thiselement.GetAttribute("href")
e.Cancel = True
Dim window As New Form1
window.Show()
window.WebBrowser1.Navigate(targeturl)
End Sub
at "href" i have found error like Object reference not set to an instant of object.
my code is in vb.net 2010.
WebBrowser1.Document.ActiveElement is returning Nothing because there is no active element. Therefore when you attempt to use targeturl, you get this error: Object reference not set to an instant of object
Handle the Navigating event. Example:
webBrowser1.Navigating += Function(source, args)
Dim uriClicked = args.Uri
' Create your new form or do whatever you want to do here
End Function