I have a crawl data from a web page. I want to get some data from the HTML Code, how can I do that?
Some Example of crawl data:
<table width="272" cellpadding="5" cellspacing="0" class="floatleft" style="border-right:1px dotted #BFBFBF">
<tr>
<td width="130" class="paymentlabel"><strong>Transfer Fee</strong></td>
<td width="120"> $11 </td>
<td><img src="http://i.i-sgcm.com/used_cars/qmark_red_18x18.png" width="18" height="18" /></td>
</tr>
<tr>
<td class="paymentlabel" valign="top"><strong>Down Payment</strong></td>
<td> $47,444 (change) <p style="line-height:14px;"><span class="font_gray">Maximum 50% Loan</span></p> </td>
<td><img src="http://i.i-sgcm.com/used_cars/qmark_red_18x18.png" width="18" height="18" /></td>
</tr>
<tr>
<td class="paymentlabel"><strong>1st Instalment</strong></td>
<td> $901 </td>
<td><img src="http://i.i-sgcm.com/used_cars/qmark_red_18x18.png" width="18" height="18" /></td>
</tr>
<tr bgcolor="#FFF4F4">
<td class="paymentlabel" valign="top"><strong>Total</strong></td>
<td valign="top"> <strong class="font_red"> $48,356<br /><span class="font_gray" style="font-weight:normal;">(excluding insurance)</span> </strong> </td>
<td><img src="http://i.i-sgcm.com/used_cars/qmark_red_18x18.png" width="18" height="18" /></td>
</tr>
<tr>
<td class="font_gray font_10" colspan="3" style="padding:7px 0 7px 5px; line-height:12px;">Estimates based on 50% loan at 2.80% interest rate. <br />Check with seller for exact figure.</td>
</tr>
</table>
Controller for crawler:
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
request.UserAgent = "A .NET Web Crawler";
WebResponse response = request.GetResponse();
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream);
string htmlText = reader.ReadToEnd();
return htmlText;
I want to get the value of Transfer Fee (i.e $11), value of Down Payment (i.e $47,444), value of 1st Instalment (i.e $901) and the value of Total (i.e $48,356). Is it possible to do that?? Thanks for help!!
Yes you can. You need a library the loads the html and allows you to query what you need. Try one or all of the following that allow you to do exactly that...
CsQuery
Anglesharp
Html Agility Pack
Related
I am setting up a customized packing slip form via Advanced PDFs in NetSuite and need help fulfilling a certain formatting request. I am having trouble with the placement of the code to indent only kit items and not their components. NetSuite will not accept my changes on the form due to the misplacement of #list and #if tags. If someone could assist clearing this issue up for me by showing an example of what that should look like, it would be most appreciated. Let me know if more information is needed here and thanks! -Ryan I have tried placing the code before and after the #list tag.
<table class="itemtable" style="width: 100%; margin-top: 10px;">
<thead>
<tr>
<th colspan="12">${salesorder.item[0].item#label}</th>
<th align="right" colspan="4">${salesorder.item[0].quantityordered#label}</th>
<th align="right" colspan="4">${salesorder.item[0].quantityremaining#label}</th>
<th align="right" colspan="4">${salesorder.item[0].quantity#label}</th>
</tr>
</thead>
<#list salesorder.item as tranline> //callout to list items from item fulfillment
<#if tranline.custcol_9r_noncomponent='T'> //checks to see if items are kits items or not
<#if tranline.custcolitemtype="Kit/Package"><tr style="font-weight:bold"> //bolds kit items
<#else><tr style="font-weight:normal"> //if not kit item, uses normal formatting
</#if> //for kit items use this formatting
<td width="15%" class="item" font-size="7pt">${tranline.item}</td>
<td width="20%" class="item">${tranline.description}</td>
<td width="8%" class="item" align="center">${tranline.quantityremaining}</td>
<td width="8%" class="item" align="center">${tranline.quantity}</td>
<td width="8%" class="item"> </td>
<td width="9%" class="item"> </td>
</tr>
<#else> //for component items, use this formatting
<tr>
<td width="15%" class="kititem" font-size="7pt"> ${tranline.item}</td>
<td width="20%" class="kititem">${tranline.description}</td>
<td width="8%" class="kititem" align="center">${tranline.quantityremaining}</td>
<td width="8%" class="kititem" align="center">${tranline.quantity}</td>
<td width="8%" class="kititem"> </td>
<td width="9%" class="kititem"> </td>
</tr>
</#if> //should refer to tranline.custcol_9r_noncomponent='T'statement
</#list> //should close the callout for list items
</table> //ends table
I am trying to migrate a webforum where I don't have control over database etc and uses Scrapy to pick the pieces. It is based on an old phpBB forum 2.x.
It is not very well structured so a few challenges.
I now have a HTML string where I need to remove surrounding <td></td>, <span></span>and the Report link at bottom
Starting with:
<td colspan="2"><span class="postbody"></span>
<table width="90%" cellspacing="1" cellpadding="3" border="0" align="center">
<tr>
<td><span class="genmed"><b>Some wrote :</b></span></td>
</tr>
<tr>
<td class="quote">
<table width="90%" cellspacing="1" cellpadding="3" border="0" align="center">
<tr>
<td><span class="genmed"><b>Another wrote:</b></span></td>
</tr>
<tr>
<td class="quote">Just for test
a link
</td>
</tr>
</table>
<span class="postbody">
<br>
<br>
Test quote #1</span>
</td>
</tr>
</table>
<span class="postbody">
<br>
<br>
Test quote #2<br>
Another link: linktext<br>
_________________<br>/ author
<br>
text<br>
<div align="right">[ Rapportera
] </div>
</span><span class="gensmall"></span>
</td>
Wanted result:
<table width="90%" cellspacing="1" cellpadding="3" border="0" align="center">
<tr>
<td><span class="genmed"><b>Some wrote :</b></span></td>
</tr>
<tr>
<td class="quote">
<table width="90%" cellspacing="1" cellpadding="3" border="0" align="center">
<tr>
<td><span class="genmed"><b>Another wrote:</b></span></td>
</tr>
<tr>
<td class="quote">Just for test
a link
</td>
</tr>
</table>
<span class="postbody">
<br>
<br>
Test quote #1</span>
</td>
</tr>
</table>
<br>
<br>
Test quote #2<br>
Another link: linktext<br>
_________________<br>/ author
<br>
text<br>
Any tips?
Why not simply do
html = html.strip('<td colspan="2"><span class="postbody"></span>')
and
html = html.strip('</td>').strip().strip('</span>')
Im using the simpleform to upload the file image
<tr>
<td align="right">Photo</td>
<td><input type="file"size="30" name="photo" /></td>
</tr>
then here is my php file..Im using mysql_fetch_array
row[6] is my photo...
// retrieves a row data and returns it as an associative array
while($row=mysql_fetch_array($result)){
echo "<table border='1' align='center' class='table_background'>
<tr>
<td align='left' width=100>UserName=</td>
<td align='left' width='400'></td>
</tr>
<tr>
<td align='left' width=100>Title=</td>
<td align='left' width='400'>$row[1]</td>
</tr>
<tr>
<td align='left'width=100>Category=</td>
<td align='left'width='400'>$row[2]</td>
</tr>
<tr>
<td align='left'width=100>Description=</td>
<td align='left'width='400'>$row[3]</td>
</tr>
<tr>
<td align='left'width=100>State=</td>
<td align='left'width='400'>$row[4]</td>
</tr>
<tr>
<td align='left'width=100>Photo=</td>
<td align='left'width='400'>$row[5]</td>
</tr>
<tr>
<td align='left'width=100>Date=</td>
<td align='left'width='200'>$row[6]</td>
</tr>
<br>
}
<br>
<br>
</table>";
}
In my database I use Blob as the type of image ... the image is successful upload to my database
but just display the image file name...
blob is a string so you have to display id a bit different
echo '<img src="data:image/jpeg;base64,'.base64_encode( $row[6] ).'"/>';
I am attempting to pull a value and a header (string) from a website, but unable to find the element using selenium.
My Code
I used Firebug to get the XPath and this is what it determined:
//*[#id="DimensionForm"]/p[1]/table/tbody/tr[3]/td[3]
Code
Dim Right as double
Dim Marker as string
Marker = selenium.findElementByXPath("//*[#id="DimensionForm"]/p[1]/table/tbody/tr[2]/td[3]").getAttribute("value")
Right = selenium.findElementByXPath("//*[#id="DimensionForm"]/p[1]/table/tbody/tr[3]/td[3]").getAttribute("value")
HTML CODE
<form id="DimensionForm" name="validate" action="Dimension" method="post">
<div style="margin-top: 7px"></div>
<p><table width="100%" cellspacing="0" border="0" cellpadding="0" class="element">
<tr>
<td> </td><td class="formtitlenobg" colspan="6" align='right'>
AREA DIMENSIONS (AREA A) <span class='quote'> Front</span> 25.24</td>
</tr>
<tr align="right">
<td class="tablerowlightgreen" width=10> </td>
<th class="formtitle" width=250 align="left">Property</th>
<th class="formtitle" width=50>Check</th> <th class="formtitle" width=75>Front</th>
<th class="formtitle" width=75>Center</th><th class="formtitle" width=75>Left</th>
<th class="formtitle" width=120>Right</th>
<th class="formtitle" width=100>Total</th>
<td class="tablerow" width=50> </td>
<td class="tablerow"> </td>
</tr>
<tr align="right" nowrap>
<td> </td>
<td class="table" align="left"><strong>
Property O</strong></td>
<td class="table">+</td>
<td class="table">10</td>
<td class="table">12</td>
<td class="table"><strong>12</strong></td>
<td class="table"><strong><font class="front">
100</font></strong></td>
<td class="table">120</td>
<td> </td>
<td> </td>
</tr>
</table></td>
</tr></table>
You have incorrectly nested quotes:
selenium.findElementByXPath("//*[#id="DimensionForm"]/p[1]/table/tbody/tr[2]/td[3]")
Perhaps you meant:
selenium.findElementByXPath("//*[#id='DimensionForm']/p[1]/table//tr[2]/td[3]")
Note the single-quotes in the second line!
Clicking on checkbox based on tables column value using Xpath. Below is the html
<table id="tblHotels">
<TBODY>
<TR>
<TH align="left">
<INPUT checkbox="" id="chkNSelectAll name=chkNSelectAll type="/>
</TH>
<TH align="left" title="Hotel">Hotel</TH>
<TH title=" align=left">
<SPAN id="spnExpandBtn">
<IMG/>
</SPAN>
</TH>
<TH align="left" title="Hotel">Hotel</TH>
<TH align="left" title="Reg Date">Reg Date</TH>
<TH align="left" title="Room Type">Room Type</TH>
<TH align="left" title="Location">Location</TH>
<TH align="left" title="Room Number">Room Number</TH>
</TR>
<TR>
<TD colSpan="11">
<IMG src=" ../NoExpiry/images/uaimBSpacer.gif"/>
</TD>
</TR>
<TR>
<TD>
<INPUT id="chkNSelect" name="chkNSelect" type="checkbox" value="on"/>
</TD>
<TD customHiddenText="">MATHEW Joe</TD>
<TD/>
<TD customHiddenText="">
<SPAN>Affray (
<STRONG/>Kim Lee)
</SPAN>
</TD>
<TD class="regDate customHiddenText=">10/01/2014</TD>
<TD customHiddenText="">1HE</TD>
<TD customHiddenText="">South West </TD>
<TD id="tdChildroom name=" tdChildroom=""/>
<INPUT id="hidYID" name="hidYID" type="hidden" value="409">
<INPUT id="hidYD" name="hidYD" type="hidden">
<INPUT id="hidYDID" name="hidYDID" type="hidden" value="1015389"/>
</INPUT>
</INPUT>
</TR>
<TR>
<TD>
<INPUT id="chkNSelect" name="chkNSelect" type="checkbox" value="on"/>
</TD>
<TD customHiddenText="">MATHEW Penny</TD>
<TD/>
<TD customHiddenText="">
<SPAN>Affray (
<STRONG/>Jim Lee)
</SPAN>
</TD>
<TD class="regDate customHiddenText=">10/01/2014</TD>
<TD customHiddenText="">1HE</TD>
<TD customHiddenText="">South West </TD>
<TD id="tdChildroom name=" tdChildroom=""/>
<INPUT id="hidYID" name="hidYID" type="hidden" value="409">
<INPUT id="hidYD" name="hidYD" type="hidden">
<INPUT id="hidYDID" name="hidYDID" type="hidden" value="1015389"/>
</INPUT>
</INPUT>
</TR>
</TBODY>
</table>
here is what i am trying and this always clicks the first checkbox??
Driver.FindElementByXPath("//td[contains(text(),'MATHEW Penny')]/preceding::td/input[#name='chkNSelect']").Click()
If i try to just find the column with the text it can find it not why it cannot find the preceding check box and jumps to first rows check box??
Driver.FindElementByXPath("//td[contains(text(),'MATHEW Penny')]
my requirement is to select the first checkbox (do something eg:add it to another table) uncheck it then check the 2nd checkbox (do something eg:add it to another table).
Use for MATHEW Penny:
//td[contains(text(),'Penny')]/preceding-sibling::td/input[#name='chkNSelect']
Use for MATHEW Joe:
//td[contains(text(),'Joe')]/preceding-sibling::td/input[#name='chkNSelect']
Its selecting all input elements with name as chkNSelect which comes before td with text as MATHEW Penny.
Use
//td[contains(text(),'MATHEW Penny')]/preceding::td/input[last()][#name='chkNSelect']
to select only first such input
You could try:
first targeting the tr
containing the td with the text node you want (using a predicate)
and then going to an input within a td in that table row
So that translates to:
Driver.FindElementByXPath("//tr[td[contains(text(),'MATHEW Penny')]]/td/input[#name='chkNSelect']")
Breakdown:
//tr[
td[
contains(text(),'MATHEW Penny')
]
]
/td/input[#name='chkNSelect']