HtmlAgilityPack Scrape table from website - vb.net

I am learning how to use HtmlAgilityPack and I cannot find any documentation on scraping tables in the way I need.
My table looks like this
| | NAME | PRICE | TIME |
| --- | ------------------------------------- | ---------------- | --------- |
| | Arma 3 | $29.99 | 1586h 57m |
| | DayZ | $44.99 | 28h 05m |
| | Survarium | Free or No Price | 02h 25m |
| | Squad | $49.99 | 11h 05m |
| | Squad - Public Testing | Not in store | 0h 0m |
| | Counter-Strike: Global Offensive | Free or No Price | 00h 26m |
| | Infestation: Survivor Stories Classic | Free or No Price | 00h 05m |
| | PLAYERUNKNOWN'S BATTLEGROUNDS | $29.99 | 0h 0m |
I have tried many things found on Google regarding tables and HtmlAgilityPack but none have worked.
I have a listview that I want the data to go into. The only things I want are
gname, gprice, gtime loaded to the listview and remembering that these values change depending on the games listed.
I can't post what I've already tried as there would be to many website posts to go back through and find.
The html for the table on the website is this
<div class="col-12">
<h2>Dan Andrews Steam Profile 8 Games</h2>
<div class="table-responsive">
<table class="table game-table">
<tbody>
<tr>
<th></th>
<th class="gname">NAME</th>
<th style="width:200px">PRICE</th>
<th style="width:200px">TIME</th>
</tr>
<tr>
<td class="gicon"><img src="https://cdn.cloudflare.steamstatic.com/steam/apps/107410/capsule_184x69.jpg" onerror="this.src='/assets/images/applogo.svg'">
</td>
<td class="gname">Arma 3</td>
<td class="gprice">$29.99</td>
<td class="gtime">1586h 57m</td>
</tr>
<tr>
<td class="gicon"><img src="https://cdn.cloudflare.steamstatic.com/steam/apps/221100/capsule_184x69.jpg" onerror="this.src='/assets/images/applogo.svg'">
</td>
<td class="gname">DayZ</td>
<td class="gprice">$44.99</td>
<td class="gtime">28h 05m</td>
</tr>
<tr>
<td class="gicon"><img src="https://cdn.cloudflare.steamstatic.com/steam/apps/355840/capsule_184x69.jpg" onerror="this.src='/assets/images/applogo.svg'">
</td>
<td class="gname">Survarium</td>
<td class="gprice">Free or No Price</td>
<td class="gtime">02h 25m</td>
</tr>
<tr>
<td class="gicon"><img src="https://cdn.cloudflare.steamstatic.com/steam/apps/393380/capsule_184x69.jpg" onerror="this.src='/assets/images/applogo.svg'">
</td>
<td class="gname">Squad</td>
<td class="gprice">$49.99</td>
<td class="gtime">11h 05m</td>
</tr>
<tr>
<td class="gicon"><img src="https://cdn.cloudflare.steamstatic.com/steam/apps/774941/capsule_184x69.jpg" onerror="this.src='/assets/images/applogo.svg'">
</td>
<td class="gname">Squad - Public Testing</td>
<td class="gprice">Not in store</td>
<td class="gtime">0h 0m</td>
</tr>
<tr>
<td class="gicon"><img src="https://cdn.cloudflare.steamstatic.com/steam/apps/730/capsule_184x69.jpg" onerror="this.src='/assets/images/applogo.svg'">
</td>
<td class="gname">Counter-Strike: Global Offensive</td>
<td class="gprice">Free or No Price</td>
<td class="gtime">00h 26m</td>
</tr>
<tr>
<td class="gicon"><img src="https://cdn.cloudflare.steamstatic.com/steam/apps/226700/capsule_184x69.jpg" onerror="this.src='/assets/images/applogo.svg'">
</td>
<td class="gname">Infestation: Survivor Stories Classic</td>
<td class="gprice">Free or No Price</td>
<td class="gtime">00h 05m</td>
</tr>
<tr>
<td class="gicon"><img src="https://cdn.cloudflare.steamstatic.com/steam/apps/578080/capsule_184x69.jpg" onerror="this.src='/assets/images/applogo.svg'">
</td>
<td class="gname">PLAYERUNKNOWN'S BATTLEGROUNDS</td>
<td class="gprice">$29.99</td>
<td class="gtime">0h 0m</td>
</tr>
</tbody>
</table>
</div>
</div>
```

Here is a possible solution in C# (even though the question is tagged with VB.Net). You get a list of Game objects here. And you can then bind these to a ListView or whatever you may need.
public class Game
{
public string Img { get; set; }
public string Name { get; set; }
public double? Price { get; set; }
public string? PriceStatus { get; set; }
public TimeSpan? Time { get; set; }
}
public static void Go()
{
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(File.ReadAllText(htmlFilePath));
Regex timeRegex = new Regex(#"\b(\d+)h\b\s*\b(\d+)m\b", RegexOptions.Compiled);
Regex priceRegex = new Regex(#"\b\$?(\d+\.\d\d)\b", RegexOptions.Compiled);
List<Game> gamesList =
doc.DocumentNode.SelectNodes("//table[#class='table game-table']/tbody/tr")
.Select(tr => tr.SelectNodes("td")?.ToList())
.Where(tds => tds != null && tds.Any())
.Select(
(tds) =>
{
return
new
{
Img = tds[0].SelectSingleNode("img").Attributes["src"].Value,
Name = tds[1].InnerText,
PriceString = tds[2].InnerText,
PriceMatch = priceRegex.Match(tds[2].InnerText),
TimeString = tds[3].InnerText,
TimeMatch = timeRegex.Match(tds[3].InnerText),
};
})
.Select(
(v) =>
{
Game game =
new Game
{
Img = v.Img,
Name = v.Name,
};
//Set Price
if (v.PriceMatch.Success)
{
game.Price = double.Parse(v.PriceMatch.Groups[1].Value);
game.PriceStatus = "In Store";
}
else
{
game.PriceStatus = v.PriceString;
}
//Set Time
if (v.TimeMatch.Success)
{
int hours = int.Parse(v.TimeMatch.Groups[1].Value);
int minutes = int.Parse(v.TimeMatch.Groups[2].Value);
game.Time = new TimeSpan(hours, minutes, 0);
}
return game;
})
.ToList();
}

Related

Using Playwright how to select next td of a given inner text

This one has me baffled. Basically, using playwright, I'm trying to verify values on a table. Given, "Cat", I should see if "Dog" exists, or if given "Space", I should see if "Rocket" exists.
I tried
const planet = (await page.locator('tr:has(td.col_d:has-text("Saturn")) >> a')).innerText();
but that didn't work. I thought of grabbing all of the innerText on all the , sticking it into an array, then looking for where the initial text is in the Array (Cat) and seeing if the text in the next index is correct (i.e. Dog). Isn't there an easier way I don't know of yet?
<tbody>
<tr>
<td class="labelCol"> Title A < /td>
<td class="dataCol col02"><span>
<a href="/00578000000VqXe" title="POS ""</a>
Data A
</td>
<td class="labelCol">Title X</td>
<td class="dataCol">Data X</td>
</tr>
<tr>
<td class="labelCol" > Cat < /td>
<td class="dataCol col02">Dog/td >
<td class="labelCol" > Saturn < /td>
<td class="dataCol">Jupiter/td >
</tr>
<tr >
<td class="labelCol" > Blue < /td>
<td class="dataCol col02">Red</td >
<td class="labelCol" > Reason < /td>
<td class="dataCol">Space</td > </tr>
Rocket
</td>
</tr >
</tbody>
I don't particularly like this, but you could just assert that 'Dog' is to the right of 'Cat' and 'Rocket' is to the right of 'Space like this if you don't care if they are in the next cell or not.
await expect(page.locator(`td:right-of(:text-is("Cat"))`).first()).toHaveText('Dog');
await expect(page.locator(`td:right-of(:text-is("Space"))`).first()).toHaveText('Rocket');
Or if Dog needs to immediately follow cat, you could do something like this:
const space = page.locator(`td:text-is("Cat")`);
await expect(space.locator(`//following-sibling::td`).first()).toHaveText('Dog');

ASP.NET CORE LINQ GROUP BY

I have little problem with group by, i want achieve this
Nobukti : 001/des/2019
1000
2000
3000
4000
in My controller
var transaction = await (from a in _context.Transaction group a by a.Nobukti into pg
join b in _context.ChartAccount on pg.FirstOrDefault().Kodeakun
equals b.Kodeakun
select new Transaksi
{
Nobukti = pg.First().Nobukti,
Kodeakun = pg.FirstOrDefault().Kodeakun + b.Namaakun
}).ToListAsync();
In my View
<table class="table">
#foreach (var item in Model)
{
<thead>
<tr>
<th>
#Html.DisplayNameFor(model => model.Nobukti)
</th>
<td>
#Html.DisplayFor(modelItem => item.Nobukti)
</tr>
</thead>
<tbody>
<tr>
#foreach (var book in item.Nobukti)
{
<td>
#Html.DisplayFor(modelItem => item.Kodeakun)
</td>
}
</tr>
</tbody>
}
</table>
Can anyone help me? i dont know which wrong, my controller or my view,
sorry for english

Extracting data from table with Scrapy

I have this table
<table class="specs-table">
<tbody>
<tr>
<td colspan="2" class="group">Sumary</td>
</tr>
<tr>
<td class="specs-left">Name</td>
<td class="specs-right">ROG GL552JX </td>
</tr>
<tr class="noborder-bottom">
<td class="specs-left">Category</td>
<td class="specs-right">Gaming </td>
</tr>
<tr>
<td colspan="2" class="group">Technical Details</td>
</tr>
<tr>
<td class="specs-left">Name</td>
<td class="specs-right">Asus 555 </td>
</tr>
<tr>
<td class="specs-left">Resolution </td>
<td class="specs-right">1920 x 1080 pixels </td>
</tr>
<tr class="noborder-bottom">
<td class="specs-left"> Processor </td>
<td class="specs-right"> 2.1 GHz </td>
</tr>
</tbody>
</table>
From this table I want my Scrapy to find the first occurrence of the text "Name" and to copy the value from the next cell (In this case "ROG GL552JX") and find the next occurrence of the text "Name" and copy the value "Asus 555".
The result I need:
'Name': [u'Asus 555'],
'Name': [u'Asus 555'],
The problem is that in this table I have two occurrences of the text "Name" and Scrapy copies the value of both occurrences.
My result is:
'Name': [u'ROG GL552JX', u'Asus 555'],
My bot:
def parse(self, response):
next_selector = response.xpath('//*[#aria-label="Pagina urmatoare"]//#href')
for url in next_selector.extract():
yield Request(urlparse.urljoin(response.url, url))
item_selector = response.xpath('//*[contains(#class, "pb-name")]//#href')
for url in item_selector.extract():
yield Request(urlparse.urljoin(response.url, url), callback=self.parse_item)
def parse_item(self, response):
l = ItemLoader(item = PcgItem(), response=response, )
l.add_xpath('Name', 'Name', '//tr/td[contains(text(), "Name")]/following-sibling::td/text()',', MapCompose(unicode.strip, unicode.title))
return l.load_item()
How can I solve this problem?
Thank you
if you need an item per Name, then you should do something like:
for sel in response.xpath('//tr/td[contains(text(), "Name")]/following-sibling::td/text()'):
l = ItemLoader(...)
l.add_value('Name', sel.extract_first())
...
yield l.load_item()
Now if you want it all inside an item, I would recommend to leave it as it is (a list) because an scrapy.Item is a dictionary, so you won't be able to have 2 Name as keys.

Update Partial view MVC4

I have this controller:
public ActionResult PopulateTreeViewModel()
{
MainModelPopulate mainModelPopulate = new MainModelPopulate();
// populate model
return View(mainModelPopulate);
}
That has a view like this:
#model xxx.xxx.MainModelPopulate
<table>
#foreach (var item2 in Model.CountryList)
{
<tr>
<td>
#Html.DisplayFor(modelItem => item2.CountryName);
</td>
</tr>
foreach (var item3 in item2.BrandList)
{
<tr>
<td>
#Html.DisplayFor(modelItem => item3.BrandName);
</td>
</tr>
foreach (var item4 in item3.ProductList)
{
<tr>
<td>
#Html.ActionLink(item4.ProductName, "FunctionX", new { idLab = item3.BrandID, idDep = item4.ProductID });
</td>
</tr>
}
}
}
</table>
The FunctionX controller is like this :
public ActionResult FunctionX(int idBrand=1 , int idProd=1)
{
List<ListTypeModel> typeModelList = new List<ListTypeModel>();
// populate typeModelList
return PartialView(typeModelList);
}
}
with this partial view:
#model IEnumerable<TControl.Models.ListTypeModel>
<table class="table">
<tr>
<th>
#Html.DisplayNameFor(model => model.Name)
</th>
<th></th>
</tr>
#foreach (var item in Model)
{
<tr>
<td>
#Html.DisplayFor(modelItem => item.Name)
</td>
</tr>
}
</table>
I want to add this partial view in my main view (PopulateTreeViewModel) and update the table with the relative type of product contained in Function X.
I tried also to substitute #Html.ActionLink with #Ajax.ActionLink and it performs the same way.
Have you tried #Html.RenderAction(item4.ProductName, "FunctionX", new { idLab = item3.BrandID, idDep = item4.ProductID });
There are other options too..! Pls refer http://www.dotnet-tricks.com/Tutorial/mvc/Q8V2130113-RenderPartial-vs-RenderAction-vs-Partial-vs-Action-in-MVC-Razor.html

RadioButtonList Model Binding in Asp.Net MVC 4.0

I want to do model binding for my radio button group , the situation is like this :
whenever I want to check either USD or % radio buttons , in my controller I want to get the true or false value from chosen radio buttons when I press submit the form button, but it appears that View doesn't pass me any value since I set int properties for them in my model. I get all the values from the form except my radio buttons.
I would appreciate any help
#for (int i = 0; i < Model.LstOrderDetails.Count; i++)
{
<tr>
#if (!string.IsNullOrEmpty(Model.LstOrderDetails[i].OrderedDiscountAmount) &&
(!string.IsNullOrEmpty(Model.LstOrderDetails[i].OrderedDiscountPerc)))
{
if(decimal.Round(Convert.ToDecimal(Model.LstOrderDetails[i].OrderedDiscountAmount), 2, MidpointRounding.AwayFromZero) != 0M)
{
<td class="col-md-2 col-xs-2">
<table style="font-size: 11px;">
<tr>
<td>
USD
</td>
<td> #Html.RadioButton(Model.LstOrderDetails[i].OrderDetailID, Model.LstOrderDetails[i].IsDiscountAmnt, Convert.ToBoolean(Model.LstOrderDetails[i].IsDiscountAmnt))
</td>
<td rowspan="2">#Html.TextBoxFor(x => x.LstOrderDetails[i].OrderedDiscountAmount, new { #class = "form-control", placeholder = "Discount", style = "font-size: 12px;" })
</td>
</tr>
<tr>
<td>
%
</td>
<td>#Html.RadioButton(Model.LstOrderDetails[i].OrderDetailID,Model.LstOrderDetails[i].IsDiscountPerc, Convert.ToBoolean(Model.LstOrderDetails[i].IsDiscountPerc))
</td>
<td>
</td>
</tr>
</table>
</td>
}
else
{
if (decimal.Round(Convert.ToDecimal(Model.LstOrderDetails[i].OrderedDiscountPerc), 2, MidpointRounding.AwayFromZero) != 0M)
{
removeafterzero = Model.LstOrderDetails[i].OrderedDiscountPerc.Substring(0, Model.LstOrderDetails[i].OrderedDiscountPerc.LastIndexOf('.'));
<td class="col-md-2 col-xs-2">
<table style="font-size: 11px;">
<tr>
<td>
USD
</td>
<td> #Html.RadioButton(Model.LstOrderDetails[i].OrderDetailID, Model.LstOrderDetails[i].IsDiscountAmnt, Convert.ToBoolean(Model.LstOrderDetails[i].IsDiscountAmnt))
</td>
<td rowspan="2">#Html.TextBoxFor(x => removeafterzero, new { #class = "form-control", placeholder = "Discount", style = "font-size: 12px;" })
</td>
</tr>
<tr>
<td>
%
</td>
<td> #Html.RadioButton(Model.LstOrderDetails[i].OrderDetailID, Model.LstOrderDetails[i].IsDiscountPerc, Convert.ToBoolean(Model.LstOrderDetails[i].IsDiscountPerc))
</td>
<td>
</td>
</tr>
</table>
</td>
}
}
}
//Model Part
public class OrderDetail
{
.
.
.
public string OrderedDiscountAmount { get; set; }
public string OrderedDiscountPerc { get; set; }
public int IsDiscountPerc { get; set; }
public int IsDiscountAmnt { get; set; }
}
to tie a field to your model you need to use a "for" helper. try changing your radio button to
#Html.RadioButtonFor(x => x.LstOrderDetails[i].OrderDetailID, "Dollar")
#Html.RadioButtonFor(x => x.LstOrderDetails[i].OrderDetailID, "Percent")
since both are tied to the same field, that field will have the value of the selected radio button (Dollar or Percent in this example)