Why do I get "www.hankyung.com" in this script? Can I have any method to get a class = "info" not class ="info press"? - beautifulsoup

Why do I get "www.hankyung.com" url in this script? Can I have any method to get a class = "info" not class ="info press"?
links[0]
<div class="info_group"> <a class="info press" href="http://www.hankyung.com/" onclick="return goOtherCR(this, 'a=nws*a.prof&r=1&i=88000107_000000000000000004520785&g=015.0004520785&u='+urlencode(this.href));" target="_blank"><span class="thumb_box"><img alt="" class="thumb" height="20" onerror="this.src='data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7';this.className='thumb bg_default_press'" src="https://search.pstatic.net/common/?src=https%3A%2F%2Fmimgnews.pstatic.net%2Fimage%2Fupload%2Foffice_logo%2F015%2F2018%2F08%2F01%2Flogo_015_18_20180801163901.png&type=f54_54&expire=24&refresh=true" width="20"/></span>한국경제</a><span class="info">1시간 전</span><a class="info" href="https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=105&oid=015&aid=0004520785" onclick="return goOtherCR(this, 'a=nws*a.nav&r=1&i=88000107_000000000000000004520785&u='+urlencode(this.href));" target="_blank">네이버뉴스</a> </div>
news_url = links[0].find("a", {"class":"info"}).get("href")
news_url
>>>'http://www.hankyung.com/'

You can use :not to the exclude the unwanted class (bs4 4.7.1+)
news_url = links[0].select_one('a.info:not(.press)')['href']
news_url

Related

Data scraping by selenium p tag

I searched a lot on the internet. I couldn't find an example similar to the one below. I'm trying to pull text from a web page. There is no location line in the first p tag. The second location section has a location line. When pulling data, I can only pull the contents of the p tag, which is the location row. I cannot pull the contents of the other p tag. I wonder how can I pull the data inside the first and second p tag?
HTML codes of Page Source:
<div class=" col-md-8">
<p>
<i class='fa fa-home main-color'></i> ORHAN MAH.İBRAHİM CAD. NO:35
<br>
<i class='fa fa-phone main-color'></i>
<a class="gri" href="tel:0508-2920344">0508-2920344 </a>
<br />
<i class='fa fa-clock-o main-color'></i>
<span class="red">19.01.2022</span>
</p>
<p>
<i class='fa fa-home main-color'></i> HAZAN MAH.ÖKTEM CAD. NO:13/B
<br>
<i class='fa fa-phone main-color'></i>
<a class="gri" href="tel:0584 837 23 70">0584 837 23 70 </a>
<br>
<i class="fa fa-map-marker main-color"></i>
<a class="gri" href="https://www.google.com/maps?q=35.554433,25.887766" target="_blank">Haritada</a>
<br />
<i class='fa fa-clock-o main-color'></i>
<span class="red">20.01.2022</span>
</p>
</div>
Here is the selenium code I used to pull the data from the HTML source above:
item = browser.find_elements_by_class_name("col-md-10")
urls = browser.find_elements_by_xpath("//div[#class=' col-md-10']/p/a[2]")
for i in zip(item,urls):
try:
address = i[0].find_element_by_css_selector("p").text.split("\n")[:2]
except:
address = None
try:
phone = i[0].find_element_by_xpath("//a[#class='gri'][1]").text
except:
phone = None
print(address)
print(phone)
try:
url = i[1].get_attribute('href').replace("https://www.google.com/maps?q=","")
except:
url = None
try:
date = i[0].find_element_by_xpath("//span[#class='red'][1]").text
except:
date = None
print(url)
print(date)
Use xpath //div[#class=' col-md-8']/p. This will return data of both p tags.
Then you can perform string operations as per your requirement and use data of each p tag using for loop
The 1.p tag blog has no location section. The 2.p tag blog has a location section. In the 1.p tag I want, I want to print none instead of the location in the p blog. When I try to pull with zip_longest regularly the location fails to pull.
#1.p tag block
ORHAN MAH.İBRAHİM CAD. NO:35
0508-2920344
19.01.2022
#2.p tag block
HAZAN MAH.ÖKTEM CAD. NO:13/B
0584 837 23 70
Haritada
20.01.2022

call function since product-variants.tpl Prestashop

I'm creating a web in Prestashop and my problem is the next:
I has created a new table in db (mysql) called 'tallas' and I have the size's preferences of the users. Now, I need call this table since the page product-variants.tpl because I would like to put cheeked for default the option select by the customer.
I create this function in classes/Product.php
public function getTalla($id_customer){
$result = Db::getInstance()->ExecuteS('
SELECT `talla1`
FROM `tallas`
WHERE `cliente` = `$id_customer`');
return $result;
}
ok, I would like to receive for parameters the user id (I don't know how to do that yet) and I would like to receive the value of talla1 and then I would like put cheeked the option selected by the customer here in the themes/asmart/templates/catalog/_partials/product-variants.tpl in the radio.
<li class="float-xs-left input-container">
<label>
<input class="input-color" type="radio" data-product-attribute="{$id_attribute_group}" name="group[{$id_attribute_group}]" value="{$id_attribute}"{if $group_attribute.selected} checked="checked"{/if}>
<span
{if $group_attribute.html_color_code}class="color" style="background-color: {$group_attribute.html_color_code}" {/if}
{if $group_attribute.texture}class="color texture" style="background-image: url({$group_attribute.texture})" {/if}
><span class="sr-only">{$group_attribute.name}</span></span>
</label>
</li>
Sincerely I don't know any idea to how to do that, can someone help me?
UPDATE: The problem is solved!
The problem is solved. I do that:
in classes/Product.php
public function getTalla(){
$variable = Context::getContext()->customer->id;
$result = Db::getInstance()->ExecuteS('SELECT * FROM `tallas` WHERE `cliente` = `$variable`');
return $result;
}
in ProductController.php
$this->context->smarty->assign('images_ex',$this->product->getTalla());
in product-variants.tpl
{if $images_ex[0]['talla1'] == $group_attribute.name} checked="checked"{/if}
I hope this can help someone in the future.

How to select specific text to scrape

I'm trying to scrape the following HTML, I want just to get the Some Header part and not the additional info.
<li class="media">
<div class="media-body">
<h4> Some Header <span class="label label-info"> additional Info </span> </h4> Address info
<br>
</div> </li>`
I'm trying the following:
val li: Elements = ul.select("li")
val list: Elements = li.select("a")
val headers: Elements = list.select("h4")
`
and then when I try to get the inner text via, headers.text() I'm getting both Some Header and additional Info
How can I only scrape the Some Header part?
You are almost near to the solution .You are probably looking for calling ownText:
String s = "<li class=\"media\"> \n" +
" <div class=\"media-body\"> \n" +
" <h4> Some Header <span class=\"label label-info\"> additional Info </span> </h4> Address info\n" +
" <br> \n" +
" </div> </li>";
Document document = Jsoup.parse(s);
Elements element = document.select("li");
Elements elements = element.select("a");
System.out.println(elements.select("h4").first().ownText()); ;
Output:
Some Header

Scrapy CSV export shows the same data in all rows

I'm trying to scrape the following html code:
<ul class="results-list" id="search-results">
<li>
<h3 class="name">First John</h3>
<div class="details">
email
<span class="phone">999999999</span>
</div>
</li>
<li>
<h3 class="name">Second John</h3>
<div class="details">
email
<span class="phone">999999999</span>
</div>
</li>
</ul>
When I run my spider, I get 2 rows, containing the same information. I have name,email,phone columns and for example in the name column for both I would get:
First John,Second John.
My Scrapy code is the following:
people= response.xpath('//ul[#class="results-list"]/li')
for person in people:
item = SpiderItem()
item['Name'] = person.xpath(
'//h3/text()').extract()
item['Email'] = person.xpath(
'//div[#class="details"]/a/#href').extract()
item['Phone'] = person.xpath(
'//div[#class="details"]/span[#class="phone"]/text()').extract()
yield item
However when I run scrapy crawl MySpider -o output.csv I get the same information in all rows.
you are using absolute path on your xpath expressions, change them to:
for person in people:
item = SpiderItem()
item['Name'] = person.xpath(
'.//h3/text()').extract_first()
item['Email'] = person.xpath(
'.//div[#class="details"]/a/#href').extract_first()
item['Phone'] = person.xpath(
'.//div[#class="details"]/span[#class="phone"]/text()').extract_first()
yield item

How do i call the controller from a new folder in mvc4

How do i call a controller from anchor tag when Controller is on the:
Area->Ticket->TicketController
Here is the code:
<a href="#Url.Action("TicketTemplate", "MyTickets", new {area = string.Empty,controller = "TicketTemplate", page = Model.PageNumber, sort = "DateCreated ", isAsc = isAsc })">
Date Created
<span class="#clsDateCreated" style="text-align: right;"></span>
</a>
The above code is not working..
How do i call the controller??
Here is the path:
~/Areas/Ticket/Controllers/TicketTemplateController.cs
You can do that and it will work, but i don't know if there is a better solution or not.
<a href="#Url.Action("MyTickets", "Ticket/TicketTemplate", , new {area = string.Empty,controller = "TicketTemplate", page = Model.PageNumber, sort = "DateCreated ", isAsc = isAsc })">
Date Created
<span class="#clsDateCreated" style="text-align: right;"></span>
</a>