Web scraping Linkedin Job posts using Python, Selenium & Phantomjs - selenium

Some LinkedIn job posts contain a see more button that expands the whole job description:
https://www.linkedin.com/jobs/view/401243784/?refId=3024203031501300167509&trk=d_flagship3_search_srp_jobs
I tried to expand it using the element.click() but the source I get after expansion contains some placeholder divs instead of the original div. How, can I scrap those hidden texts.
This is what I get from driver.page_source
<div class="jobs-ghost-placeholder jobs-ghost-placeholder--medium jobs-ghost-placeholder--thin mb2"></div>
<div class="jobs-ghost-placeholder jobs-ghost-placeholder--x-small jobs-ghost-placeholder--thin mb2"></div>
<div class="jobs-ghost-placeholder jobs-ghost-placeholder--small jobs-ghost-placeholder--thin"></div>
Instead of the source I get from chrome inspect:
<div id="ember7189" class="jobs-description-details pt5 ember-view"> <h3 class="jobs-box__sub-title js-formatted-exp-title">Seniority Level</h3>
<p class="jobs-box__body js-formatted-exp-body">Associate</p>
<!---->
<h3 class="jobs-box__sub-title js-formatted-industries-title">Industry</h3>
<ul class="jobs-box__list jobs-description-details__list js-formatted-industries-list">
<li class="jobs-box__list-item jobs-description-details__list-item">Real Estate</li>
<li class="jobs-box__list-item jobs-description-details__list-item">Information Technology and Services</li>
</ul>
<h3 class="jobs-box__sub-title js-formatted-employment-status-title">Employment Type</h3>
<p class="jobs-box__body js-formatted-employment-status-body">Full-time</p>
<h3 class="jobs-box__sub-title js-formatted-job-functions-title">Job Functions</h3>
<ul class="jobs-box__list jobs-description-details__list js-formatted-job-functions-list">
<li class="jobs-box__list-item jobs-description-details__list-item">Information Technology</li>
<li class="jobs-box__list-item jobs-description-details__list-item">Project Management</li>
<li class="jobs-box__list-item jobs-description-details__list-item">Product Management</li>
</ul>
</div>
I also tried different values for the wait WebDriverWait(driver, 3) but in vain.
code:
employment_type = wait.until(EC.presence_of_element_located(
(By.CSS_SELECTOR, 'div.jobs-description__details>div.jobs-description-details>p.js-formatted-employment-status-body'))).text
raises timeout exception as it only finds those jobs-ghost-placeholder instead of the described css_selector

Related

Cypress - get an element in iframe

I solve the problem with getting into iframe but now I can't get my element. Maybe I'm finding bad but right now it took me too much time and I don't what to do next.
Source code:
<divid="ctl00_Telo_Dock_1005_C_ctl00_MainPage1_myPageVozidlo_inpDruhVozidla_ADX" class="inputCell" style="visibility:visible;display:inherit;">
<span id="ctl00_Telo_Dock_1005_C_ctl00_MainPage1_myPageVozidlo_lblDruhVozidla_ADX" class="labels labelC1_n W270">Druh vozidla:
</span>
<div id="ctl00_Telo_Dock_1005_C_ctl00_MainPage1_myPageVozidlo_cmbDruhVozidla_ADX" tabindex="13" class="RadDropDownList RadDropDownList_CMS_Black RadComboBoxInput" style="width:216px;height:23px;font-weight:bold;font-size:10pt;font-family:Arial;color:#396170;border-width:1px;border-style:Solid;border-color:#FDC267;background-color:#F9FBFC;">
<span class="rddlInner">
<span class="rddlFakeInput"></span>
<span class="rddlIcon"><!-- --></span>
</span>
<div class="rddlSlide" id="ctl00_Telo_Dock_1005_C_ctl00_MainPage1_myPageVozidlo_cmbDruhVozidla_ADX_DropDown" style="display:none;">
<div class="rddlPopup rddlPopup_CMS_Black">
<ul class="rddlList">
<li class="rddlItem rddlItemSelected"></li>
<li class="rddlItem">Osobní automobily</li>
<li class="rddlItem">Motocykly</li>
<li class="rddlItem">Užitkové automobily</li>
</ul>
</div>
</div>
<input id="ctl00_Telo_Dock_1005_C_ctl00_MainPage1_myPageVozidlo_cmbDruhVozidla_ADX_ClientState" name="ctl00_Telo_Dock_1005_C_ctl00_MainPage1_myPageVozidlo_cmbDruhVozidla_ADX_ClientState" type="hidden" />
</div>
</div>
Image of input:
My get function:
cy.get('#iframe-id')
.iframe('body #elementToFind')
.should('exist')
Thank you all for helping me.
Unfortunately, Cypress have some open issues regarding interacting with an iframe. But here's a pretty straightforward workaround: https://github.com/cypress-io/cypress/issues/136#issuecomment-328100955.
Anyway, I believe that this can work only if the domain of the outer page and of the iframe are the same, due to the same-origin limitation.

How to use indexes in XPath

I do have popup where are three dropdowns, ids are unique
with each popup generation:
The first element:
<a aria-required="true" class="select" aria-disabled="false" aria-
describedby="5715:0-label" aria-haspopup="true" tabindex="0" role="button"
title="" href="javascript:void(0);" data-aura-rendered-by="5733:0" data-
interactive-lib-uid="10">Stage 1 - Needs Assessment</a>
While I'm able to identify the element above by simple xpath="//*[#class='select'][1]", the other two, which look same to me (example below), can't be identified by index like //*[#class='select'][2], tried 'following' without success, but I may be not correct with syntax.
Example of dropdown element I'm unable to locate..
<a aria-required="false" class="select" aria-disabled="false" aria-
describedby="6280:0-label" aria-haspopup="true" tabindex="0" role="button"
title="" href="javascript:void(0);" data-aura-rendered-by="6290:0" data-
interactive-lib-uid="16">--None--</a>
Any ideas what am I missing?, except advanced xpath knowledge..
Thank you!
//*[#class='select'][2] will return you required node only if both links are children of the same parent, e.g.
<div>
<a class="select">Stage 1 - Needs Assessment</a>
<a class="select">--None--</a>
</div>
If links are children of different parents, e.g.
<div>
<a class="select">Stage 1 - Needs Assessment</a>
</div>
<div>
<a class="select">--None--</a>
</div>
you should use
(//*[#class='select'])[1]
for first
(//*[#class='select'])[2]
for second

Autocomplete with Selenium

I'm having some (actually, a lot) of trouble automating selection from autocomplete options with Selenium. Currently, I am able to automate the inputting of the text, though, I am not able to select anything from the appearing drop-down suggestions list that pops up. I tried searching here for some answers to my problem, but nothing has worked. Below is the element that appears with the suggestions that I am trying to select:
<div class="cs-autocomplete-popup">
<div class="inner">
<div class="cs-autocomplete-Matches csc-autocomplete-Matches">
<ul>
<li class="cs-autocomplete-matchItem csc-autocomplete-matchItem">
<span class="csc-autocomplete-matchItem-content cs-autocomplete-matchItem-content" id="matchItem::matchItemContent">john doe</span>
</li>
</ul>
</div>
<div class="csc-autocomplete-addToPanel cs-autocomplete-addToPanel">
<hr>
<div class="content csc-autocomplete-addTermTo cs-autocomplete-addTermTo">Add "John Doe" to:</div>
<ul>
<li class="cs-autocomplete-authorityItem csc-autocomplete-authorityItem" id="authorityItem:">Local Persons</li>
</ul>
</div>
</div>
<div class="cs-autocomplete-popup-miniView csc-autocomplete-popup-miniView" style="top: 2px; left: 149px; display: none;"><div class="cs-miniView">
john doe
<div>
<span class="csc-autocomplete-popup-miniView-field1Label cs-autocomplete-popup-miniView-field1Label">b.</span>
<span class="csc-autocomplete-popup-miniView-field1 cs-autocomplete-popup-miniView-field1" id="field1"></span>
</div>
<div>
<span class="csc-autocomplete-popup-miniView-field2Label cs-autocomplete-popup-miniView-field2Label">d.</span>
<span class="csc-autocomplete-popup-miniView-field2 cs-autocomplete-popup-miniView-field2" id="field2"></span>
</div>
<div>
<span class="csc-autocomplete-popup-miniView-field3 cs-autocomplete-popup-miniView-field3" id="field3"></span>
</div>
<div>
</div>
</div></div>
</div>
From this, I am trying to select "john doe". Does anyone know an efficient/complete way of doing this? I would very much appreciate the help.
driver.findElement(By.xpath("get the exact field address of autocomplete textbox");
Thread.sleep(5000);
//for xpath: need to take the common xpath from the list of the elements.
List<WebElement> link=driver.findElements(By.xpath("//span[contains(#class, 'cs-autocomplete-matchItem-content') and .='john doe']");
for(int i=0; i<=link.size(); i++)
{
if(link.get(i).getText().equalsIgnoreCase("john doe");
{
link.get(i).click();
}
}

Selenium - Not able to Click Dynamically Visible Menu

I have a Menu which have li
(list) elements which gets enabled after you mouse-hover a particular label.
driver.get("www.snapdeal.com"); Actions actions = new Actions(driver);
actions.moveToElement(driver.findElement(By.id("loggedOutAccount"))).build().perform();
//Wait for 5 Secs
driver.findElement(By.className("accountLink")).click();// Here it's throwing Element not visible exception
This code is doing the mouse-hover properly but not able to click the "SignIn Link" Link. Though on manually checking the element is Visible
DOM Structure -
<div id="loggedOutAccount" class="hd-rvmp-logout">
<a class="signIn" href="javascript:void(0);">
<i class="iconHeader accountUser"></i>
<label class="my-account-lang"> My Account</label>
<i class="mar_2_left right-downArrow breadcrumbArrow-down"></i>
</a>
<div class="sdNavDropWrapper accDetails" style="display: none; z-index: 999;">
<ul class="positionAbsolute pull-right">
<li class="customLoggedInState">
<div class="left triangle"></div>
<div class="right triangle"></div>
<div>
<a class="accountLink" href="javascript:void(0);">Click here to sign in ></a>
</div>
</li>
<li class="stop-event">
<li class="stop-event">
<li class="stop-event">
<li class="stop-event">
<li class="stop-event">
</ul>
</div>
</div>
Please use xpath for both element like below :
driver.get("www.snapdeal.com");
Actions actions = new Actions(driver);
actions.moveToElement(driver.findElement(By.xpath("yourxpathhere"))).build().perform();
driver.findElement(By.xpath("yourxpathhere")).click();
I think class/Id repeating for other elements also for style purpose. so Xpath is better to find unique element.

Is there a Microformat for the Hours a Business is open?

I was wondering if there was yet a Microformat for a business's hours of operation.
If not, who do I submit a standard to?
After submitting the same question to the Microformats mailing list, I received a reply from someone named Martin Hepp who apparently has come up with a specification for this.
He provided me with the following links:
The GoodRelations vocabulary provides
a standard way for business hours of
operation, see:
http://www.ebusiness-unibw.org/wiki/Rdfa4google
http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey
#
the full spec and other materials are at
http://www.ebusiness-unibw.org/wiki/GoodRelations
This is used e.g. by Bestbuy to expose the opening hours of their 1000k
stores in the US.
Best
Martin
The most widely used markup for opening hours on the Web is GoodRelations.
Here is an example:
<div xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:gr="http://purl.org/goodrelations/v1#"
xmlns:vcard="http://www.w3.org/2006/vcard/ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<div about="#store" typeof="gr:LocationOfSalesOrServiceProvisioning">
<div property="rdfs:label" content="Pizzeria La Mamma"></div>
<div rel="vcard:adr">
<div typeof="vcard:Address">
<div property="vcard:country-name" content="Germany"></div>
<div property="vcard:locality" content="Munich"></div>
<div property="vcard:postal-code" content="85577"></div>
<div property="vcard:street-address" content="1234 Main Street"></div>
</div>
</div>
<div property="vcard:tel" content="+33 408 970-6104"></div>
<div rel="foaf:depiction" resource="http://www.pizza-la-mamma.com/image_or_logo.png"></div>
<div rel="vcard:geo">
<div>
<div property="vcard:latitude" content="48.08" datatype="xsd:float"></div>
<div property="vcard:longitude" content="11.64" datatype="xsd:float"></div>
</div>
</div>
<div rel="gr:hasOpeningHoursSpecification">
<div about="#mon_fri" typeof="gr:OpeningHoursSpecification">
<div property="gr:opens" content="08:00:00" datatype="xsd:time"></div>
<div property="gr:closes" content="18:00:00" datatype="xsd:time"></div>
<div rel="gr:hasOpeningHoursDayOfWeek" resource="http://purl.org/goodrelations/v1#Friday"></div>
<div rel="gr:hasOpeningHoursDayOfWeek" resource="http://purl.org/goodrelations/v1#Thursday"></div>
<div rel="gr:hasOpeningHoursDayOfWeek" resource="http://purl.org/goodrelations/v1#Wednesday"></div>
<div rel="gr:hasOpeningHoursDayOfWeek" resource="http://purl.org/goodrelations/v1#Tuesday"></div>
<div rel="gr:hasOpeningHoursDayOfWeek" resource="http://purl.org/goodrelations/v1#Monday"></div>
</div>
</div>
<div rel="gr:hasOpeningHoursSpecification">
<div about="#sat" typeof="gr:OpeningHoursSpecification">
<div property="gr:opens" content="08:30:00" datatype="xsd:time"></div>
<div property="gr:closes" content="14:00:00" datatype="xsd:time"></div>
<div rel="gr:hasOpeningHoursDayOfWeek" resource="http://purl.org/goodrelations/v1#Saturday"></div>
</div>
</div>
<div rel="foaf:page" resource=""></div>
</div>
</div>
Note that the Microformats suggestion from Ton does not really model that this is an opening hour, so a client cannot do a lot with it. GoodRelations markup is supported by many major companies. For example, BestBuy is using GoodRelations on all of their 1000+ store pages for indicating opening hours.
A HTML micro-format can look like:
<ol class="business_hours">
<li class="monday">Maandag <span class="dtstart" title="08:00:00+01">9.00</span> - <span class="dtend" title="17:00:00+01">18.00</span> uur</li>
<li class="tuesday">Dinsdag <span class="dtstart" title="08:00:00+01">9.00</span> - <span class="dtend" title="17:00:00+01">18.00</span> uur</li>
<li class="wednesday">Woensdag <span class="dtstart" title="08:00:00+01">9.00</span> - <span class="dtend" title="17:00:00+01">18.00</span> uur</li>
<li class="thursday">Donderdag <span class="dtstart" title="08:00:00+01">9.00</span> - <span class="dtend" title="17:00:00+01">18.00</span> uur</li>
<li class="friday">Vrijdag <span class="dtstart" title="08:00:00+01">9.00</span> - <span class="dtend" title="17:00:00+01">18.00</span> uur</li>
<li class="saturday">Zaterdag <span class="dtstart" title="08:00:00+01">9.00</span> - <span class="dtend" title="15:00:00+1">16.00</span> uur</li>
<li>Zondag Gesloten</li>
</ol>
Excuse my Dutch :)
My 2 cents.
Microformat has updated their wiki with a suggested way of implementing Operating Hours based on hCalendar.
http://microformats.org/wiki/operating-hours
See https://schema.org/openingHours
Schema.org is an initiative launched on 2 June 2011 by Bing, Google and Yahoo.
An example:
<strong>Openning Hours:</strong>
<time itemprop="openingHours" datetime="Tu,Th 16:00-20:00">
Tuesdays and Thursdays 4-8pm
</time>
Perhaps http://microformats.org/ may be of use...
If is still useful, you should submit to the microformats community using their wiki: microformats.org.
In this link you have all the existing process to propose a new microformat specification.
Hope that helps.