how to identify page boundries in a .prn file of multiple page job - ps

Can someone help me to identify the page boundaries in a multiple page job .prn file ?
My goal is to use a PJL command #PJL SET MEDIASOURCE=TRAYX at the start of every page of a multiple job file.
Where X = page number, for example:
for Page 1 : #PJL SET MEDIASOURCE=TRAY1
for Page 2 : #PJL SET MEDIASOURCE=TRAY2

Related

Make Selenium scroll LinkedIn to scrape jobs

I have this code scraping each job title and company name from :
https://www.linkedin.com/jobs/search/?geoId=106155005&location=Egypt
This is for every job title
job_titles = browser.find_elements_by_css_selector("a.job-card-list__title")
c = []
for title in job_titles:
c.append(title.text)
print(c)
print((len(c)))
This is for every company name
Company_Names = browser.find_elements_by_css_selector("a.job-card-container__company-name")
d = []
for name in Company_Names:
d.append(name.text)
print(d)
print((len(d)))
I provided the URL above, there are many many pages!
how can I make Selenium auto-open each page and scrape each of the 4 thousand results available?
I have found a way to paginate to each page, but I am yet to know how to scrape each page.
So the URL is :
https://www.linkedin.com/jobs/search/?geoId=106155005&location=Egypt&start=25
The start parameter in the URL increments by 25 from each page to the other.
so we add this piece of code which navigates us successfully to the other pages:
page = 25
pagination = browser.get('https://www.linkedin.com/jobs/search/?geoId=106155005&location=Egypt&start={}'.format(page))
for i in range(1,40):
page = i * 25
pagination = browser.get('https://www.linkedin.com/jobs/search/?geoId=106155005&location=Egypt&start={}'.format(page

In Adobe Acrobat Javascript, how can I force a page to become "editable" before a certain part of a script acts upon it?

What I'm trying to do: Iterate over each page in a PDF, and extract the number of words on each page.
What is happening instead: The code below will return 0 words for any page that has not become "editable". Although I have selected for all pages to become editable at once, Adobe will not maintain the editability of a page for very long after I have left that page. Side note: It also seems to cap how many pages I can have "editable" at once. This is a problem because right now I'm working with a 10 page selection of a pdf file. This same code will have to work with a 120+ page pdf. Please click 'Edit PDF'-->'Scanned Documents'-->'Settings' to see what I mean by "editable". I have already selected the option to have all pages become editable at once.
What I've tried so far: I've tried various ways to get Acrobat to make the page being iterated upon the "active one" so that it would become editable. I've tried manually setting the page number after each iteration of the for loop, and including an artificial delay like with the h variabled for loop in the sample code. I've tried looking for some sort of method that determines which page is the "active one" but I've had no luck so far.
CurrDoc = app.activeDocs[0]
CurrDoc.title;
NumPagesInDoc = CurrDoc.numPages;
console.println("Document has "+NumPagesInDoc+" pages");
for (j=0; j<NumPagesInDoc; j++)
{
NumWordsOnPage = CurrDoc.getPageNumWords(j);
CurrDoc.pageNum = j;
for(h=0; h<10000;h++); //<--I've tried adding in delays to give time so that
//Acrobat can catch up, but this hasn't worked.
console.println("Page number: "+j+" has this number of words: "+ NumWordsOnPage);
};
Output:
Document has 10 pages
Page number: 0 has this number of words: 309
Page number: 1 has this number of words: 0
Page number: 2 has this number of words: 0
Page number: 3 has this number of words: 0
Page number: 4 has this number of words: 0
Page number: 5 has this number of words: 0
Page number: 6 has this number of words: 0
Page number: 7 has this number of words: 0
Page number: 8 has this number of words: 0
Page number: 9 has this number of words: 158
true
Note: Different pages might work on the output at different times depending on which pages I've clicked on most recently before running the script.
Any guidance or help would be greatly appreciated. Thank you for your time.
So. I'm still not entirely sure what the issue is, but I've found a way to get acrobat to function most of the time.
Before clicking the "make all pages editable" option, zoom all the way out until you can see all the pages in the document. For whatever reason, when I did this, it would seem to refresh something about the settings and once again make all the pages editable. This even seemed to work when I opened a totally different pdf and pressed "make all pages editable" even without zooming out.

Can I count rows across multiple pages of a table with robot framework?

I am quite new to robot, and have only been working solo on it at work for a month or so.
Currently I am trying to count the total number on rows in a table within the application I am testing. (chrome based)
This is what I am using:
${count}= get element count //table[#class='options-table']/tbody/tr
Which brings back a value of 5 - this is counting the first page. However, I'm expecting it to bring back 76 as there are multiple pages.
Can anyone help on how to bring back the amount of rows across multiple pages?
${count}= get element count //table[#class='options-table']/tbody/tr
Expected result: 76
Actual result: 5 (only the first page)
To avoid a slightly complex logic (iterating through pages, summing up element counts) in a Robot Framework keyword you could write your own keyword in Python for example.
In this case you need a keyword that takes an element locator (//table[#class='options-table']/tbody/tr to be specific) and a list of page urls.
To implement such keyword, create a file like ExtendedSeleniumLib.py:
from robot.libraries.BuiltIn import BuiltIn
def get_element_count_from_pages(locator, *page_urls):
seleniumlib = BuiltIn().get_library_instance('SeleniumLibrary')
element_count = 0
for url in page_urls:
seleniumlib.go_to(url)
element_count += seleniumlib.get_element_count(locator)
return element_count
and from your test code you can use it like:
*** Settings ***
Library SeleniumLibrary
Library ExtendedSeleniumLib
*** Variables ***
${SE HEADER LOCATOR} //a[#class='site-header--link fs-headline1 fw-bold']
*** Test Cases ***
Count Elements On Multiple Pages Example
[Setup] Open Browser https://stackoverflow.com Firefox
Maximize Browser Window
Set Selenium Speed 0.1
${count}= Get Element Count From Pages ${SE HEADER LOCATOR}
... https://iot.stackexchange.com/
... https://sqa.stackexchange.com/
... https://robotics.stackexchange.com/
Should Be Equal As Integers ${count} 3
[Teardown] Close Browser
This example iterates through three Stack Exchange sites and counts the header elements. As there should be only one on each page the expected result is 3. Based on this you should be able to count the table rows on your pages.
About how to configure search path for libraries and resources, check the relevant chapter form the Robot Framework User Guide; Configuring where to search libraries and other extensions. If you place the python file into the same directory where your robot file is, then you do not need anything to do.
Please check below code, it assumes that the total number of pages will not be more than 100 since I'm not aware of the webpage, you can either take this number from webpage if available. Also, if you are sure that total number of rows per page is always 5 then you can use below formula
[ 5 * (total number of pages - 1 ) + row count of the last page]
This can give you total row count across all pages without traversing through all the pages. Also, please add any time synchronisation steps for the successful run.
Get Count of All Pages
${next_page_locator} Set Variable enter next page icon/link xpath here
${first_row_locator} Set Variable enter first row xpath here
${total_count} set variable 0
: FOR ${index} IN RANGE 1 100
\ Wait Until Element Is Visible ${first_row_locator}
\ ${count} get element count //table[#class='options-table']/tbody/tr
\ ${total_count} evaluate ${count} + ${total_count}
\ ${next_link_present} Run Keyword And Return Status Page Should Contain Element ${next_page_locator}
\ exit for loop if ${next_link_present} is ${False}
\ Click Element ${next_page_locator}

Scrapy Rules - Navigating one page at a time

My scrapy script has rules specified as below:
rules = (Rule(LinkExtractor(allow=(), restrict_xpaths=<xpath for next page>), callback=parse_website, follow= True, ),)
The website itself has a navigation but each page only shows the link to the next page. i.e as page 1 loads, I can get the link to page 2 and so on and so forth.
How do I get my spider to navigate through all of the n pages?
Thank you!

Display (include) MediaWiki table of contents (TOC) on another page

In MediaWiki, we would like to display tables of contents (from multiple pages) on one other page. We know that this can be done automatically, e.g. if we include pages 1, 2 & 3 like this:
{{:Page 1}}
{{:Page 2}}
{{:Page 3}}
on page X, then page X displays a combined TOC for pages 1, 2 & 3.
But we want a table on page X which shows each TOC in a separate cell. Is there any way to include each TOC individually?
I have tried using <noinclude></noinclude> tags around the text on pages 1, 2 & 3 and then forcing a table of contents outside (using __TOC__) but that only creates a TOC on page X (using the contents of page X).
You can't. The table of contents is generated dynamically in each page, for all the sections that appear in the current page.
When you include the sections (or at least the section headings) of the other pages, they will show up in the TOC of page X. If you include the __TOC__ magic word, it means only to generate the toc for page X.
Three solutions:
Include the section (headings) of pages 1, 2 and 3. They will show up in the toc of page X even when contained in a <div style="display:none;"> - a really ugly way.
Copy the TOC tables manually to page X. You can view their HTML by looking in the generated HTML source of pages 1, 2 and 3 with your browser.
Write an extension that allows transclusion of TOCs from other pages. It might introduce a new parserfunction {{toc:<pagename>}} and be able to call the toc-generating function in the context of another page.
Include only the section headings as a list. In the pages 1, 2 and 3 you will need to write
== <onlyinclude><includeonly>##</includeonly> Heading Number One </onlyinclude> ==
=== <onlyinclude><includeonly>###</includeonly> Part One of Heading Number One </onlyinclude> ===
...
which you will be able to include in the table at Page X with
{{:Page 1}}
It should show up as a numbered list, like the TOC.