How do I use chained `css()` calls so the selector in the second call uses the first call as the context? - scrapy

I'm processing a table row-by-row and need to sniff the ids of the rows:
<table id="tbl">
<tr id="row_1">
<td id="cell_1">...</td>
</tr>
<tr id="row_2">
<td id="cell_2">...</td>
</tr>
</table>
So my code looks something like:
def parse_table(self, response):
rows = response.css('#tbl > tr')
for row in rows:
rowid = row.css('::attr(id)')
if rowid.extract_first().startswith('row'):
...
However, this way, the second call to .css() gives me IDs of all the descendants of row, not just its direct children. I.e. for the above example HTML, it returns "cell_1" as well as "row_1". How do I scope the chained css() call so it only acts on direct children of the given row?
I've tried using the :scope pseudo-class but that doesn't seem to be supported by Scrapy, and :root gives me no results.
Alternately, can I just get the value of the id attribute without going through CSS?

I can show you how to use XPath for the same task:
def parse_table(self, response):
for row in response.xpath('//*[#id="tbl"]/tr'):
rowid = row.xpath('./#id').extract_first()
if rowid.startswith('row'):
...

Related

htmx: How to swap table row with hx-swap-oob?

I want to use hx-swap-oob to replace a table row of the existing page "out of band".
in browser:
<table>
<tr id="offer_1">....</tr>
<tr id="offer_2">....</tr> (old)
<tr id="offer_3">....</tr>
</table>
From Server to client:
<table hx-swap-oob="outerHTML:#offer_2" hx-select="#offer_2">
<tr id="offer_2"> .... </tr> (new)
</table>
But up to now this is the result:
<table>
<tr id="offer_1">....</tr>
<table hx-swap-oob="outerHTML:#offer_2" hx-select="#offer_2">
<tr id="offer_2"> .... </tr> (new)
</table>
<tr id="offer_3">....</tr>
</table>
I guess hx-select does not get evaluated when htmx get this snippet from the server.
How can I swap a row out-of-band?
Take a look at the new extension multi-swap.
https://htmx.org/extensions/multi-swap/
It allows swapping multiple elements marked with the id attribute.
For each element it is possible to choose which swap method should be used.
This does work:
<tr hx-swap-oob="true" id="offer_2"> .... </tr> (new)
But it has a drawback:
You need to modify the method which creates this row. Depending on your context, you might already have a method for this. Why modify this method, just because the result of this method should get used out-of-band?
If you use Django, this snippet could get used to add the hx-swap-oob attribute after the HTML got created:
def add_oob_attribute(html):
"""
I would like to avoid this ugly hack
https://github.com/bigskysoftware/htmx/issues/423
"""
assert isinstance(html, SafeString)
new, count = re.subn(r'(<\S+)', r'\1 hx-swap-oob="true"', html, count=1)
if not count == 1:
raise ValueError(f'Could not add hx-swap-oob: {html}')
return mark_safe(new)
I created an issue to find a better solution in the future:
https://github.com/bigskysoftware/htmx/issues/423

Selenium xpath with multiple conditions and fetch element value

I have a condition where tr row which generates dynamic value:
<tbody>
<tr id="24686" tabindex="0">
<td class="nowrap xh-highlight" style="padding: 3px 8px;">Available</td>
</tr>
</tbody>
I have Xpath 1: (//tbody/tr/td[contains(text(),'Available')])[1] which returns
Available
and Xpath 2: //tr[1]/#id which returns
ld_9050427
22707
The condition is that I want to generate one xpath which will return first number whose status is Available and then return its ID. Later on I want to use this same id to carry on later process?
I tried something like below but it didn't work
(//tbody/tr[/#id and/td[contains(text(),'Disponible')]])[1]
If you want to select tr that has id attribute (any) and table cell with text "Available" try
//tr[#id and td='Available']
to extract id value for further use you need get_attribute/getAttribute method
To find the first number whose status is Available and then return its ID you can use the following solution:
xpath:
"//tbody//tr//td[text()='Available']/.."
Note 1: The .. in the xpath refers to the ancestor node
Note 2: As you are looking for the first match with the implemented condition, you have use either:
Python:
find_element_by_xpath()
Java:
findElement()
C#:
FindElement()
Note 3: Finally you have to use getAttribute("id") / get_attribute("id") to extract the value of the id attribute as follows:

Updating HTML table through JSTL and SQL

I have a HTML table which initially displays all records in the DB. I have just added a JQuery Datepicker component, and now I want to be able to retrieve records based on date. I have no problem extracting the date from the Datepicker component, but I do have a problem updating the existing HTML table. My code is pretty standard.
<sql:query var="retrieveSecurity" dataSource="jdbc/SecurityApp">SELECT * FROM security</sql:query>
<tbody>
<c:forEach var="row" items="${retrieveSecurity.rows}">
<tr>
<td>${row.security_id}</td>
<td>${row.description}</td>
<td class="center">${row.bid_price}</td>
<td class="center">${row.bid_percent_change}</td>
<td>${row.comment}</td>
</tr>
</c:forEach>
</tbody>
What I basically want to do is to have another SQL query, something like this...
<sql:query var="retrieveSecurity" dataSource="jdbc/SecurityApp">SELECT * FROM security where trade_date = "MYVALUE"
</sql:query>
I then want to be able to update the the tbody tag to point to this new SQL query, and call it instead. The columns won't change, so it should be the same table, just with filtered data. Not sure if what I am asking is the easiest/best solution, just looking for something that will work quickly for a prototype.
Hope this makes sense.

How can I insert a HTML tag before its parent tag using the Transformer from Genshi?

I need to modify the file table in my trac browser view by creating a class which implements the ITemplateStreamfilter class. I tried using the Transformer from genshi.filters.transform. My table looks like
<tbody>
<tr class="even">
<td class="name">
<a class="partent" title="Parent Directory" ..>..</a>
</td>
..
</tr>
..
</tbody>
I now need to insert a </td> tag just before the frist cell in the first row of the table. The problem is that I only can identify the position of column where I want to put the new cell befor by searching for the "Parent Directory" title: Transformer('//*[#title="Parent Directory"]'). How can I step one tag up than put the new cell before the first <td class="name"> tag?
I'm not THAT familiar with the support for XPATH of Transformer BUT:
What about
Transformer('(//td[*[#title="Parent Directory"]])[1]') and then using the before method?
As far as I understand, this should select the first td node with a child node with an attribute title="Parent Directory".
If you want to select any td with that kind of child node use
Transformer('//td[*[#title="Parent Directory"]]')
However, this only works if Transformer supports those XPATH expressions.
Edit 1
If you're sure, your td has an attribute class="name" you can also use Transformer('(//td[class="name" and *[#title="Parent Directory"]])[1]')

Calculate module of index int Struts2 iterator

I'm using Struts2 iterator to setup a list of checkbox in a table. I want to have 10 checkbox per row, so I'm doing the following:
<table>
<tr>
<s:iterator value="securityMasterFields" status="fieldNameStatus" var="fieldName">
<s:if test="#fieldNameStatus.index % 10 ==0">
</tr><tr>
</s:if>
<td>
<s:checkbox name="fieldsToShow" fieldValue="%{fieldName}" value="%{fieldName}"/>
</td>
</s:iterator>
</tr>
</table>
It never goes through the if, so I'm assuming the mod is not been calculated correctly. How do I do it?
thanks
Well, I had to add some parentheses and it worked correctly. The loop was working, it was just that it wasn't going through the if.
<s:if test="(#fieldNameStatus.index % 8 )==0"></tr><tr></s:if>
It looks good to me. Two thoughts:
1) try printing the result of the test in s:property tag
2) It looks like you will have empty table rows... Are you looking at the generated html or just the output, because if it is just the output then unless you have some CSS giving you some table padding and borders, without an empty 'td' element the row might collapse and make it appear as if nothing is being added. So do make sure you print the empty 'td' elements too!