Download a page doesn't return a status code - scrapy

I have found a page I need to download that doesn't include an http status code in the returned headers. I get the error: ParseError: ('non-integer status code', b'Tag: "14cc1-5a76434e32f9e"') which is obviously accurate. But otherwise the returned data is complete.
I'm just trying to save the page content manually in a call back: afilehandle.write(response.body) sort of thing. It's a pdf. Is there a way I can bypass this and still get the contents of the page?
The returned example that also crashed fiddler. The first thing in the header is Tag.
Tag: "14cc1-5a76434e32f9
e"..Accept-Ranges: bytes
..Content-Length: 85185.
.Keep-Alive: timeout=15,
max=100..Connection: Ke
ep-Alive..Content-Type:
application/pdf....%PDF-
1.4.%ÓôÌá.1 0 obj.<<./Cr
eationDate(D:20200606000
828-06'00')./Creator(PDF
sharp 1.50.4740 \(www.pd
fsharp.com\))./Producer(
PDFsharp 1.50.4740 \(www
.pdfsharp.com\)).>>.endo
bj.2 0 obj.<<./Type/Cata
log./Pages 3 0 R.>>.endo
bj.3 0 obj.<<./Type/Page
s./Count 2./Kids[4 0 R 8
0 R].>>.endobj.4 0 obj.
<<./Type/Page./MediaBox[
0 0 612 792]./Parent 3 0
R./Contents 5 0 R./Reso
urces.<<./ProcSet [/PDF/
Text/Ima.... etc
Note: For any not familiar with PDF file structure %PDF-1.4 and everything after is the correct format for a PDF document. Chrome downloads the PDF just fine even with the bad headers.

In the end, I modified the file twisted/web/_newclient.py directly to not throw the error, and use a weird status code that I could identify:
def statusReceived(self, status):
parts = status.split(b' ', 2)
if len(parts) == 2:
version, codeBytes = parts
phrase = b""
elif len(parts) == 3:
version, codeBytes, phrase = parts
else:
raise ParseError(u"wrong number of parts", status)
try:
statusCode = int(codeBytes)
except ValueError:
# Changes were made here
version = b'HTTP/1.1' #just assume it is what it should be
statusCode = 5200 # deal with invalid status codes later
phrase = b'non-integer status code' # sure, pass on the error message
# and commented out the line below.
# raise ParseError(u"non-integer status code", status)
self.response = Response._construct(
self.parseVersion(version),
statusCode,
phrase,
self.headers,
self.transport,
self.request,
)
And I set the spider to accept that status code.
class MySpider(Spider):
handle_httpstatus_list = [5200]
However, in the end I discovered the target site behaved correctly when accessed via https, so I ended up rolling back all the above changes.
Note the above hack would work, until you updated the library, at which point you would need to reapply the hack. But it could possibly get it done if you are desparate.

Related

Why FF_5 is not posting EBS records to subledgers?

I'm trying to post document through tcode FF_5 (electronic bank statements) as SWIFT MT940 - international format, with immediate posting parameter. Bank Accounting Posting works fine, but Subledger posting doesn't work correctly.
After debugging I found information that document is being posted by FM: 'POSTING_INTERFACE_DOCUMENT'. Inside return table - t_bapiret2 I'm getting message "Batch Input for screen SAPLFCPD 0100 does not exist" (Type: S, ID: 00, NR: 344). When I'm trying to post this without background processing I have to insert name of customer into field BSEC-NAME1 of this screen and it posts fine.
I want to automize this process. How should I pass data to ftpost[] or bdcdata[] tables to inject information about Customer Name? I tried to do it in various ways in debugging mode but none of them worked for me.
Sample BDCDATA[] record that I created:
ft-program = 'SAPLFCPD'.
ft-dynpro = '0100'.
ft-dynbegin = 'X'.
APPEND ft.
CLEAR ft.
ft-fname = 'BSEC-NAME1'.
ft-fval = 'TEST'.
APPEND ft.
EDIT:
Sample bank statement:
:20:MT940
:25:/PL22112110212000180204832110
:28C:56
:60F:C220525PLN89107,30
:61:2205250525D269,98N152NONREF//6450501100324535
152 0
:86:020~00152
~20ZAM.PL111111111, FVKOR/0022
~2111/2205/2401120
~22˙
~23˙
~24˙
~25˙
~3010202964
~310000620200678839
~32CUSTOMER NAME
~33˙
~38PL23102029640000620200678839
~60˙
~63˙
:62F:C220525PLN88837,32
:64:C220525PLN88837,32
-
This is one-time Client, he has no master data information that's why I want to inject it.
I would really appreciate any help.
I added some code to process it as BDC, right now entries are available in SM35.
Code looks like this:
ENHANCEMENT 1 ES_BDC_FEBAN. "active version
data lv_session TYPE APQI-GROUPID.
lv_session = |{ SY-DATUM }{ SY-TIMLO(4) }|.
DATA: lv_name1 LIKE bsec-name1.
GET PARAMETER ID 'FEBAN_NAME1' FIELD lv_name1.
IF lv_name1 IS NOT INITIAL.
CALL FUNCTION 'BDC_OPEN_GROUP'
EXPORTING
client = SY-MANDT " Client
group = LV_SESSION " Session name
keep = 'X' " Indicator to keep processed sessions
user = SY-UNAME " Batch input user
EXCEPTIONS
client_invalid = 1 " Client is invalid
destination_invalid = 2 " Target system is invalid/no longer relevant
group_invalid = 3 " Batch input session name is invalid
group_is_locked = 4 " Batch input session is protected elsewhere
holddate_invalid = 5 " Lock date is invalid
internal_error = 6 " Internal error of batch input (see SYSLOG)
queue_error = 7 " Error reading/writing the queue (see SYSLOG)
running = 8 " Session is already being processed
system_lock_error = 9 " System error when protecting BI session
user_invalid = 10 " BI user is not valid
others = 11
.
IF SY-SUBRC <> 0.
ENDIF.
MODE = 'Q'.
clear: FUNCT, SGFUNCT.
* funct = 'B'.
* SGFUNCT = 'B'.
ft-program = 'SAPLFCPD'.
ft-dynpro = '0100'.
ft-dynbegin = 'X'.
APPEND ft TO ft[].
CLEAR: ft-program, ft-dynpro, ft-dynbegin.
ft-fnam = 'BSEC-NAME1'.
ft-fval = lv_name1.
APPEND ft TO ft[].
CALL FUNCTION 'BDC_INSERT'
EXPORTING
tcode = tcode
TABLES
dynprotab = ft.
call function 'BDC_CLOSE_GROUP' .
COMMIT WORK AND WAIT.
SUBMIT RSBDCSUB EXPORTING LIST TO MEMORY
WITH mappe EQ lv_session
WITH von EQ sy-datum
WITH bis EQ sy-datum
WITH z_verarb EQ 'X'
WITH fehler EQ ''
WITH logall EQ 'X'
AND RETURN.
ENDIF.
ENDENHANCEMENT.
Variables entries:
Tcode = 'FB01'
FT[]:
<asx:abap version="1.0" xmlns:asx="http://www.sap.com/abapxml"><asx:values><_--5CTYPE_--3D_--25_T00004S00000371O0000147040><item><PROGRAM>SAPMF05A</PROGRAM><DYNPRO>0100</DYNPRO><DYNBEGIN>X</DYNBEGIN><FNAM/><FVAL/></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BDC_CURSOR</FNAM><FVAL>RF05A-NEWKO</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BKPF-BLDAT</FNAM><FVAL>25.05.2022</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BKPF-BLART</FNAM><FVAL>WB</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BKPF-BUKRS</FNAM><FVAL>1700</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BKPF-BUDAT</FNAM><FVAL>25.05.2022</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BKPF-WAERS</FNAM><FVAL>PLN</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BKPF-XBLNR</FNAM><FVAL>PBE01PL41022056</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BKPF-BKTXT</FNAM><FVAL>0000375800001</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>RF05A-NEWBS</FNAM><FVAL>40</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>RF05A-NEWKO</FNAM><FVAL>1232000000</FVAL></item><item><PROGRAM>SAPMF05A</PROGRAM><DYNPRO>0300</DYNPRO><DYNBEGIN>X</DYNBEGIN><FNAM/><FVAL/></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BSEG-WRBTR</FNAM><FVAL>269,98</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BSEG-VALUT</FNAM><FVAL>25.05.2022</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BSEG-ZUONR</FNAM><FVAL>0000375800001PLN</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BSEG-SGTXT</FNAM><FVAL>NONREF 020152 ZAM.PL146751217, FVKOR/002211/2205/2</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BDC_CURSOR</FNAM><FVAL>RF05A-NEWKO</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>RF05A-NEWBS</FNAM><FVAL>50</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>RF05A-NEWKO</FNAM><FVAL>1430101010</FVAL></item><item><PROGRAM>SAPLKACB</PROGRAM><DYNPRO>0002</DYNPRO><DYNBEGIN>X</DYNBEGIN><FNAM/><FVAL/></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BDC_OKCODE</FNAM><FVAL>/00</FVAL></item><item><PROGRAM>SAPMF05A</PROGRAM><DYNPRO>0300</DYNPRO><DYNBEGIN>X</DYNBEGIN><FNAM/><FVAL/></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BSEG-WRBTR</FNAM><FVAL>269,98</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BSEG-VALUT</FNAM><FVAL>25.05.2022</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BSEG-ZUONR</FNAM><FVAL>PL1467512</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BSEG-SGTXT</FNAM><FVAL>NONREF 020152 ZAM.PL111111111, FVKOR/002211/2205/2</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BDC_CURSOR</FNAM><FVAL>RF05A-NEWKO</FVAL></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BDC_OKCODE</FNAM><FVAL>/11</FVAL></item><item><PROGRAM>SAPLKACB</PROGRAM><DYNPRO>0002</DYNPRO><DYNBEGIN>X</DYNBEGIN><FNAM/><FVAL/></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BDC_OKCODE</FNAM><FVAL>/00</FVAL></item><item><PROGRAM>SAPLFCPD</PROGRAM><DYNPRO>0100</DYNPRO><DYNBEGIN>X</DYNBEGIN><FNAM/><FVAL/></item><item><PROGRAM/><DYNPRO>0000</DYNPRO><DYNBEGIN/><FNAM>BSEC-NAME1</FNAM><FVAL>CUSTOMER NAME</FVAL></item></_--5CTYPE_--3D_--25_T00004S00000371O0000147040></asx:values></asx:abap>
Data might looks slightly differently from debugger and bank statement.
There are 2 entries in SM35, first is processed correctly, but 2nd one has log entries like this:
Can somebody help me please?
Most likely you are confusing working principles of FEBAN and FF_5.
In SM35 you will see BI sessions created by FF_5. You need to process them to post real postings.
Also I recommend to retry the failed postings via FEBP transaction, which is called by FF_5 under the hood. It does almost the same as FF_5, and uses FF_5 data, but has the ability to repost the failed records.
The one interesting parameter FEBP has is Bk Pstg Only "Only post to G/L", which may be setting silently by FF_5 which may prevent you to post to subledgers. Though I can't confirm this, it's only assumption.
P.S. Also I recommend to never ever change automatically generated batch sessions like you do, not SAPLFCPD nor any others.
Problem solved. I passed records in ft[] in wrong order.
Very usefull thing is using tcode SHDB as simulation how records should be passed. At my case FT[] table should contain
SAPMF05A scr. 0100
[... required fields ...]
SAPLFCPD scr. 0100
BSEC-NAME1 <-- Injected missing field
SAPMF05A scr. 0300
[... required fields ...]
SAPMF05A SCR. 0301
[... required fields ... -> SAVE]
Topic can be closed. Thank you.

Error saying "Expecting value: line 1 column 1 (char 0) " when retrieving venue for the tip with the greatest number of agree counts

I am using foursquare api to retrieve the venue for the tip with the greatest number of agree counts using requests library in jupyterLab but I am getting this error "Expecting value: line 1 column 1 (char 0)". I am new to using api calls so please help me on how to fix this error. Below is my code:
tip_id = '5ab5575d73fe2516ad8f363b' # tip id
# define URL
url = 'http://api.foursquare.com/v2/tips/{}?client_id={}&client_secret={}&v={}'.format(tip_id, CLIENT_ID, CLIENT_SECRET, VERSION)
# send GET Request and examine results
result=requests.get(url).json()
# .get(url).json()
print(result['response']['tip']['venue']['name'])
print(result['response']['tip']['venue']['location'])
error:
[Error

Scrapy - get depth level of failed requests with no response

When parsing scraped pages I also save the depth the request was scraped from using response.meta['depth'].
I recently started using errback to log all failed requests into a separate file and having depth there would help me a lot. (I believe) I could use failure.value.response.meta['depth'] for those pages which actually got a response but failed due to ie a http status error like 403 etc., however when an error like TCPTimeout is encountered there is no response.
Is it possible to get the depth level of a failed request with no response?
EDIT1: Tried failure.request.meta['depth'] but that gives an error. Meta seems that can be found but it has no depth key.
EDIT2: The issue seems to be that failure.request.meta['depth'] is created only when the first response is received. So the way I understand is that if the first request, a start_url doesn't receive a response, the depth key is not yet created and hence throws an exception.
I'm going to experiment with this as per the depth middleware:
if 'depth' not in response.meta:
response.meta['depth'] = 0
Yep, the issue turns out to be exactly how I described it in EDIT2. This is how I fixed it:
def start_requests(self):
for u in self.start_urls:
yield scrapy.Request(u, errback=self.my_errback)
def my_errback(self, failure):
if 'depth' not in failure.request.meta:
failure.request.meta['depth'] = 0
depth = failure.request.meta['depth']
# do something with depth...
Big thanks to mr #Galecio who pointed me in the right direction!

If response contains the word 'any' then match response contains is failing

Let's say if I am having a scenario like
Scenario: Call a Get API and validate the response
Given path 'myteam'
When method get
Then status 201
And print response
And match response contains { teamFeature: 'pick any feature'}
And my API response is
{
"id": "6c0377cd-96c9-4651-bcc8-0c9a7d962bc3",
"teamFeature": "pick any feature"
}
Then I am getting the error like
example.feature:19 - javascript evaluation failed: feature'}, :1:9 Missing close quote
feature'}
^ in at line number 1 at column number 9
If my API response does not contain the word 'any' and I change the match statement then it is working fine. Looks like I need to escape the the word 'any' somehow.
May I know how can I escape the word 'any'?
Not sure if this is a bug in Karate.
Tried to call
com.intuit.karate.Match match = new com.intuit.karate.Match("pick any feature");
System.out.println(match.contains("pick any feature"));
And received following error
Exception in thread "main" java.lang.RuntimeException: javascript
evaluation failed: pick any feature, :1:5 Expected ; but found
any pick any feature
^ in at line number 1 at column number 5 at com.intuit.karate.ScriptBindings.eval(ScriptBindings.java:152) at
com.intuit.karate.ScriptBindings.updateBindingsAndEval(ScriptBindings.java:142)
at
com.intuit.karate.ScriptBindings.evalInNashorn(ScriptBindings.java:127)
at com.intuit.karate.Script.evalJsExpression(Script.java:423) at
com.intuit.karate.Script.evalKarateExpression(Script.java:337) at
com.intuit.karate.Script.evalKarateExpression(Script.java:203) at
com.intuit.karate.Match.(Match.java:67) at
com.intuit.karate.Match.(Match.java:53)
Yes this is a bug in Karate, we've opened an issue: https://github.com/intuit/karate/issues/678
The workaround suggested by #BabuSekaran will work:
* def response = { foo: 'a any b' }
* def temp = { foo: 'a any b' }
* match response contains temp

python-ldap: Retrieve only a few entries from LDAP search

I wish to mimic the ldapsearch -z flag behavior of retrieving only a specific amount of entries from LDAP using python-ldap.
However, it keeps failing with the exception SIZELIMIT_EXCEEDED.
There are multiple links where the problem is reported, but the suggested solution doesn't seem to work
Python-ldap search: Size Limit Exceeded
LDAP: ldap.SIZELIMIT_EXCEEDED
I am using search_ext_s() with sizelimit parameter set to 1, which I am sure is not more than the server limit
On Wireshark, I see that 1 entry is returned and the server raises SIZELIMIT_EXCEEDED. This is the same as ldapsearch -z behavior
But the following line raises an exception and I don't know how to retrieve the returned entry
conn.search_ext_s(<base>,ldap.SCOPE_SUBTREE,'(cn=demo_user*)',['dn'],sizelimit=1)
Based upon the discussion in the comments, this is how I achieved it:
import ldap
# These are not mandatory, I just have a habit
# of setting against Microsoft Active Directory
ldap.set_option(ldap.OPT_REFERRALS, 0)
ldap.set_option(ldap.OPT_PROTOCOL_VERSION, 3)
conn = ldap.initialize('ldap://<SERVER-IP>')
conn.simple_bind(<username>, <password>)
# Using async search version
ldap_result_id = conn.search_ext(<base-dn>, ldap.SCOPE_SUBTREE,
<filter>, [desired-attrs],
sizelimit=<your-desired-sizelimit>)
result_set = []
try:
while 1:
result_type, result_data = conn.result(ldap_result_id, 0)
if (result_data == []):
break
else:
# Handle the singular entry anyway you wish.
# I am appending here
if result_type == ldap.RES_SEARCH_ENTRY:
result_set.append(result_data)
except ldap.SIZELIMIT_EXCEEDED:
print 'Hitting sizelimit'
print result_set
Sample Output:
# My server has about 500 entries for 'demo_user' - 1,2,3 etc.
# My filter is '(cn=demo_user*)', attrs = ['cn'] with sizelimit of 5
$ python ldap_sizelimit.py
Hitting sizelimit
[[('CN=demo_user0,OU=DemoUsers,DC=ad,DC=local', {'cn': ['demo_user0']})],
[('CN=demo_user1,OU=DemoUsers,DC=ad,DC=local', {'cn': ['demo_user1']})],
[('CN=demo_user10,OU=DemoUsers,DC=ad,DC=local', {'cn': ['demo_user10']})],
[('CN=demo_user100,OU=DemoUsers,DC=ad,DC=local', {'cn': ['demo_user100']})],
[('CN=demo_user101,OU=DemoUsers,DC=ad,DC=local', {'cn': ['demo_user101']})]]
You may use play around with more srv controls to sort these etc. but I think the basic idea is conveyed ;)
You have to use the async search method LDAPObject.search_ext() and separate collect the results with LDAPObject.result() until the exception ldap.SIZELIMIT_EXCEEDED is raised.
The accepted answer works if you are searching for less users than specified by the server's sizelimit, but will fail if you wish to gather more than that (the default for AD is 1000 users).
Here's a Python3 implementation that I came up with after heavily editing what I found here and in the official documentation. At the time of writing this it works with the pip3 package python-ldap version 3.2.0.
def get_list_of_ldap_users():
hostname = "google.com"
username = "username_here"
password = "password_here"
base = "dc=google,dc=com"
print(f"Connecting to the LDAP server at '{hostname}'...")
connect = ldap.initialize(f"ldap://{hostname}")
connect.set_option(ldap.OPT_REFERRALS, 0)
connect.simple_bind_s(username, password)
connect=ldap_server
search_flt = "(cn=demo_user*)" # get all users with a specific cn
page_size = 1 # how many users to search for in each page, this depends on the server maximum setting (default is 1000)
searchreq_attrlist=["cn", "sn", "name", "userPrincipalName"] # change these to the attributes you care about
req_ctrl = SimplePagedResultsControl(criticality=True, size=page_size, cookie='')
msgid = connect.search_ext_s(base=base, scope=ldap.SCOPE_SUBTREE, filterstr=search_flt, attrlist=searchreq_attrlist, serverctrls=[req_ctrl])
total_results = []
pages = 0
while True: # loop over all of the pages using the same cookie, otherwise the search will fail
pages += 1
rtype, rdata, rmsgid, serverctrls = connect.result3(msgid)
for user in rdata:
total_results.append(user)
pctrls = [c for c in serverctrls if c.controlType == SimplePagedResultsControl.controlType]
if pctrls:
if pctrls[0].cookie: # Copy cookie from response control to request control
req_ctrl.cookie = pctrls[0].cookie
msgid = connect.search_ext_s(base=base, scope=ldap.SCOPE_SUBTREE, filterstr=search_flt, attrlist=searchreq_attrlist, serverctrls=[req_ctrl])
else:
break
else:
break
return total_results