python-ldap: Retrieve only a few entries from LDAP search - ldap

I wish to mimic the ldapsearch -z flag behavior of retrieving only a specific amount of entries from LDAP using python-ldap.
However, it keeps failing with the exception SIZELIMIT_EXCEEDED.
There are multiple links where the problem is reported, but the suggested solution doesn't seem to work
Python-ldap search: Size Limit Exceeded
LDAP: ldap.SIZELIMIT_EXCEEDED
I am using search_ext_s() with sizelimit parameter set to 1, which I am sure is not more than the server limit
On Wireshark, I see that 1 entry is returned and the server raises SIZELIMIT_EXCEEDED. This is the same as ldapsearch -z behavior
But the following line raises an exception and I don't know how to retrieve the returned entry
conn.search_ext_s(<base>,ldap.SCOPE_SUBTREE,'(cn=demo_user*)',['dn'],sizelimit=1)

Based upon the discussion in the comments, this is how I achieved it:
import ldap
# These are not mandatory, I just have a habit
# of setting against Microsoft Active Directory
ldap.set_option(ldap.OPT_REFERRALS, 0)
ldap.set_option(ldap.OPT_PROTOCOL_VERSION, 3)
conn = ldap.initialize('ldap://<SERVER-IP>')
conn.simple_bind(<username>, <password>)
# Using async search version
ldap_result_id = conn.search_ext(<base-dn>, ldap.SCOPE_SUBTREE,
<filter>, [desired-attrs],
sizelimit=<your-desired-sizelimit>)
result_set = []
try:
while 1:
result_type, result_data = conn.result(ldap_result_id, 0)
if (result_data == []):
break
else:
# Handle the singular entry anyway you wish.
# I am appending here
if result_type == ldap.RES_SEARCH_ENTRY:
result_set.append(result_data)
except ldap.SIZELIMIT_EXCEEDED:
print 'Hitting sizelimit'
print result_set
Sample Output:
# My server has about 500 entries for 'demo_user' - 1,2,3 etc.
# My filter is '(cn=demo_user*)', attrs = ['cn'] with sizelimit of 5
$ python ldap_sizelimit.py
Hitting sizelimit
[[('CN=demo_user0,OU=DemoUsers,DC=ad,DC=local', {'cn': ['demo_user0']})],
[('CN=demo_user1,OU=DemoUsers,DC=ad,DC=local', {'cn': ['demo_user1']})],
[('CN=demo_user10,OU=DemoUsers,DC=ad,DC=local', {'cn': ['demo_user10']})],
[('CN=demo_user100,OU=DemoUsers,DC=ad,DC=local', {'cn': ['demo_user100']})],
[('CN=demo_user101,OU=DemoUsers,DC=ad,DC=local', {'cn': ['demo_user101']})]]
You may use play around with more srv controls to sort these etc. but I think the basic idea is conveyed ;)

You have to use the async search method LDAPObject.search_ext() and separate collect the results with LDAPObject.result() until the exception ldap.SIZELIMIT_EXCEEDED is raised.

The accepted answer works if you are searching for less users than specified by the server's sizelimit, but will fail if you wish to gather more than that (the default for AD is 1000 users).
Here's a Python3 implementation that I came up with after heavily editing what I found here and in the official documentation. At the time of writing this it works with the pip3 package python-ldap version 3.2.0.
def get_list_of_ldap_users():
hostname = "google.com"
username = "username_here"
password = "password_here"
base = "dc=google,dc=com"
print(f"Connecting to the LDAP server at '{hostname}'...")
connect = ldap.initialize(f"ldap://{hostname}")
connect.set_option(ldap.OPT_REFERRALS, 0)
connect.simple_bind_s(username, password)
connect=ldap_server
search_flt = "(cn=demo_user*)" # get all users with a specific cn
page_size = 1 # how many users to search for in each page, this depends on the server maximum setting (default is 1000)
searchreq_attrlist=["cn", "sn", "name", "userPrincipalName"] # change these to the attributes you care about
req_ctrl = SimplePagedResultsControl(criticality=True, size=page_size, cookie='')
msgid = connect.search_ext_s(base=base, scope=ldap.SCOPE_SUBTREE, filterstr=search_flt, attrlist=searchreq_attrlist, serverctrls=[req_ctrl])
total_results = []
pages = 0
while True: # loop over all of the pages using the same cookie, otherwise the search will fail
pages += 1
rtype, rdata, rmsgid, serverctrls = connect.result3(msgid)
for user in rdata:
total_results.append(user)
pctrls = [c for c in serverctrls if c.controlType == SimplePagedResultsControl.controlType]
if pctrls:
if pctrls[0].cookie: # Copy cookie from response control to request control
req_ctrl.cookie = pctrls[0].cookie
msgid = connect.search_ext_s(base=base, scope=ldap.SCOPE_SUBTREE, filterstr=search_flt, attrlist=searchreq_attrlist, serverctrls=[req_ctrl])
else:
break
else:
break
return total_results

Related

How send answer in group poll?

How can I vote poll in a group? I need to select 1 of the options, but I have not found a function that is responsible for this
I can create poll, but not vote as client like Lonami show in another question
await client.send_message('#username',file=types.InputMediaPoll(
poll=types.Poll(
id=..., # type: long (random id)
question=..., # type: string (the question)
answers=... # type: list of PollAnswer (up to 10 answers)
)
))
Use message.click(index) as stated in the docs.
message = await client.get_messages(chat, ids=xx)
# ^ get the message containing poll or from events
await message.click(0) # index starts from 0 == first
For multiple choice polls, pass a list of indexes.

Python BigQuery Storage Write retry strategy when writing to default stream

I'm testing python-bigquery-storage to insert multiple items into a table using the _default stream.
I used the example shown in the official docs as a basis, and modified it to use the default stream.
Here is a minimal example that's similar to what I'm trying to do:
customer_record.proto
syntax = "proto2";
message CustomerRecord {
optional string customer_name = 1;
optional int64 row_num = 2;
}
append_rows_default.py
from itertools import islice
from google.cloud import bigquery_storage_v1
from google.cloud.bigquery_storage_v1 import types
from google.cloud.bigquery_storage_v1 import writer
from google.protobuf import descriptor_pb2
import customer_record_pb2
import logging
logging.basicConfig(level=logging.DEBUG)
CHUNK_SIZE = 2 # Maximum number of rows to use in each AppendRowsRequest.
def chunks(l, n):
"""Yield successive `n`-sized chunks from `l`."""
_it = iter(l)
while True:
chunk = [*islice(_it, 0, n)]
if chunk:
yield chunk
else:
break
def create_stream_manager(project_id, dataset_id, table_id, write_client):
# Use the default stream
# The stream name is:
# projects/{project}/datasets/{dataset}/tables/{table}/_default
parent = write_client.table_path(project_id, dataset_id, table_id)
stream_name = f'{parent}/_default'
# Create a template with fields needed for the first request.
request_template = types.AppendRowsRequest()
# The initial request must contain the stream name.
request_template.write_stream = stream_name
# So that BigQuery knows how to parse the serialized_rows, generate a
# protocol buffer representation of our message descriptor.
proto_schema = types.ProtoSchema()
proto_descriptor = descriptor_pb2.DescriptorProto()
customer_record_pb2.CustomerRecord.DESCRIPTOR.CopyToProto(proto_descriptor)
proto_schema.proto_descriptor = proto_descriptor
proto_data = types.AppendRowsRequest.ProtoData()
proto_data.writer_schema = proto_schema
request_template.proto_rows = proto_data
# Create an AppendRowsStream using the request template created above.
append_rows_stream = writer.AppendRowsStream(write_client, request_template)
return append_rows_stream
def send_rows_to_bq(project_id, dataset_id, table_id, write_client, rows):
append_rows_stream = create_stream_manager(project_id, dataset_id, table_id, write_client)
response_futures = []
row_count = 0
# Send the rows in chunks, to limit memory usage.
for chunk in chunks(rows, CHUNK_SIZE):
proto_rows = types.ProtoRows()
for row in chunk:
row_count += 1
proto_rows.serialized_rows.append(row.SerializeToString())
# Create an append row request containing the rows
request = types.AppendRowsRequest()
proto_data = types.AppendRowsRequest.ProtoData()
proto_data.rows = proto_rows
request.proto_rows = proto_data
future = append_rows_stream.send(request)
response_futures.append(future)
# Wait for all the append row requests to finish.
for f in response_futures:
f.result()
# Shutdown background threads and close the streaming connection.
append_rows_stream.close()
return row_count
def create_row(row_num: int, name: str):
row = customer_record_pb2.CustomerRecord()
row.row_num = row_num
row.customer_name = name
return row
def main():
write_client = bigquery_storage_v1.BigQueryWriteClient()
rows = [ create_row(i, f"Test{i}") for i in range(0,20) ]
send_rows_to_bq("PROJECT_NAME", "DATASET_NAME", "TABLE_NAME", write_client, rows)
if __name__ == '__main__':
main()
Note:
In the above, CHUNK_SIZE is 2 just for this minimal example, but, in a real situation, I used a chunk size of 5000.
In real usage, I have several separate streams of data that need to be processed in parallel, so I make several calls to send_rows_to_bq, one for each stream of data, using a thread pool (one thread per stream of data). (I'm assuming here that AppendRowsStream is not meant to be shared by multiple threads, but I might be wrong).
It mostly works, but I often get a mix of intermittent errors in the call to append_rows_stream's send method:
google.cloud.bigquery_storage_v1.exceptions.StreamClosedError: This manager has been closed and can not be used.
google.api_core.exceptions.Unknown: None There was a problem opening the stream. Try turning on DEBUG level logs to see the error.
I think I just need to retry on these errors, but I'm not sure how to best implement a retry strategy here. My impression is that I need to use the following strategy to retry errors when calling send:
If the error is a StreamClosedError, the append_rows_stream stream manager can't be used anymore, and so I need to call close on it and then call my create_stream_manager again to create a new one, then try to call send on the new stream manager.
Otherwise, on any google.api_core.exceptions.ServerError error, retry the call to send on the same stream manager.
Am I approaching this correctly?
Thank you.
The best solution to this problem is to update to the newer lib release.
This problem happens or was happening in the older versions because once the connection write API reaches 10MB, it hangs.
If the update to the newer lib does not work you can try these options:
Limit the connection to < 10MB.
Disconnect and connect again to the API.

Download a page doesn't return a status code

I have found a page I need to download that doesn't include an http status code in the returned headers. I get the error: ParseError: ('non-integer status code', b'Tag: "14cc1-5a76434e32f9e"') which is obviously accurate. But otherwise the returned data is complete.
I'm just trying to save the page content manually in a call back: afilehandle.write(response.body) sort of thing. It's a pdf. Is there a way I can bypass this and still get the contents of the page?
The returned example that also crashed fiddler. The first thing in the header is Tag.
Tag: "14cc1-5a76434e32f9
e"..Accept-Ranges: bytes
..Content-Length: 85185.
.Keep-Alive: timeout=15,
max=100..Connection: Ke
ep-Alive..Content-Type:
application/pdf....%PDF-
1.4.%ÓôÌá.1 0 obj.<<./Cr
eationDate(D:20200606000
828-06'00')./Creator(PDF
sharp 1.50.4740 \(www.pd
fsharp.com\))./Producer(
PDFsharp 1.50.4740 \(www
.pdfsharp.com\)).>>.endo
bj.2 0 obj.<<./Type/Cata
log./Pages 3 0 R.>>.endo
bj.3 0 obj.<<./Type/Page
s./Count 2./Kids[4 0 R 8
0 R].>>.endobj.4 0 obj.
<<./Type/Page./MediaBox[
0 0 612 792]./Parent 3 0
R./Contents 5 0 R./Reso
urces.<<./ProcSet [/PDF/
Text/Ima.... etc
Note: For any not familiar with PDF file structure %PDF-1.4 and everything after is the correct format for a PDF document. Chrome downloads the PDF just fine even with the bad headers.
In the end, I modified the file twisted/web/_newclient.py directly to not throw the error, and use a weird status code that I could identify:
def statusReceived(self, status):
parts = status.split(b' ', 2)
if len(parts) == 2:
version, codeBytes = parts
phrase = b""
elif len(parts) == 3:
version, codeBytes, phrase = parts
else:
raise ParseError(u"wrong number of parts", status)
try:
statusCode = int(codeBytes)
except ValueError:
# Changes were made here
version = b'HTTP/1.1' #just assume it is what it should be
statusCode = 5200 # deal with invalid status codes later
phrase = b'non-integer status code' # sure, pass on the error message
# and commented out the line below.
# raise ParseError(u"non-integer status code", status)
self.response = Response._construct(
self.parseVersion(version),
statusCode,
phrase,
self.headers,
self.transport,
self.request,
)
And I set the spider to accept that status code.
class MySpider(Spider):
handle_httpstatus_list = [5200]
However, in the end I discovered the target site behaved correctly when accessed via https, so I ended up rolling back all the above changes.
Note the above hack would work, until you updated the library, at which point you would need to reapply the hack. But it could possibly get it done if you are desparate.

Payara asadmin command to monitor a specific resource

Does anyone know the asadmin command line equivalent to display the Resource data as shown in the image below (ie the Resource __TimerPool)?
I'm using Payara 4.1.1.171.1.
I typed asadmin monitor --help and it provided this as
monitor [--help]
--type type
[--filename filename]
[--interval interval]
[--filter filter]
instance-name
The type field only accepts "httplistener", "jvm" and "webmodule" as inputs.
So I can't use a "resource" or "jdbcpool" as a type.
Oddly enough in the old glassfish 2.1 https://docs.oracle.com/cd/E19879-01/821-0185/gelol/index.html you can select "jdbcpool" as the type
Any help is appreciated.
I couldn't really find the answer on the payara documentation https://docs.payara.fish/documentation/payara-server/monitoring-service/monitoring-service.html
But using part of the glassfish documentation https://docs.oracle.com/cd/E18930_01/html/821-2416/ghmct.html#gipzv I was able to get what I needed.
The command is asadmin get --monitor server.resources.__TimerPool.*
This then returns (this is a partial output):
server.resources.__TimerPool.numconnused-highwatermark = 2
server.resources.__TimerPool.numconnused-lastsampletime =
1559826720029 server.resources.__TimerPool.numconnused-lowwatermark =
0 server.resources.__TimerPool.numconnused-name = NumConnUsed
server.resources.__TimerPool.numconnused-starttime = 1559823838730
server.resources.__TimerPool.numconnused-unit = count
server.resources.__TimerPool.numpotentialconnleak-count = 0
server.resources.__TimerPool.numpotentialconnleak-description = Number
of potential connection leaks
server.resources.__TimerPool.numpotentialconnleak-lastsampletime = -1
server.resources.__TimerPool.numpotentialconnleak-name =
NumPotentialConnLeak
server.resources.__TimerPool.numpotentialconnleak-starttime =
1559823838735 server.resources.__TimerPool.numpotentialconnleak-unit =
count server.resources.__TimerPool.waitqueuelength-count = 0
server.resources.__TimerPool.waitqueuelength-description = Number of
connection requests in the queue waiting to be serviced.
server.resources.__TimerPool.waitqueuelength-lastsampletime = -1
server.resources.__TimerPool.waitqueuelength-name = WaitQueueLength
server.resources.__TimerPool.waitqueuelength-starttime = 1559823838735
server.resources.__TimerPool.waitqueuelength-unit = count
Command get executed successfully.
It's important to add the .* at the end of the asadmin command in asadmin get --monitor server.resources.__TimerPool.*
If you neglect that and just enter asadmin get --monitor server.resources.__TimerPool it'll return
No monitoring data to report.
Command get executed successfully.
To see thelist of resources you have available to you to monitor type /asadmin list --monitor server.resources.*

Lego-EV3: How to fix EOFError when catching user-input via multiprocessing?

Currently, I am working with a EV3 lego robot that is controlled by several neurons. Now I want to modify the code (running on
python3) in such a way that one can change certain parameter values on the run via the shell (Ubuntu) in order to manipulate the robot's dynamics at any time (and for multiple times). Here is a schema of what I have achieved so far based on a short example code:
from multiprocessing import Process
from multiprocessing import SimpleQueue
import ev3dev.ev3 as ev3
class Neuron:
(definitions of class variables and update functions)
def check_input(queue):
while (True):
try:
new_para = str(input("Type 'parameter=value': "))
float(new_para[2:0]) # checking for float in input
var = new_para[0:2]
if (var == "k="): # change parameter k
queue.put(new_para)
elif (var == "g="): # change parameter g
queue.put(new_para)
else:
print("Error". Type 'k=...' or 'g=...')
queue.put(0) # put anything in queue
except (ValueError, EOFError):
print("New value is not a number. Try again!")
(some neuron-specific initializations)
queue = SimpleQueue()
check = Process(target=check_input, args=(queue,))
check.start()
while (True):
if (not queue.empty()):
cmd = queue.get()
var = cmd[0]
val = float(cmd[2:])
if (var == "k"):
Neuron.K = val
elif (var == "g"):
Neuron.g = val
(updating procedure for neurons, writing data to file)
Since I am new to multiprocessing there are certainly some mistakes concerning taking care of locking, efficiency and so on but the robot moves and input fields occur in the shell. However, the current problem is that it's actually impossible to make an input:
> python3 controller_multiprocess.py
> Type 'parameter=value': New value is not a number. Try again!
> Type 'parameter=value': New value is not a number. Try again!
> Type 'parameter=value': New value is not a number. Try again!
> ... (and so on)
I know that this behaviour is caused by putting the exception of EOFError due to the fact that this error occurs when the exception is removed (and the process crashes). Hence, the program just rushes through the try-loop here and assumes that no input (-> empty string) was made over and over again. Why does this happen? - when not called as a threaded procedure the program patiently waits for an input as expected. And how can one fix or bypass this issue so that changing parameters gets possible as wanted?
Thanks in advance!