LDAP filter boolean expression maximum number of arguments - ldap

I was writing a small test case to see what's more efficient, multiple small queries or a single big query, when I encountered this limitation.
The query looks like this:
(| (clientid=1) (clientid=2) (clientid=3) ...)
When the number of clients goes beyond 2103 ?! the LDAP server throws an error:
error code 1 - Operations Error
As far as I can tell the actual filter string length does not matter ~69KB (at least for Microsoft AD the length limit is 10MB). I tried with longer attribute names and got the same strange limit: 2103 operands
Does anyone have more information about this limitation?
Is this something specified in the LDAP protocol specification or is it implementation specific?
Is it configurable?
I tested this against IBM Tivoli Directory Server V6.2 using both the UnboundID and JNDI Java libraries.

It cannot be more than 8099 characters. See http://www-01.ibm.com/support/docview.wss?uid=swg21295980
Also, what you are doing is not a good practice. If there are common attributes these entries share (e.g., country code, department number, location, etc.), try to retrieve the results using common criteria given you by those attributes. If not, divide your search filter into smaller ones each of which is with few predicates and execute multiple searches. It depends the programming language you're using to do this, but try to execute each search in a separate thread to speed up your data retrieval process.

Related

LDAP search for multiple complete DNs?

Assume I have an array of N DNs (distinguished names), e.g.:
cn=foo,dc=capmon,dc=lan
cn=bar,dc=capmon,dc=lan
cn=Fred Flintstone,ou=CapMon,dc=capmon,dc=lan
cn=Clark Kent,ou=yada,ou=whatnot,dc=capmon,dc=lan
They are not related and I cannot reduce/simplify the search. I have N complete DNs and want N records.
Can I write a single LDAP search that will return exactly N records, one for each DN? The assumption being that performance of both client and server will be better if I do it all in one search. Had it been SQL, it would be:
SELECT *
FROM dc=capmon,dc=lan
WHERE dn IN (
"cn=foo,dc=capmon,dc=lan",
"cn=bar,dc=capmon,dc=lan",
"cn=Fred Flintstone,ou=CapMon,dc=capmon,dc=lan",
"cn=Clark Kent,ou=yada,ou=whatnot,dc=capmon,dc=lan"
)
rather than doing individual LDAP searches in a for loop (which I do know how to do).
I tried against an MS Active Directory. There, all fields (seem to) have a distinguishedName attribute, and a search filter like this works (I added some newlines for readability):
(|
(distinguishedName=cn=ppolicy,dc=capmon,dc=lan)
(distinguishedName=cn=Users,dc=capmon,dc=lan)
<more ORed terms>
)
But this doesn't work:
(|
(dn=cn=ppolicy,dc=capmon,dc=lan)
(dn=cn=Users,dc=capmon,dc=lan)
<more ORed terms>
)
even though the returned records look like they contain dn attributes. :-(
An OpenLDAP server's records don't have distinguishedName attributes, and neither of the filters above work against it.
Can I do something that will work against most major LDAP servers?
It's not possible to "Read" several entries in a single operation.
You can do a single search operation that will match and return several entries, but you cannot search on the "DN" itself.
I've seen several applications that are trying to get several entries by using complex filters such as "(|(cn=foo)(cn=bar)(cn=Fred Flintstone))", but this may result in more entries, unless all CN values are unique. It's not really a good practice either, as there are limits in the number of elements you can have in the filter, and such requests are usually not optimized in term of I/O.
It will be faster to read each invidual entry, as LDAP servers are optimized for such operations. If you want to reduce the latency, you can issue multiple asynchronous search operations on the same connection.

How to correctly search (Real Estate Transaction Standard aka RETS) server?

I am trying to interact with a RETS (Real Estate Transaction Standard) server to find all listings where matrix_unique_id field is greater than or equal to 0.
After logging in, I tried the following URI
Search.ashx?SearchType=Property&Class=Listing&Limit=1000&Query=(matrix_unique_id=0+)&StandardNames=0
The above call returns
<RETS ReplyCode="20201" ReplyText="No Records Found."/>
But then I supplied a valid Matrix_Unique_Id value like this
Search.ashx?SearchType=Property&Class=Listing&Limit=1000&Query=(matrix_unique_id=59075770+)&StandardNames=0
Now that returns something but not what I am expecting. The returned value is as follow
Here is the documentation for RETS 1.7.2 and a PDF
Additionally, here is an example of how to search RETS server for a different server but both adhere to the same specification.
https://www.flexmls.com/developers/rets/tutorials/example-rets-session/
Additionally, I used RETS Connector to query the listing and I am able to download listings with no issues which indicated that my account is working and has permission to search.
Question: How can I correctly search up all properties where the field Matrix_Unique_Id is 0+?
For getting full result try the following logic,
(ModificationTimestamp=2000-01-01T00:00:00+)
This will return all the listings from the year 2000 onwards. If you need further old, give 1990 or older in the query.
Note: Your example query (matrix_unique_id=0+) is not working because
of its pattern may not be correct, say 8 digit number only will take
as input.

endeca returning zero results in refinements when none of refinements available in ref app?

I am using Endeca 3.1.2 Assembler API. When I am hitting the Endeca query, its giving me some bunch of refinements which contains zero counts and some positive counts .
Example:
category
**category1(0)**
category2(25)
**category3(0)**
Like this result I am getting. When I am hitting the same query in jspref application I am not getting any refinements which contains zero count.
My expectation is that I don't want to get that zero count refinements on the available refinements.
Please help me to get out from this.
You might have disabled refinements enabled in your query.
Check whether you have the Ndr parameter in Dgraph request log file
If so, ensure your code doesn't have: ENEQuery.setNavDisabledRefinementsConfig() method.
Endeca has one of the features called implicit dimensions. There might be the case that implicit dimension is being displayed to the front-end. Endeca provides implicit dimension as part of the query response.
Following code is being used to get implicit dimension.
Navigation.getCompleteDimensions().getDimension(dimensionid)

DATA_BUFFER_EXCEEDED error when calling RFC_READ_TABLE?

My java/groovy program receives table names and table fields from the user input, it queries the tables in SAP and returns its contents.
The user input may concern the tables CDPOS and CDHDR. After reading the SAP documentations and googling, I found these are tables storing change document logs. But I did not find any remote call functions that can be used in java to perform this kind of queries.
Then I used the deprecated RFC Function Module RFC_READ_TABLE and tried to build up customized queries only depending on this RFC. However, I found if the number of desired fields I passed to this RFC are more than 2, I always got the DATA_BUFFER_EXCEEDED error even if I limit the max rows.
I am not authorized to be an ABAP developer in the SAP system and can not add any FM to existing systems, so I can only write code to accomplish this requirement in JAVA.
Am I doing something wrong? Could you give me some hints on that issue?
DATA_BUFFER_EXCEEDED only happens if the total width of the fields you want to read exceeds the width of the DATA parameter, which may vary depending on the SAP release - 512 characters for current systems. It has nothing to do with the number of rows, but the size of a single dataset.
So the question is: What are the contents of the FIELDS parameter? If it's empty, this means "read all fields." CDHDR is 192 characters in width, so I'd assume that the problem is CDPOS which is 774 characters wide. The main issue would be the fields VALUE_OLD and VALUE_NEW, both 245 Characters.
Even if you don't get developer access, you should prod someone to get read-only dictionary access to be able to examine the structures in detail.
Shameless plug: RCER contains a wrapper class for RFC_READ_TABLE that takes care of field handling and ensures that the total width of the selected fields is below the limit imposed by the function module.
Also be aware that these tables can be HUGE in production environments - think billions of entries. You can easily bring your database to a grinding halt by performing excessive read operations on these tables.
PS: RFC_READ_TABLE is not released for customer use as per SAP note 382318, and the note 758278 recommends to create your own function module and provides a template with an improved logic.
Use BBP_RFC_READ_TABLE instead
There is a way around the DATA_BUFFER_EXCEED error. Although this function is not released for customer use as per SAP OSS note 382318, you can get around this issue with changes to the way you pass parameters to this function. Its not a single field that is causing your error, but if the row of data exceeds 512 bytes this error will be raised. CDPOS will have this issue for sure!
The work around if you know how to call the function using Jco and pass table parameters is to specify the exact fields you want returned. You then can keep your returned results under the 512 byte limit.
Using your example of table CDPOS, specify something like this and you should be good to go...(be careful, CDPOS can get massive! You should specify and pass a where clause!)
FIELDS = 'OBJECTCLAS'....
FIELDS = 'OBJECTID'
In Java it can be expressed as..
listParams.setValue(this.getpObjectclas(), "OBJECTCLAS");
By limiting the fields you are returning you can avoid this error.

Database vs. Front-End for Output Formatting

I've read that (all things equal) PHP is typically faster than MySQL at arithmetic and string manipulation operations. This being the case, where does one draw the line between what one asks the database to do versus what is done by the web server(s)? We use stored procedures exclusively as our data-access layer. My unwritten rule has always been to leave output formatting (including string manipulation and arithmetic) to the web server. So our queries return:
unformatted dates
null values
no calculated values (i.e. return values for columns "foo" and "bar" and let the web server calculate foo*bar if it needs to display value foobar)
no substring-reduced fields (except when shortened field is so significantly shorter that we want to do it at database level to reduce result set size)
two separate columns to let front-end case the output as required
What I'm interested in is feedback about whether this is generally an appropriate approach or whether others know of compelling performance/maintainability considerations that justify pushing these activities to the database.
Note: I'm intentionally tagging this question to be dbms-agnostic, as I believe this is an architectural consideration that comes into play regardless of one's specific dbms.
I would draw the line on how certain layers could rotate out in place for other implementations. It's very likely that you will never use a different RDBMS or have a mobile version of your site, but you never know.
The more orthogonal a data point is, the closer it should be to being released from the database in that form. If on every theoretical version of your site your values A and B are rendered A * B, that should be returned by your database as A * B and never calculated client side.
Let's say you have something that's format heavy like a date. Sometimes you have short dates, long dates, English dates... One pure form should be returned from the database and then that should be formatted in PHP.
So the orthogonality point works in reverse as well. The more dynamic a data point is in its representation/display, the more it should be handled client side. If a string A is always taken as a substring of the first six characters, then have that be returned from the database as pre-substring'ed. If the length of the substring depends on some factor, like six for mobile and ten for your web app, then return the larger string from the database and format it at run time using PHP.
Usually, data formatting is better done on client side, especially culture-specific formatting.
Dynamic pivoting (i. e. variable columns) is also an example of what is better done on client side
When it comes to string manipulation and dynamic arrays, PHP is far more powerful than any RDBMS I'm aware of.
However, data formatting can use additional data which is also kept in the database. Like, the coloring info for each row can be stored in additional table.
You should then correspond the color to each row on database side, but wrap it into the tags on PHP side.
The rule of thumb is: retrieve everything you need for formatting in as few database round-trips as possible, then do the formatting itself on the client side.
I believe in returning the data pretty much as-is from the database and letting it be formatted on the front-end instead. I don't stick to it religously, but in general I think it's better as it provides greater flexibility - e.g. 1 sproc can service n different requirements for data, each of which can format the data as each individually needs. Otherwise, you end up either with multiple queries returning the same data with slightly different formatting from the DB (from a SQL Server point of view, thus reducing execution plan caching benefits - therefore negative impact on performance).
Leave output formatting to the web server