ServerSide Sorting control prevents retrieval of operational attributes - ldap

If I use a ServerSideSort control, the entries are sorted nicely for me, but I can't retrieve any operational attributes, even if I specifically request them, e.g. "entryUUID", or "+". If I remove the SSS control I get the operational attributes as I always did before. All I get are the ordinary attributes.
Is this a known feature of the SSS specification? or a known problem in OpenLDAP 2.4.30?

This condition, and those described in your comments, sounds like a server software defect. An LDAP compliant server should either:
return the entries with attribute as requested, even if unsorted (criticality false)
return unavailableCriticalExtension and no entries (criticality true)
As for sorting on an operational attribute like entryUUID, several servers I tested refused to sort, but did return results (with criticality false).
Perhaps you could export the data to an LDIF file and deal with your entries with single digit entryUUID, and re-import the data.

Related

LDAP search for multiple complete DNs?

Assume I have an array of N DNs (distinguished names), e.g.:
cn=foo,dc=capmon,dc=lan
cn=bar,dc=capmon,dc=lan
cn=Fred Flintstone,ou=CapMon,dc=capmon,dc=lan
cn=Clark Kent,ou=yada,ou=whatnot,dc=capmon,dc=lan
They are not related and I cannot reduce/simplify the search. I have N complete DNs and want N records.
Can I write a single LDAP search that will return exactly N records, one for each DN? The assumption being that performance of both client and server will be better if I do it all in one search. Had it been SQL, it would be:
SELECT *
FROM dc=capmon,dc=lan
WHERE dn IN (
"cn=foo,dc=capmon,dc=lan",
"cn=bar,dc=capmon,dc=lan",
"cn=Fred Flintstone,ou=CapMon,dc=capmon,dc=lan",
"cn=Clark Kent,ou=yada,ou=whatnot,dc=capmon,dc=lan"
)
rather than doing individual LDAP searches in a for loop (which I do know how to do).
I tried against an MS Active Directory. There, all fields (seem to) have a distinguishedName attribute, and a search filter like this works (I added some newlines for readability):
(|
(distinguishedName=cn=ppolicy,dc=capmon,dc=lan)
(distinguishedName=cn=Users,dc=capmon,dc=lan)
<more ORed terms>
)
But this doesn't work:
(|
(dn=cn=ppolicy,dc=capmon,dc=lan)
(dn=cn=Users,dc=capmon,dc=lan)
<more ORed terms>
)
even though the returned records look like they contain dn attributes. :-(
An OpenLDAP server's records don't have distinguishedName attributes, and neither of the filters above work against it.
Can I do something that will work against most major LDAP servers?
It's not possible to "Read" several entries in a single operation.
You can do a single search operation that will match and return several entries, but you cannot search on the "DN" itself.
I've seen several applications that are trying to get several entries by using complex filters such as "(|(cn=foo)(cn=bar)(cn=Fred Flintstone))", but this may result in more entries, unless all CN values are unique. It's not really a good practice either, as there are limits in the number of elements you can have in the filter, and such requests are usually not optimized in term of I/O.
It will be faster to read each invidual entry, as LDAP servers are optimized for such operations. If you want to reduce the latency, you can issue multiple asynchronous search operations on the same connection.

Getting the exact edited data from SQL Server

I have two Tables:
Articles(artID, artContents, artPublishDate, artCategoryID, publisherID).
ArticleUpdated(upArtID, upArtContents, upArtEditedData, upArtPublishDate, upArtCategory, upArtOriginalArticleID, upPublisherID)
A user logging in to the application and update an article's
contents at (artContents) column. I want to know about:
Which Changes the user made to the article's contents?
I want to store both versions of the Article, Original version and Edited Version!
What should I do for doing above two task:
Any necessary changes into the tables?
The query for getting exact edited data of (artContents).
(The exact edited data means, that there may 5000 characters in the coloumns, the user may edit 200 characters in the middle or somewhere else in column's characters, I want exact those edited characters, before of edit and after of edit)
Note: I am using ASP.NET with C# for Developing
You are not going to be able to do the exact editing using SQL. You need an algorithm such as the Unix diff on files (which works on the line level). At the character level, the algorithm would be some variation of Levenshtein distance. If diff meets your needs, you could download it, write a stored-procedure to call it, and then use it in the database. This would be rather expensive.
The part of your question of maintaining the different versions is much easier. I would add two colmnns EffDate and EndDate onto each record. You can get the most recent version by looking for EndDate is NULL and find the version active at any given time. Merge is generally useful for maintaining such a table.
Basically this type for requirement needs custom logging.
The example what you have provided i.e. "The exact edited data means, that there may 5000 characters in the coloumns, the user may edit 200 characters in the middle or somewhere else in column's characters, I want exact those edited characters, before of edit and after of edit"
Can have a case that user updates particular words from different place from the text.
You can use http://nlog-project.org/ for logging, its a fast and robust tool that normally we use for doing .net logging.
Also you can take a look
http://www.codeproject.com/Articles/38756/Two-Simple-Approaches-to-WinForms-Dirty-Tracking
Asp.net Event for change tracking of entities
What would be the best way to implement change tracking on an object
Above urls will clear some air, on how to do it.
You would obviously need to track down and store every change.

LDAP filter boolean expression maximum number of arguments

I was writing a small test case to see what's more efficient, multiple small queries or a single big query, when I encountered this limitation.
The query looks like this:
(| (clientid=1) (clientid=2) (clientid=3) ...)
When the number of clients goes beyond 2103 ?! the LDAP server throws an error:
error code 1 - Operations Error
As far as I can tell the actual filter string length does not matter ~69KB (at least for Microsoft AD the length limit is 10MB). I tried with longer attribute names and got the same strange limit: 2103 operands
Does anyone have more information about this limitation?
Is this something specified in the LDAP protocol specification or is it implementation specific?
Is it configurable?
I tested this against IBM Tivoli Directory Server V6.2 using both the UnboundID and JNDI Java libraries.
It cannot be more than 8099 characters. See http://www-01.ibm.com/support/docview.wss?uid=swg21295980
Also, what you are doing is not a good practice. If there are common attributes these entries share (e.g., country code, department number, location, etc.), try to retrieve the results using common criteria given you by those attributes. If not, divide your search filter into smaller ones each of which is with few predicates and execute multiple searches. It depends the programming language you're using to do this, but try to execute each search in a separate thread to speed up your data retrieval process.

django objects...values() select only some fields

I'm optimizing the memory load (~2GB, offline accounting and analysis routine) of this line:
l2 = Photograph.objects.filter(**(movie.get_selectors())).values()
Is there a way to convince django to skip certain columns when fetching values()?
Specifically, the routine obtains all rows of the table matching certain criteria (db is optimized and performs it very quickly), but it is a bit too much for python to handle - there is a long string referenced in each row, storing the urls for thumbnails.
I only really need three fields from each row, but, if all the fields are included, it suddenly consumes about 5kB/row which sadly pushes the RAM to the limit.
The values(*fields) function allows you to specify which fields you want.
Check out the QuerySet method, only. When you declare that you only want certain fields to be loaded immediately, the QuerySet manager will not pull in the other fields in your object, till you try to access them.
If you have to deal with ForeignKeys, that must also be pre-fetched, then also check out select_related
The two links above to the Django documentation have good examples, that should clarify their use.
Take a look at Django Debug Toolbar it comes with a debugsqlshell management command that allows you to see the SQL queries being generated, along with the time taken, as you play around with your models on a django/python shell.

Database vs. Front-End for Output Formatting

I've read that (all things equal) PHP is typically faster than MySQL at arithmetic and string manipulation operations. This being the case, where does one draw the line between what one asks the database to do versus what is done by the web server(s)? We use stored procedures exclusively as our data-access layer. My unwritten rule has always been to leave output formatting (including string manipulation and arithmetic) to the web server. So our queries return:
unformatted dates
null values
no calculated values (i.e. return values for columns "foo" and "bar" and let the web server calculate foo*bar if it needs to display value foobar)
no substring-reduced fields (except when shortened field is so significantly shorter that we want to do it at database level to reduce result set size)
two separate columns to let front-end case the output as required
What I'm interested in is feedback about whether this is generally an appropriate approach or whether others know of compelling performance/maintainability considerations that justify pushing these activities to the database.
Note: I'm intentionally tagging this question to be dbms-agnostic, as I believe this is an architectural consideration that comes into play regardless of one's specific dbms.
I would draw the line on how certain layers could rotate out in place for other implementations. It's very likely that you will never use a different RDBMS or have a mobile version of your site, but you never know.
The more orthogonal a data point is, the closer it should be to being released from the database in that form. If on every theoretical version of your site your values A and B are rendered A * B, that should be returned by your database as A * B and never calculated client side.
Let's say you have something that's format heavy like a date. Sometimes you have short dates, long dates, English dates... One pure form should be returned from the database and then that should be formatted in PHP.
So the orthogonality point works in reverse as well. The more dynamic a data point is in its representation/display, the more it should be handled client side. If a string A is always taken as a substring of the first six characters, then have that be returned from the database as pre-substring'ed. If the length of the substring depends on some factor, like six for mobile and ten for your web app, then return the larger string from the database and format it at run time using PHP.
Usually, data formatting is better done on client side, especially culture-specific formatting.
Dynamic pivoting (i. e. variable columns) is also an example of what is better done on client side
When it comes to string manipulation and dynamic arrays, PHP is far more powerful than any RDBMS I'm aware of.
However, data formatting can use additional data which is also kept in the database. Like, the coloring info for each row can be stored in additional table.
You should then correspond the color to each row on database side, but wrap it into the tags on PHP side.
The rule of thumb is: retrieve everything you need for formatting in as few database round-trips as possible, then do the formatting itself on the client side.
I believe in returning the data pretty much as-is from the database and letting it be formatted on the front-end instead. I don't stick to it religously, but in general I think it's better as it provides greater flexibility - e.g. 1 sproc can service n different requirements for data, each of which can format the data as each individually needs. Otherwise, you end up either with multiple queries returning the same data with slightly different formatting from the DB (from a SQL Server point of view, thus reducing execution plan caching benefits - therefore negative impact on performance).
Leave output formatting to the web server