What is wrong with this LDAP filter packet? - ldap

I am trying to port a program which queries an LDAP server from Perl to Go, and with the Go version I am receiving a response that the filter is malformed:
00000057: LdapErr: DSID-0C0C0968, comment: The server was unable to decode a search request filter, data 0, v1db1\x00
I have used tcpdump to capture the data transmitted to the server with both the Perl and Go versions of my program, and have found that they are sending slightly different filter packets. This question is not about any possible bugs in the Go program, but simply about understanding the contents of the LDAP filter packets.
The encoded filter is:
(objectClass=*)
And the Perl-generated packet (which the server likes) looks like this:
ASCII . . o b j e c t C l a s s
Hex 87 0b 6f 62 6a 65 63 74 43 6c 61 73 73
Byte# 0 1 2 3 4 5 6 7 8 9 10 11 12
The Go-generated packet (which the server doesn't like) looks like this:
ASCII . . . . o b j e c t C l a s s
Hex a7 0d 04 0b 6f 62 6a 65 63 74 43 6c 61 73 73
Byte# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
This is my own breakdown of the packets:
##Byte 0: Tag
When I dissect Byte 0 from both packets, I see they are identical, except for the Primitive/Constructed bit, which is set to Primitive in the Perl version, and Constructed in the Go version. See DER encoding for details.
Bit# 87 6 54321
Perl 10 0 00111
Go 10 1 00111
Bits 87: In both packets, 10 = Context Specific
Bit 6: In the Perl version 0 = Primitive, in the Go version 1 = Constructed
Bits 54321: 00111 = 7 = Object descriptor
##Byte 1: Length
11 bytes for the Perl version, 13 for the Go version
##Bytes 2-3 for the Go version
Byte 2: Tag 04: Substring Filter (See section 4.5.1 of RFC 4511)
Byte 3: Length of 11 bytes
##Remainder: Payload
For both packets this is simply the ASCII text objectClass
My reading of RFC 4511 section 4.5.1 suggests that the Go version is "more" correct, yet the Perl version is the one that works with the server. What gives?
Wireshark is able to parse both packets, and interprets them both equally.

The Perl version is correct, and the Go version is incorrect.
As you point out, RFC 4511 section 4.5.1 specifies encoding for the filter elements, like:
Filter ::= CHOICE {
and [0] SET SIZE (1..MAX) OF filter Filter,
or [1] SET SIZE (1..MAX) OF filter Filter,
not [2] Filter,
equalityMatch [3] AttributeValueAssertion,
substrings [4] SubstringFilter,
greaterOrEqual [5] AttributeValueAssertion,
lessOrEqual [6] AttributeValueAssertion,
present [7] AttributeDescription,
approxMatch [8] AttributeValueAssertion,
extensibleMatch [9] MatchingRuleAssertion,
... }
And in this case, the relevant portion is:
present [7] AttributeDescription,
The AttributeDescription element is defined in section 4.1.4 of the same specification:
AttributeDescription ::= LDAPString
-- Constrained to <attributedescription>
-- [RFC4512]
And from section 4.1.2:
LDAPString ::= OCTET STRING -- UTF-8 encoded,
-- [ISO10646] characters
So this means that the present filter component is an octet string, which is a primitive element. Go is incorrectly converting it to a constructed element, and the directory server is correctly rejecting that malformed request.

Related

how to zlib inflate a gzip/deflate archive

I have an archive encoded with gzip 1.5. I'm unable to decode it using the C zlib library. zlib inflate() return EC -3 stream.msg = "unknown compression method".
$ gzip --list --verbose vmlinux.z
method crc date time compressed uncompressed ratio uncompressed_name
defla 12169518 Apr 29 13:00 4261643 9199404 53.7% vmlinux
The first 32 bytes of the file are:
00000000 1f 8b 08 08 29 f4 8a 60 00 03 76 6d 6c 69 6e 75 |....)..`..vmlinu|
00000010 78 00 ec 9a 7f 54 1c 55 96 c7 6f 75 37 d0 fc 70 |x....T.U..ou7..p|
I see the first 18 bytes are the RFC-1952 gzip header.
After the NULL, I expect the next byte to be RFC-1951 deflate or RFC-1950 zlib (I'm not sure which)
So, I pass zlib inflate() a z_stream:next_in pointing to to the byte #0x12.
If this were deflate encoded, then I would expect the next byte #0x12 to be 0aabbbbb (BFINAL=0 and BTYPE=some compression)
If this were zlib encoded, I would expect the next byte #0x12 to take the form 0aaa1000 bbbccccc
Instead, I see #0x12 EC = 1110 1100 Which fits neither of those.
For my code, I took the uncompress() code and modified it slightly with allocators appropriate to my environment and several different experiments with the window bits (including 15+16, -MAX_WBITS, and MAX_WBITS).
int ZEXPORT unzip (dest, destLen, source, sourceLen)
Bytef *dest;
uLongf *destLen;
const Bytef *source;
uLong sourceLen;
{
z_stream stream;
int err;
stream.next_in = (Bytef*)source;
stream.avail_in = (uInt)sourceLen;
/* Check for source > 64K on 16-bit machine: */
if ((uLong)stream.avail_in != sourceLen) return Z_BUF_ERROR;
stream.next_out = dest;
stream.avail_out = (uInt)*destLen;
if ((uLong)stream.avail_out != *destLen) return Z_BUF_ERROR;
stream.zalloc = (alloc_func)my_alloc;
stream.zfree = (free_func)my_free;
/*err = inflateInit(&stream);*/
err = inflateInit2(&stream, 15 + 16);
if (err != Z_OK) return err;
err = inflate(&stream, Z_FINISH);
if (err != Z_STREAM_END) {
inflateEnd(&stream);
return err == Z_OK ? Z_BUF_ERROR : err;
}
*destLen = stream.total_out;
err = inflateEnd(&stream);
return err;
}
How can I correct my decoding of this file?
That should work fine, assuming that my_alloc and my_free do what they need to do. You should verify that you are actually giving unzip() the data that you think you are giving it. The data you give it needs to start with the 1f 8b.
(Side comment: "unzip" is a lousy name for the function. It does not unzip, since zip is an entirely different format than either gzip or zlib. "gunzip" or "ungzip" would be appropriate.)
You are manually reading the bits in the deflate stream in the wrong order. The least significant bits are first. The low three bits of ec are 100, indicating a non-last dynamic block. 0 for non-last, then 10 for dynamic.
You can use infgen to disassemble a deflate stream. Its output for the 14 bytes provided is this initial portion of a dynamic block:
dynamic
count 286 27 16
code 0 5
code 2 7
code 3 7
code 4 5
code 5 5
code 6 4
code 7 4
code 8 2
code 9 3
code 10 2
code 11 4
code 12 4
code 16 7
code 17 7
lens 4 6 7 7 7 8 8 8 7 8
repeat 3
lens 10

How many combinations does SHA-256 have?

By using an online tool and wikipedia I found out that every sha-256 encrypted string is 64 chars longs containing numbers and characters. Hence I assumed that there are 34^36 combinations ( 2^216 simplified by an algebra calculator ).
After doing some research I found out that most people said there are 2^256 combinations. Could someone explain ? To make the context clear, I write a paper about cryptocurrencies and try to explain how many different combinations there are to encrypt and how long this could take ( therefore how many guesses it could take) and compare this to the amount of total atoms in the universe (roughly 10^85).
SHA-256 produces 256 bits which is 32 bytes, not characters, each byte has 256 possible values.
There are 256 bits and each bit has 2 values (0 or 1), thus 2^256.
There are 32 bytes and each byte has 256 values, thus 256^32.
Note: 2^256 == 256^32 ~= 10^77.
The 32 bytes can be encoded many ways, in hexadecimal it would be 64 characters, in Base64 it would be 44 characters.
Total combinations of SHA-256 is
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
A sha-256 hash has 64 characters, 32 hex combinations, because a hex has 2 characters.
3a 7b d3 e2 36 0a 3d 29 ee a4 36 fc fb 7e 44 c7 35 d1 17 c4 2d 1c 18 35 42 0b 6b 99 42 dd 4f 1b
Above is a hash where the hex combinations are separated so you can count 32.
There are 16 characters available to hex 0-9&a-f and 16^2 or 256 combinations in hex.
With 32 slots for a hex in a sha-256 you use 256^32 to get:
115792089237316195423570985008687907853269984665640564039457584007913129639936
Available sha-256 hashes.

wrong output when decoding base64 string

i seem to always get incorrect output when decoding this base64 string in vb.net ( i think its base64? it really looks like it )
im using the frombase64string function
and i did it like this
Dim b64str = "0DDQQL3uAikQBgAAc4cqK4WnSQBg4SAgExEAAF3BAmAILYojRgkBhUrBAgEDRw=="
Dim i As String = System.Text.Encoding.Unicode.GetString(Convert.FromBase64String(b64str))
MsgBox(i)
but i always get this output
バ䃐⤂ؐ
that doesn't seem right
0DDQQL3uAikQBgAAc4cqK4WnSQBg4SAgExEAAF3BAmAILYojRgkBhUrBAgEDRw==
It looks like Base64, the length is a correct size, the characters belong to the Base64 character set and the trailing "==" is reasonable. Of course it might not be a Base64 encoding.
Base64 decoding results in:
D0 30 D0 40 BD EE 02 29 10 06 00 00 73 87 2A 2B 85 A7 49 00 60 E1 20 20 13 11 00 00 5D C1 02 60 08 2D 8A 23 46 09 01 85 4A C1 02 01 03 47
Now the problem, this is not a character string, it is an array of 8-bit bytes. Thus it can not be displayed as characters. The 0x00 bytes will signal the end of a string to the print method and the no-representable characters may be ignored, displayed with special characters or multiple bytes may display as must-byte unicode characters. The only guaranteed and usual display is in hexadecimal as above.
That String can be virtually anything. It might be the result of an encryption algorithm, like sha*. Your mistake is that you assume that it must be base64 because it might be.
It is a valid observation that it might be base64, so it was a perfectly valid thing to run that function, but it is you who has to determine whether based on the results it is base64 or something else, based on particular logic, which was not described in the question.

BER Encoding of a "Choice"

I am trying to parse an LDAP bind request using the Apache Harmony ASN.1/BER classes (could use another library, I just chose that as it has an Apache License).
My question is on the encoding specifically of a "CHOICE" in ASN.1. The RFC that defines the LDAP ASN.1 schema (http://www.rfc-editor.org/rfc/rfc2251.txt) gives the following as part a bind request:
BindRequest ::= [APPLICATION 0] SEQUENCE {
version INTEGER (1 .. 127),
name LDAPDN,
authentication AuthenticationChoice }
AuthenticationChoice ::= CHOICE {
simple [0] OCTET STRING,
-- 1 and 2 reserved
sasl [3] SaslCredentials }
SaslCredentials ::= SEQUENCE {
mechanism LDAPString,
credentials OCTET STRING OPTIONAL }
How is that CHOICE there actually encoded?
I generated a sample bind request using JXplorer and captured the raw data that was sent. It looks like this:
00000000 30 31 02 01 01 60 2c 02 01 03 04 1b 75 69 64 3d |01...`,.....uid=|
00000010 74 65 73 74 75 73 65 72 2c 64 63 3d 74 65 73 74 |testuser,dc=test|
00000020 2c 64 63 3d 63 6f 6d 80 0a 74 65 73 74 69 6e 67 |,dc=com..testing|
00000030 31 32 33 |123|
The 80 there (at offset 0x27) seems to represent that choice. Fair enough - and I get that (per http://en.wikipedia.org/wiki/Basic_Encoding_Rules#BER_encoding) the last bit is set in order to indicate that it's "context specific" (i.e. defined by this application/protocol) But how would I know if this is a "simple" or "sasl" auth? What indicates which option of the choice is being used? In this case it looks like the next byte (0x0a) is the length of the string - so this could be an OctetString or something of the sort - but I don't see anything here that indicates what the actual is other than 0x80...
I'm also not sure what the [0] and [3] mean in the CHOICE section above. Is that saying there are four options but only options numbered 0 and 3 are in use?
Below you can see output of openssl asn1parse command. The CHOICE members are encoded using so called context specific tags - which means normal tag value is replaced with the one specified in ASN.1 definition for respective item in the CHOICE. The tag has value 0 which implicates the first item in CHOICE is selected. The first choice item is of type OCTET STRING. The value 0 of context specific tag gives you the information about the value type. If there was no context tag, normal OCTET STRING tag would be used.
0:d=0 hl=2 l= 49 cons: SEQUENCE
2:d=1 hl=2 l= 1 prim: INTEGER :01
5:d=1 hl=2 l= 44 cons: appl [ 0 ]
7:d=2 hl=2 l= 1 prim: INTEGER :03
10:d=2 hl=2 l= 27 prim: OCTET STRING :uid=testuser,dc=test,dc=com
39:d=2 hl=2 l= 10 prim: cont [ 0 ]
The '80'H in the encoded message above is called the "identifier octets" (in general it may be more than one octet). This value of the identifier octet(s) indicates that the selected alternative of the CHOICE is "simple", because the five low-order bits of '80'H are '00000'B, which matches the tag number of the tag of "simple" ([0]).
If the sender had selected the "sasl" alternative, the identifier octet would be 'A3'H instead of '80'H. The '3'H in 'A3'H (the five low-order bits) is the tag number of the tag of "sasl" ([3]). The two highest-order bits of the identifier octet are set to '10'B for both alternatives because both [0] and [3] are "context-specific" tags (this just means that these tags don't contain the APPLICATION keyword or the PRIVATE keyword). The next bit of the identifier octet (the "constructed" bit) is set to '0' for "simple" but is set to '1' for "sasl", because the encoding of "sasl" contains nested tags whereas the encoding of "simple" does not contain any nested tags.

How to find the "lexical file" in Wordnet?

If you look at the original Wordnet search and select "Display options: Show Lexical File Info", you'll see an extremely useful classification of words called lexical file. Eg for "filling" we have:
<noun.substance>S: (n) filling, fill (any material that fills a space or container)
<noun.process>S: (n) filling (flow into something (as a container))
<noun.food>S: (n) filling (a food mixture used to fill pastry or sandwiches etc.)
<noun.artifact>S: (n) woof, weft, filling, pick (the yarn woven across the warp yarn in weaving)
<noun.artifact>S: (n) filling ((dentistry) a dental appliance consisting of ...)
<noun.act>S: (n) filling (the act of filling something)
The first thing in brackets is the "lexical file". Unfortunately I have not been able to find a SPARQL endpoint that provides this info
The latest RDF translation of Wordnet 3.0 points to two things:
Talis SPARQL endpoint. Use eg this query to check there's no such info:
DESCRIBE <http://purl.org/vocabularies/princeton/wn30/synset-chair-noun-1>
W3C's mapping description. Appendix D "Conversion details" describes something useful: wn:classifiedByTopic.
But it's not the same as lexical file, and is quite incomplete. Eg "chair" has nothing, while one of the senses of "completion" is in the topic "American Football"
DESCRIBE <http://purl.org/vocabularies/princeton/wn30/synset-completion-noun-1> ->
<j.1:classifiedByTopic rdf:resource="http://purl.org/vocabularies/princeton/wn30/synset-American_football-noun-1"/>
The question: is there a public Wordnet query API, or a database, that provides the lexical file information?
Using the Python NLTK interface:
from nltk.corpus import wordnet as wn
for synset in wn.synsets('can'):
print synset.lexname
I don't think you can find it in the RDF/OWL Representation of WordNet. It's in the WordNet distribution though: dict/lexnames. Here is the content of the file as of WordNet 3.0:
00 adj.all 3
01 adj.pert 3
02 adv.all 4
03 noun.Tops 1
04 noun.act 1
05 noun.animal 1
06 noun.artifact 1
07 noun.attribute 1
08 noun.body 1
09 noun.cognition 1
10 noun.communication 1
11 noun.event 1
12 noun.feeling 1
13 noun.food 1
14 noun.group 1
15 noun.location 1
16 noun.motive 1
17 noun.object 1
18 noun.person 1
19 noun.phenomenon 1
20 noun.plant 1
21 noun.possession 1
22 noun.process 1
23 noun.quantity 1
24 noun.relation 1
25 noun.shape 1
26 noun.state 1
27 noun.substance 1
28 noun.time 1
29 verb.body 2
30 verb.change 2
31 verb.cognition 2
32 verb.communication 2
33 verb.competition 2
34 verb.consumption 2
35 verb.contact 2
36 verb.creation 2
37 verb.emotion 2
38 verb.motion 2
39 verb.perception 2
40 verb.possession 2
41 verb.social 2
42 verb.stative 2
43 verb.weather 2
44 adj.ppl 3
For each entry of dict/data.*, the second number is the lexical file info. For example, this filling entry contains the number 13, which is noun.food.
07883031 13 n 01 filling 0 002 # 07882497 n 0000 ~ 07883156 n 0000 | a food mixture used to fill pastry or sandwiches etc.
It can be done through MIT JWI (MIT Java Wordnet Interface) a Java API to query Wordnet. There's a topic in this link showing how to implement a java class to access lexicographic
This is what worked for me,
Synset[] synsets = database.getSynsets(wordStr);
ReferenceSynset referenceSynset = (ReferenceSynset) synsets[i];
int lexicalCode =referenceSynset.getLexicalFileNumber();
Then use above table to deduce "lexnames" e.g. noun.time
If you're on Windows, chances are it is in your appdata, in the local directory. To get there, you will want to open your file browser, go to the top, and type in %appdata%
Next click on roaming, and then find the nltk_data directory. In there, you will have your corpora file. The full path is something like:
C:\Users\yourname\AppData\Roaming\nltk_data\corpora
and lexnames will present under
C:\Users\yourname\AppData\Roaming\nltk_data\corpora\wordnet.