What's the difference between zlib and zlib#openssh.com? - ssh

When I was debugging ssh and I found there are 2 compression method: zlib and zlib#openssh.com.
debug2:compression ctos: none, zlib#openssh.com,zlib
debug2:compression stoc: none, zlib#openssh.com,zlib
So is there any difference between the 2?

In rfc4251
There are two formats for algorithm and method names:
Names that do not contain an at-sign ("#") are reserved to be
assigned by IETF CONSENSUS. Examples include "3des-cbc", "sha-1",
"hmac-sha1", and "zlib" (the doublequotes are not part of the
name). Names of this format are only valid if they are first
registered with the IANA. Registered names MUST NOT contain an
at-sign ("#"), comma (","), whitespace, control characters (ASCII
codes 32 or less), or the ASCII code 127 (DEL). Names are case-
sensitive, and MUST NOT be longer than 64 characters.
Anyone can define additional algorithms or methods by using names
in the format name#domainname, e.g., "ourcipher-cbc#example.com".
The format of the part preceding the at-sign is not specified;
however, these names MUST be printable US-ASCII strings, and MUST
NOT contain the comma character (","), whitespace, control
characters (ASCII codes 32 or less), or the ASCII code 127 (DEL).
They MUST have only a single at-sign in them. The part following
the at-sign MUST be a valid, fully qualified domain name [RFC1034]
controlled by the person or organization defining the name. Names
are case-sensitive, and MUST NOT be longer than 64 characters. It
is up to each domain how it manages its local namespace. It
should be noted that these names resemble STD 11 [RFC0822] email
addresses. This is purely coincidental and has nothing to do with
STD 11 [RFC0822].
In short, one without at-sign is a formal version and the other one is additional one made by openssh.

From https://www.openssh.com/manual.html
OpenSSH implemented a compression method "zlib#openssh.com" that delays starting compression until after user authentication, to eliminate the risk of pre-authentication attacks against the compression code. It is described in draft-miller-secsh-compression-delayed-00.txt.
So in short it performs the same zlib compression, but starts the compression only after successful authentication, this way preventing certain type of attacks.

Related

Where are named pdf characters defined like "f_f", "uni00D0" and "a204"?

I'm trying to read the official pdf specification "Document management — Portable document format — Part 1: PDF 1.7" (PDF32000_2008.pdf) as bytes and then interpret them according to that specification.
In Annex D, Character Sets and Encodings, there is a list of all named characters, like:
or
When I parse PDF32000_2008.pdf, there are also named characters like "f_f", "uni00D0" and "a204", which are missing in that specification.
My guess is that "f_f" is a symbol for two 'f' characters, which might get printed with a special glyph. There is a unicode "Latin Small Ligature Ff" for 'ff'.
For example, there is also "f_i" in that file, which I expect to mean 'fi', one glyph showing the 2 characters 'f' and 'i'. However, the pdf specification has 'fi' as named character "fi" and what is the point for having 2 named characters pointing to the same symbol ?
I can imagine that "uni00D0" means the unicode character 'Ð'. However, pdf defines it already as named character "Eth"
What could be "a204" ? Maybe Ansi 204 'Ì', which also has already a named character "Igrave" ?
Why do they use also "a62", which would be just a '<' ?
However, my main question is: Where can I find a specification for these additional named characters ?
Of course, Adobe Acrobat understands them, but also Gmail seems not to have a problem with them. So I guess, their meaning must be specified somewhere.

Base64 Encoded String for Filename

I cant think of an OS (Linux, Windows, Unix) where this would cause an issue but maybe someone here can tell me if this approach is undesirable.
I would like to use a base64 encoded string as a filename. Something like gH9JZDP3+UEXeZz3+ng7Lw==. Is this likely to cause issues anywhere?
Edit: I will likely keep this to a max of 24 characters
Edit: It looks like I have a character that will cause issues. My function that generated my string is providing stings like: J2db3/pULejEdNiB+wZRow==
You will notice that this has a / which is going to cause issues.
According to this site the / is a valid base64 character so I will not be able to use a base64 encoded string for a filename.
No. You can not use a base64 encoded string for a filename. This is because the / character is valid for base64 strings which will cause issues with file systems.
https://base64.guru/learn/base64-characters
Alternatives:
You could use base64 and then replace unwanted characters but a better option would be to hex encode your original string using a function like bin2hex().
The official RFC 4648 states:
An alternative alphabet has been suggested that would use "~" as the 63rd character. Since the "~" character has special meaning in some file system environments, the encoding described in this section is recommended instead. The remaining unreserved URI character is ".", but some file system environments do not permit multiple "." in a filename, thus making the "." character unattractive as well.
I also found on the serverfault stackexchange I found this:
There is no such thing as a "Unix" filesystem. Nor a "Windows" filesystem come to that. Do you mean NTFS, FAT16, FAT32, ext2, ext3, ext4, etc. Each have their own limitations on valid characters in names.
Also, your question title and question refer to two totally different concepts? Do you want to know about the subset of legal characters, or do you want to know what wildcard characters can be used in both systems?
http://en.wikipedia.org/wiki/Ext3 states "all bytes except NULL and '/'" are allowed in filenames.
http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx describes the generic case for valid filenames "regardless of the filesystem". In particular, the following characters are reserved < > : " / \ | ? *
Windows also places restrictions on not using device names for files: CON, PRN, AUX, NUL, COM1, COM2, COM3, etc.
Most commands in Windows and Unix based operating systems accept * as a wildcard. Windows accepts % as a single char wildcards, whereas shells for Unix systems use ? as single char wildcard.
And this other one:
Base64 only contains A–Z, a–z, 0–9, +, / and =. So the list of characters not to be used is: all possible characters minus the ones mentioned above.
For special purposes . and _ are possible, too.
Which means that instead of the standard / base64 character, you should use _ or .; both on UNIX and Windows.
Many programming languages allow you to replace all / with _ or ., as it's only a single character and can be accomplished with a simple loop.
In Windows, you should be fine as long if you conform to the naming conventions of Windows:
https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#naming-conventions.
As far a I know, any base64 encoded string does not contain any of the reserves characters.
The thing that is probably going to be a problem is the lengte of the file name.

How many white spaces is the best for ssh config

For Ruby, using 2 spaces is the best.
For Python, using 4 spaces is the best.
But for ssh config file, how many spaces is the best?
I found the originally accepted answer a bit confusing so I thought I'd contribute some additional information.
To the original question, ssh config files allow, but do not require, indentation with whitespace (either tabs or spaces). Blank lines and lines beginning with a hash # are ignored.
The config file consists of stanzas, each beginning with the reserved word Host or Match followed by a list of options until the stanza ends at the next Host, Match or end of file.
The options can be specified as name value or name=value. Looking at the OpenSSH release notes, it appears the developers use the name=value format. Leading whitespace is ignored. Unquoted in-line whitespace is also ignored
The following (mixing with and without equals and whitespace) are equivalent
Host test1
Hostname = 192.168.0.100
Host test1
Hostname 192.168.0.100
Host=test1
Hostname 192.168.0.100
Note that the equal sign is significant when parsing options. Values with embedded equals signs need to be quoted. This contrived example demonstrates what happens without quotes.
Host test1
Hostname = 192.168.0.100
UserKnownHostsFile /tmp/name_with=equals /tmp/name2
Will look for known host in /tmp/name_with and in /tmp/name2 but not in /tmp/name_with=equals.
The configuration files (for ssh or other programs) do not need indentation.
They contain lines of type name=value.
Some programs allow spaces around the equal sign, others are more strict and do not accept them.
ssh accepts spaces around the equal sign but they are ignored. Use how many of them you like but don't abuse them and let the file be readable.
A small fragment from the documentation:
The file contains keyword-argument pairs, one per line. Lines starting with # and empty lines are interpreted as comments. Arguments may optionally be enclosed in double quotes (") in order to represent arguments containing spaces. Configuration options may be separated by whitespace or optional whitespace and exactly one =; the latter format is useful to avoid the need to quote whitespace when specifying configuration options using the ssh, scp, and sftp -o option.

Approximate search with openldap

I am trying to write a search that queries our directory server running openldap.
The users are going to be searching using the first or last name of the person they're interested in.
I found a problem with accented characters (like áéíóú), because first and last names are written in Spanish, so while the proper way is Pérez it can be written for the sake of the search as Perez, without the accent.
If I use '(cn=*Perez*)' I get only the non-accented results.
If I use '(cn=*Pérez*)' I get only accented results.
If I use '(cn=~Perez)' I get weird results (or at least nothing I can use, because while the results contain both Perez and Pérez ocurrences, I also get some results that apparently have nothing to do with the query...
In Spanish this happens quite a lot... be it lazyness, be it whatever you want to call it, the fact is that for this kind of thing people tend NOT to write the accents because it's assumend all these searches work with both options (I guess since Google allowes it, everybody assumes it's supposed to work that way).
Other than updating the database and removing all accents and trimming them on the query... can you think of another solution?
You have your ~ and = swapped above. It should be (cn~=Perez). I still don't know how well that will work. Soundex has always been strange. Since many attributes are multi-valued including cn you could store a second value on the attribute that has the extended characters converted to their base versions. You would at least have the original value to still go off of when you needed it. You could also get real fancy and prefix the converted value with something and use the valuesReturnFilter to filter it out from your results.
#Sample object
dn:cn=Pérez,ou=x,dc=y
cn:Pérez
cn:{stripped}Perez
sn:Pérez
#etc.
Then modify your query to use an or expression.
(|(cn=Pérez)(cn={stripped}Perez))
And you would include a valuesReturnFilter that looked like
(!(cn={stripped}*))
See RFC3876 http://www.networksorcery.com/enp/rfc/rfc3876.txt for details. The method for adding a request control varies by what platform/library you are using to access the directory.
Search filters ("queries") are specified by RFC2254.
Encoding:
RFC2254
actually requires filters (indirectly defined) to be an
OCTET STRING, i.e. ASCII 8-byte String:
AttributeValue is OCTET STRING,
MatchingRuleId
and AttributeDescription
are LDAPString, LDAPString is an OCTET STRING.
The standard on escaping: Use "<ASCII HEX NUMBER>" to replace special characters
(https://www.rfc-editor.org/rfc/rfc4515#page-4, examples https://www.rfc-editor.org/rfc/rfc4515#page-5).
Quote:
The <valueencoding> rule ensures that the entire filter string is a
valid UTF-8 string and provides that the octets that represent the
ASCII characters "*" (ASCII 0x2a), "(" (ASCII 0x28), ")" (ASCII
0x29), "\" (ASCII 0x5c), and NUL (ASCII 0x00) are
represented as a backslash "\" (ASCII 0x5c) followed by the two hexadecimal digits
representing the value of the encoded octet.
Additionally, you should probably replace all characters that semantically modify the filter (RFC 4515's grammar gives a list), and do a Regex replace of non-ASCII characters with wildcards (*) to be sure. This will also help you with characters like "é".

Can NMEA values contain '*' (asterisks)?

I am trying to create NMEA-compatible proprietary sentences, which may contain arbitrary strings.
The usual format for an NMEA sentence with checksum is:
$GPxxx,val1,val2,...,valn*ck<cr><lf>
where * marks the start of a 2-digit checksum.
My question is: Can any of the value fields contain a * character themselves?
It would seem possible for a parser to wait for the final <cr><lf>, then to look back at the previous 3 characters to find the checksum if present (rather than just waiting for the first * in the sentence). However I don't know if the standard allows it.
Are there other characters which may cause problems?
The two ASCII characters to be careful with are $, which has to be at the start, and * which precedes the checksum. Anyone else parsing your custom NMEA wouldn't expect to find either of those characters anywhere else. Some parsers, when they hit a $ assume that a new line has started. With serial port communication sometimes characters get lost in transit, and that's why there's a $ start of sentence marker.
If you're going to make your own NMEA commands it is customary to start them with P followed by a 3 character code indicating the manufacturer or company creating the proprietary message, so you could use $PSQU. Note that although it is recommended that NMEA commands are 5 characters long, there are proprietary messages out there by various hardware and software manufacturers that are anywhere from 4 characters to 7 characters long.
Obviously if you're writing your own parser you can do what you like.
This website is rather useful:
http://www.gpsinformation.org/dale/nmea.htm
If you're extending the protocol yourself (based on "proprietary") - then sure, you can put in anything you like. I would stick to ASCII, but go wild within those bounds. (Obviously, you need to come up with your own $GPxxx so as not to clash with existing messages. Perhaps a new header $SQUEL, ...)
By definition, a proprietary message will not be NMEA-compatible.
A standard parser listening to an NMEA stream should ignore anything that doesn't match what it thinks is 'good' data. That means a checksum error, or any massively corrupted message like it would think your new message is with some random *s thrown in.
If you are merely writing an existing message, then a * doesn't make sense, and should be ignored, but you run the risk of major issues if the checksum is correct, and the parser doesn't understand the payload.