What is the meaning of 'sanitize-utf8' in syslog-ng configuration flags - syslog-ng

The documentation of syslog-ng says that this flag converts non-utf8 input to a valid 'escaped form'.
Does 'escaped form' mean that the rest of the part that is not utf8 is removed or does it mean that the part that is not utf8 is converted to utf8?
And if you add this flag to syslog-ng.conf, syslog-ng will not run.
The error code is as follows.
Why does the execution of syslog-ng fail?

In case the received message contains invalid UTF-8 sequences, the sanitize-utf8 flag replaces them with a binary-escaped format, for example, \xAB.
The sanitize-utf8 flag is available starting from syslog-ng v3.7.1.

Related

invalid byte sequence for encoding “UTF8”

I am trying to load a 3GB (24 Million rows) csv file to greenplum database using gpload functionality but I keep getting the below error
Error -
invalid byte sequence for encoding "UTF8": 0x8d
I have tried solution provided by Mike but for me, my client_encoding and file encoding are already the same. Both are UNICODE.
Database -
show client_encoding;
"UNICODE"
File -
file my_file_name.csv
my_file_name.csv: UTF-8 Unicode (with BOM) text
I have browsed through Greenplum's documentation as well, which says the encoding of external file and database should match. It is matching in my case yet somehow it is failing.
I have uploaded similar smaller files as well (same UTF-8 Unicode (with BOM) text)
Any help is appreciated !
Posted in another thread - use the iconv command to strip these characters out of your file. Greenplum is instantiated using a character set, UTF-8 by default, and requires that all characters be of the designated character set. You can also choose to log these errors with the LOG ERRORS clause of the EXTERNAL TABLE. This will trap the bad data and allow you to continue up to set LIMIT that you specify during create.
iconv -f utf-8 -t utf-8 -c file.txt
will clean up your UTF-8 file, skipping all the invalid characters.
-f is the source format
-t the target format
-c skips any invalid sequence

Maximum line length supported by logstash?

What is the maximum character length does logstash can read as a single event from a file (single line input, NOT multiline input)? Also does logstash take specific number of spaces/tabs in between a line as newline?
Looks like the file input uses filewatch, which has a 32K limit.
The previous answer is referring to an old version of filewatch where the chunk size was hard-coded to 32K
Recent file-input still uses this as the default chunk size but allows configuration if the line is longer than 32K.
The plugin also prints a warning when a delimiter is not found reading a chunk.

Openfire: Offline UTF-8 encoded messages are saved wrong

We use Openfire 3.9.3. Its MySql database uses utf8_persian_ci collation and in openfire.xml we have:
...<defaultProvider>
<driver>com.mysql.jdbc.Driver</driver>
<serverURL>jdbc:mysql://localhost:3306/openfire?useUnicode=true&amp;characterEncoding=UTF-8</serverURL>
<mysql>
<useUnicode>true</useUnicode>
</mysql> ....
The problem is that offline messages which contain Persian characters (UTF-8 encoded) are saved as strings of question marks. For example سلام (means hello in Persian) is stored and showed like ????.
MySQL does not have proper Unicode support, which makes supporting data in non-Western languages difficult. However, the MySQL JDBC driver has a workaround which can be enabled by adding
?useUnicode=true&characterEncoding=UTF-8&characterSetResults=UTF-8
to the URL of the JDBC driver. You can edit the conf/openfire.xml file to add this value.
Note: If the mechanism you use to configure a JDBC URL is XML-based, you will need to use the XML character literal & to separate configuration parameters, as the ampersand is a reserved character for XML.
Also be sure that your DB and tables have utf8 encoding.

Why does Redis not work with requirepass directive?

I want to set a password to connect to a Redis server.
The appropriate way to do that is using the requirepass directive in the configuration file.
http://redis.io/commands/auth
However, after setting the value, I get this upon restarting Redis:
Stopping redis-server: redis-server.
Starting redis-server: Segmentation fault (core dumped)
failed
Why is that?
The password length is limited to 512 characters.
In redis.h:
#define REDIS_AUTHPASS_MAX_LEN 512
In config.c:
} else if (!strcasecmp(argv[0],"requirepass") && argc == 2) {
if (strlen(argv[1]) > REDIS_AUTHPASS_MAX_LEN) {
err = "Password is longer than REDIS_AUTHPASS_MAX_LEN";
goto loaderr;
}
server.requirepass = zstrdup(argv[1]);
}
Now, the parsing mechanism of the configuration file is quite basic. All the lines are split using the sdssplitargs function of the sds (string management) library. This function interprets specific sequence of characters such as:
single and double quotes
\x hex digits
special characters such as \n, \r, \t, \b, \a
Here the problem is your password contains a single double quote character. The parsing fails because there is no matching double quote at the end of the string. In that case, the sdssplitargs function returns a NULL pointer. The core dump occurs because this pointer is not properly checked in the config.c code:
/* Split into arguments */
argv = sdssplitargs(lines[i],&argc);
sdstolower(argv[0]);
This is a bug that should be filed IMO.
A simple workaround would be to replace the double quote character or any other interpreted characters by an hexadecimal sequence (ie. \x22 for the double quote).
Although not documented, it seems there are limitations to the password value, particularly with the characters included, not the length.
I tried with 160 characters (just digits) and it works fine.
This
9hhNiP8MSHZjQjJAWE6PmvSpgVbifQKCNXckkA4XMCPKW6j9YA9kcKiFT6mE
too. But this
#hEpj6kNkAeYC3}#:M(:$Y,GYFxNebdH<]8dC~NLf)dv!84Z=Tua>>"A(=A<
does not.
So, Redis does not support some or all of the "special characters".
Just nailed this one with:
php: urlencode('crazy&char's^pa$$wor|]');
-or-
js: encodeURIComponent('crazy&char's^pa$$wor|]');
Then it can be used anywhere sent to the redis server via (usually) tcp

Can NMEA values contain '*' (asterisks)?

I am trying to create NMEA-compatible proprietary sentences, which may contain arbitrary strings.
The usual format for an NMEA sentence with checksum is:
$GPxxx,val1,val2,...,valn*ck<cr><lf>
where * marks the start of a 2-digit checksum.
My question is: Can any of the value fields contain a * character themselves?
It would seem possible for a parser to wait for the final <cr><lf>, then to look back at the previous 3 characters to find the checksum if present (rather than just waiting for the first * in the sentence). However I don't know if the standard allows it.
Are there other characters which may cause problems?
The two ASCII characters to be careful with are $, which has to be at the start, and * which precedes the checksum. Anyone else parsing your custom NMEA wouldn't expect to find either of those characters anywhere else. Some parsers, when they hit a $ assume that a new line has started. With serial port communication sometimes characters get lost in transit, and that's why there's a $ start of sentence marker.
If you're going to make your own NMEA commands it is customary to start them with P followed by a 3 character code indicating the manufacturer or company creating the proprietary message, so you could use $PSQU. Note that although it is recommended that NMEA commands are 5 characters long, there are proprietary messages out there by various hardware and software manufacturers that are anywhere from 4 characters to 7 characters long.
Obviously if you're writing your own parser you can do what you like.
This website is rather useful:
http://www.gpsinformation.org/dale/nmea.htm
If you're extending the protocol yourself (based on "proprietary") - then sure, you can put in anything you like. I would stick to ASCII, but go wild within those bounds. (Obviously, you need to come up with your own $GPxxx so as not to clash with existing messages. Perhaps a new header $SQUEL, ...)
By definition, a proprietary message will not be NMEA-compatible.
A standard parser listening to an NMEA stream should ignore anything that doesn't match what it thinks is 'good' data. That means a checksum error, or any massively corrupted message like it would think your new message is with some random *s thrown in.
If you are merely writing an existing message, then a * doesn't make sense, and should be ignored, but you run the risk of major issues if the checksum is correct, and the parser doesn't understand the payload.