vCard 4.0 doesn't display correctly, with labels ignored and wrong character decoding, among other things - vcf-vcard

I have created a vCard 4.0 file with a text editor according to RFC 6350 by IETF. It is simple, and looks kind of like this:
BEGIN:VCARD
VERSION:4.0
KIND:individual
FN:René Descartes
N:Descartes;René;;;
TITLE:Façade Engineer
ADR;
GEO="geo:46.975308,0.699597";
LABEL="Headquarters":
;;29 Rue Descartes;;Descartes;37160;France
TEL;VALUE=uri;TYPE=home:tel:+33247597919
END:VCARD
The file is saved as somename.vcf (with CRLF and in UTF-8) and inspected on my iOS/macOS devices. However, the display of the file has many issues.
Non-ASCII characters are not decoded correctly.
The labels are all wrong.
The URI scheme is prepended to the phone number.
It is as if vCard 4.0 is not supported at all. Or did I make any mistakes?
The screenshot is attached below.

Like you suggested, it looks to me like the client does not support vCard
version 4. For example, URI-formatted telephone numbers are only supported by version 4, which might explain why it is not rendering the phone number properly. Try using a version 3 vCard.
Your ADR property is formatted strangely. I might trying
putting it all on one line to see if that makes any difference. If your intent is to make use of line
folding, each additional line must be prefixed with a
single space according to the RFC. You are using two spaces.

Related

How to request a page title in a foreign language using wikipedia API

I am trying to use a simple GET request
"https://en.wikipedia.org/w/api.php?action=query&titles=音乐&prop=langlinks&lllimit=500"
but with chinese characters in the title wikipedia can't find the page even though it exists https://zh.wikipedia.org/wiki/%E9%9F%B3%E4%B9%90
I have tried URL encoding the chinese word, I tried both simplified and traditional. I tried giving it unicode in ascii like "\u97f3\u4e50"
Does anyone know how to do this?
I solved it.
There are a few things to remember when doing this:
You need to use the wikipedia of your target language. so in this case
zh.wikipedia.org
Chinese wikipedia externally displays the charset of the users region (simplified for mainland, standard for Taiwan). But internally it depends on who wrote the article. The title in your api query must be in the original character set of the person who created it. So for Music, 音樂 will not work and you must use the simplified 音乐. But for Notebook computer the simplified 笔记本 will not work and you must use 筆記本. You have no choice but to try both. .NET includes a set of methods for converting between the two character sets.

How do I make iText 7 diacritic mark stacking work correctly?

I have run into a problem with iText 7 where diacritic marks are painted on top of one another instead of stacking properly when multiple marks are used on a single character. Is there a setting that makes them appear correctly, or is this a bug in iText 7? Any help greatly appreciated. This can be observed if you create a text object in your PDF like below. Obviously, replace the relevant bit with an actual font object, rather than than what I have in there.
new Text("ḗ and ṓ are characters that display incorrectly").setFont(<UNICODE COMPATIBLE FONT LIKE CHARIS>);
While Bruno and Benoit correctly pointed out that for advanced typography stuff like stacking diacritical marks you need pdfCalligraph module, there is a workaround you can try at your own risk. If your combinations of base glyph and diacritics are real, meaning they occur in real texts in some languages or some other known contexts, then such combinations are most probably present in Unicode and have their own number associated with them. For instance, in text you provided, they are 0xU1E17 and 0x1E53 Unicode characters. Some fonts may contain such glyphs, so that there is a second option to showing base glyph and stacking diacritics: showing combined glyphs. For example, ArialUni shipped with Windows does contain the above mentioned glyphs.
To try this approach, you would need the following code for composing known Unicode base glyph + diacritics combinations into single glyphs:
String originalStr = "ḗ and ṓ are characters that display incorrectly";
String normalizedStr = java.text.Normalizer.normalize(originalStr, Normalizer.Form.NFC);
new Text(normalizedStr); // Use this normalized Text instance
The result that I got with ArialUni:
But again as I mentioned do it at your own risk because it will only work if there are necessary combinations present both in Unicode and font. For correct rendering you still should use pdfCalligraph.

Special characters in PDF form fields and global and fieldbased DR

I have a question regarding a weird form field behaviour.
Two pdf documents, both have textfield(s) using Helvetica as a font
Both are filled with values using the same iText logic (cp. below)
The field value (/V) is correct for both PDFs however the field appearance is not.
One Pdf is working fine the other scrambles special character like the euro symbol € or German characters like üöäß.
I tried to define a substitute font (as described in the book) however never got € and ß to work.
The only difference I could find is that a /DR dictionary is defined on field level for the non-working PDF (in adition to the global one). But if I remove it, the € sign still doesn't work. Please note, that I am not talking about asian or some exotic unicode characters here - all are part of the standard helvetica font (as the other PDF proves)
Question(s):
Any ideas how to get the non working PDF to correctly display the characters?
Or does the PDF violates the pdf spec somehow? (It was created using Acrobat which makes that unlikely but not impossible).
If you suggest to replace the form field font - how can I differentiate between working and non working PDF files since I don't want to do that for perfectly valid and working files
Update: The code is not the problem (I am certain of that since its the same code for both) however for the sake of completeness here it is:
AcroFields acroFields = stamper.getAcroFields();
try {
boolean successful = acroFields.setField("Mitarbeiter", "öäü߀#");
if (!successful) {
//throw some exception
}
}
catch (DocumentException de) {
//some exceptionhandling
}
I didn't find any clues in the PDF reference about this, but the font that is used for the field doesn't define an encoding. However: an encoding is defined at the level of the resource dictionary (/DR). If you use that encoding, then the appearance of the field is created correctly. Note that the ISO specification doesn't say anything about the existence of an /Encoding entry at the level of the resource dictionary.
I've made a small update to iText. You can check the changes in revision 6693. This way, iText will now check if the /DR dictionary has encoding values in case no encoding is defined at the level of the font. With this fix, your form is filled out correctly.

SQL Strip the Font Format(Colour or other)

I have a problem to strip out the format in a note table
Here is an example:
";\red31\green73\blue125;
\viewkind4\uc1\ltrpar\f0\fs20 USEFUL TEXT BODY \cf1\f3
\ltrpar\f0\fs17
"
How to get rid of those stuff? I want to play safe not to replace anything after'\'
Many thanks,
Rick
Your making it quite difficult for yourself by not replace '\' .
If you look at http://other9.tripod.com/Refs/easy-rtf.html you will see that there are different RTF codes and there is no default size for the codes.
Additionally, it is not like HTML where there must be a necessary "closing" tag which makes it additionally difficult.
The only thing I can think of is to record all possible RTF codes (or use an RTF parser library) and hence be able to recognize if a \ is or is not RTF code.

vb.net character set

According to MSDN vb.net uses this extended character set. In my experience it actually uses this:
What am I missing? Why does it say it uses the one and uses the other?
Am I doing something wrong?
Is there some sort of conversion tool to the original character set?
This behaviour is defined in the documentation of the Chr command:
The returned value depends on the code page for the current thread, which is contained in the ANSICodePage property of the TextInfo class in the System.Globalization namespace. You can obtain ANSICodePage by specifying System.Globalization.CultureInfo.CurrentCulture.TextInfo.ANSICodePage.
So, the output of Chr for values greater than 127 is system-dependent. If you want reproducible results, create the desired instance of Encoding by calling Encoding.GetEncoding(String), then use Encoding.GetChars(Byte()) to convert your numeric values into characters.
If you go up one level in the chart linked in your question, you will see that they do not claim that this chart is always the output of the Chr command:
The characters that appear in Windows above 127 depend on the selected typeface.
The charts in this section show the default character set for a console application.
Your application is a WinForm application, not a console application. Even in the console, the character set used can be changed (for example, by using the chcp command), hence the word "default".
For detailed information about the encodings used in .net, I recommend the following MSDN article: Character Encoding in the .NET Framework.
The first character set is Code Page 437 (CP437), the second looks like Code Page 1252 (CP1252) also known as Windows Latin-1.
I'd guess VB.Net is simply picking up the default encoding for the PC.
How did you write all this? Because usually, when you use a output stream function, you can specify the encoding going with it.
Edit: I know this is not C#, but you can see the idea...
You'd have to set the encoding of your filestream, by doing something like this:
Setting the encoding when creating the filestream