How to Bold Text - Non-HTML Way? - formatting

Case Scenario
I'm storing some textual data into database which cannot contain HTML tags. For line breaks, I'm using \r\n and the output is fine when I view it in a tool known as Service Cloud (FKA RightNow)
Similarly, is there a way in which I can bold a part of text using a similar technique as \r\n?

Where are you storing this text? If this is a field that supports rich text, like standard text, then you can bold the text through rich text encoding. However, you're likely working with a generic text (ASCII characters) field, which does not have formatting. If that's the case, then no, you cannot format it.

Related

Missing presentation forms (glyphs) of some arabic characters in Unicode

I am working on a code that generates PDF containing arabic texts. For each character, I am choosing the correct glyph in the presentation forms to display the text correctly. This works fine but Unicode doesn't contain presentation form of all arabic characters.
For example \u067D ARABIC LETTER TEH WITH THREE DOTS ABOVE DOWNWARDS ٽ. There is no presentation form of this character even though the character has medial form, as can be seen in this string: لٽط
What is the reason that presentation forms of this and other characters are missing?
Is the character not used in practice?
Can the simple ARABIC LETTER TEH, which contains only one dot above and has presentation forms, be used instead?
Or is it necessary to somehow build this character (e.g. by using \uFBB6 THREE DOTS ABOVE character)?
The Arabic presentation forms should never be used for writing text. They exist only because they were needed for compatibility with older standards long ago. As such, there aren’t presentation forms for all Arabic letters in Unicode, only those necessary for this specific purpose. Many letters were also added long after the presentation forms ceased being relevant altogether. See the FAQ on Arabic for more information.
Arabic text should always be entered and stored using the regular letters (from the blocks Arabic, Arabic Supplement, and Arabic Extended-A). These letters will then automatically assume the correct shape depending on where they are situated in the word (initial, medial, or final) as can be seen in the example string you provided.
Using the character U+FBB6 ﮶ ARABIC SYMBOL THREE DOTS ABOVE would not be appropriate in this context because it is not a combining mark. It isn’t used to build new characters, but to talk about the symbol itself in isolation. From the code chart for Arabic Presentation Forms-A:
These are spacing symbols representing Arabic letter diacritics
considered in isolation, as for example as in discussions about the
Arabic script.
If the software you are using does not handle Arabic letter joining correctly, then there simply is no Unicode-defined way to enter the medial form of ٽ in your document. You will either have to switch to another framework entirely, or (as a last resort) encode the contextual forms you need as private-use characters in a new font, but I strongly recommend against that solution.

How to export text document containing astral Unicode characters to PDF

I regularly create documents that need Unicode characters above U+FFFF. Unfortunately, OpenOffice and LibreOffice are both unable to correctly export these characters when creating a PDF. The actual data gets mangled by a completely asinine algorithm, while the display just consists of various overlapping question mark boxes.
This is not a font issue. I embed all used fonts in the PDF and all characters below U+FFFF work perfectly fine.
Until now I have been working around this issue by mapping the glyphs I need to a custom PUA font. This solves the display problems, but obviously makes the actual content of the text unsearchable and quite fragile. I haven’t been able to find any settings that might affect the handling of Unicode characters in PDF.
Therefore I have three questions:
Is there a way to make OpenOffice/LibreOffice handle astral characters correctly on PDF export?
If not, is there an external tool that can convert .odt files to PDF while preserving astral characters?
If not, is there another good rich-text editor using a different file format that can deal with astral characters in PDFs?

Check if RichTextBox has formatting (vb.net)

Is there a way to detect if there is formatting or not (i.e. Rich) anywhere in a RichTextBox?
I'd like to write RichTextBox.Rtf is there is, and RichTextBox.Text if there isn't.
I came across a post that suggested checking the SelectionFont after selecting all text, but I have no idea how to accomplish that, or if it would work.
Thanks!
You can check the rtf property of the richtextbox: richtextbox1.rtfto see the rich text code. Even with no additional formatting, there will be some rich text formatting. For example, here's what you get with a richtextbox containing only "text":
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}}
\viewkind4\uc1\pard\f0\fs16 test\par
}
You can check to see if there are any additional formatting tags, and if not, it should be plain text. You can also access the plain text (without formatting) with richtextbox1.text.

How Does a PDF Store Text

I am attempting to gain a better understanding of how a PDF stores text. Generally speaking, when a PDF is created from an application like MS Word (or in my case SQL Server Reporting Services), how is text stored by the PDF? I would hope that the resulting document isn't OCR'ed in this particular scenario the way it would be if the original PDF document had been created from an image.
To get a bit more detailed, I am trying to understand how text extractors for PDFs work. My initial understanding of PDF was that it stored (PostScript) instructions on how to draw the "image" of the document to a page or a printer, and that there was no actual text contained within the document itself. Subsequently, I was thinking that a text extractor might reverse-engineer such instructions to generate the text that the PDF would otherwise generate. I am not confident of this, though.
PDF contains several different types of objects; not only vectorial or raster drawing instructions. Text in in particular is represented by text elements. These include a string of characters that should be drawn at certain positions using a specific font.
Text extraction from PDFs can be a complicated affair because the file format is oriented for page layout. A text element may be an entire paragraph, or a single character. Even a single word may consist of several text elements if different typefaces are mixed. Also, the characters are not necessarily encoded in a standard encoding such as Unicode. They may be encoded in a way specific to a particular font.
If you are lucky enough to deal with Tagged PDF files such as PDF/A or PDF/UA, text extraction can be a lot easier because text spans are identified as such, and a mapping to Unicode characters is defined.
Wikipedia doesn't have the complete specification but does serve as an introduction: http://en.wikipedia.org/wiki/Portable_Document_Format#Text

Programmatically replace text in PDF

I have PDF files with text that should be replaced. More specificly, the text should be translated and replaced with the translated version.
It's important that the rest of the PDF structure stays intact. Note that the text is available in the PDFs and techniques like OCr are not needed. Also, it would be nice if font and other text attributes are kept.
Which libraries would you recommend for extracting the text to an easy to edit format (such as CSV) and put the new text back in again?
Assuming you are replacing text with a different language, you will have to choose a different font in most cases, and the font choice is non-trivial. I've used the Foxit libraries to change text or create PDFs with success.