An issue with Microsoft's Direct2D framework.
How is it possible that width > widthIncludingTrailingWhitespace? Shouldn't it be
width == widthIncludingTrailingWhitespace
in the case of no trailing spaces and
width < widthIncludingTrailingWhitespace.
when the underlying string contains trailing spaces?
In my case the underlying string is " Info ". It has a trailing space in it and still the widthIncludingTrailingWhitespace is zero. See screenshot:
MSDN documentation states:
width
Type: FLOAT
A value that indicates the width of the formatted text, while ignoring
trailing whitespace at the end of each line.
widthIncludingTrailingWhitespace
Type: FLOAT
The width of the formatted text, taking into account the trailing whitespace at the end of each line.
It seems to be a MSDN bug.
The issue occurs only with
IDWriteTextLayout->SetTextAlignment(DWRITE_TEXT_ALIGNMENT_TRAILING)
In the case of
IDWriteTextLayout->SetTextAlignment(DWRITE_TEXT_ALIGNMENT_LEADING)
the widthIncludingTrailingWhitespace is calculated correctly.
Related
When extracting the position of the words in this example:
http://www.dertour.de/static/agb/2015/sommer/DER_Deutschland_So15.pdf
with iTextSharp 5.5.8
I'm getting 'incorrect' coordinates for some words. For example on line 17 of the first paragraph: 'gehen oder im Widerspruch zur Reiseaus-'
the x-values of the left,top position of the words are 118, 217, 296, 350, 524, 587. Only the first value seems correct (118,208,277,320,487,540). The x-value of the right-bottom point of the space-character between 'gehen' and 'oder' is 208, which seems correct and also seems to be the correct x-pos for the word 'oder'. Maybe it has something to do with the fillmode of the paragraph, but I'm not sure which actions I should perform to get the right coordinates.
I'm using LocationTextExtractionStrategy and calculate the word-positions to a 300 dpi coordinate system.
public override void RenderText(TextRenderInfo renderInfo)
{
// for the provided example
// uUnit = 1
// originX = 33.862
// originY = 33.555
// dpi = 300
// above values where calculated with code:
// PdfNumber userUnit = pageDict.GetAsNumber(PdfName.USERUNIT);
// if (userUnit != null)
// {
// uUnit = userUnit.FloatValue;
// }
// Rectangle dim = reader.GetPageSize(i);
// float originX = dim.Left;
// float originY = dim.Bottom;
// calculate coordinates:
renderInfo.GetText();
LineSegment segment = renderInfo.GetBaseline();
List<TextRenderInfo> charInfo = renderInfo.GetCharacterRenderInfos().ToList();
foreach (TextRenderInfo item in charInfo)
{
LineSegment char_segment = item.GetBaseline();
int char_left = (int)Math.Round((char_segment.GetStartPoint()[0] - originX) * dpi * uUnit / 72.0f);
int char_top = (int)Math.Round((item.GetAscentLine().GetEndPoint()[1] - originY) * dpi * uUnit / 72.0f);
int char_right = (int)Math.Round((char_segment.GetEndPoint()[0] - originX) * dpi * uUnit / 72.0f);
int char_bottom = (int)Math.Round((item.GetDescentLine().GetStartPoint()[1] - originY) * dpi * uUnit / 72.0f);
}
}
This indeed is a bug in iText & iTextSharp:
The lines with the extremely inaccurate x coordinates are those for which a large wordspacing value is set, e.g. your line:
0.2861 Tw T*
[<0047004500480045004E0000>-286<004F0044004500520000>-286<0049004D0000>-231<003700490044004500520053005000520055004300480000>-286<005A005500520000>-286<00320045004900530045004100550053000D>]TJ
(That 0.2861 argument for Tw is large.)
According to the ToUnicode map of the font in question the 0000 at the end of each word maps to the space character. Thus, iText here adds the word spacing value when calculating the x coordinates because according to the PDF specification ISO 32000-1:
Word spacing works the same way as character spacing but shall apply only to the ASCII SPACE character
(First sentence of section 9.3.3 Word Spacing)
Unfortunately it does not take into account
Word spacing shall be applied to every occurrence of the single-byte character code 32 in a string when using
a simple font or a composite font that defines code 32 as a single-byte code. It shall not apply to occurrences of
the byte value 32 in multiple-byte codes.
(Last sentence of section 9.3.3 Word Spacing)
At the 0000 above, therefore, word spacing must not be applied even though it is mapped to the space character because
the font encoding in question is purely multi-byte and
even in case of single-byte encoded space characters the word spacing is applied only at the single-byte code 32, not at a code which merely maps to the space character with ASCII code 32.
Usually this is not a problem during text extraction, usually PDF generators which encode space characters using multi-byte encodings are aware that word spacing does not apply for them and, therefore, don't change the word spacing from its default 0 value, so the iText bug here does no harm. Usage of word spacing instructions usually indicates that fonts are used which do map the single-byte code 32 to the space character.
Your PDF, on the other hand, seems to not have been created with that fact on the mind, it looks like first the word spacing has been set (0.2861 Tw), and after recognizing that it made no difference, explicit gaps have been added (-286 in the TJ instruction). (Or that was part of the development history of the PDF generator in question.)
Please be aware that positive values in the TJ argument mean a shift to the left, so negative values (as claimed for the -286 above) indeed widen or add gaps:
array TJ Show one or more text strings, allowing individual glyph positioning. Each element of array shall be either a string or a number. If the element is a string, this operator shall show the string. If it is a number, the operator shall adjust the text position by that amount; that is, it shall translate the text matrix, Tm . The number shall be expressed in thousandths of a unit of text space (see 9.4.4, "Text Space Details"). This amount shall be subtracted from the current horizontal or vertical coordinate, depending on the writing mode. In the default coordinate system, a positive adjustment has the effect of moving the next glyph painted either to the left or down by the given amount. Figure 46 shows an example of the effect of passing offsets to TJ.
(Table 109 – Text-showing operators in ISO 32000-1)
when converting javascript floating point numbers to string format insignificant trailing decimal zeros are lost.
for an example
var num = 20.0;
altert(num);
will give output as 20
how can i stop removing insignificant decimal trailing zeros when formatted as string
Answer that was already given
Please take above reference
my PDF file has deflate encoding, when inflating the string, it outputs something like this:
[(Lorem)-21( ipsum)-55( dolor)-14( sit)-55( amet,)-56( consectetur)-8( adipiscing)-14( elit.)-34( Donec)-15( faucibus)-49( lorem)-42( varius2)-56( mauris)-28( porttitor,)-34( et)-28( pellentesque)-1( )]TJ
what do the numbers and brackets mean?
it does not seems to be character count, or spacing,
does anyone know?
That is an array for showing text (Stuff in brackets denote array objects []), it should be followed by the TJ operator. The number is used to translate the text matrix (adjust the positioning of the text). Assuming horizontal text, a negative number moves the next glyph to the right.
From 9.4.3 Text-Showing Operators (Please see the spec for more details)
Show one or more text strings, allowing individual glyph positioning.
Each element of array shall be either a string or a number. If the
element is a string, this operator shall show the string. If it is a
number, the operator shall adjust the text position by that amount;
that is, it shall translate the text matrix, Tm. The number shall be
expressed in thousandths of a unit of text space (see 9.4.4, "Text
Space Details"). This amount shall be subtracted from the current
horizontal or vertical coordinate, depending on the writing mode. In
the default coordinate system, a positive adjustment has the effect of
moving the next glyph painted either to the left or down by the given
amount.
The parentheses denote string objects:
String objects shall be written in one of the following two ways:
As a sequence of literal characters enclosed in parentheses ( ) (using
LEFT PARENTHESIS (28h) and RIGHT PARENThESIS (29h)); see 7.3.4.2,
"Literal Strings."
...
A literal string shall be written as an arbitrary number of characters
enclosed in parentheses. Any characters may appear in a string except
unbalanced parentheses (LEFT PARENHESIS (28h) and RIGHT PARENTHESIS
(29h)) and the backslash (REVERSE SOLIDUS (5Ch)), which shall be
treated specially as described in this sub-clause. Balanced pairs of
parentheses within a string require no special treatment.
I suggest getting the PDF Spec and reading it to find out more info.
What is the mask for "percentage", in a WinForms application (VB.net)?
Per the documentation here: http://msdn.microsoft.com/en-us/library/system.windows.forms.maskedtextbox.mask.aspx
\ Escape. Escapes a mask character,
turning it into a literal. "\\" is the
escape sequence for a backslash.
So the mask for a % sign is \%
Before posting, I made up a quick and dirty winforms app, tried it and it works.
Edit - added although this next item in the documentation makes it look like just a straight % sign should work without the backslash, so I tried it and it works as well.
All other characters Literals. All
non-mask elements will appear as
themselves within MaskedTextBox.
Literals always occupy a static
position in the mask at run time, and
cannot be moved or deleted by the
user.
textEdit1.Properties.Mask.MaskType = Numeric;
textEdit1.Properties.Mask.EditMask = "00.00%%";
textEdit1.Properties.Mask.UseMaskAsDisplayFormat = true;
http://community.devexpress.com/forums/t/59535.aspx
I'm finding that Char fields are being padded.
Is there any way to stop this happening.
I've tried using the property
SET PROPERTY "sql.enforce_strict_size" FALSE
but doesn't seem to help.
Indeed, the MySQL docs specify that "When CHAR values are retrieved, trailing spaces are removed." This is odd, as other databases seem to always keep the padding (i can confirm that for Oracle). The SQL-92 standard indicates that right-padded spaces are part of the char, for example in the definition of the CAST function on p. 148. When source (SV=source value) and target (TV=target value, LTD=length of target datatype), then:
ii) If the length in characters of SV is larger than LTD, then
TV is the first LTD characters of SV. If any of the re-
maining characters of SV are non-<space> characters, then a
completion condition is raised: warning-string data, right
truncation.
iii) If the length in characters M of SV is smaller than LTD,
then TV is SV extended on the right by LTD-M <space>s.
Maybe that's just another one of MySQL's many oddities and gotchas.
And to answer your question: if you don't want the trailing spaces, you should use VARCHAR instead.
I thought 'char' by definition are space padded to fill the field. They are considered fixed lenght and will be space padded to be fixed length.
The data type 'varchar' is defined as variable char where they are not space padded to fill the field.
I could be wrong though since I normally work on SQL Server.