I modified numberOfLines in this code.
But I didn't figure out yet what hability it has.
What is this?
<Text numberOfLines={10}>{this.state.bodyText}</Text>
numberOfLines is used to limit visible text on screen.
From docs: (https://facebook.github.io/react-native/docs/text#numberoflines)
Used to truncate the text with an ellipsis after computing the text layout, including line wrapping, such that the total number of lines does not exceed this number.
This prop is commonly used with ellipsizeMode.
Related
I needed to convert some PDF back to text. I tried many soft and online tools and result was always mediocre.
Why is it so difficult technically speaking ?
Let's not assume you are talking about PDFs which merely wrap some bitmap image because it should be clear that in that case you can only resort to OCR with all its restrictions.
Let's instead assume that text is drawn in the PDF at hand.
What is drawn on a PDF page is determined by a sequence of instructions in the content stream of that page. "Text is drawn" on a page means that among those instructions there are some setting the font to use by the instructions to come, some setting the text position and direction to use by the instructions to come, and some actually drawing text given by "string arguments".
Text extraction is the task of taking the sequence of instructions from a content stream and instead of drawing the text as indicated by the font and position setting instructions, to export it in a sensible order using a standard encoding, usually the encoding of the character type of the used programming language / platform.
The first problem is to understand the encoding of the string arguments of those text drawing instructions:
each font can have its own encoding; to extract the text one cannot simply ignore everything but the instructions drawing text and concatenate their string contents, you always have to take the current font into account (some extremely simple text extractors ignore this and, therefore, fail pretty often to return something sensible);
there are a large number of predefined encodings, some reminding of encodings you know, e.g. WinAnsiEncoding, many you likely don't know, e.g. Add-RKSJ-H; these encodings may use a constant number of bytes per glyph or they may be mixed-multibyte; so a text extractor must support very many encodings to start with;
encodings also may be completely ad-hoc and arbitrary; in particular in case of embedded subset fonts one often sees ad-hoc encodings generated by dealing out character codes from some starting value whenever one is needed; i.e. the first glyph in a given font used on a page is given the starting value as code, the next, different glyph is given the starting value plus one, the next, different one the starting value plus two, etc; "Hello World" and a starting value of 48 (ASCII value of '0') would result in "01223453627"; these fonts may contain a mapping to Unicode but they are not required to.
The next problem is to make sense out of the order of the strings:
the string drawing instructions may occur in an arbitrary order, e.g "Hello" might be drawn "lo" first, then after moving back "el", then after again moving back "H"; to extract the text one cannot ignore text positioning instructions and simply concatenate text strings, you always have to take the current position into account (some simple text extractors ignore this and, therefore, can fail to return something sensible);
multi-columnar text may present a difficulty, text may be drawn line by line, e.g. first the text of the top line of the first column, then the top line of the second column, then the second line of the first column, then the second line of the second column, etc.; there need not be any hints in the PDF that the text is multi-columnar.
Another problem is to recognize formatting or styling artifacts:
spaces between words need not be created by drawing a space glyph, it may also be done by text position changing instructions; text extractors not trying to recognize gaps created by text positioning instructions may return a result without spaces; on the other hand the same technique can be used to draw adjacent glyphs at an optimal distance, aka kerning; text extractors trying to recognize gaps created by text positioning instructions may falsely return spaces where there should be none;
sometimes selected words are printed s p a c e d o u t for extra emphasis; in the extracted text these gaps might be presented as space characters which automatic postprocessing of the text may see as word separators;
usually for bold text one uses a different, bold font program; if that is not at hand, people sometimes get creative and emulate bold by printing the same text twice with a minute offset; with a slightly larger offset (or a different transformation) and a different color a shadow effect can be emulated; if the text extractor does not try to recognize this, you end up having some duplicate characters in the output.
More problems arise due to incomplete or wrong extra information:
ToUnicode maps of fonts (optional maps from character code to Unicode) may be incomplete or contain errors; there e.g. are many questions here on stack overflow dealing with incorrect ToUnicode maps for Indian writings; the text extraction results reflect these errors;
there even are PDFs with contradictory information, e.g. with an error in the ToUnicode map but the correct information in an ActualText entry; this is used by some PDF creators to allow correct copy&paste from some programs (preferring an ActualText entry in such a situation) while injecting errors in the output of other programs (preferring ToUnicode information then).
Yet another problem arises if you expect the text extractor to extract only text eventually visible in the page:
text may be drawn outside the current clipping area or outside the visible page area; text extractors need to keep these in mind;
text may be drawn using the rendering mode "invisible"; text extractors have to keep an eye on the rendering mode;
text may be drawn using the same color as the background; to recognize this, a text extractor can not only look at the current instruction and a few graphics state details, it has to take into account anything drawn beforehand in the location of the text;
text may be drawn as a clip path; to recognize whether this text is visible in the end, a text extractor must keep track of what is drawn in the text area as long as the clip path is active;
text may be covered by something else later; a text extractor must drop recognized text in such a case; but depending on blend modes and transparency settings these coverings might or might not allow the text to shine through; thus, for a correct result the text extractor must for each glyph keep track of the color its drawn with, the color of the backdrop, and what all those spiffy effects do with those colors later on; and of course, both glyph color and backdrop color can be interesting, e.g. some shading colors; and the color spaces involved may differ, requiring one to convert back and forth between color spaces; and so on.
Furthermore, text may be drawn where text extractors usually don't look:
some tools hide text from text extraction by putting it into a pattern and filling the page area with that pattern;
similarly there are type 3 fonts; each character in a type 3 font is represented by its own content stream; thus, a tool can draw all text in the content stream of a single type 3 font glyph and then draw that glyph on the page.
...
You surely have meanwhile gotten an idea why text extraction results can be less than optimal. And be assured, the list above is not complete, there still are more complications for text extraction.
I am using a simple TextField wrapped in a Container.
When the user types a long string, I want it to automatically wrap to a new line.
It currently flows off the screen, on a single line. How do I fix this?
Unlimited number of lines
new TextField(..., maxLines: null)
or limited number of lines
new TextField(..., maxLines: 3)
This way it starts scrolling when the content exceeds the height of the input field
https://docs.flutter.io/flutter/material/TextField/maxLines.html
You have to set maxLines property to null. It default to 1.
Does anyone know if the method getFontSize in TextPosition always returns one and should I only use getFontSizeInPt to get the size of the font?
The problem I have is that getFontSizeInPt sometimes returns different values for the same sized text (I got 12 and 11 return for text in the same paragraph with the same size.
Does anyone know if the method getFontSize in TextPosition always returns one
It does not always return one.
Please be aware that in the PDF page content descriptions there are several settings which all influence the final text size:
the font size parameter of the font selecting operator Tf:
the text matrix set by the operator Tm;
the current transformation matrix set by the operator cm;
the UserUnit setting of the PDF page.
The final text size is the first value scaled by the text matrix, scaled again by the transformation matrix, and scaled once more by the user unit value.
(Actually there even are some more factors. E.g. if one uses rendering mode 2, fill & stroke, for a faux bold effect, this slightly increases the size, too.)
TextPosition.getFontSize returns the first value only.
TextPosition.getFontSizeInPt returns something like the first value scaled by the matrices. (something like because at first glance there seems to be another influence in it.)
Different PDF creators use these influences in different ways:
Some PDF creators use only the first value to set the font size and use the matrices only for operations not changing the effective font size, e.g. rotations.
Some PDF creators set the first value to 1 and scale using the matrices.
Some PDF creators fall inbetween and use both the first value and the scaling operations.
Thus, your PDFs seem to be created by software using the second way.
getFontSizeInPt sometimes returns different values for the same sized text (I got 12 and 11 return for text in the same paragraph with the same size.
Could you share a sample PDF with that issue? As mentioned above, at first glance there seem to be additional influences which might be incorrect. But there also might be something special about your PDF.
I have an attributed string with "Date: Heading - Description."
The Date, Heading, and Description text values are dynamically generated, and all have different attributed fonts which I've assigned to the appropriate substring.
How should I determine the total width, and then consequentially the determine the height for a given width which this content should fill?
I've found this SO post but it is from 2010 and a more current solution seems like it would be available.
Additionally, how should the width of an attributed string with content changing text style by substring be determined? The iOS7 sizeWithAttributes method seems to apply the attributes for the entire string.
This is an attributed string, so the place to look is the NSAttributedString documentation. In particular:
https://developer.apple.com/library/ios/documentation/uikit/reference/NSAttributedString_UIKit_Additions/Reference/Reference.html#//apple_ref/occ/instm/NSAttributedString/boundingRectWithSize:options:context:
As to your last paragraph: simply make this a mutable attributed string, change the styles as desired, and proceed as above.
I used CoreText to render text as below:
Another very common typesetting operation is drawing a single line of text to use as a label for a user-interface element.
In Core Text this requires only two lines of code, one to create the line object with an attributed string and another to draw the line into a graphic context.
but it shows how to create an attributes dictionary and use it to create.
obvious there're 3 paragraphs. and I use default CTParagraphStyleSetting so that the ParagraphSpacing and ParagraphSpacingBefore is set to 0 by default.
But the rendered result shows the space is too HUGE
Any idea to reduce the paragraph space?
This might help:
Technical Q&A QA1698 - How do I work-around an issue where some lines in my Core Text output have extra line spacing?
u can try
kCTParagraphStyleSpecifierMinimumLineHeight
kCTParagraphStyleSpecifierMaximumLineHeight
kCTParagraphStyleSpecifierLineSpacing