PDF to Text using PDFSharp - vb.net

I wrote the following function to read the text out of a PDF file. It is pretty close, but I'm just not familiar enough with all the op codes to get the line spacing right. For example, I'm currently inserting a new line when I see "ET" but that doesn't seem quite right since it may just be the end of a text run, mid line. Could someone help me adjust the parsing? My goal is something similar to Adobe Reader's "Save as other" --> "Text"
Public Function ReadPDFFile(filePath As String,
Optional maxLength As Integer = 0) As String
Dim sbContents As New StringBuilder
Dim cArrayType As Type = GetType(CArray)
Dim cCommentType As Type = GetType(CComment)
Dim cIntegerType As Type = GetType(CInteger)
Dim cNameType As Type = GetType(CName)
Dim cNumberType As Type = GetType(CNumber)
Dim cOperatorType As Type = GetType(COperator)
Dim cRealType As Type = GetType(CReal)
Dim cSequenceType As Type = GetType(CSequence)
Dim cStringType As Type = GetType(CString)
Dim opCodeNameType As Type = GetType(OpCodeName)
Dim ReadObject As Action(Of CObject) = Sub(obj As CObject)
Dim objType As Type = obj.GetType
Select Case objType
Case cArrayType
Dim arrObj As CArray = DirectCast(obj, CArray)
For Each member As CObject In arrObj
ReadObject(member)
Next
Case cOperatorType
Dim opObj As COperator = DirectCast(obj, COperator)
Select Case System.Enum.GetName(opCodeNameType, opObj.OpCode.OpCodeName)
Case "ET", "Tx"
sbContents.Append(vbNewLine)
Case "Tj", "TJ"
For Each operand As CObject In opObj.Operands
ReadObject(operand)
Next
Case "QuoteSingle", "QuoteDbl"
sbContents.Append(vbNewLine)
For Each operand As CObject In opObj.Operands
ReadObject(operand)
Next
Case Else
'Do Nothing
End Select
Case cSequenceType
Dim seqObj As CSequence = DirectCast(obj, CSequence)
For Each member As CObject In seqObj
ReadObject(member)
Next
Case cStringType
sbContents.Append(DirectCast(obj, CString).Value)
Case cCommentType, cIntegerType, cNameType, cNumberType, cRealType
'Do Nothing
Case Else
Throw New NotImplementedException(obj.GetType().AssemblyQualifiedName)
End Select
End Sub
Using pd As PdfDocument = PdfReader.Open(filePath, PdfDocumentOpenMode.ReadOnly)
For Each page As PdfPage In pd.Pages
ReadObject(ContentReader.ReadContent(page))
If maxLength > 0 And sbContents.Length >= maxLength Then
If sbContents.Length > maxLength Then
sbContents.Remove(maxLength - 1, sbContents.Length - maxLength)
End If
Exit For
End If
sbContents.Append(vbNewLine)
Next
End Using
Return sbContents.ToString
End Function

Your code is ignoring almost all operations which change the line. You do consider ' and " which most often imply a change of line but which in the wild are seldom used.
Inside a text object (BT .. ET) you, therefore, should also look out for
tx ty Td Move to the start of the next line, offset from the start of the current line by (tx, ty).
tx ty TD Move to the start of the next line, offset from the start of the current line by (tx, ty). As a side effect, this operator shall set the leading parameter in the text state.
a b c d e f Tm Set the text matrix, Tm, and the text line matrix, Tlm.
T* Move to the start of the next line.
To interpret ', " and T* correctly, you should also look out for
leading TL Set the text leading, Tl, to leading.
If you find multiple text objects (BT .. ET .. BT .. ET), the second one is not necessarily on a new line. You should look out for the special graphics state operators between them:
a b c d e f cm Modify the current transformation matrix (CTM) by concatenating
the specified matrix
q Save the current graphics state
Q Restore the graphics state
Your code is ignoring all numeric arguments to the operations. You should not ignore them, especially:
You should check the parameters of the operators listed above; e.g. while 0 -20 Td starts a new line 20 units down, 20 0 Td remains on the same line and merely starts drawing text 20 units right of the former line start.
You should check the numeric elements of the array parameter of TJ as they may (or may not!) indicate space between two words.
Your code is assuming the Value of CString instances to already contain Unicode encoded character data. This assumption in general is incorrect, the encoding used in PDF strings drawn in text drawing operations is ruled by the font. Thus, you furthermore should also look out for
font size Tf Set the text font, Tf, to font and the text font size, Tfs, to size. font shall be the name of a font resource in the Font subdictionary of the current resource dictionary.
For details you should first and foremost study the PDF specification ISO-32000-1, especially chapter 9 Text with a solid background from chapter 8 Graphics.

Related

How to change the color of text from a certain char to the end of the line

I have a WinForm application that reads a text file. It reads the whole file line by line and generates a RadioButton when there is a delimiter at the start of the line.
So, "|A Topic" produces a RadioButton called A Topic.
During runtime, a user chooses a RadioButton, the application finds that line in the text file then reads all the text until the next delimiter and puts it into a RichTextBox.
The contents of the text file is actual Visual Basic code and I am looking for a way to delineate comments from the code.
I am trying to use .Find() to locate all of the apostrophes which signify a comment.
With a comment found, how can I expand the selection to the end of that line?
Dim index As Integer = 0
While index < RichTxtOut.Text.LastIndexOf("'")
RichTxtOut.Find("'", index, RichTxtOut.TextLength, RichTextBoxFinds.None)
RichTxtOut.SelectionColor = Color.Green
index = RichTxtOut.Text.IndexOf("'", index) + 1
End While
This makes the apostrophe green, but how do I make the rest of the that line green.
Here is an update code _________________________________________________
Dim indexx As Integer = 0
Dim lines() As String = System.IO.File.ReadAllLines(FILE_NAME)
Dim numOfChars As Integer
While indexx < RichTxtOut.Text.LastIndexOf("'")
numOfChars = 0
RichTxtOut.Find("'", indexx, RichTxtOut.TextLength, RichTextBoxFinds.None)
'count the number of characters after the apostrophe
For li As Integer = 0 To Lines(RichTxtOut.GetLineFromCharIndex(RichTxtOut.Find("'", indexx, RichTxtOut.TextLength, RichTextBoxFinds.None))).Count - 1
numOfChars += 1
Next
RichTxtOut.Select(RichTxtOut.Find("'", indexx, RichTxtOut.TextLength, RichTextBoxFinds.None), numOfChars)
RichTxtOut.SelectionColor = Color.Green
numOfChars = 0
indexx = RichTxtOut.Text.IndexOf("'", indexx) + 1
End While
However It is still not working correctly. It is not finding the correct number of characters to in the line after the apostrophe.
Any suggestions??
RichTextBoxes can be complicated, as they are quite powerful and have lots of ways to do things.
There's an example here that sets colours by adding words one at a time, and setting the selection colour for each one (presumably making use of the fact that the most recently added word is selected). You could add your text one line at a time, and set the colour to green if the first (non-space) character is an apostrophe.
Using the selection is a bit of a hack though. The WPF RTB uses Paragraph and InLine objects, similar to DIVs and SPANs in HTML. You can set a font for each Inline. I'm not sure if the winforms one is the same.

Read part of text file using VB.net

I am new to VB.net and I need help.
What I want to do is read lines from a text file that take place between two specific lines. Within these lines I have to look for a specific one and display the next line, if that makes any sense. The catch is that there are more than one pair of these marker lines that contain the exact lines that I need within them. I hope the explanation is clear enough for you guys! Is that possible?
I am looking for the line under the number 10 in the parts of the text file shown on the screenshot. Since there are a lot of 10s in that file I need to read it on parts in order to get the exact line needed.
The code below is what I have so far thanks to #TimSchmelter in my previous question Read certain line in text file and display the next. Which is actually the bit that looks for a specific line and displays the next one, but it reads all of the lines within that text file.
Dim x1 As Decimal = File.ReadLines("filepath").
SkipWhile(Function(line) Not line.Contains(" 10")).Skip(1).FirstOrDefault()
If x1 >= 0.0 Then TextBox1.Text = x1
Your original code was the best since it allowed you to have better control over what you read and whatnot, so I'm going to use that in my answer.
The LINQ approach will only give you the first instance of what you're looking for, therefore it is better to read though the whole file line-by-line.
First, a structure holding the coordinates of a line:
Public Structure Line
Public Start As PointF 'Start coordinates.
Public [End] As PointF 'End coordinates.
Public Sub New(ByVal Start As PointF, ByVal [End] As PointF)
Me.Start = Start
Me.End = [End]
End Sub
Public Sub New(ByVal x1 As Single, ByVal y1 As Single, ByVal x2 As Single, ByVal y2 As Single)
Me.New(New PointF(x1, y1), New PointF(x2, y2))
End Sub
End Structure
We will use this structure so that we can return a list of all lines we've found.
This will read the file line-by-line and look for our wanted values:
EDIT: I've updated the code to make it properly read the coordinates (i.e. it is no longer bound by the order: 10, 20, 11, 21).
I have also made it only read coordinates located inside AcDbLine/LINE blocks.
Public Function ParseLines(ByVal File As String) As List(Of Line)
'Create a list of Line structures.
Dim Lines As New List(Of Line)
'Create an array holding four items: the coordinates for 10, 20, 11 and 21.
Dim CoordValues As Single() = New Single(4 - 1) {}
'Declare a variable holding the current index in the array:
'CoordIndex = 0 represents the first X-coordinate (10).
'CoordIndex = 1 represents the first Y-coordinate (20).
'CoordIndex = 2 represents the second X-coordinate (11).
'CoordIndex = 3 represents the second Y-coordinate (21).
Dim CoordIndex As Integer = 0
'Declare a variable indicating whether we are currently inside an AcDbLine/LINE block.
Dim InsideAcDbLine As Boolean = False
Using sReader As New StreamReader(File) 'Open a StreamReader to our file.
While Not sReader.EndOfStream 'Keep reading until we've reached the end of the file.
Dim line As String = sReader.ReadLine() 'Read a line.
'Check if we're inside a AcDbLine/LINE block and that we aren't at the end of the file.
If InsideAcDbLine = True AndAlso Not sReader.EndOfStream Then
'Determine if the current line contains 10, 20, 11 or 21.
'Depending on what "line.Trim()" returns it will execute the code of the respective "Case".
'Ex: If "line.Trim()" returns "20" it will execute the code of 'Case "20"', which is 'CoordIndex = 1'.
Select Case line.Trim() 'Trim() removes leading or trailing space characters from the string (i.e. " 10" -> "10").
Case "10" : CoordIndex = 0 'We've reached "10", the following line is coordinate x1.
Case "20" : CoordIndex = 1 'We've reached "20", the following line is coordinate y1.
Case "11" : CoordIndex = 2 '[...] x2.
Case "21" : CoordIndex = 3 '[...] y2.
Case Else : Continue While
' "Continue While" stops execution at this point and
' goes back to the beginning of the loop to: "Dim line As String = ...".
'
' - If "line.Trim()" DOES NOT return 10, 20, 11 or 21 then we
' just want to skip the rest of the code (in other words: ignore the current line).
End Select
'I used colons above for better readability as they can be used to replace line breaks.
'For example the above SHOULD ACTUALLY be:
'
'Select Case line.Trim()
' Case "10"
' CoordIndex = 0
' Case "11"
' (and so on)
'=========================================
Dim nextLine As String = sReader.ReadLine() 'Read the next line.
Dim Coordinate As Single 'Declare a variable for the current coordinate.
'Try parsing the line into a single using
'invariant culture settings (decimal points must be dots: '.').
If Single.TryParse(nextLine.Trim(), NumberStyles.Float, CultureInfo.InvariantCulture, Coordinate) = True Then
CoordValues(CoordIndex) = Coordinate 'Add the coordinate to the array.
If CoordIndex = CoordValues.Length - 1 Then 'Have we reached the end of the array?
InsideAcDbLine = False 'We've found all values we want. Do not look for any more in this block.
Lines.Add(New Line(CoordValues(0), CoordValues(1), CoordValues(2), CoordValues(3))) 'Create a new Line and set its coordinates to the values of 10, 20, 11 and 21.
Array.Clear(CoordValues, 0, CoordValues.Length) 'Set all the items in the array to zero.
End If
End If
End If
'Check if we've reached an "AcDbLine" or "LINE" marker.
'Used in order to determine whether we are inside a AcDbLine/LINE block.
'If we aren't, then we shouldn't look for any coordinates.
Select Case line.Trim()
Case "AcDbLine" : InsideAcDbLine = True 'Start of an AcDbLine/LINE block.
Case "LINE" : InsideAcDbLine = False 'End of an AcDbLine/LINE block.
End Select
End While
End Using
Return Lines 'Return our list of lines.
End Function
This code will only add a new line to the list once it has found four different coordinates, i.e. if it at the end only finds x1, y1 and x2 (but not y2) it will just ignore those values.
Here's how you can use it:
Dim Lines As List(Of Line) = ParseLines("filepath")
'Iterate through every parsed line.
For Each Line As Line In Lines
'Print all parsed lines to console in the format of:
'Line start: {X=x1, Y=y1}
'Line end: {X=x2, Y=y2}
Console.WriteLine("Line start: " & Line.Start.ToString())
Console.WriteLine("Line end: " & Line.End.ToString())
Console.WriteLine()
Next
To get individual coordinates of a line you can do (assuming you're still using the loop above):
Dim StartX As Single = Line.Start.X
Dim StartY As Single = Line.Start.Y
Dim EndX As Single = Line.End.X
Dim EndY As Single = Line.End.Y
If you want to access individual lines without a loop you can do:
Dim Line As Line = Lines(0) '0 is the first found line, 1 is the second (and so on).
Documentation:
List(Of T) class
PointF class
How to: Declare a Structure (Visual Basic)
Arrays in Visual Basic
Single Data Type (Visual Basic)
Single.TryParse() method
Select Case statement

How many .ToString("N0") can I have?

I am getting a warning:
Severity Code Description Project File Line Suppression State
Warning BC42322 Runtime errors might occur when converting 'String' to 'IFormatProvider'.
I have 2 .ToString("N0") in my code in the same Sub. Can I not have 2 in the same sub? They go to different labels but, I am new to VB as well so please do not judge. Thanks!
If Integer.TryParse(input, infantry) Then
Dim hpai = Integer.Parse(frmMainGame.lblHPAI.Text, Globalization.NumberStyles.AllowThousands, Globalization.CultureInfo.InvariantCulture)
frmMainGame.lblHPAI.Text = (hpai - infantry * 2).ToString("N0")
frmMainGame.lblInfantryNumberPlayer.Text -= input.ToString("N0") '<---- One that gets the warning
Else
' handle not an int inputted case
End If
Let's look at this line:
frmMainGame.lblInfantryNumberPlayer.Text -= input.ToString("N0")
You're applying the -= operator to STRINGS. This operator has no meaning for strings. Are you trying to apply some kind of reverse-concatenation? Removing any occurrence of input from within the label text? That code just doesn't make sense at all.
If you're actually trying to do a numeric operation, you need to actually work with numbers... convert the label string to an integer, not the integer to a string.
If Integer.TryParse(input, infantry) Then
Dim hpai = Integer.Parse(frmMainGame.lblHPAI.Text, Globalization.NumberStyles.AllowThousands, Globalization.CultureInfo.InvariantCulture)
frmMainGame.lblHPAI.Text = (hpai - infantry * 2).ToString("N0")
Dim numPlayer = Integer.Parse(frmMainGame.lblInfantryNumberPlayer.Text)
frmMainGame.lblInfantryNumberPlayer.Text = (numplayer - input).ToString("N0") '<---- One that gets the warning
Else
' handle not an int inputted case
End If

Is possible to ignore the TextBox?

I'm creating a program to calculate the average. There are 12 TextBox and I want to create the possibility to leave some fields blank. Now there are only errors and the crash of the program. Is possible to create that?
This is part of code:
ItalianoScritto = (TextBox1.Text)
MatematicaScritto = (TextBox2.Text)
IngleseScritto = (TextBox3.Text)
InformaticaScritto = (TextBox4.Text)
ScienzeScritto = (TextBox5.Text)
FisicaScritto = (TextBox6.Text)
MediaScritto = (ItalianoScritto + MatematicaScritto + IngleseScritto + InformaticaScritto + ScienzeScritto + FisicaScritto) / 6
Label10.Text = Str(MediaScritto)
If i leave blank the textbox1 when I click on the button to calculate the average Vb says Cast not valid from the string "" to type 'Single' and the bar of te textbox1 become yellow
I would do the following:
Iterate over the textboxes and check if you can parse the value into an iteger. If yes, add it to a value list.
Then add all values from that list and divide it by the number of cases.
It is faster than big if-statements and resilient against error
dim TBList as new list(of Textbox)
'add your textboxes to the list here
TbList.add(Textbox1)
...
dim ValList as new List(Of Integer)
for each elem in Tblist
dim value as integer
If integer.tryparse(elem.text,value)=True
ValList.add(Value)
else
'report error or do nothing
end if
next
dim Result as Integer
Dim MaxVal as Integer =0
for each elem in ValList
Maxval +=elem
next
Result = MaxVal / ValList.count
If you need support for point values, just choose double or single instead of Integer.
Also: regardless what you do -CHECK if the values in the textboxes are numbers or not. If you omit the tryparse, somebody will enter "A" and your app will crash and burn
Also: You OPTION STRICT ON!
You just have to check if the TextBox is blank on each one before using the value:
If TextBox7.TextLength <> 0 Then
'Use the value inside
End If
The way to do it depends a lot of your code. You should consider editing your question giving more information (and code) in order to us to help you better.

Insert line break in wrapped cell via code

Is it possible to insert line break in a wrapped cell through VBA code? (similar to doing Alt-Enter when entering data manually)
I have set the cell's wrap text property to True via VBA code, and I am inserting data into it also through VBA code.
Yes. The VBA equivalent of AltEnter is to use a linebreak character:
ActiveCell.Value = "I am a " & Chr(10) & "test"
Note that this automatically sets WrapText to True.
Proof:
Sub test()
Dim c As Range
Set c = ActiveCell
c.WrapText = False
MsgBox "Activcell WrapText is " & c.WrapText
c.Value = "I am a " & Chr(10) & "test"
MsgBox "Activcell WrapText is " & c.WrapText
End Sub
You could also use vbCrLf which corresponds to Chr(13) & Chr(10). As Andy mentions in the comment below, you might be better off using ControlChars.Lf instead though.
Yes there are two ways to add a line feed:
Use the existing constant from VBA (click here for a list of existing vba constants) vbLf in the string you want to add a line feed, as such:
Dim text As String
text = "Hello" & vbLf & "World!"
Worksheets(1).Cells(1, 1) = text
Use the Chr() function and pass the ASCII character 10 in order to add a line feed, as shown bellow:
Dim text As String
text = "Hello" & Chr(10) & "World!"
Worksheets(1).Cells(1, 1) = text
In both cases, you will have the same output in cell (1,1) or A1.
Have a look at these two threads for more information:
What is the difference between a "line feed" and a "carriage return"?
Differences Between vbLf, vbCrLf & vbCr Constants
I know this question is really old, but as I had the same needs, after searching SO and google, I found pieces of answers but nothing usable. So with those pieces and bites I made my solution that I share here.
What I needed
Knowing the column width in pixels
Be able to measure the length of a string in pixels in order to cut it at the dimension of the column
What I found
About the width in pixels of a column, I found this in Excel 2010 DocumentFormat :
To translate the value of width in the file into the column width value at runtime (expressed in terms of pixels), use this calculation:
=Truncate(((256 * {width} + Truncate(128/{Maximum Digit Width}))/256)*{Maximum Digit Width})
Even if it's Excel 2010 format, it's still working in Excel 2016. I'll be able to test it soon against Excel 365.
About the width of a string in pixels, I used the solution proposed by #TravelinGuy in this question, with small corrections for typo and an overflow. By the time I'm writing this the typo is already corrected in his answer, but there is still the overflow problem. Nevertheless I commented his answer so there is everything over there for you to make it works flawlessly.
What I've done
Code three recursive functions working this way :
Function 1 : Guess the approximate place where to cut the sentence so if fits in the column and then call Function 2 and 3 in order to determine the right place. Returns the original string with CR (Chr(10)) characters in appropriate places so each line fits in the column size,
Function 2 : From a guessed place, try to add some more words in the line while this fit in the column size,
Function 3 : The exact opposite of function 2, so it retrieves words to the sentence until it fits in the column size.
Here is the code
Sub SplitLineTest()
Dim TextRange As Range
Set TextRange = FeuilTest.Cells(2, 2)
'Take the text we want to wrap then past it in multi cells
Dim NewText As String
NewText = SetCRtoEOL(TextRange.Value2, TextRange.Font.Name, TextRange.Font.Size, xlWidthToPixs(TextRange.ColumnWidth) - 5) '-5 to take into account 2 white pixels left and right of the text + 1 pixel for the grid
'Copy each of the text lines in an individual cell
Dim ResultArr() As String
ResultArr() = Split(NewText, Chr(10))
TextRange.Offset(2, 0).Resize(UBound(ResultArr) + 1, 1).Value2 = WorksheetFunction.Transpose(ResultArr())
End Sub
Function xlWidthToPixs(ByVal xlWidth As Double) As Long
'Fonction to convert the size of an Excel column width expressed in Excel unit(Range.ColumnWidth) in pixels
'Parameters : - xlWidth : that is the width of the column Excel unit
'Return : - The size of the column in pixels
Dim pxFontWidthMax As Long
'Xl Col sizing is related to workbook default string configuration and depends of the size in pixel from char "0". We need to gather it
With ThisWorkbook.Styles("Normal").Font
pxFontWidthMax = pxGetStringW("0", .Name, .Size) 'Get the size in pixels of the '0' character
End With
'Now, we can make the calculation
xlWidthToPixs = WorksheetFunction.Floor_Precise(((256 * xlWidth + WorksheetFunction.Floor_Precise(128 / pxFontWidthMax)) / 256) * pxFontWidthMax) + 5
End Function
Function SetCRtoEOL(ByVal Original As String, ByVal FontName As String, ByVal FontSize As Variant, ByVal pxAvailW) As String
'Function aiming to make a text fit into a given number of pixels, by putting some CR char between words when needed.
'If some words are too longs to fit in the given width, they won't be cut and will get out of the limits given.
'The function works recursively. Each time it find an End Of Line, it call itself with the remaining text until.
'The recursive process ends whent the text fit in the given space without needing to be truncated anymore
'Parameters : - Original : The text to fit
' - FontName : Name of the font
' - FontSize : Size of the font
' - pxAvailW : Available width in pixels in wich we need to make the text fit
'Return : - The orignal text with CR in place of spaces where the text needs to be cut to fit the width
'If we got a null string, there is nothing to do so we return a null string
If Original = vbNullString Then Exit Function
Dim pxTextW As Long
'If the text fit in, may be it's the original or this is end of recursion. Nothing to do more than returne the text back
pxTextW = pxGetStringW(Original, FontName, FontSize)
If pxTextW < pxAvailW Then
SetCRtoEOL = Original
Exit Function
End If
'The text doesn't fit, we need to find where to cut it
Dim WrapPosition As Long
Dim EstWrapPosition As Long
EstWrapPosition = Len(Original) * pxAvailW / pxTextW 'Estimate the cut position in the string given to a proportion of characters
If pxGetStringW(Left(Original, EstWrapPosition), FontName, FontSize) < pxAvailW Then
'Text to estimated wrap position fits in, we try to see if we can fits some more words
WrapPosition = FindMaxPosition(Original, FontName, FontSize, pxAvailW, EstWrapPosition)
End If
'If WrapPosition = 0, we didn't get a proper place yet, we try to find the previous white space
If WrapPosition = 0 Then
WrapPosition = FindMaxPositionRev(Original, FontName, FontSize, pxAvailW, EstWrapPosition)
End If
'If WrapPosition is still 0, we are facing a too long word for the pxAvailable. We'll cut after this word what ever. (Means we must search for the first white space of the text)
If WrapPosition = 0 Then
WrapPosition = InStr(Original, " ")
End If
If WrapPosition = 0 Then
'Words too long to cut, but nothing more to cut, we return it as is
SetCRtoEOL = Original
Else
'We found a wrap position. We recurse to find the next EOL and construct our response by adding CR in place of the white space
SetCRtoEOL = Left(Original, WrapPosition - 1) & Chr(10) & SetCRtoEOL(Right(Original, Len(Original) - WrapPosition), FontName, FontSize, pxAvailW)
End If
End Function
Function FindMaxPosition(ByVal Text As String, ByVal FontName As String, ByVal FontSize As Variant, ByVal pxAvailW, ByVal WrapPosition As Long) As Long
'Function that finds the maximum number of words fitting in a given space by adding words until it get out of the maximum space
'The function is inteded to work on text with a "guessed" wrap position that fit in the space allowed
'The function is recursive. Each time it guesses a new position and the word still fits in the space, it calls itself with a further WrapPosition
'Parameters : - Text : The text to fit
' - FontName : Name of the font
' - FontSize : Size of the font
' - pxAvailW : Available width in pixels in wich we need to make the text fit
' - WrapPosition : The initial wrap position, positionned someware in the text (WrapPosition < len(Text)) but inside pxAvailW
'Return : - The position were the text must be wraped to put as much words as possible in pxAvailW, but without getting outside of it. If no position can be found, returns 0
Dim NewWrapPosition As Long
Static isNthCall As Boolean
'Find next Whitespace position
NewWrapPosition = InStr(WrapPosition, Text, " ")
If NewWrapPosition = 0 Then Exit Function 'We can't find a wrap position, we return 0
If pxGetStringW(Left(Text, NewWrapPosition - 1), FontName, FontSize) < pxAvailW Then '-1 not to take into account the last white space
'It still fits, we can try on more word
isNthCall = True
FindMaxPosition = FindMaxPosition(Text, FontName, FontSize, pxAvailW, NewWrapPosition + 1)
Else
'It doesnt fit. If it was the first call, we terminate with 0, else we terminate with previous WrapPosition
If isNthCall Then
'Not the first call, we have a position to return
isNthCall = False 'We reset the static to be ready for next call of the function
FindMaxPosition = WrapPosition - 1 'Wrap is at the first letter of the word due to the function call FindMax...(...., NewWrapPosition + 1). The real WrapPosition needs to be minored by 1
Else
'It's the first call, we return 0 | Strictly speaking we can remove this part as FindMaxPosition is already 0, but it make the algo easier to read
FindMaxPosition = 0
End If
End If
End Function
Function FindMaxPositionRev(ByVal Text As String, ByVal FontName As String, ByVal FontSize As Variant, ByVal pxAvailW, ByVal WrapPosition As Long) As Long
'Function working backward of FindMaxPosition. It finds the maximum number of words fitting in a given space by removing words until it fits the given space
'The function is inteded to work on text with a "guessed" wrap position that fit in the space allowed
'The function is recursive. Each time it guesses a new position and the word still doesn't fit in the space, it calls itself with a closer WrapPosition
'Parameters : - Text : The text to fit
' - FontName : Name of the font
' - FontSize : Size of the font
' - pxAvailW : Available width in pixels in wich we need to make the text fit
' - WrapPosition : The initial wrap position, positionned someware in the text (WrapPosition < len(Text)), but outside of pxAvailW
'Return : - The position were the text must be wraped to put as much words as possible in pxAvailW, but without getting outside of it. If no position can be found, returns 0
Dim NewWrapPosition As Long
NewWrapPosition = InStrRev(Text, " ", WrapPosition)
'If we didn't found white space, we are facing a "word" too long to fit pxAvailW, we leave and return 0
If NewWrapPosition = 0 Then Exit Function
If pxGetStringW(Left(Text, NewWrapPosition - 1), FontName, FontSize) >= pxAvailW Then '-1 not to take into account the last white space
'It still doesnt fits, we must try one less word
FindMaxPositionRev = FindMaxPositionRev(Text, FontName, FontSize, pxAvailW, NewWrapPosition - 1)
Else
'It fits, we return the position we found
FindMaxPositionRev = NewWrapPosition
End If
End Function
Known limitations
This code will work as long as the text in the cell has only one font and one font size. Here I assume that the font is not Bold nor Italic, but this can be easily handled by adding few parameters as the function measuring the string length in pixels is already able to do it.
I've made many test and I always got the same result than the autowrap function of Excel worksheet, but it may vary from one Excel version to an other. I assume it works on Excel 2010, and I tested it with success in 2013 and 2016. Fo others I don't know.
If you need to handle cases where fonts type and/or attributs vary inside a given cell, I assume it's possible to achieve it by testing the text in the cell character by character by using the range.caracters property. It should be really slower, but for now, even with texts to split in almost 200 lines, it takes less than one instant so maybe it's viable.
Just do Ctrl + Enter inside the text box