Parsing text line by line across columns in Word document - vba

Using VBA or Interop.Word I would like to simply parse the text in a Word document line by line, regardless of whether the text in that line spans multiple columns. As per the example below, I want:
Line 1 = "Line 1 Line 5"
Line 2 = "Line 2 Line 6"
Line 3 = "Line 3 Line 7"
etc.
I can't find any method, property or object in the Word Object Model that can facilitate this. I tried exporting to PDF and then opening that same file again in Word, but the conversion does not retain the original text line by line and gets very scrambled in places.

As per my comment above: a workaround is to try to get the document recreated using layout mode. In this case the Word file came from an export of an Adobe PDF scan of a printed document, so it's only applicable in these situations.

Related

convert pdf to txt file where bold text placement is changed position

given this pdf
is there a way to process it to a txt file with the following specfications:
the bold answer to print as the 3rd one
to remove empty line between question 1 and 2.
bold answer would have different placement every time as there a lot of questions

Markdown Paragraph

I'm encountering a problem in Markdown paragraph.
I use Notepad in Microsoft Windows to create .md file and use Typora for rendering.
The new lines in the same paragraph are treated as new line in rendering.
For example, if my .md file contains the following text
Electric Field inside
a conductor
is zero
The Typora renders as it is with new lines....whereas it is expected the rendering should be like this
Electric Field inside a conductor is zero.
i.e new lines inside the same paragraph to be formatted in proper paragraph sense and not like code listing. Whats the mistake I' doing ?.
Typora seems to not follow typical Markdown behavior in this regard. As explained in their documentation:
A paragraph is simply one or more consecutive lines of text. In
markdown source code, paragraphs are separated by two or more blank
lines. In Typora, you only need one blank line (press Return once)
to create a new paragraph.
Press Shift + Return to create a single line break. Most other
markdown parsers will ignore single line breaks, so in order to make
other markdown parsers recognize your line break, you can leave two
spaces at the end of the line, or insert <br/>.

VB.NET - Appending text to the last line of text file leaves (sometimes) an empty line

The intention of the software I am coding is to append a string to the last line of a text file. The problem I am experiencing is that depending on the situation, sometimes I get an empty line when I try to append text to the last line of a text file.
My text file is very simple. It contains a few lines of text and each line contains a few words. When I open the text file with Windows' Notepad (without editing the content of the text file) and I move the caret to the end of the entire file, sometimes the caret is positioned at the end of the last line with content (text), and sometimes it is positioned at an empty line that doesn't contain even a white space.
When the caret's ending position corresponds to the last char of a line that contains text, I don't experience any problem appending some text programmatically. The programmatically appended line is at a new line and there isn't any empty line between the old last line and the current last line in the text file.
The problem appears when I can move the caret in the text file to the last possible position and that position corresponds to an empty line. In that case, when I try to append a new line programmatically, I can see an empty line before the just added line.
I have tried different codes and techniques, but it always happens the same to me.
The code I am actually using is:
If stringLastLine.Length = 0 Then
Using sw As StreamWriter = New System.IO.StreamWriter(filePath, True)
sw.Write("Inserted text")
sw.Close()
End Using
ElseIf stringLastLine.Length > 0 Then
Using sw As StreamWriter = New System.IO.StreamWriter(filePath, True)
sw.Write(Environment.NewLine & "Inserted text")
sw.Close()
End Using
End If
Basically, I am trying to detect if the last line length is over zero, so it's a line with text content and I will use "environment.newline + string". If it's a line without content (length = 0) I add directly the string without using "environment.newline". In the case that length = 0 I always get an empty line in between.
Any idea to solve this problem?
How about trimming your text to ensure there's no end of line characters at the end (the whole text)? Right now you might have an issue with multiple blank lines:
text = text.TrimEnd({Convert.ToChar(10), Convert.ToChar(13)})
This handles the EOL characters.

Finding occuramce of a string in a column in excel based text file

I am using vb.net to find the sum of occuramce of string in a particular column in text file(excel based) . The text file is not tab delimited, and it is separated column by column nicely, I only learnt how to read line by line using stream reader but I have no idea how to read only the last column of the line and summing up the specific string that I want. Any idea how to do it? Not nesseccary nid to provide me the code
If by "an Excel-based text file" you mean that the values are comma-separated, you can read it in line by line, like you already are doing using a stream, and then use Split to separate the line out into an array. Google "vb.net split" to learn how to do this.

Read Index (Table of Content) of word document

I am interested to get the topic headings (say all lines with Heading 1 and Heading 2) from a word document. Using VBA you can parse thru every line in that document and verify the style; however this seems to be a tedious job. I believe that there should be some easy way of doing it. Any pointers
A pointer --->
tempD = ActiveDocument.GetCrossReferenceItems(wdRefTypeHeading)
gives you a list of Headings in document.