How do I style a word document exported from a webpage in VB.Net - vb.net

I'm trying to export text retrieved from a database into a word document in VB.Net and while I have a working example, I need to figure out how to style some sections of the document appropriately.
I have found a few working examples from MS Online resources (such as this one), which I've found can cover some basics:
para.Range.Text = "Quad Chart"
para.Range.Style = "Heading 1"
para.Range.Bold.Font = True
But it doesn't cover even some of the simplest of formatting such as:
How you align the text (left, right, center)?
How you specify letting?
How do you start a list style?
What I'm trying to find is either a straight answer to these or (even better), a definitive list of the commands that would allow most any formatting.
Also, I would prefer not using Spire, which seems to be a common answer.
Thanks!

The VBA object model describes all the classes, their methods and properties that you can use for the marking up of content.
Your suggestion to use styles is strongly recommended as a way of separating your code from the presentation. Create a document template (.dot or .dotx, depending on Word version) and attach this to your documents. Then, when the document is opened, it will inherit layout and presentation from the template and be correctly rendered.
The list creation is a little intricate as you will need to restart the list if you are using numbering.
If you are interested in a completely different approach, you can look at Applying an XSLT Transform in the Microsoft Office Word 2003 XML Software Development Kit. This describes how to generate XML documents and using XSL transforms to describe the presentation. More general, but definitely more complex to set up.
Your preferred approach will depend on whether you want to generate native documents with a template, or to require your users to install the transform using the tools in the SDK.

So, you have a few examples. Office VBA is a cut down version of VB6, so why not record some macros in Word, open the VB editor and look at what it does. It's also the easiest way to navigate the help on the Word object model.

Related

How to create switchable multi-language pdf form?

I want to create a pdf form for two language (Chinese/English) UI, and there's a button(s) or somethings on the form for language switch, is there anyway can make it? and how to do?
thanks!
Thanks for all reply!
Actually I got a sample like this,
PDF Sample
there're two checkbox on the top-left of the form, one is for English UI, the other is Chinese, I just want to know how to make PDF like that sample? (and I don't see any layers on the sample...)
thx
mkl's comment (which he should turn into a full answer, really) already hinted at the option to use different page templates residing in the same file.
Another option you could explore is this:
put the two language versions into 2 different layers (or 'optional content groups' in PDF parlance)
make the visibility of the two layers toggeable
let the user activate that layer which he/she needs.
Layer activation can be handled through normal Acrobat Reader user interface elements.
The layer switching can be made accessible via a "button" on the PDF page too -- but that requires additional JavaScript to be embedded in the PDF (something many people are not particularly keen about).
As Kurt proposed, I make my comment on Frank's answer an answer in its own right:
Actually there is a pdf feature seldomly used nowerdays, page
templates. Thus, those two forms can reside in the same file in
different page templates, and based on some initially present buttons
("English version", ...) the desired form is spawned.
Unfortunately I don't know how to create page templates using some easy-to-use tool, I only came a cross them in the context of integrated PDF signatures (depending on the signature type, page template instantiation is a document change not breaking the signature) and tested them with low-level tools.
Essentially page templates are PDF objects just like page dictionaries of the normal pages, they are not XFA stuff. They merely are not referenced in the pages tree but instead in the name tree.
There is a JavaScript command which creates a visible page based on such a template --- I don't know which anymore; I may be able to find out when I'm back in office next week. This command would have to be bound to the inital language selection button in the file.
The problem will be in switching the static text - PDF does not allow this.
If I were you, I would split the document into two identical forms in the respective languages. You can use bookmarks and links on the first page to navigate to the right part of the document.
Note that it is possible to assign the same field names to the Enlgish/Chinese versions of your fields. This will make it easier to process the submitted form data because the process path would be independent of the chosen language. It will also simplify any JavaScript (validation, summing, etc.) you plan to add.

Detecting tables and images in word document using office interop

I am iterating through paragraphs in a word document using word interop API. So far i did not have a problem detecting different headings by using the style object. However now i have a situation that contents inside a table have the same style as those outside it. I need to figure out a way to understand when the paragraph in question is actually a table.
I have similar need to figure out when paragraph is actually an embedded image.
When i physically select a table or image in the word document i can see that tools section above format changes. When an image is selected it is "Picture tools" when a table is selected it is 'Table tools" and when a normal paragraph is selected the tools section does not show.
How can i detect this behavior using word interop API?
Thanks
Sameer
Though this post seems to be old, I came across this while searching for a similar problem while working on Office automation, hope this post will help to investigate and expand further.
While looping through paragraphs of a word document,
Paragraph.Range.Tables.Count provides a value indicating if the paragraph is inside a table or not.
Paragraph Outside table : Paragraph.Range.Tables.Count = 0
Paragraph Inside table : Paragraph.Range.Tables.Count = 1 ( or above not checked )
To get end of table (last paragraph inside table)
: Paragraph.Next().Range.Tables.Count == 0
(The above logic applies while using NetOffice assembly, which in-turn uses interop assemblies,hope this is applicable directly to word interop assembly also)

Programatically extract content of PowerPoint slides into MS Word-like format?

I'd like to extract all of the information (formatted text, images, etc) from powerpoint slides into a flowing, readable (MS Word-style) format.
I'm not interested in keeping the slide concept at all--think of taking class slides from a college course and batch converting them all into one collective study guide.
I can't find a way to do this within powerpoint (though if you know of one, please share!) and,
I don't have experience scripting Office apps. Is this kind of thing easily done? Does this kind of script already exist somewhere?
Clarification:
In an earlier version of this post, I used the word "flowing" to refer to a slide-free (MS Word-like) format. This does not, however, refer to the actual formatting of slide content. So keeping bullet lists, etc. is fine and even desirable.
I don't see this being a simple task. College professors use a format of either "TITLE: BULLET POINTS OR IMAGE" or "EVERY WORD I'M ABOUT TO SAY" for their slides in my experience, and you're just not going to get flowing, readable text from the former no matter what you do. For the latter, you've already got your text, you just have to copy it to another document.
I think you might as well just open the PowerPoint, select all the text, and copy+paste into Word/Publisher/InDesign/your favorite page layout program. You'll have the same effect and the same amount of editing after the fact except without all the hassle of writing a program to do it for you.
Doing a Print operation to a PDF with the N-up options might be a good solution for handouts if that's all you need. You could expand the idea and condense ALL the slide decks into one, get it printed (with N slides per page and the note space next to it) and bound, and voila, instant study guide. I've seen that, and then you get options for note taking.
More power to you if you're doing this just because you can - don't let me stop you. There is much good learning to be had that way. You might want to look into writing a program using the Microsoft.Office.Interop namespace in .NET (starting at http://msdn.microsoft.com/en-us/library/bb772069.aspx ), or perhaps look on CPAN ( http://search.cpan.org/search?mode=all&query=powerpoint ) and do it with Perl! There are lots of ways to do it, but you've got to be up for the challenge.
Text is fairly simple to extract, but what text do you want? The text from the title and body text placeholders only? File, Save As, and choose to save the outline.
The other text on the slide? That can be pulled out to a text file programmatically, but in what order? Suppose you have a complex diagram with text callouts. Extracting the text is going to give you gibberish. There's no obvious/meaningful order to the text other than what the human viewer supplies by noting that "Ah. The arrow next to this bit of text points to the fribulator sub-assembly, so must relate to it in some way." Try doing that in code. ;-)
You could give the author a way to sort the text into reading order so that the code knows what order to extract it in, but that would require a fair amount of work on the part of the author.
If you can be certain that all of the content is in title+bullet form, no worries. Otherwise, you'd have to be able to articulate exactly what you want extracted, in what form and in what order before you could get anywhere with this.
MS Word-style is not only readable, but writeable as well (which was not specified in your requirements). If you want a read-only guide, PDF is your natural choice (either through Acrobat Distiller or LibreOffice). Combine individual Acrobatted presentations with PDFtk, or Acrobat or Foxit and you're good to go without any programming at all.
"Is this kind of thing easily done?" - Yes, your humble servant did a couple of similar scripts ages ago (extracting enhanced metafiles from Powerpoint slides).
"Does this kind of script already exist somewhere?" - Yes. Probably at hundreds of places, but not sure if any of them get posted to the 'Net. All things considered think you'd be better off learning some scripting and macro programming on your own, since a ready-made script may be not quite fit for your needs - and to understand and rewrite it you'd need more time than to code & debug from scratch.
Since you mention that title+bullet form is ok, open the file, choose to save as and pick Outline as the save-as type.
I think you could parse through the PowerPoint file for formatting, text and pictures. There are Visual Studio namespaces available for such a task. You open the file, parse through it and make Word file from these. Complicated work, as you would have to consider type of elements and their position, you would have to use a temporary structure for each slide.
Have a look at this sample code :
http://msdn.microsoft.com/en-us/library/office/gg278331.aspx
How to: Get All the Text in All Slides in a Presentation
Basically, using c# and openXML SDK 2.0, it loops through all the slides in the presentation, and then adds each text in every slide into a string builder. You can write out the result into a text file if you like (modification required).
Recommendation: <25 oct 2012>
For your study guide, maybe you could extract all the text in each slide, and dump those text programmatically (by adding that function into the sample code above while it's iterating the slides) into the "Notes" section of each slide. With that, you can print it in Notes Page view. You'll get the entire slide image at the top half of the page, and the actual slide texts at the bottom of it in the Notes Page view. It sure beats trying to copy and paste all the text from the slide into the notes section. You can even print it 2 slides per page, as small text would not be an issue inside the slide's image, and diagrams would still be visible more or less.
Unfortunatly, this method works for simple standard slide format ... meaning, it's OK if your slides just have a title, and a center text box with all the bullet points... any complex slide layout (maybe text boxes scattered everywhere) will come out in non-order and will be confusing. But at least you can still look at the slide image above to make sense of it :)

How to convert a PowerPoint file to wiki markup?

In order to make PowerPoint presentations 1. readable, and 2. searchable, I'd like to somehow convert them to wiki markup (we're using ScrewTurn).
I'm expecting some manual steps.
One idea was to upload a slide as a PDF to Google Docs and make it use its native doc format, and then use Google's HTML in I love wiki, but Google Docs erred when trying to convert the PDF file.
Just an idea...Not very sure about the feasibility. Maybe you can export the contents of slides by using VBA to a plain text file and just add some simple wiki markup for titiles, sections... if you don't care much about the font, style, pictures in it. I guess VBA can provide easy way to traverse and judge the objects in PowerPoint slides easier.
You'd need to create an XSLT or use another transform technology such as Linq in .NET to convert PresentationML and DrawingML (assuming PowerPoint 2007 and later) to a different mark up. To be clear, there is nothing easy about doing this - the PowerPoint format is the most complex of all the Office MLs.
You can start by looking at Eric White's blog on Transforming Open XML WordprocessingML to XHtml - this would be one way to do it (Linq). Certainly for the textual portions of DrawingML (which PowerPoint uses for text) there are similarities between that and WordprocessingML). You an also look at the OOXML->ODF converter for inspiration (XSLT).

Creating an Equation Editor 3.0 equation in a Word 2003 document using a marco (or through the API)

I think the title is fully descriptive now. Anyway, I need to generate a word document from my delphi application. It needs to choose from one of four different equations (with some specific parameters for each document). So far I have manage to create the whole document programmatically except the equation.
Is it possible to create equations programmatically? if so, where is de API documentation from MS? if not, which solution can be used?
Going the VBA route that Brian suggested will only give you the code to open Equation Editor; it won't give you the code for actually creating the equation.
Perhaps the MathType SDK will be of use to you. It's a free download.
Record a macro in a blank document of yourself adding an equation and then save the macro. Opening the macro up in the VBA editor will give you the exact VBA code required to programatically add an equation. If you're using Word's COM API, most of the methods in VBA should have COM counterparts. This technique can be used to discover how to programmatically do anything in Word that you can do in the GUI.