OCR at OneNote using VBA [duplicate] - vba

I need to do the simple Program whcih need to extract text from image using Onenote Interop? Could any one suggest me the appropriate document for my concept please?

Text recognized by OneNote's OCR is stored in the one:OCRText element in the XML file structure in OneNote. e.g.
<one:Page ...>
...
<one:Image ...>
...
<one:OCRData lang="en-US">
<one:OCRText><![CDATA[This is some sampletext]]></one:OCRText>
</one:OCRData>
</one:Image>
</one:Page>
You can see this XML using a program called OMSPY (it shows you the XML behind OneNote pages) - http://blogs.msdn.com/b/johnguin/archive/2011/07/28/onenote-spy-omspy-for-onenote-2010.aspx
To extract the text you would use the OneNote COM interop (as you pointed out). e.g.
//Instantialize OneNote
ApplicationClass onApp = new ApplicationClass();
//Get the XMl from the selected page
string xml = "";
onApp.GetPageContent("put the page id here", out xml);
//Put it into an XML document (from System.XML.Linq)
XDocument xDoc = XDocument.Parse(xml);
//OneNote's Namespace - for OneNote 2010
XNamespace one = "http://schemas.microsoft.com/office/onenote/2010/onenote";
//Get all the OCRText from the page
string[] OCRText = xDoc.Descendants(one + "OCRText").Select(x => x.Value).ToArray();
See the "Application Interface" docs on MSDN for more info - http://msdn.microsoft.com/en-us/library/gg649853.aspx

Related

Append to meeting invite body with or without Redemption

We are developing an Outlook VSTO add-in.
Right now I am trying to append some information to a meeting invite the user is in the process of composing. I want the content to appear in the body like what clicking the Teams-meeting button would do, where formatted text and links are appended to the end of the body.
Since the content is HTML and the Outlook Object Model does not expose an HTMLBody property for AppointmentItems, I try to set it via Redemption:
// Dispose logic left out for clarity but everything except outlookApplication and outlookAppointment is disposed after use
Application outlookApplication = ...;
AppointmentItem outlookAppointment = ...; // taken from the open inspector
NameSpace outlookSession = outlookApplication.Session;
RDOSession redemptionSession = RedemptionLoader.new_RDOSession();
redemptionSession.MAPIOBJECT = outlookSession.MAPIOBJECT;
var rdoAppointment = (RDOAppointmentItem)redemptionSession.GetRDOObjectFromOutlookObject(outlookAppointment);
string newBody = transform(rdoAppointment.HTMLBody); // appends content right before HTML </body> tag
rdoAppointment.BodyFormat = (int)OlBodyFormat.olFormatHTML;
rdoAppointment.HTMLBody = newBody;
Problem
The Outlook inspector window is not updating with the appended content. If I try to run the code again, I can see the appended content in the debugger, but not in Outlook.
Things I have tried:
Saving the RDOAppointmentItem
Also adding the content to Body property
Using SafeAppointmentItem instead of RDOAppointmentItem; didn't work because HTMLBody is a read-only property there
Setting PR_HTML via RDOAppointment.Fields
Paste the HTML via WordEditor (see below)
Attempt to use WordEditor
Per suggestion I also attempted to insert the HTML via WordEditor:
// Dispose logic left out for clarity but everything except inspector is disposed after use
string htmlSnippet = ...;
Clipboard.SetText(htmlSnippet, TextDataFormat.Html);
Inspector inspector = ...;
Document wordDoc = inspector.WordEditor;
Range range = wordDoc.Content;
range.Collapse(WdCollapseDirection.wdCollapseEnd);
object placement = WdOLEPlacement.wdInLine;
object dataType = WdPasteDataType.wdPasteHTML;
range.PasteSpecial(Placement: ref placement, DataType: ref dataType);
... but I simply receive the error System.Runtime.InteropServices.COMException (0x800A1066): Kommandoen lykkedes ikke. (= "Command failed").
Instead of PasteSpecial I also tried using PasteAndFormat:
range.PasteAndFormat(WdRecoveryType.wdFormatOriginalFormatting);
... but that also gave System.Runtime.InteropServices.COMException (0x800A1066): Kommandoen lykkedes ikke..
What am I doing wrong here?
EDIT: If I use Clipboard.SetText(htmlSnippet, TextDataFormat.Text); and then use plain range.Paste();, the HTML is inserted at the end of the document as intended (but with the HTML elements inserted literally, so not useful). So the general approach seems to be okay, I just can't seem to get Outlook / Word to translate the HTML.
Version info
Outlook 365 MSO 32-bit
Redemption 5.26
Since the appointment is being displayed, work with the Word Object Model - Inspector.WordEditor returns the Document Word object.
Per Dmitrys suggestion, here is a working solution that:
Shows the inserted content in the inspector window.
Handles HTML content correctly with regards to both links and formatting (as long as you stay within the limited capabilities of Words HTML engine).
using System;
using System.IO;
using System.Text;
using Outlook = Microsoft.Office.Interop.Outlook;
using Word = Microsoft.Office.Interop.Word;
namespace VSTO.AppendHtmlExample
{
public class MyExample
{
public void AppendAsHTMLViaFile(string content)
{
// TODO: Remember to release COM objects range and wordDoc and delete output file in a finally clause
Outlook.Inspector inspector = ...;
string outputFolderPath = ...;
string outputFilePath = Path.Combine(outputFolderPath, "append.html");
Word.Document wordDoc = inspector.WordEditor;
File.WriteAllText(outputFilePath, $"<html><head><meta charset='utf-8'/></head><body>{content}</body></html>", Encoding.UTF8);
Word.Range range = wordDoc.Content;
range.Collapse(Word.WdCollapseDirection.wdCollapseEnd);
object confirmConversions = false;
object link = false;
object attachment = false;
range.InsertFile(fileName,
ConfirmConversions: ref confirmConversions,
Link: ref link,
Attachment: ref attachment);
}
}
}

Custom Property for PDF File

I would like to add total page count as a custom property like ms word does as follow.
Is it possible to similar things for pdf? I am also using aspose for file conversion. I converter many kind of file types to pdf but if we want to show also the document's pagen count as the custom property.
This is from the Aspose Newsgroup:
//open document
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document("Test.pdf");
//set properties
pdfDocument.Metadata["xmp:Nickname"] = "Nickname";
pdfDocument.Metadata["xmp:CreatorTool"] = "Test Value";
pdfDocument.Metadata["xmp:CustomProperty"] = "Custom Value";
//Convert to PDF/A compliant document
pdfDocument.Validate("log.xml", PdfFormat.PDF_A_1A);
pdfDocument.Convert("log.xml", PdfFormat.PDF_A_1A, ConvertErrorAction.Delete);
//save output document
pdfDocument.Save("output_XMP.pdf");
Is this what you are looking for?

Dynamic XFA PDF Forms

I have a number of Dynamic XFA PDFs that were created with Livecycle Designer. These PDFs are used as templates for various individuals to complete. When a user requests a template we have to write a submit button with url pointing back to a .net application for processing and update some of the fields with information from the database into the PDF.
Can I use iText(Sharp) with .net to update the dynamic xfa pdf to write into it a submit button and update fields and then use iText(Sharp) to process the returning form.
We are doing this now with Acroforms but need to do the same for Dynamic XFA forms as well. I can’t find any confirmation information that this is possible. If it is possible does anyone have any code that they can share to show me how to do this?
You can also achieve that putting the pdf's content inside an XmlDocument and working as you were working with an XML.
This is the code I use to replace some placeholders written in textboxes.
PdfReader pdfReader = new PdfReader(path_pdf);
using (PdfStamper pdfStamp = new PdfStamper(pdfReader, new FileStream(temp_path, FileMode.Create), '\0', true))
{
pdfStamp.ViewerPreferences = PdfWriter.AllowModifyContents;
XmlDocument xmlDocument = pdfReader.AcroFields.Xfa.DomDocument;
string pdfContent = xmlDocument.InnerXml;
string newpdfContent = pdfContent
.Replace("$CONTENT_TO_REPLACE_1$", "some_content")
.Replace("$CONTENT_TO_REPLACE_2$", "some_other_content")
xmlDocument.InnerXml = newpdfContent;
Stream stream = GenerateStreamFromString(newpdfContent);
pdfStamp.AcroFields.Xfa.FillXfaForm(stream);
pdfStamp.AcroFields.Xfa.DomDocument = xmlDocument;
pdfStamp.Close();
}

docx4j No xpathStorageItemId found

I am using docx4j to bind a custom XML part to a word document, but I get an error message that I cannot resolve. I have done these steps:
Inserted content controls into a Word document.
Create an XML file with data.
Use the Word 2007 Content Control Toolkit to bind the cuxtom XML parts to the content controls.
When I open the document the binding is successful and the XML data appear in the document. However, when I run Java code to bind the docx file to the xml file I get an error:
org.docx4j.openpackaging.exceptions.Docx4JException: No xpathStorageItemId found, does the document contain content controls that are bound?
The code I am using to make the binding is from the docx4j examples
try{
WordprocessingMLPackage wordMLPackage = Docx4J.load(new File(input_DOCX));
File inputFile = new File(input_XML);
FileInputStream xmlStream = new FileInputStream(inputFile);
Docx4J.bind(wordMLPackage, xmlStream, Docx4J.FLAG_BIND_INSERT_XML & Docx4J.FLAG_BIND_BIND_XML);
Docx4J.save(wordMLPackage, new File(OUTPUT_DOCX), Docx4J.FLAG_NONE);
System.out.println("Saved: " + OUTPUT_DOCX);
}catch(IOException ex){
ex.printStackTrace();
}catch(Docx4JException ex){
ex.printStackTrace();
}
There are some content controls on the document that are not associated with a custom XML part. Could that be the cause of the error? Must all content controls have an XPATH?

Create PDF file with images in WinRT

How I can create PDF files from a list of image in WinRT. I found something similar for windows phone 8 here("Converting list of images to pdf in windows phone 8") But I am looking for a solution for windows 8. If anyone of having knowledge about this please share your thoughts with me.
Try http://jspdf.com/
This should work on WinJS, but I haven't tested it. In a XAML app you can try to host a web browser control with a jsPDF-enabled page.
ComponentOne has now released the same PDF library that they had in Windows Phone for Windows Runtime. Tho it's not open source, of course.
Amyuni PDF Creator for WinRT (a commercial library) could be used for this task. You can create a new PDF file by creating a new instance of the class AmyuniPDFCreator.IacDocument, then add new pages with the method AddPage, and add pictures to each page by using the method IacPage.CreateObject.
The code in C# for adding a picture to a page will look like this:
public IacDocument CreatePDFFromImage()
{
IacDocument pdfDoc = new IacDocument();
// Set the license key
pdfDoc.SetLicenseKey("Amyuni Tech.", "07EFCD0...C4FB9CFD");
IacPage targetPage = pdfDoc.GetPage(1); // A new document will always be created with an empty page
// Adding a picture to the current page
using (Stream imgStrm = await Windows.ApplicationModel.Package.Current.InstalledLocation.OpenStreamForReadAsync("pdfcreatorwinrt.png"))
{
IacObject oPic = page.CreateObject(IacObjectType.acObjectTypePicture, "MyPngPicture");
BinaryReader binReader = new BinaryReader(imgStrm);
byte[] imgData = binReader.ReadBytes((int)imgStrm.Length);
// The "ImageFileData" attribute has not been added yet to the online documentation
oPic.AttributeByName("ImageFileData").Value = imgData;
oPic.Coordinates = new Rect(100, 2000, 1200, 2000);
}
return pdfDoc;
}
Disclaimer: I currently work as a developer of the library
For an "open source" alternative it might be better for you to rely on a web service that creates the PDF file using one of the many open source libraries available.
I think this may help you if you want to convert an image (.jpg) file to a PDF file.
Its working in my lab.
string source = "image.jpg";
string destinaton = "image.pdf";
PdfDocument doc = new PdfDocument();
doc.Pages.Add(new PdfPage());
XGraphics xgr = XGraphics.FromPdfPage(doc.Pages[0]);
XImage img = XImage.FromFile(source);
xgr.DrawImage(img, 0, 0);
doc.Save(destinaton);
doc.Close();
Thanks.