I have data in an Excel spreadsheet with values like this:
0.69491375
0.31220394
The cells are formatted as Percentage, and set to display two decimal places. So they appear in Excel as:
69.49%
31.22%
I have a C# program that parses this data off the Clipboard.
var dataObj = Clipboard.GetDataObject();
var format = DataFormats.CommaSeparatedValue;
if (dataObj != null && dataObj.GetDataPresent(format))
{
var csvData = dataObj.GetData(format);
// do something
}
The problem is that csvData contains the display values from Excel, i.e. '69.49%' and '31.22%'. It does not contain the full precision of the extra decimal places.
I have tried using the various different DataFormats values, but the data only ever contains the display value from Excel, e.g.:
DataFormats.Dif
DataFormats.Rtf
DataFormats.UnicodeText
etc.
As a test, I installed LibreOffice Calc and copy/pasted the same cells from Excel into Calc. Calc retains the full precision of the raw data.
So clearly Excel puts this data somewhere that other programs can access. How can I access it from my C# application?
Edit - Next steps.
I've downloaded the LibreOffice Calc source code and will have a poke around to see if I can find out how they get the full context of the copied data from Excel.
I also did a GetFormats() call on the data object returned from the clipboard and got a list of 24 different data formats, some of which are not in the DataFormats enum. These include formats like Biff12, Biff8, Biff5, Format129 among other formats that are unfamiliar to me, so I'll investigate these and respond if I make any discoveries...
Also not a complete answer either, but some further insights into the problem:
When you copy a single Excel cell then what will end up in the clipboard is a complete Excel workbook which contains a single spreadsheet which in turn contains a single cell:
var dataObject = Clipboard.GetDataObject();
var mstream = (MemoryStream)dataObject.GetData("XML Spreadsheet");
// Note: For some reason we need to ignore the last byte otherwise
// an exception will occur...
mstream.SetLength(mstream.Length - 1);
var xml = XElement.Load(mstream);
Now, when you dump the content of the XElement to the console you can see that you indeed get a complete Excel Workbook. Also the "XML Spreadsheet" format contains the internal representation of the numbers stored in the cell. So I guess you could use Linq-To-Xml or similar to fetch the data you need:
XNamespace ssNs = "urn:schemas-microsoft-com:office:spreadsheet";
var numbers = xml.Descendants(ssNs + "Data").
Where(e => (string)e.Attribute(ssNs + "Type") == "Number").
Select(e => (double)e);
I've also tried to read the Biff formats using the Excel Data Reader however the resulting DataSets always came out empty...
The BIFF formats are an open specification by Microsoft. (Note, that I say specification not standard). Give a read to this to get an idea of what is going on.
Then those BIFF you see correspond to the some Excel formats. BIFF5 is XLS from Excel 5.0 and 95, BIFF8 is XLS from Excel 97 to 2003, BIFF12 is XLSB from Excel 2003, note that Excel 2007 can also produce them (I guess Excel 2010 too). There is some documentation here and also here (From OpenOffice) that may help you make sense of the binary there...
Anyways, there is some work has been done in past to parse this documents in C++, Java, VB and for your taste in C#. For example this BIFF12 Reader, the project NExcel, and ExcelLibrary to cite a few.
In particular NExcel will let you pass an stream which you can create from the clipboard data and then query NExcel to get the data. If you are going to take the source code then I think ExcelLibrary is much more readable.
You can get the stream like this:
var dataobject = System.Windows.Forms.Clipboard.GetDataObject();
var stream = (System.IO.Stream)dataobject.GetData(format);
And read form the stream with NExcel would be something like this:
var wb = getWorkbook(stream);
var sheet = wb.Sheets[0];
var somedata = sheet.getCell(0, 0).Contents;
I guess the actual Office libraries from Microsoft would work too.
I know this is not the whole tale, please share how is it going. Will try it if I get a chance.
Related
I am looking for a way to convert some excel workbooks into PDF files automaically using R.
I have seen people suggesting the RDCOMClient option, but it doesn't work from my company's PC. The problem I am encountering is that my spreadsheets contain tables, plots and images.
Is there a way I can print the entire worksheets and then save them into a pdf file?
I also tried using the loadWorkbook() function to import the files, but I cannot find a way to save them.
I thought about creating a pdf file with the pdf() function but I only managed to save the tables through the grid.table() function.
Does anyone have better ideas?
You can use the following approach
library(RDCOMClient)
xlApp <- COMCreate("Excel.Application")
path_Excel_File <- "C:\\...\\excel_File.xlsx"
xlWbk <- xlApp$Workbooks()$Open(path_Excel_File)
xlWbk$ExportAsFixedFormat(Type = 0, FileName = "C:\\...\\pdf_File.pdf")
Type = 0 is a PDF, Type = 1 is XPS, see https://learn.microsoft.com/en-us/office/vba/api/excel.xlfixedformattype
I want to split a multi-page MS Publisher 2010 document into a set of separate documents, one per page.
The starting document is from a mail-merge, and I am trying to produce a set of numbered and named tickets as PDFs to send to people for an event (this is for a charity). The mail-merge seems to work fine and I can save the merged document and it looks OK with e.g. a list of fifty people giving me a 50-page document.
Ideally the result would be a set of PDFs.
I have tried to create some simple VBA code to do this, but it is not working consistently. If I try this very simple macro below , I get the correct number of documents, but only perhaps 1 or 2 documents with the correct contents out of every five. Most of the documents are completely empty.
Sub splitter()
Dim i As Integer
Dim Source As Document
Dim Target As Document
Set Source = ActiveDocument
For i = 1 To Source.Pages.Count
Set Target = Documents.Add
Source.Pages(i).Shapes.Range.Copy
Target.Pages(1).Shapes.Paste
Target.SaveAs Filename:="C:\Temp\Ticket_" & i
Target.Close
Set Target = Nothing
Next i
End Sub
I did sometimes get an error that the clipboard is busy, but not always.
Another approach might be to start with the master document and do this looping over the separate documents and fill in the personal details for each person's ticket and directly produce the PDFs. But that seems more complex, and I am not a VB programmer (but been doing C++ etc for 20+ years, so I can program :-) )
A final annoyance is that it seems to keep opening a new Publisher window for each document. It takes a while to then close 50+ copies of publisher, and the laptop starts to crawl...
Please advise how best to get round these issues. I am probably missing something trivial, being a relative VB(A) newbie.
Thanks in advance for any suggestions
Try coding something like this:
Open Publisher application (CreateObject()?)
Open Publisher document (doc.Open(filename))
Store the total amount of pages in a global variable (doc.Pages.Count)
Close document (doc.Close())
Loop the following for each page
Copy the pub file and rename it to name & "page" & X
Open the new pub file
Remove all Pages except page X from the pub file
doc.Save()
doc.Close()
Copying files with VBA is easy, but copying pages in Publisher VBA is quite a hassle, so this should be easier to achieve
I'm currently presented with the following issue:
I need to open a CSV file with the Excel Interop Classes (15.0). This is done using the following code:
Dim app As New Excel.Application
Dim workbook As Excel.Workbook = app.Workbooks.Open(sFileNameCSV, Format:=4, Local:=True)
Unfortunately this converts some of the data into formulas (e.g. text starting with a hyphen [- this is an example] or phone numbers in international format [+41-555-123-45-67]) resulting in either a #NAME? error or a calculated result in case of the phone number.
After some searching in the web and on SO I tried the following things with no luck:
Saving the CSV File as a .txt file
Using the OpenText() Method instead of the Open() method
Combination of the two above
Is there any solution to this issue without having to change the CSV file data itself and still using the Interop classes, like disabling formulas altogether? Or am I just missing a param in the Open() / OpenText() functions?
I have this txt file with the following information:
National_Insurence_Number;Name;Surname;Hours_Worked;Price_Per_Hour so:
eg.: aa-12-34-56-a;Peter;Smith;36;12
This data has been inputed to the txt file through a VB form which works totally fine, the problem comes when, on another form. This is what I expect it to do:
The user will input into a text box the employees NI Number.
The program will then search through the file that NI Number and, if found;
It will fill in the appropriate text boxes with its data.
(Then the program calculates tax and national insurance which i got working fine)
So basically the problem comes telling the program to search that NI number and introduce each ";" delimited field into its corresponding text box.
Thanks for all.
You just need to parse the file like a csv, you can use Microsoft.VisualBasic.FileIO.TextFieldParser to do this or you can use CSVHelper - https://github.com/JoshClose/CsvHelper
I've used csv helper in the past and it works great, it allows you to create a class with the structure of the records in your data file then imports the data into a list of these for searching.
You can look here for more info on TextFieldParser if you want to go that way -
Parse Delimited CSV in .NET
Dim afile As FileIO.TextFieldParser = New FileIO.TextFieldParser(FileName)
Dim CurrentRecord As String() ' this array will hold each line of data
afile.TextFieldType = FileIO.FieldType.Delimited
afile.Delimiters = New String() {";"}
afile.HasFieldsEnclosedInQuotes = True
' parse the actual file
Do While Not afile.EndOfData
Try
CurrentRecord = afile.ReadFields
Catch ex As FileIO.MalformedLineException
Stop
End Try
Loop
I'd recommend using CsvHelper though, the documentation is pretty good and working with objects is much easier opposed to the raw string data.
Once you have found the record you can then manually set the text of each text box on your form or use a bindingsource.
I am looking for mail merge alternatives in my vb.net app. I have used the mail merge feature of word, and find that it is quite buggy when dealing with a large volume of documents. I am looking at alternate methods of generating the merge, and have come across open xml. I think this will probably be the answer I am looking for. I have come to understand that the merge will be entirely code-driven in vb.net. I have started playing around with the following code:
Dim wordprocessingDocument As WordprocessingDocument = wordprocessingDocument.Open("C:\Users\JasonB\Documents\test.docx", True)
'for each simplefield (mergefield)
For Each field In wordprocessingDocument.MainDocumentPart.Document.Body.Descendants(Of SimpleField)()
'get the document instruction values
Dim instruction As String() = field.Instruction.Value.Split(splitChar, StringSplitOptions.RemoveEmptyEntries)
'if mergefield
If instruction(0).ToLower.Equals("mergefield") Then
Dim fieldname As String = instruction(1)
For Each fieldtext In field.Descendants(Of Text)()
fieldtext.Text = "I AM TESTING"
Next
End If
wordprocessingDocument.MainDocumentPart.Document.Save()
wordprocessingDocument.Dispose()
Now this works great and all, but I am realizing that I need to create as many documents as I will have datarows (assuming I use a datatable to handle the data).
One suggestion I found was to loop through each datarow, take my document template, save it to a folder and insert the datarow data. This could mean however that I end up with 12,000 documents in a single folder that need to be joined later and converted to pdf.
Is there another option? The other thing that stood out to me is to create a new word document, and duplicate over the xml from the template, and then replace the values. I dont know however if there is a "simpler" way of doing this, thanks.
If you don't want to save all 12,000 documents to file you should be able to process, convert and email them one at a time using temporary files.
Converting the DOCX to PDF in .NET might be an issue but looks like it's possible using Word Automation (Saving Word DOCX files as PDF).
The bottom line is you don't need to generate all documents before emailing them if you perform the process one document at a time. You can use SmtpClient in VB.NET to email the PDF after it is generated.
In terms of creating the document I have seen reports generated where a simple string replace is used to replace a string such as '%FIRSTNAME%' with the person's name and so on. This isn't necessarily the best approach but can work quite well. This way you can create your template in Word or OpenOffice and then edit it in .NET using OpenXML.