We are using Google Sheets as a data source for Data Studio and other BI products (Tableau, Domo). We create new sheets using the Google Drive API and set the mime type to have the CSV file converted to a Sheet automatically. That all works fine. But when we overwrite an existing sheet with newer data through the Drive API, it deletes the original worksheet and creates a new one (not the sheet itself but rather the one and only worksheet in the sheet). This breaks the connection to the data source for Data Studio (it needs the worksheet ID to stay the same). How do we do the same thing using the Google Sheets API? Is this the strategy?
Truncate the data in the existing worksheet (batchClear).
Write new data to the existing worksheet starting at cell 0,0 (batchUpdate) and writing X rows at a time in a loop (no file upload through the Sheets API...).
The new data has the same headers but may have fewer rows so just overwriting without clearing will not work. But if there is a way to do this all as one batchUpdate, please let me know.
Answered my own question. It is easy enough to overwrite a sheet with new data in one batchRequest call. You just need to send along an UpdateSheetPropertiesRequest that identifies the sheet and sets the new column and row count. Batch that together with an UpdateCellsRequest that has the new data. Here is some groovy code to do just that:
// read CSV file into memory as list of RowData records
def numCols, numRows, rows
(numCols, numRows, rows) = makeRows(url)
// overwrite sheet with data
def requests = []
// update sheet properties
requests << new Request()
.setUpdateSheetProperties(new UpdateSheetPropertiesRequest()
.setFields('gridProperties(rowCount,columnCount)')
.setProperties(new SheetProperties()
.setSheetId(sheetId)
.setGridProperties(new GridProperties()
.setColumnCount(numCols)
.setRowCount(numRows))))
// overwrite sheet data
requests << new Request()
.setUpdateCells(new UpdateCellsRequest()
.setStart(new GridCoordinate()
.setColumnIndex(0)
.setRowIndex(0)
.setSheetId(sheetId))
.setRows(rows)
.setFields('*'))
// batch those requests
def body = new BatchUpdateSpreadsheetRequest()
.setRequests(requests)
def resp = sheets.spreadsheets()
.batchUpdate(id, body)
.execute()
Related
I have data in an Excel spreadsheet with values like this:
0.69491375
0.31220394
The cells are formatted as Percentage, and set to display two decimal places. So they appear in Excel as:
69.49%
31.22%
I have a C# program that parses this data off the Clipboard.
var dataObj = Clipboard.GetDataObject();
var format = DataFormats.CommaSeparatedValue;
if (dataObj != null && dataObj.GetDataPresent(format))
{
var csvData = dataObj.GetData(format);
// do something
}
The problem is that csvData contains the display values from Excel, i.e. '69.49%' and '31.22%'. It does not contain the full precision of the extra decimal places.
I have tried using the various different DataFormats values, but the data only ever contains the display value from Excel, e.g.:
DataFormats.Dif
DataFormats.Rtf
DataFormats.UnicodeText
etc.
As a test, I installed LibreOffice Calc and copy/pasted the same cells from Excel into Calc. Calc retains the full precision of the raw data.
So clearly Excel puts this data somewhere that other programs can access. How can I access it from my C# application?
Edit - Next steps.
I've downloaded the LibreOffice Calc source code and will have a poke around to see if I can find out how they get the full context of the copied data from Excel.
I also did a GetFormats() call on the data object returned from the clipboard and got a list of 24 different data formats, some of which are not in the DataFormats enum. These include formats like Biff12, Biff8, Biff5, Format129 among other formats that are unfamiliar to me, so I'll investigate these and respond if I make any discoveries...
Also not a complete answer either, but some further insights into the problem:
When you copy a single Excel cell then what will end up in the clipboard is a complete Excel workbook which contains a single spreadsheet which in turn contains a single cell:
var dataObject = Clipboard.GetDataObject();
var mstream = (MemoryStream)dataObject.GetData("XML Spreadsheet");
// Note: For some reason we need to ignore the last byte otherwise
// an exception will occur...
mstream.SetLength(mstream.Length - 1);
var xml = XElement.Load(mstream);
Now, when you dump the content of the XElement to the console you can see that you indeed get a complete Excel Workbook. Also the "XML Spreadsheet" format contains the internal representation of the numbers stored in the cell. So I guess you could use Linq-To-Xml or similar to fetch the data you need:
XNamespace ssNs = "urn:schemas-microsoft-com:office:spreadsheet";
var numbers = xml.Descendants(ssNs + "Data").
Where(e => (string)e.Attribute(ssNs + "Type") == "Number").
Select(e => (double)e);
I've also tried to read the Biff formats using the Excel Data Reader however the resulting DataSets always came out empty...
The BIFF formats are an open specification by Microsoft. (Note, that I say specification not standard). Give a read to this to get an idea of what is going on.
Then those BIFF you see correspond to the some Excel formats. BIFF5 is XLS from Excel 5.0 and 95, BIFF8 is XLS from Excel 97 to 2003, BIFF12 is XLSB from Excel 2003, note that Excel 2007 can also produce them (I guess Excel 2010 too). There is some documentation here and also here (From OpenOffice) that may help you make sense of the binary there...
Anyways, there is some work has been done in past to parse this documents in C++, Java, VB and for your taste in C#. For example this BIFF12 Reader, the project NExcel, and ExcelLibrary to cite a few.
In particular NExcel will let you pass an stream which you can create from the clipboard data and then query NExcel to get the data. If you are going to take the source code then I think ExcelLibrary is much more readable.
You can get the stream like this:
var dataobject = System.Windows.Forms.Clipboard.GetDataObject();
var stream = (System.IO.Stream)dataobject.GetData(format);
And read form the stream with NExcel would be something like this:
var wb = getWorkbook(stream);
var sheet = wb.Sheets[0];
var somedata = sheet.getCell(0, 0).Contents;
I guess the actual Office libraries from Microsoft would work too.
I know this is not the whole tale, please share how is it going. Will try it if I get a chance.
I have two excel files in separate instances of excel. I want to take data from one instance of excel to another. This seemed simple as I know that path of the file that I want to pull data from. However, the file I want to pull data from is used by a separate program where it opens up the file I want to pull from (a template), populates it, but does not save it. So each time this external program is running it is using the file I want to pull data from but since it never saves it I am having a hard time pulling data from the template file. I have used the getObject() function which successfully pulls data from the file as I know the file path but the fields are of course empty as when the external program used the file, it only filled in the data but never saves it. How can I do what I am asking?
Building on Scotts suggestion
Since you know the full path and name of the other workbook, use GetObject to reference it
Use .SaveCopyAs to save it
Open the saved copy in the local instance
This code goes in the file running in your instance of Excel
Sub Demo()
Dim wbRemote As Workbook
Dim wbLocal as Workbook
' Get reference to the workbook running in the other instance of Excel
Set wbRemote = GetObject("C:\data\Temp\TemplateBook.xlsx")
' Save a copy
wbRemote.SaveCopyAs "C:\data\Temp\Temp.xlsx"
' remove reference
Set wbRemote = Nothing
' open copy in this instance
Set wbLocal = Application.Workbooks.Open("C:\data\Temp\Temp.xlsx")
' work with object wbLocal
' ...
End Sub
I'm using System.IO.Packaging to build simple Excel files. One of our customers would like to have an autorun macro that updates data and recalcs the sheet.
Pulling apart existing sheets I can see that all you need to do is add the vbaProject.bin file and change a few types in the _rels. So I made the macro in one file, extracted the vbaProject.bin, copied it into another file, and presto, there's the macro.
I know how to add package parts when they are in XML format, like the sheets or the workbook itself, but I've never added a binary file and I can't figure it out. Has anyone done this before?
Ok I got it. Following TnTinMn's suggestion:
Open a new workbook and type in your macro. Change the extension to
zip, open it, open the xl folder and copy out the vbaProject.bin
to somewhere easy to find.
In your .Net code, make a new Part and add it to the Package as
'xl/vbaProject.bin'. Copy over byte-for-byte from the
vbaProject.bin you extracted above. It will be compressed as you
add the bytes.
Then you have to add a relationship to the workbook that points to
your new file. You can find those relationships in
xl/_rels/workbook.xml.rels.
You also have to add a content type entry at the root of the
document, which goes into the [Content Types].xls. This happens automatically when you use the ContentType parameter of CreatePart
And finally, change the extension to .xlsm or .xltm
I'm extracting the following from many places in my code, so this is pseudo...
'the package...
Dim xlPackage As Package = Package.Open(WBStream, FileMode.Create)
'start with the workbook, we need the object before we physically insert it
Dim xlPartUri As URI = PackUriHelper.CreatePartUri(New Uri(GetAbsoluteTargetUri("/", "xl/workbook.xml"), UriKind.Relative)) 'the URI is relative to the outermost part of the package, /
Dim xlPart As PackagePart = xlPackage.CreatePart(xlPartUri, "application/vnd.ms-excel.sheet.macroEnabled.main+xml", CompressionOption.Normal)
'add an entry in the root _rels folder pointing to the workbook
xlPackage.CreateRelationship(xlPartUri, TargetMode.Internal, "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument", "xlWorkbook") 'it turns out the ID can be anything unique
'now that we have the WB part, we can make our macro relative to it
Dim xlMacroUri As URI = PackUriHelper.CreatePartUri(New Uri(GetAbsoluteTargetUri("/xl/workbook.xml", "vbaProject.bin"), UriKind.Relative))
Dim xlMacroPart as PackagePart = xlPackage.CreatePart(xlPartUri, "application/vnd.ms-office.vbaProject", CompressionOption.Normal)
'time we link the vba to the workbook
xlParentPart.CreateRelationship(xlPartUri, TargetMode.Internal, "http://schemas.microsoft.com/office/2006/relationships/vbaProject", "rIdMacro") 'the ID on the macro can be anything as well
'copy over the data from the macro file
Using MacroStream As New FileStream("C:\yourdirectory\vbaProject.bin", FileMode.Open, FileAccess.Read)
MacroStream.CopyTo(xlMacroPart.GetStream())
End Using
'
'now write data into the main workbook any way you like, likely using new Parts to add Sheets
I'm new to the site and I'm learning VBA.
Basically, I created a code which loops through excel files in a folder and processes some data which are subsequently implemented in a single common excel file with the name of the processed file in column A and all the data I want to record in the following cells.
Since I'm working with a lot of XSL files and the folder is constantly updated with new files, I was wondering which is the easiest way to go through the files once again when the macro starts and skipping the pre-processed files, in order to just record the new ones.
Thanks in advice
Add a function that checks if your file is already processed. Assuming that you have the list of processed files in column A of 1st Worksheet:
Function FileAlreadyProcessed(filename As String) As Boolean
Dim r As Range, matchRes As Variant
Set r = ThisWorkbook.Sheets(1).Range("A:A")
matchRes = Application.Match(filename, r, 0)
FileAlreadyProcessed = (Not IsError(matchRes))
End Function
This function will search Col A for the filename. When found, the function will return true, else false. So add a check in your loop
if not FileAlreadyProcessed(fileName) then
... do your processing
endif
I have an excel file that has a header row which is a row that I want to delete. The header row in thsi file are the cells of A1 to W1 merged into one. This causes a problem when I try to read the file because I am expecting column names. Proper column names exist in the second row of the file, which is why I want to delete the first.
To accomplish this I thought I'd be able to use the 'Excel Source' item in SSIS since it supports a SQL option to write a query. What I want to do is something like this:
SELECT * from ExcelFile WHERE Row > 1
My file only has data in columns A thru W.
I don't know what syntax I can use in the query to do this. The query builder that is in the Excel Source item will allow me to do many things with columns but I don't see an option for doing anything with rows. Searching online and using the help didn't get me anywhere.
None of these solutions will work because the Excel driver will be confused by the merged first line. You won't be able to use any driver features such as skip first row to do this. You need to run some script to open the Excel file and delete the row manually.
There is some basic sample script at this site:
http://www.sqlservercentral.com/Forums/Topic1327014-1292-1.aspx
The code below is adapted from the code written by snsingh at that site.
You would obviously want to use connnection manager properties, not hard coded paths
Excel needs to be installed on the SSIS Server for it to work - this is the only way to use Excel automation.
Dim filename As String
Dim appExcel As Object
Dim newBook As Object
Dim oSheet1 As Object
appExcel = CreateObject("Excel.Application")
filename = "C:\test.xls"
appExcel.DisplayAlerts = False
newBook = appExcel.Workbooks.Open(filename)
oSheet1 = newBook.worksheets("Sheet1")
oSheet1.Range("A1").Entirerow.Delete()
newBook.SaveAs(filename, FileFormat:=56)
appExcel.Workbooks.Close()
appExcel.Quit()
You don't need to use a syntax.
Go to control flow..
Pull in a data flow task.
Add a excel file source...add a conection manager
With excel sheet.
Open your connection manager and then check the box which says.
Column names In first row. That's it and add ur destination.