grab and filter from more than 255 columns from a huge closed workbook - sql

i have a huge workbook (0.6 million rows) and 315 columns whose column names i need to grab into an array. due to the huge size, i don't want to open and close the workbook to copy the 1st row of the range. Also, I want to only grab certain columns from the 1st row that begin with the word "Global ".
can anyone help with short code example on how to go about doing this? please note i have tried ADOX, ADO etc but both show the 255 column limitations. I also dont want to open the workbook, but pull the required "Global " columns from the 315 columns into an array.
any help is most appreciated.

You can copy the first row of your target by opening a new workbook, and in A1 use this formula:
='C:\PATH_TO_TARGET\[TARGET_FILE_NAME.xlsx]WORKSHEET_NAME'!A1
Note that PATH+FILENAME+WORKSHEET is enclosed in single quotes, the FILENAME is enclosed in square brackets, and an exclamation separates the cell reference.
Then copy/Paste or fill right to get the next 314 columns. Note: this formula will return zero for empty target cells.
Once you have the column heading you can copy/paste_special_values if you want to destroy the links to the closed workbook.
Hope that helps

You could use the Python programing language.
While it does not actively works with XLSX fiels, you just have to install the openpyxl external module from here: https://pypi.python.org/pypi/openpyxl -
(You will also have to install Python. of course - just download it from www.python.org)
It will make working with your data in an interactive Python session a piece of cake, and the time to open the workbook without having to load the Excel interface should be a fraction of what you are expecting. (I think it will have to fit in your memory, though).
But this is all I had to type, in an interactive Python2 session to open a workbook, and retreive the column names that start with "bl":
import openpyxl
a = openpyxl.load_workbook("bla.xlsx")
[cell.value for cell in a.worksheets[0].rows[0] if cell.value.startswith("bl")]
output:
Out[8]: [u'bla', u'ble', u'bli', u'blo', u'blu']
The last input line requires on to know Python to be understood, so, here is a summary of what happens: Python is a language very fond of working with sequences - and the openpyxl libray gives your workbook as just that:
an object which is a sequence of worksheets - each worksheet having a rows attribute which has a sequence of all rows in the sheet, and each row bein a sequence of cells. Each cell has a value attribute which is the text within it.
The inline for statement is the compact form, but it could be written as a multiple line statement as:
In [10]: for cell in a.worksheets[0].rows[0]:
....: if cell.value.startswith("bl"):
....: print cell.value
....:
bla
ble
bli
blo
blu
Keep in mind that by exploring Python a bit deeper, you can programatically manipulate your data in a way that will be easier than ininteractivelygiven a data-set this size - and you can even use Python itself to drop select contents to an SQL database, (including its bult-in, single-file database, sqlite), where sophisticated indexes and queries can make working with your data a breeze)

Related

SUMIFS returns 0 using dynamic criteria, with criteria range and sum range on another sheet

Anyone,
I've chatted with and called excel customer service with no luck. I used the formula builder (please see attached screenshot) to make sure each element of the formula is correct and returns the value for the criteria I'm trying to reference.
Everything is accurate, but it returns a value of 0. When I do the same thing in the actual sheet the data is stored in (and click a criteria cell within the criteria range) it returns the accurate value?! I'm not sure why it won't work on the other sheet. The values I am using to select are dynamic and change with a drop down. I have another, advanced, workbook (I did not create) that does the same thing and completes an even more complicated formula, but actually works so I'm not sure why this is returning a 0 value.
Photos and code/syntax: Dynamic Selection, Example 2 of it working, Example 1 of it working, Formula Builder, CountIFs, Advanced Spreadsheet working, VLOOKUP
=SUMIFS('GFEBS Pull'!Q:Q,'GFEBS Pull'!G:G,FMCOP!$C$20,'GFEBS Pull'!H:H,FMCOP!B23)
or:
=SUMIFS('GFEBS Pull'!Q:Q,'GFEBS Pull'!G:G,'FMCOP'!$C$20,'GFEBS Pull'!H:H,'FMCOP'!B23)
When I type ' around FMCOP sheet name, they disappear? I've also tried to lock the columns on the 'GFEBS Pull' sheet with no luck. Cell B23 is not locked because I'm going to copy the formula down to reference other cells. Any help is appreciated!
In this screenshot you can clearly see that both FMCOP!C20 ansd FMCOP!B23 have prefacing spaces; e.g. " HHC".
Since " HHC" will never match "HHC", fix the data returned from 'the lower table in the same screenshot'.
A Text-to-Columns, Fixed Width, Finish should do this. You could adjust the original formula like,
=SUMIFS('GFEBS Pull'!Q:Q, 'GFEBS Pull'!G:G, TRIM(FMCOP!$C$20), 'GFEBS Pull'!H:H, TRIM(FMCOP!B23))
I would caution against the latter 'bandaid' fix. Fix the original data; do not apply bandaids on-the-fly.

Trouble with Copying VBA Code

I've been working on an independent project for a client of mine. They wanted to produce a button that, upon the user-click, it would open up a user-form and have a variety of macro-related options to choose from: a drop-down list, checkbox, option select button, etc.
I created a test formula and submitted it to the client; they enjoyed it thoroughly and decided to sent me a file to 'copy & paste' my original code within their excel file.
Problem is; because I'm a tad bit inexperienced with VBA I've run into a problem where once I click the button - the user form doesn't show up.
Below is a Dropbox link of the original file I created and it's original code; as well as the file that I am trying to copy.
Any help would be all welcome and appreciated.
Link to dropbox: https://www.dropbox.com/sh/l1t37lz8uritrua/AAAdWPGvw0GDZ6hW4SwmbBdRa?dl=0
OriginalProject.xlsm has a form named honor_roll_form which contains 100 lines of code.
CopyOfOriginal.xlsm has a form named UserForm1 which contains no useful code.
I do not believe there is any method of directly copying user forms from one workbook to another. Instead
Within VB Editor of OriginalProject.xlsm, select honor_roll_form.
Click File then Export File and save the form on your desktop or where ever you like.
You will now have two files on your desktop; one with an extension of frm and one with an extension of frx.
Within VB Editor of CopyOfOriginal.xlsm, click File then Import file.
Import honor_roll_form.frm
When I try clicking button "Honor Roll", I get "Method or data member not found" for project1Box. I will investigate after dinner (18:57 here) unless you tell me you already know why I am getting this error.
Extra comments in response to request from OP
It is late here but I have started looking down sub execute_button_Click within the second CopyOfOriginal.xlsm. I will comment on what I see even if it is not directly relevant to the non-execution of the macro.
If you open the VB Editor and look on the left you will see the Project Explorer. Near the top you will see:
Microsoft Excel Objects
Sheet1 (Sheet1)
I have always found this confusing. The first “Sheet1” is Excel’s Id for the worksheet and cannot be changed. The second “Sheet1” is the default name for the worksheet which can be changed. You can write Sheet1.Range("A1") or Worksheets("Sheet1").Range("A1"). That is: you can reference a worksheet by its Id or its name. You have named a variable of type Worksheet as Sheet1. Using Excel’s names as variable names can lead to bizarre errors so it is important to avoid doing anything like this.
It is better to always use meaningful names. At the moment, you know what Sheet1 means but if you come back to this macro in six or twelve months will you remember. I would use a variable as you have but I would name it WshtCis208 or WshtVBAProg or something similar.
Set ID = Range(Sheet1.Cells(2, 1), Sheet1.Cells(52, 1)) could be written as:
With WshtCis208
Set ID = Range(.Cells(2, 1), .Cells(52, 1))
End With
Using With statements produces faster code and, almost always, code that it easier to read.
“52” is the current bottom row for this table. Will you amend the macro for them every time they add or remove a student? There are several techniques for finding the last row, none of which is perfect in every situation. The technique that is the most convenient most of the time is:
Const ColCis208Id as Long = 1
Const ColCis208MidTermExam as Long = 5
Dim RowCis208Last as Long
RowCis208Last = .Cells(.Rows.Count, ColCis208Id).End(xlUp).Row
At the moment, column 1 is the Id column. It is perhaps unlikely that the Id column will move but it is very likely that some of the others columns will move when some new column is identified as useful. Do you want to scan the code trying to decide which 5s refer to the MidtermExam column when a Project3 column is added?
Constants allow you to name literals that might change. It makes your code easier to read and saves so much pain when a value changes.
.Rows.Count gives the number of rows in a worksheet for the current version of Excel so .Cells(.Rows.Count, ColCis208Id) identifies the bottom cell of column 1. End(xlUp).Row says go up until you hit a cell with a value and returns its row number. It is the VBA equivalent of Ctrl+Up.
The next statement subjectCount = … fails because projectBox does not exist on the form. You have changed the captions but not the names.
As far as I can see the form fails to execute because you have started updating it but have not finished.

Automating the mathematical tool

I have developed a calculator in "TOOL.xlsx" file. It takes 4 inputs and returns 2 outputs. The calculation is performed on the calculator sheet in the "Tool.xlsx" workbook. 4 inputs correspond to 1 data set. I have another Excel file named "DATA.xlsx" that contains around 20,000 datasets (4 inputs per data set) and it also has an output column that collects the output. I would like to automate the "DATA.slsx" and "Tool.xlsx" interaction so that the inputs for the "TOOL.xlsx" are automatically called from the "data.xlsx" and the output column in the "data.xlsx" are filled with the outputs of "TOOL.xlsx". I would really appreciate any help because I am really stuck up here. PS: I an new to VBA.
screenshot that ilustrates my problem
I would like to see the formula behind the input/output cells
and need to put those formula in second sheet.
to give a try, use
=SheetName!CellAddress in second sheet input/output cells

Excel VBA; Search rows grabbing values and pasting them into another worksheet

I have two workbooks;
(WB1) with two sheets; "Input" and "Output"
and
(MacroWB) with the macro and a "Column Header" list.
Example file: "Messy" sheet = input, "Organized" = output
https://drive.google.com/file/d/0B-leh2Ii2uh9bDBFbDBHbGcxbUU/view?usp=sharing
I need help coding a macro to do the following:
1) Create a loop to go through each row of the "Input" sheet searching for values matching cells in the "Column Header" list.
2) When a matching value is found; take the data from the cell immediately to it's right (in the "Input" sheet) and paste it into the corresponding column of the "Output" worksheet.
3) Once every "Column Header" item has been searched/pasted for that row; move to the next row of the "Input" sheet. Rinse and repeat until all rows of the "Input" sheet have been searched/pasted.
Here is an example, the letters are to be column headers and the numbers are to be copied to the appropriate "Output" sheet column.
https://drive.google.com/file/d/0B-leh2Ii2uh9TXRGTnFDRU1jY0U/view?usp=sharing
Keep in mind that the actual data file has ~50 columns and ~3000 rows.
Also that the data is not all Letter/Numbers like the table above, it is more like the data in the linked .xlsx file.
If there is anything I haven't been clear about, please ask and I will try my best to clarify. Also I may be WAY over thinking this, if so.. please let me know.
THANK YOU ANYONE THAT CAN GET ME GOING IN THE CORRECT DIRECTION!!!
-Joe
Skip the the VBA and use Text to Columns the Data tab. I'malways copying html and its works 99% of the time. If the html is pretty and properly formated you may get away with using the fixed width option, otherwise gor for the delimted and choose "tab". If tab doesn't work try using spaces, assuming that your cells don't contain spaces.
The other option that I've had work on rare occasions that text to columns doesn't is simply saving the text in word and saving as rtf and then opening that in notepad++ (which everyone should have.) Copy from ++ to excel and that usually fixes the problem.
EDIT: If you right click before pasting and click "paste special" this regularly helps with html pasting.
In your sample file, I used the following formula in A2 of Organized sheet (assumed 50 as max columns in Messy):
=IFERROR(OFFSET(Messy!A1,0,MATCH(Organized!A$1,OFFSET(Messy!A1,0,0,1,50), )),"")
Dragging it to H11 produced the following result:
The sample data is not complete, and some 'tags' in Messy sheet are not consistent (SiteID vs SITE_ID), but it should help you get started.

Convert xls File to csv, but extra rows added?

So, I am trying to convert some xls files to a csv, and everything works great, except for one part. The SaveAs function in the Excel interop seems to export all of the rows (including blank ones). I can see these rows when I look at the file using Notepad. (All of the rows I expect, 15 rows with two single quotes, then the rest are just blank). I then have a stored procedure that takes this csv and imports to the desired table (this works on spreadsheets that have been manually converted to csv (e.g. open, File--> Saves As, etc.)
Here is the line of code I am using for my SavesAs in my code. I have tried xlCSV, xlCSVWindows, and xlCSVDOS as my file format, but they all do the same thing.
wb.SaveAs(aFiles(i).Replace(".xls", "B.csv"), Excel.XlFileFormat.xlCSVMSDOS, , , , False) 'saves a copy of the spreadsheet as a csv
So, is there some additional step/setting I need to do to not get the extraneuos rows to show up in the csv?
Note that if I open this newly created csv, and then click Save As, and choose csv, my procedure likes it again.
When you create a CSV from a Workbook, the CSV is generated based upon your UsedRange. Since the UsedRange can be expanded simply by having formatting applied to a cell (without any contents) this is why you are getting blank rows. (You can also get blank columns due to this issue.)
When you open the generated CSV all of those no-content cells no longer contribute to the UsedRange due to having no content or formatting (since only values are saved in CSVs).
You can correct this issue by updating your used range before the save. Here's a brief sub I wrote in VBA that would do the trick. This code would make you lose all formatting, but I figured that wasn't important since you're saving to a CSV anyway. I'll leave the conversion to VB.Net up to you.
Sub CorrectUsedRange()
Dim values
Dim usedRangeAddress As String
Dim r As Range
'Get UsedRange Address prior to deleting Range
usedRangeAddress = ActiveSheet.UsedRange.Address
'Store values of cells to array.
values = ActiveSheet.UsedRange
'Delete all cells in the sheet
ActiveSheet.Cells.Delete
'Restore values to their initial locations
Range(usedRangeAddress) = values
End Sub
Tested your code with VBA and Excel2007 - works nice.
However, I could replicate it somewhat, by formatting an empty cell below my data-cells to bold. Then I would get empty single quotes in the csv. BUT this was also the case, when I used SaveAs.
So, my suggestion would be to clear all non-data cells, then to save your file. This way you can at least exclude this point of error.
I'm afraid that may not be enough. It seems there's an Excel bug that makes even deleting the non-data cells insufficient to prevent them from being written out as empty cells when saving as csv.
http://answers.microsoft.com/en-us/office/forum/office_2010-excel/excel-bug-save-as-csv-saves-previously-deleted/2da9a8b4-50c2-49fd-a998-6b342694681e
Another way, without a script. Hit Ctrl+End . If that ends up in a row AFTER your real data, then select the rows from the first one until at least the row this ends up on, right click, and "Clear Contents".