Openpyxl How does one define print area from a variable or other input like max rows? - openpyxl

I am very new to python in general but have a set of code where I am needing to define the print area for multiple excel files. My code is built to step through thousands of files in a directory and they are all relatively similar but vary in row length.
I am trying to do something akin to:
worksheet.print_area = "['A1' : 'F' + str(max_rows)]"
The code above is not working because it is not a valid coordinate range because I am guessing the print.area function is requiring an exact range like [A1:F45]. But I am needing that last number to be able to change depending on the max rows within the sheet.
Any and all help is appreciated.

Related

VBA Excel wildcard string search, copy and paste to new cell

I am looking for help in trying to do the following.
I have large amount of data in Excel (100K+ rows) which have many columns, in one of the columns (lets just say col A) there is a string of data in each cell that lists file path info indicating a java installation. I need to search within that string to find common characters that start and end the same but will have different data in-between, so wildcard would be needed to identify. After identifying the data in the string search for each cell in the specified column (col A) I need to copy and paste the data identified to a different column located just right of the data (col B) on same worksheet for each row. So would look something like this:
Example
COLUMN A (original data string)
C:\app\Java\jre7\bin\...
C:\Program Files (x86)\Java\jre6\bin\...
C:\app\JAVA\jdk1.7.0._21\jre\bin\...
C:\JAVA\JDK-1_5_0_16_i586\jre\bin\...
COLUMN B (copy & paste to here)
jre7
jre6
jdk1.7.0._21
JDK-1_5_0_16_i586
Would need wildcard search to pick up anything specific between the \ & \ that starts with letter j and would be followed by two letters which could be re or dk but would have to allow for character or number to follow this in any length. Would be something like \j??*#*\ for the search always starting with same first two characters, followed by two letters, then possibly another character or nothing at all before a version number, then can be no characters or many after the version number.
As you can see from examples I am trying to pick up the version info with version numbers in them and do not want to get dir info with jre or jdk only in them, since most of my data have these dir listed somewhere in the file path string.
Then copying this info and pasting into Col B as shown in example is what I am trying to do.
Any help would be greatly appreciated as this is a manual process that would benefit greatly from automating.
There's actually a really quick way to do this using formulas. You could combine all of these into one formula if you wish, but I spread it into four simple formulas with the fourth giving you the answer.
Assuming the first string is in cell A1:
B1 = =SEARCH("java",A1)
C1 = =FIND("\",A1,E1)
D1 = =FIND("\",A1,F1+1)
E1 = =MID(A1,F1+1,G1-F1-1)
E1 will have your answer. Autofill down the columns and then copy>paste values in column E and delete columns B-D.

grab and filter from more than 255 columns from a huge closed workbook

i have a huge workbook (0.6 million rows) and 315 columns whose column names i need to grab into an array. due to the huge size, i don't want to open and close the workbook to copy the 1st row of the range. Also, I want to only grab certain columns from the 1st row that begin with the word "Global ".
can anyone help with short code example on how to go about doing this? please note i have tried ADOX, ADO etc but both show the 255 column limitations. I also dont want to open the workbook, but pull the required "Global " columns from the 315 columns into an array.
any help is most appreciated.
You can copy the first row of your target by opening a new workbook, and in A1 use this formula:
='C:\PATH_TO_TARGET\[TARGET_FILE_NAME.xlsx]WORKSHEET_NAME'!A1
Note that PATH+FILENAME+WORKSHEET is enclosed in single quotes, the FILENAME is enclosed in square brackets, and an exclamation separates the cell reference.
Then copy/Paste or fill right to get the next 314 columns. Note: this formula will return zero for empty target cells.
Once you have the column heading you can copy/paste_special_values if you want to destroy the links to the closed workbook.
Hope that helps
You could use the Python programing language.
While it does not actively works with XLSX fiels, you just have to install the openpyxl external module from here: https://pypi.python.org/pypi/openpyxl -
(You will also have to install Python. of course - just download it from www.python.org)
It will make working with your data in an interactive Python session a piece of cake, and the time to open the workbook without having to load the Excel interface should be a fraction of what you are expecting. (I think it will have to fit in your memory, though).
But this is all I had to type, in an interactive Python2 session to open a workbook, and retreive the column names that start with "bl":
import openpyxl
a = openpyxl.load_workbook("bla.xlsx")
[cell.value for cell in a.worksheets[0].rows[0] if cell.value.startswith("bl")]
output:
Out[8]: [u'bla', u'ble', u'bli', u'blo', u'blu']
The last input line requires on to know Python to be understood, so, here is a summary of what happens: Python is a language very fond of working with sequences - and the openpyxl libray gives your workbook as just that:
an object which is a sequence of worksheets - each worksheet having a rows attribute which has a sequence of all rows in the sheet, and each row bein a sequence of cells. Each cell has a value attribute which is the text within it.
The inline for statement is the compact form, but it could be written as a multiple line statement as:
In [10]: for cell in a.worksheets[0].rows[0]:
....: if cell.value.startswith("bl"):
....: print cell.value
....:
bla
ble
bli
blo
blu
Keep in mind that by exploring Python a bit deeper, you can programatically manipulate your data in a way that will be easier than ininteractivelygiven a data-set this size - and you can even use Python itself to drop select contents to an SQL database, (including its bult-in, single-file database, sqlite), where sophisticated indexes and queries can make working with your data a breeze)

Find Lowest Value in CSV File Below A Certain Cell and Output to Excel Cell

I am looking to take data from a CSV file and check for the lowest value under a certain heading.
I would then like to write that file to an excel file within a certain cell.
As an example:
|18| Distance( ft) | DTF-RL |
|19| 69.63636364 |31.05373|
|20| 69.81818182 |30.85291|
|21| 70.61818182 |31.85291|
The value of interest would be 30.85291, the lowest value in the range.
I would like to write that value to a specific cell within an excel document
I would then like to write the value next to it (the distance) in another cell within the same excel document.
I have very little programming experience and wanted to reduce a redundant task. I have no code as of yet, any direction would be much appreciated.
How can I implement this?
Thanks in advance.

Column references in formulas

I am a little stuck at the moment. I am working on an array of data and need to find a way to input column numbers into formulas.
-I have used the match function to find the corresponding column number for a value.
ex. "XYZ" matched with Column 3, which is equivalent to C1:Cxxxxxx
-now for inputing the C1:Cxxxxxx into a formula to get data for that particular column, I would like to be able to directly reference the Column 3 part, because I plan on using this workbook in the future and the column needed to run the calculation may or may not be column 3 the next time I use it.
- is there any way to tell excel to use a formula to tell excel which column to use for an equation?
so a little more detail, I have the equation
=AND(Sheet3!$C$1:$C$250000=$A$4,Sheet3!$B$1:$B$250000=$B$4)
instead of specifying to use column C, is there a way to use a formula to tell it to use C?
EDIT: more additional info;
"i am basically running the equivalent of a SQL where statement where foo and bar are true, I want excel to spit out a concatenated list of all baz values where foo and bar are true. ideally i would like it to ONLY return baz values that are true, then I will concat them together separately. the way I got it now, the expression will test every row separately to see if true; if there is 18K rows, there will be 18K separate tests.. it works, but it's not too clean. the goal is to have as much automated as possible. *i do not want to have to go in and change the column references every time I add a new data arra*y"
Thanks
You can use INDEX, e.g. if you have 26 possible columns from A to Z then this formula will give you your column C range (which you can use in another formula)
=INDEX(Sheet3!$A$1:$Z$250000,0,3)
The 0 indicates that you want the whole column, the 3 indicates which column. If you want the 3 can be generated by another formula like a MATCH function
Note: be careful with AND in
=AND(Sheet3!$C$1:$C$250000=$A$4,Sheet3!$B$1:$B$250000=$B$4)
AND only returns a single result not an array, if you want an array you might need to use * like this
=(Sheet3!$C$1:$C$250000=$A$4)*(Sheet3!$B$1:$B$250000=$B$4)
You could use ADDRESS to generate the text, you then need to use INDIRECT as you are passing a string rather than a range to the fomula
=AND(INDIRECT(ADDRESS(1,3,,,"Sheet3") & ":" & ADDRESS(250000,3))=$A$4
,INDIRECT(ADDRESS(1,2,,,"Sheet3") & ":" & ADDRESS(250000,2))=$B$4)
Obviously replace the 3s and 2s in the ADDRESS formulae with your MATCH function you used to get the column number. The above assumes the column for $B$1:$B$25000 is also found using `MATCH', otherwise it is just:
=AND(INDIRECT(ADDRESS(1,3,,,"Sheet3") & ":" & ADDRESS(250000,3))=$A$4
,Sheet3!$B$1:$B$25000=$B$4)
Note a couple of things:
You only need to use "Sheet3" on the first part of the INDRECT
Conditions 3 and 4 in the ADDRESS formula are left as default, this
means they return absolute ($C$1) reference and are A1 style as
opposed to R1C1
EDIT
Given the additional info maybe using an advanced filter would get you near to what you want. Good tutorial here. Set it up according to the tutorial to familiarise yourself with it and then you can use some basic code to set it up automatically when you drop in a new dataset:
Paste in the dataset and then use VBA to get the range the dataset uses then apply the filter with something like:
Range("A6:F480").AdvancedFilter Action:=xlFilterInPlace, CriteriaRange:= _
Sheets("Sheet1").Range("A1:B3"), Unique:=False
You can also copy the results into a new table, though this has to be in the same sheet as the original data. My suggestion would be paste you data into hidden columns to the left and put space for your criteria in rows 1:5 of the visible columns and then have a button that gets the used range for your data, applies the filter and copies the data below the criteria:
Range("A6:F480").AdvancedFilter Action:=xlFilterCopy, CriteriaRange:=Sheets _
Range("H1:M3"), CopyToRange:=Range("H6"), Unique:=False
Button would need to clear the destination cells first etc, make sure you have enough hidden columns etc but it's all possible. Hope this helps.

Is there a way to execute VBA code contained in a cell?

I'm building a macro in Excelto run rules against a set of data and output whether each row passes or fails the rules. I want to be able to add, remove, or alter the rules without altering the macro. As such I have a DATA worksheet and a RULES worksheet and the macro generates the OUTPUT worksheet and then populates it.
RULES is set up so that each different rule is enumerated on a different row. For this to work I need to be able to enter the actual VBA code relevant to the rule in on RULES, then I need to have the macro look at that column on RULES and execute the code in the cell.
Simplified example of my setup-
DATA has : ID, Dividend1, Dividend2, Divisor. There are n rows on DATA.
An example of a row on DATA might be ID="123", Dividend1=5, Dividend2=7, Divisor=35.
RULES has : Name, Formula, Threshold. For simplicity's sake there is only .
Let's set the as Name="Example", Formula=[see below], Threshold="0.15" (Threshold is used for conditional formatting in the macro, in this example it is unused.)
I'm going to use pseudocode for Formula just to eliminate the need to explain some of the irrelevant particulars of my macro so far. RULES.Formula should contain a line of VBA code that carries out-
If CurrentDATARow.Dividend1 = Empty Then
CurrentDATARow.Dividend2 / CurrentDATARow.Divisor
Else
CurrentDATARow.Dividend1 / CurrentDATARow.Divisor
End If
So, all of this explanation just to give context to this question: What can I do in the VBA of the macro to make it read the contents of RULES.Formula and make it execute that code inline with the rest of the macro?
If you have (say)
IF({dividend1}="",{dividend2}/{divisor},{dividend1}/{divisor})
stored in a "rule" cell (note do not include the "="), you can use Replace() to replace the placeholders with the relevant cell addresses for each cell in the row you're checking.
Then use something like
Dim val
val=Sheet1.Evaluate(yourformulastring) 'evaluate vs. specific worksheet
If Not IsError(val) Then
'check against thresholds etc
End If
If the evaluation results in an error you can test with IsError(val) as shown, otherwise it will return the result of the formula, which you can test against your "threshold" value cell. If you set background colors on your threshold cells you can color each row according to which threshold was exceeded.
NOTE without a worksheet qualifier, Evaluate will calculate the formula based on the ActiveSheet, so make sure the right sheet is active when this runs if you don't use the qualifier.
you could store your Tests/Rules as Excel worksheet formulas in Named ranges. Then you just call them from the cells.
see Ozgrid: Named Formulas
If you give us some example data and the type of calculations or rules I can give you a couple of examples.