Sorting Excel with openpyxl - openpyxl

My English is not well,sorry about that.
I am trying to sort a sheet using openpyxl and Python. I have read the documents and I don't quite understand this page. I am expecting to sort my excel file by some columns,at frist sort by column A,then by column B,then by cloumn C and last by column F.My problem is how to use add_sort_condition and add_filter_column these two ways,just the simple way would be great.If i can get an example,that's help me a lot!
wb_read = openpyxl.load_workbook(filename)
sheet_read = wb_read.get_active_sheet()
sheet_read.auto_filter.ref = 'A3:Q40000'
"""sheet_read.auto_filter.add_filter.column(2,['id'],blank=False) Didn't work
sheet_read.auto_filter.add_sort_condition('A') How and Where to use condition?
sheet_read.auto_filter.add_sort_condition('A3:Q40000') It seems didn't work again"""
wb_read.save('sorted.xlsx')
By the way,If i want to sort,Do I have to use auto_filter?

Related

Pandas Dataframe unable to reference column name in Plotly Dash Dropdown when set to reference in google sheets column

I am currently busy with a Plotly Dash Web Application and busy creating a dropdown referencing a column from a pandas dataframe I am reading in from a CSV file.
The issue is it is not able to read the column and I have seen it is because the column is actually a reference of another sheet I.e =RawData!A1.
I have managed to print the column so I know it exists in the dataframe and all the data is printing correctly however, Plotly Dash does not want to populate the dropdown with the label and values - my current line of code is:
options=[{'label': i, 'value': i} for i in df.CategoryName.unique()],
Category name in Google Sheets is referring to =RawData!A1
What I have tested:
Ammended my sheet name to read directly from my RawData sheet and it works fine - This is not a solution that I want though, this lead me to see the issue was with the reading from the referenced column.
Attempted using column index instead:
options=[{'label': i, 'value': i} for i in df.iloc[:,1].unique()],
Again this worked for printing but not to populate the dropdown in plotly dash.
Any advise will be greatly appreciated!
So by actually adding some data cleaning to remove rows at the bottom of my dataset in pandas it fixed my issue.
I added a remove nan based on my column CategoryName and by doing that my dropdown worked
df = df[df['CategoryName'].notna()]
The reason it worked makes sense - I setup my copy form to =RawData!A:A so my RawData at the moment only comprising of 123 rows, by row 125 in my reference sheet it was referencing a blank column causing the dropdown to have an error showing a reference to something that does not exist, very funny error but logical at the same time. Not sure if this will help many people but hopefully it will assist somebody!

Matching against ID and copying cells of other column as a block to another sheet

I got a Sheet1 looking like this:
And what I want to do is that I copy the amounts where the ID matches the ID I am looking for (here 3) into another Sheet.
So my result in Sheet 2 would look like this:
Any ideas on how to do that? I think Excel VBA is necessary (would be awesome if its not needed though) and I also thought about using SQL expression in VBA.
Thanks
When ids are on B2:B10 and amounts are on D2:D10 and the value to find is on B13, this array formula will get multiple matches:
{=IFERROR(INDEX(D$3:D$10, SMALL(IF(B$3:B$10=$B$13, ROW(B$3:B$10)-MIN(ROW(B$3:B$10))+1), ROW(E$3:E$10)-2)), "")}
Note that the last E$3:E$10 can be an any column. This method is from this article, which explains the formula very kindly.
Also, as I commented, in Google Sheets, you can achieve this by more simple query() function:
=query($B:$D, "select D where B = "&$B$13)

How do I get more information from an additional column if the first one does not have it?

I would just like to say that I'm very new to VBA and more complicated formulas so all the help will be appreciated! Thank you!
To clarify a bit more with the title. I currently have a Macro that is reading a formula to give me information from another worksheet. I'll give an example of the formula that is working:
=IF(A2 = ""No Specific Program"", A2,IF(F2 = """",""No PIN"",IFERROR(VLOOKUP(CONCATENATE(A2,F2),....!$C:$I,4,FALSE),""NO DATA"")))
This is the original formula that I'm using to get the information, column A is my Parts owned by Program, And my column B is the actual Program. So when I run the Macro it does give me most of the information, but when it runs into "No Specific Program" even when there is something for Column B showing what program the part is in, it will give me No specific Program.
Also for reference the F2 it is Concatenating is a PIN number which will help determine who owns the part.
I've been stumped on this trying to get the code to work and I've tried place THEN and ELSE within the statement and it just gives a FALSE statement
EDIT:
The Code above works, it's when I use this version of the Code:
=IF(A2 = ""No Specific Program"",THEN,IF(F2 = """",""No PIN"",IFERROR(VLOOKUP(CONCATENATE(A2,F2),'.....'!$C:$I,4,FALSE),""NO DATA"",Else,IF(A2 = ""No Specific Program"",THEN,IF(F2 = """",""No PIN"",IFERROR(VLOOKUP(CONCATENATE(A2,F2),'.....'!$C:$I,4,FALSE),""NO DATA"")))))
I get the False or errors when I try different variations. Here is an example of the columns. Column A is where I have the original formula reading from, but then it says No Specific Program, while Column B shows the Program. So I'm trying to get the formula to read that as well as column A to capture all the information I need :
Columns Example
EDIT:
It starts breaking after the ELSE statement.
Edit:
=IF(A2 = ""No Specific Program"",
IF(F2 = """",""No PIN"",IFERROR(VLOOKUP(CONCATENATE(B2,F2),'\NW\Data\TechIntegration\Sustaining
Team\Data
Mining\DataMining[GAD_PIN_TABLE.xlsx]Sheet1'!$C:$I,5,FALSE),
""NO GAD
DATA"",
IF(F2 = """",""No
PIN"",IFERROR(VLOOKUP(CONCATENATE(A2,F2),'\NW\Data\TechIntegration\Sustaining
Team\Data
Mining\DataMining[GAD_PIN_TABLE.xlsx]Sheet1'!$C:$I,5,FALSE),""NO GAD
DATA"")))))
Just trying to make it easier to see the formula.
I'd split this down to make it simpler to follow, use a holding cell then refer to that in your top formula (The one you know works already)
Stick this in another column, say Z for example, then everywhere you refer to A2 in your working formula, change it to Z2
=IF(A2=""No Specific Program"",IF(B2=""No Specific Program"",""No Specific Program"",B2),A2)
This will only give you "No specific program" if both a2 and b2 contain "No specific program" which I think is what you're after. In your second example in the columns example link, it will return NG

grab and filter from more than 255 columns from a huge closed workbook

i have a huge workbook (0.6 million rows) and 315 columns whose column names i need to grab into an array. due to the huge size, i don't want to open and close the workbook to copy the 1st row of the range. Also, I want to only grab certain columns from the 1st row that begin with the word "Global ".
can anyone help with short code example on how to go about doing this? please note i have tried ADOX, ADO etc but both show the 255 column limitations. I also dont want to open the workbook, but pull the required "Global " columns from the 315 columns into an array.
any help is most appreciated.
You can copy the first row of your target by opening a new workbook, and in A1 use this formula:
='C:\PATH_TO_TARGET\[TARGET_FILE_NAME.xlsx]WORKSHEET_NAME'!A1
Note that PATH+FILENAME+WORKSHEET is enclosed in single quotes, the FILENAME is enclosed in square brackets, and an exclamation separates the cell reference.
Then copy/Paste or fill right to get the next 314 columns. Note: this formula will return zero for empty target cells.
Once you have the column heading you can copy/paste_special_values if you want to destroy the links to the closed workbook.
Hope that helps
You could use the Python programing language.
While it does not actively works with XLSX fiels, you just have to install the openpyxl external module from here: https://pypi.python.org/pypi/openpyxl -
(You will also have to install Python. of course - just download it from www.python.org)
It will make working with your data in an interactive Python session a piece of cake, and the time to open the workbook without having to load the Excel interface should be a fraction of what you are expecting. (I think it will have to fit in your memory, though).
But this is all I had to type, in an interactive Python2 session to open a workbook, and retreive the column names that start with "bl":
import openpyxl
a = openpyxl.load_workbook("bla.xlsx")
[cell.value for cell in a.worksheets[0].rows[0] if cell.value.startswith("bl")]
output:
Out[8]: [u'bla', u'ble', u'bli', u'blo', u'blu']
The last input line requires on to know Python to be understood, so, here is a summary of what happens: Python is a language very fond of working with sequences - and the openpyxl libray gives your workbook as just that:
an object which is a sequence of worksheets - each worksheet having a rows attribute which has a sequence of all rows in the sheet, and each row bein a sequence of cells. Each cell has a value attribute which is the text within it.
The inline for statement is the compact form, but it could be written as a multiple line statement as:
In [10]: for cell in a.worksheets[0].rows[0]:
....: if cell.value.startswith("bl"):
....: print cell.value
....:
bla
ble
bli
blo
blu
Keep in mind that by exploring Python a bit deeper, you can programatically manipulate your data in a way that will be easier than ininteractivelygiven a data-set this size - and you can even use Python itself to drop select contents to an SQL database, (including its bult-in, single-file database, sqlite), where sophisticated indexes and queries can make working with your data a breeze)

How do you use the value of a named cell in a macro in a different sheet, same workbook?

I'm just not getting it, I'm using Excel 2003 and am totally confused as I'm just not getting it ...can anyone help? I need to check that named value with a number of column headings on the active sheet and then insert a column to the left of the column holding the matching text. I'm sure that bit is very hard - but I'm not even able to get started here...
Well, to get your named value column (which I presume is a named range...), you would use:
ActiveWorkbook.Sheets("mySheet").Range("myRange").Column
So you could do something like:
myNamedRange = ActiveWorkbook.Sheets("mySheet").Range("myNamedRange").Value
myCol = ActiveWorkbook.Sheets("myMainSheet").Rows("1:1").Find(myNamedRange).Column
ActiveWorkbook.Sheets("myMainSheet").Cells(1, myCol).EntireColumn.Insert