Pandas Dataframe unable to reference column name in Plotly Dash Dropdown when set to reference in google sheets column - pandas

I am currently busy with a Plotly Dash Web Application and busy creating a dropdown referencing a column from a pandas dataframe I am reading in from a CSV file.
The issue is it is not able to read the column and I have seen it is because the column is actually a reference of another sheet I.e =RawData!A1.
I have managed to print the column so I know it exists in the dataframe and all the data is printing correctly however, Plotly Dash does not want to populate the dropdown with the label and values - my current line of code is:
options=[{'label': i, 'value': i} for i in df.CategoryName.unique()],
Category name in Google Sheets is referring to =RawData!A1
What I have tested:
Ammended my sheet name to read directly from my RawData sheet and it works fine - This is not a solution that I want though, this lead me to see the issue was with the reading from the referenced column.
Attempted using column index instead:
options=[{'label': i, 'value': i} for i in df.iloc[:,1].unique()],
Again this worked for printing but not to populate the dropdown in plotly dash.
Any advise will be greatly appreciated!

So by actually adding some data cleaning to remove rows at the bottom of my dataset in pandas it fixed my issue.
I added a remove nan based on my column CategoryName and by doing that my dropdown worked
df = df[df['CategoryName'].notna()]
The reason it worked makes sense - I setup my copy form to =RawData!A:A so my RawData at the moment only comprising of 123 rows, by row 125 in my reference sheet it was referencing a blank column causing the dropdown to have an error showing a reference to something that does not exist, very funny error but logical at the same time. Not sure if this will help many people but hopefully it will assist somebody!

Related

Error in appending first row of a dataframe to a csv file template

Trying to append the dataframe to an existing template of a csv file.
Facing problem with the first row data only. Instead of first row being written in second row below the column fields, it is getting written adjacent to column headings with the last column heading getting merged with the first row data. Unable to resolve it. I'm seeking help to resolve this issue.
import shutil, pandas as pd
original_File = 'File1.csv'
target_File = "File2.csv"
shutil.copyfile(original_File, target_File)
FinalDF = pd.DataFrame()
FinalDF["Item"]= None
FinalDF["Value"] = None
FinalDF.loc["0"] = ["Book",300]
FinalDF.loc["1"] = ["Calculator",1000]
print(FinalDF)
FinalDF.to_csv('File2.csv',mode='a', index=False, header=False)
The issue was with the template file. On opening the template csv file in mac book, it was keeping the cursor active in the last column heading cell even after closing it.
When the dataframe is appended to the file2 which is a replica of File1, first row is continued from last column heading cell but succeeding rows are appended normally.
Hence i tried to open the same csv file in windows system, for the safer side, deleted all extra columns and rows in the template file and saved.
Thereafter not facing this issue.

iterating through a dataframe column to see if individual cell value is in a list of file paths

I have a list of shapefile paths (sf_list) and I'm using a nested for loop to iterate through the list and a dataframe (df) to see if a value in the row of a column (name) of that dataframe is in that path, and if it is, append that list value to that row of the dataframe in a new column of the dataframe (sf_path). This is what I have now:
for sf in sf_list:
for row in df.iterrows():
df.loc[df['name'].isin(sf),'sf_path'] = [sf]
The script runs but the new column is empty. The list is populated with all of the paths I need and the column of that dataframe contains specific text that is in the path I went the path to populate in that row of the new column. Any direction appreciated.
UPDATE:
Alright now I have:
for sf in sf_list:
for row in dlrules_df.iterrows():
dlrules_df.loc[dlrules_df['dl_foldername'] in sf, 'sf_path'] = sf
Error returned:
TypeError: 'in <string>' requires string as left operand, not Series
Can you give this a try? apply isn't recommended but it has become quite a habit for me. Would like to spend more time to give you a more efficient solution but it's already bedtime here and this popped out off the back of my head.
sf = [list_of_folder_paths]
dlrules_df.loc[:, 'dl_foldername'].apply(lambda x: sf[sf.index(x)] if x in sf else None)
PS: Not tested, so it may break somewhere but I hope it gives you some idea.

How to append to an existing sheet and empty a dataFrame for new data

This might be a noob question but how can I append to a sheet instead of overwriting the existing data. Then how can I empty a dataFrame so it can be populated wit new data.
Basically I am reading in a file and populating a dataFrame writing to a sheet then emptying the dataFrame so it may be empty to read in new data.
I am stuck at empting the dataFrame:
avgs = avgs.drop(['Period start','Period end','zone','usid','site id','rank','Total LCQI Impact','LTE BLOCK Impact','LTE DROP Impact','LTE TPUT Impact','engineer notes'],axis=1)
And appending to the sheet.
avgs.to_excel("pandas_out.xlsx",merge_cells=False) ## need to append to file
You can consider using avgs = pd.DataFrame() to empty existing dataframe. In case of you want to preserve column names, you can try avgs = pd.DataFrame(columns=avgs.columns) instead.
Regarding dataframe append, there are many ways to do it. But you have to proceed it along following step. First pd.read_excel(), then append something, finally df.to_excel() again. About ways to append, Please refer to pd.concat(), pd.Series.append(), pd.DataFrame.append()

grab and filter from more than 255 columns from a huge closed workbook

i have a huge workbook (0.6 million rows) and 315 columns whose column names i need to grab into an array. due to the huge size, i don't want to open and close the workbook to copy the 1st row of the range. Also, I want to only grab certain columns from the 1st row that begin with the word "Global ".
can anyone help with short code example on how to go about doing this? please note i have tried ADOX, ADO etc but both show the 255 column limitations. I also dont want to open the workbook, but pull the required "Global " columns from the 315 columns into an array.
any help is most appreciated.
You can copy the first row of your target by opening a new workbook, and in A1 use this formula:
='C:\PATH_TO_TARGET\[TARGET_FILE_NAME.xlsx]WORKSHEET_NAME'!A1
Note that PATH+FILENAME+WORKSHEET is enclosed in single quotes, the FILENAME is enclosed in square brackets, and an exclamation separates the cell reference.
Then copy/Paste or fill right to get the next 314 columns. Note: this formula will return zero for empty target cells.
Once you have the column heading you can copy/paste_special_values if you want to destroy the links to the closed workbook.
Hope that helps
You could use the Python programing language.
While it does not actively works with XLSX fiels, you just have to install the openpyxl external module from here: https://pypi.python.org/pypi/openpyxl -
(You will also have to install Python. of course - just download it from www.python.org)
It will make working with your data in an interactive Python session a piece of cake, and the time to open the workbook without having to load the Excel interface should be a fraction of what you are expecting. (I think it will have to fit in your memory, though).
But this is all I had to type, in an interactive Python2 session to open a workbook, and retreive the column names that start with "bl":
import openpyxl
a = openpyxl.load_workbook("bla.xlsx")
[cell.value for cell in a.worksheets[0].rows[0] if cell.value.startswith("bl")]
output:
Out[8]: [u'bla', u'ble', u'bli', u'blo', u'blu']
The last input line requires on to know Python to be understood, so, here is a summary of what happens: Python is a language very fond of working with sequences - and the openpyxl libray gives your workbook as just that:
an object which is a sequence of worksheets - each worksheet having a rows attribute which has a sequence of all rows in the sheet, and each row bein a sequence of cells. Each cell has a value attribute which is the text within it.
The inline for statement is the compact form, but it could be written as a multiple line statement as:
In [10]: for cell in a.worksheets[0].rows[0]:
....: if cell.value.startswith("bl"):
....: print cell.value
....:
bla
ble
bli
blo
blu
Keep in mind that by exploring Python a bit deeper, you can programatically manipulate your data in a way that will be easier than ininteractivelygiven a data-set this size - and you can even use Python itself to drop select contents to an SQL database, (including its bult-in, single-file database, sqlite), where sophisticated indexes and queries can make working with your data a breeze)

Inconsistent recognition of ranges (No errors thrown)

This is code from Excel 2010. All of the code resides within the workbook itself. Information is gathered using internal forms and the code is run.
I have code that retrieves data from a spreadsheet and populates an object with that data. The row number is dynamic as it is dependent on the form input. The column is by the header, not the column number. The following code works perfectly but for two anomalies:
cTank.RowForTankSpecs = rNum
cTank.MP = .Cells(rNum, Range("MP").Column).Value
cTank.Form = .Cells(rNum, Range("formName").Column).Value
cTank.TankProcess = .Cells(rNum, Range("Process").Column).Value
cTank.Location = .Cells(rNum, Range("Location").Column).Value
cTank.TankName = .Cells(rNum, Range("Tanks").Column).Value
cTank.tankID = .Cells(rNum, Range("TankID").Column).Value
First:
The cTank.TankName is retrieving information from a column named "Tanks". That column does not exist. The actual column header is "Tank". But, it is retrieving the correct information. If I change the name to what it really is (Tank), it does not work.
Second:
When the cTank.TankID line is executed, I get the following error on the Range("TankID"):
Runtime Error 1004: Method 'Range' of object '_Global' failed
This one has the appropriate header (column header), but it is not recognizing the range.
I have tried simple things such as changing the order of the code, but it doesn't help. As earlier stated, the other lines work. Later in the program, information is gathered in the same manner but using another worksheet from the same workbook, and none of them are working. I've double checked that strings are strings and integers are integers, etc. I've double checked the column headers match the range names. Nothing seems to jump out at me.
I would appreciate any input you may have on the situation.
Thanks in advance.
Steve
Ok. Being pretty sure my code was correct, I went to the spreadsheet itself. For some reason it was recognizing only certain columns and it was recognizing one of them incorrectly. So I started highlighting the columns that worked and also the columns that didn't. What I noticed was that on the columns that were being recognized, that column header was displayed where the cell location is normally displayed whereas on the columns that were not being recognized, the cell location (i.e. A1, A2, etc.) for the header was being displayed and not the header title itself. The incorrect label was showing up for one of them. As it turns out, the mislabeled column was one that I had used for a form dropdown menu. So, I checked the name manager, and the ones that were working were listed. So anyway, using the name manager, I added named ranges using the headers. Now, when I select the columns, the column header(named range) appears in that window and now, the code works.
Thanks guys for your input. I really appreciate it.
Two things you can do:
Do not use use Range, but as it seems you are using names, use Names("Yourname").Referstorange.
OR
Make sure your names are set up correctly using the Name Manager in Data Ribbon.