Create a spreadsheet with multiple pivoted tabs using Pandas - pandas

The following code successfully creates a spreadsheet from a pivot table:
pivot_table_name.to_excel('spreadsheet_file_name.xlsx')
But now I need to be able to create more than one tab. Other posts have recommended using code like this:
for row in dataframe_to_rows('table_name', index=False, header=True):
ws.append(row)
But that is for a "normal" table/dataframe, i.e. not pivoted so if I try that I will of course get the expected:
AttributeError: 'str' object has no attribute '_data'
Is there a way to do this using Pandas?
Also, in case is helpful in any way, I am currently using Papermill to execute multiple scripts so if separate processing is required that is not a problem. Not everything needs to happen in the same script.

If you're trying to add multiple sheets in the same workbook this should work.
with pd.ExcelWriter('spreadsheet_file_name.xlsx') as writer:
pivot_table_name.to_excel(writer, sheet_name='Sheet_name_1')
pivot_table_name2.to_excel(writer, sheet_name='Sheet_name_2')

Related

Robotframework - how to fix this deprecated functionality ${row.find_elements_by_tag_name('td')}

At somepoint there was an update to some functionality and now the following doesnt work on robotframework anymore
${row.find_elements_by_tag_name('td')}
I cant seem to fix this and cant find much support
I believe the new way is done by using by.TAG_NAME() or something like that
For context im getting the table rows (tr) and storing it in a list, for each tr I want to get the table divisions (td) so I can create a list of each tr but have access to the td within.
This is done so I can check a csv file matches the table when exported
Below is the section of code
#{severity_list}= Create List
FOR ${page} IN RANGE 99999
#{rows}= Get Alarms Table Rows
FOR ${row} IN #{rows}
#{elements}= Set Variable ${row.find_elements_by_tag_name('td')}
# Train Severity List
#{train_severity}= Create List
Append To List ${train_severity} ${elements[2].text}
Append To List ${train_severity} ${elements[3].text}
# Get Severity value
Append To List ${severity_list} ${train_severity}
END
I believe the new way is done by using by.TAG_NAME() or something like that
ERROR Message:Resolving variable '${row.find_elements_by_tag_name('td')}' failed:
AttributeError: 'WebElement' object has no attribute 'find_elements_by_tag_name'

Correct Malformed CSV and pull corrected data back into a dataframe

UPDATE BELOW.....
Have automated csv data dumping into our backend and it looks like there are some malformed items buried in the data. There is a job family title that errantly has a \n in between two words. Which is wrecking our data, so that's the problem.
I want to read in the csv as wholetext, regexp_replace the title with the correction, then load this fixed wholetext into a new dataframe as if I loaded up a correct csv to start with.. Here's the madness of where I'm at right now: Lol.
# Import in the functions I need
# from pyspark.sql.functions import col
# Looks like there is a job family title with an issue. There's a carriage return / line feed between two words messing up the csv
# This needs to be patched before we actually pull the data into the dataframes to begin work
data_requisitions_patch0 = spark.read.text('abfss://container#somethingcool.dfs.core.windows.net/Data/brokencsv.csv', wholetext=True)
data_requisitions_patch0.collect()
data_requisitions_new = data_requisitions_patch0
# print(data_requisitions_patch0)
# data_requisitions_patch0.printSchema()
# data_requisitions_patch0.show()
data_requisitions_patch1 = data_requisitions_patch0 \
.withColumn("value", regexp_replace(col('value'), 'Job - Starting\n', 'Job - Starting'))
data_requisitions_patch1.collect()
print('patch0')
data_requisitions_new.count()
print('patch1')
data_requisitions_patch1.count()
# print('Patch0 dataframe: ' + data_requisitions_patch0.count())
# print('Patch0 dataframe: ' + data_requisitions_patch1.count())
# data_requisitions_test0 = spark(data_requisitions_patch1, header=True)
# data_requisitions_test1 = spark.read.csv('abfss://container#somethingcool.dfs.core.windows.net/Data/brokencsv.csv', header=True)
# data_requisitions_test0.count()
# data_requisitions_test0.printSchema()
# data_requisitions_test1.count()
# data_requisitions_test1.printSchema()
It's obviously a mess right now, I'm trying to troubleshoot is the regexp_replace is working, but not having much luck. Then it occurred to me that I have a single row single column dataframe. Now I'm attempting to try to figure how how to take the dataframe post the 'patch' and turn that back into a normal csv'ed dataframe like everything was ok to begin with.
I left in all my testing nonsense, thought was that you might see where my head is... Unsure if that was helpful or not. Links have been faked, obviously.
First off: Am I going in the right direction? No part of this is really working.. I can't even get the counts to work. test1.count() does return... but test0.count() doesn't? I don't even really care about the counts, that's me just trying to figure out why it's not working.
Secondly: Malformed csv -> wholetext dataframe -> regexp fix the problem -> fixed dataframe with correct headers, rows, like normal.
How off am I?
=======
UPDATE
Made some great progress, I ended up splitting the wholetext dataframe on \n line feeds and exploded that into rows. That works great. Now the dataframe has exactly how many rows it's supposed to have. Now working on trying to figure out how to re-map the columns to get those created correctly.
Thoughts are to take in the header row and try to use that as a map? I don't know, still researching.
I wasn't approaching this right... Was handling this like a typical C# project, pull data from the db and process. But this doesn't really deal well with that. Ended up putting the processed data into the dataframe itself and ran my if checks from contained columns. Works fantastic, and it's a lot faster than trying to extract the data to do the checks.

pyspark gives error for describe() method when the column name as a '.' in it

I am importing data using pyspark dataframe from a csv file with header names have . in them e.g., x.name. It displays correctly. However, when I am trying to view its summary using describe method, I get an error - looks like whenever the column name has a . in it I have this error, else it works fine. Could someone please suggest how to resolve this?
I tried pandas and it works without any issues.
Thank you very much.
error

Creating a feature class in ArcGIS 10

I am trying to create a feature class from another feature class via below:
arcpy.CreateFeatureclass_management(path, name, "POLYGON")
In ArcGIS, it is creating the fields Shape, Shape_Length, and Shape_Area. I added additional fields to the newly feature class.
cursor = arcpy.da.SearchCursor("old featureclass", ["Shape#", "*"]
insert = arcpy.da.InsertCursor("new featureclass", ["*"]
for i in cursor:
insert.insertRow(i)
I am getting an error:
Sequence size must match size of the row
This is because the newly feature class has added additional fields as I mentioned above. Then I tried
for i in cursor:
append newly array with (ShapeLength, and ShapeArea)
insert.insertRow(newlyarray)
It worked fine but the Shape_Area and Shape_Length is returning zero. I've also tried to calculate field area and it didn't work as well.
Can someone please help me with this issue? The geometry shape is a polygon but the shape area and shape length won't populate based of the pre-existing shape.
I think what you want to do is:
Get the fields list from the old shapefile,
Add the desired fields to the new file,
then iterate over each field (or just the ones you want to copy) of each row from the old file and copy the value into its corresponding field in the new file.
Then you can add your brand new new fields whenever you like and this copying function won't be affected if it refers to fields by name.
Another method would be to literally copy the old shapefile using the copy tool, then edit the copied file.

Openrefine not working as expected

I'm very new to OpenRefine, so please bear with me if i have made a simple mistake.
I'm parsing a HTML website to gather some date.
Everything went fine with fetching the individual pages, but now the parsing of the HTML fails.
I'm creating a new column, based on the one holding all the page's HTML. I'm trying to get to the data in a specific DIV[20].
In the"create column based on this column" window it gives me a preview when using value.parseHtml().select("DIV")[20] , which results in exactly what i need... executing it gives me nothing but blank cells.
it even tells me that it is "filling 0 rows with grel:value.parseHtml().select("DIV")[20]"
Any clue what i'm doing wrong here?
You just need to finalize with .toString() to output the JSON.org object AS a string.
This is explained on our wiki here: https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML#extract-html-attributes-text-links-with-integrated-grel-commands
I also updated the select() function with that example: https://github.com/OpenRefine/OpenRefine/wiki/GREL-Other-Functions#selectelement-e-string-s