Create Dynamic Excel path to read in Python - pandas

Happy NYE! Hopefully this is the last question of the year from me. I am trying to create a dynamic excel path for python to extract various excel files based on the date, type parameters, etc.
I have created below parameters which I can change base on the file I am trying to retrieve:
date = input() [assign '201912']
type = input() [assign 'abc']
I am trying to figure out a way to incorporate these into the file directory I am asking python to read such that it returns the same result as below:
import pandas as pd
sheet=pd.ExcelFile(r'C:\Information\Management\201912\abc\abc Template 201912.xlsm')
Have tried few different ways but cant seem to get it to work. Any suggestions on this?
Thank you very much!

You can format it with the % sign like in Python 2 (not recommended):
path = r'C:\Information\Management\%s\%s\%s Template %s.xlsm' % (date, type, type, date)
The Pythonic way in Python 3 is to use str.format:
path = r'C:\Information\Management\{}\{}\{} Template {}.xlsm'.format(date, type, type, date)
Or with named parameters:
path = r'C:\Information\Management\{date}\{type}\{type} Template {date}.xlsm'.format(date=date, type=type)
Starting with Python 3.6, you can also use f-strings:
path = f'C:\\Information\\Management\\{date}\\{type}\\{type} Template {date}.xlsm'
sheet = pd.read_excel(path)

Related

GridFs read PDF

I am trying to build a financial dashboard with Flask and pymongo. The starting point is a flask form which saves data in a MongoDB database. One of the fields in the form is a FileField (wtforms) which allows the upload of a PDF, which is then stored in MongoDB with GridFS.
Now I manage to save the pdf and I can see the resulting entries within the .files and .chunks collections. Now I would like to build a function that retrieves the PDFs and analyses them with some basic NLP, however I struggle with the getting meaningful data.
When I do:
storage = gridfs.GridFS(db, collection)
data = storage.get('some id')
a = data.read()
The result is a binary file. If I continue with:
with open(data, 'rb') as f:
b = f.read()
The result is "ValueError: embedded null byte or sometimes an empty "byte string".
Any help on this?
To follow up on the above, I found a solution for myself that consists in 2 separate functions:
(1) Upon upload of the form and before uploading the files to MongoDB, I apply a function based on pdfminer that extracts the string content of the PDF and tranform it into a list of sentences using NLTK. I will then store this list in the .files via the storage.put(file, sent_list = sent_list) #sent_list being the variable name of the list of sentences.
Whenever I wish to run NLP operations on the file, I will just call the "sent_list" variable from mongodb.
(2) If I wish to display the stored pdf in its original content however, I included the following function as a separate route.
storage = GridFS(db, collection)
data = storage.get_last_version(filename)
response = make_response(data.read())
extension = data.filename.split('.')[-1]
response.headers['Content-Type'] = f'application/{extension}'
response.headers['Content-Disposition'] = f'inline; filename={data.filename}'
return response
(2) will open a new tab in my flask app showing the .pdf file in its original format.
I hope this helps anyone coming across a similar problem in the future.

How do I save a pdf with a custom file name

Im new to python and can't figure out how to save my pdf file with a custom name.
What I want to do is something like:
Name = input('What is your name: ')
pdf.output('Name.pdf')
And as I expected the name of the pdf file created is now: Name
I tried searching for it but I think im using the wrong words for it since I can't find a solution for my problem. I might be explaning it badly. But if anyone got an answer for it it would be much appreciated :)
Sir, in your question:
pdf.output('Name.pdf')
Here you are passing Name.pdf as a string to the output function if you want to pass the Name variable you should not use ' quotes around the value you are passing you final code should look something like this:
name = input("Enter your name: ")
name = name + ".pdf" //adds .pdf extension to the name entered
pdf.output(name) //see how there are no quotes around name variable

Is there a way to store data in hdf5 file in ScriptJob in pyiron?

I have my own Monte Carlo code (which is not part of pyiron), which I launch via ScriptJob in pyiron. Currently, I store the output data in a file, but since the script job is a pyiron object and an hdf5 is created, I would love to store the data there. So, I'd love to have something like:
script_job = pr.create_job('ScriptJob', 'job')
script_job.script_path = 'monte_carlo.ipynb'
script_job.run()
script_job['user/output/'] # This returns the output of what I store in monte_carlo.ipynb
Is there a way to do something inside monte_carlo.ipynb to make this happen?
You can summarise your output in a dictionary named output_dict and then use:
from pyiron import Notebook
Notebook().store_custom_output_dict(output_dict)

Groovy Script for PhpStorm Live Templates give suggested box?

So I have a little silly problem. I have a groovy script that reads all files in a folder and then manipulates the files in such a way to output the file names for the user to select the correct one in the live template variable. My problem is that the auto suggestion list only displays 1 item and not multiple items to select from in the IDE.
Here is the live template setup:
This is the output:
This is what I want (without using enum()):
This is the piece of code:
groovyScript("import static groovy.io.FileType.FILES;def curPath = _editor.getVirtualFile().getPath().split('/src/')[0];def dir = new File(curPath+'/src/partials');def files = [];dir.traverse(type: FILES, maxDepth: 1) { files.add(it.toString().replace('/src/partials/','').replace(curPath,'').replace('.html','')) }; return files;",methodParameters())
Please help... Since google searches does not yield any proper answers.
As of IntelliJ IDEA 2018.3, the groovyScript() feature does not support generating a list of suggestions. It can only be used to calculate a single suggestion which is then inserted into the editor.

how do I add a pandas object (e.g. DataFrame) to a group within an HDF file?

Suppose I have an HDF5 file (myHDF.h5) with a hierarchy of groups, something like:
/root/groupA
/groupB
Now I want to add a DataFrame (myFrame) to the groupA (along with some other objects such as dictionaries). How do I do that? If I open my HDF.h5 with pandas.io.HDFStore:
store = pandas.io.HDFStore('myHDF.h5')
and then try:
store['groupA']['myFrame'] = myFrame
I get:
AttributeError: Attribute 'pandas_type' does not exist in node: '/groupA'
What is the proper way to do this?
this is enabled as of version 0.10.0
http://pandas.pydata.org/pandas-docs/stable/io.html#hierarchical-keys
Currently pandas does not support hierarchical paths as you specified.
There is an open github issue about this: https://github.com/pydata/pandas/issues/13
I'm not sure when we will get around to adding this feature, would more than welcome a pull request if you're interested in completing the skeleton code that's in the issue discussion.