Auto-send inputs to another python script - pandas

I have a python script that accepts 2 user inputs before it goes away and does something. I'm in a position where I now need to run the script multiple times, with different values for either of the 2 inputs. I'd rather not have to keep re-running the script and manually enter the values each time as I have over 100 different combinations.
I believe this can be done fairly fluidly in python.
Here is an example: input.py
import pandas as pd
# Variables for user input
print("Please enter your name: ")
UserInput = input()
name = str(UserInput)
print("Now enter your Job title: ")
UserInput2 = input()
job = str(UserInput2)
# Create filename with name and current date
currenttime = pd.to_datetime('today').strftime('%d%m%Y')
filename = name + '_' + currenttime + ".csv"
print('\nHi ' + name + '. ')
print('Hope your job as a ' + job + ' is going well.')
print("I've saved your inputs to " + filename)
print('Speak to you soon. Bye.')
# Create DataFrame, append inputs and save to csv
series = pd.Series({'Name ': name, 'Job Title ': job})
df = pd.DataFrame([series])
df.to_csv(filename, index=False)
Below is my attempt of auto-sending inputs to input.py using the subprocess module (might not be the correct approach). The file calls input.py successfully, but I cannot figure what other arguments I need to add to send the specified values.
Code used: auto.py
import subprocess
subprocess.run(['python', 'input.py'])
# These don't work
#subprocess.run(['python', 'input.py', 'Tim', 'Doctor'])
#subprocess.run(['python', 'input.py'], input=('Tim', 'Doctor'))
#subprocess.run(['python', 'input.py'], stdin=('Tim', 'Doctor'))
Ideally I want to have a list of the different inputs I want to send to the script so that it loops and adds the second batch of inputs, loops again then third batch and so on.
I'm unsure if I'm using correct subprocess method.
Any help would be greatly appreciated.

Use something like this in your input.py
import pandas as pd
import sys
# Variables for user input
UserInput = sys.argv[0]
name = str(UserInput)
UserInput2 = sys.argv[1]
job = str(UserInput2)
# Create filename with name and current date
currenttime = pd.to_datetime('today').strftime('%d%m%Y')
filename = name + '_' + currenttime + ".csv"
print('\nHi ' + name + '. ')
print('Hope your job as a ' + job + ' is going well.')
print("I've saved your inputs to " + filename)
print('Speak to you soon. Bye.')
# Create DataFrame, append inputs and save to csv
series = pd.Series({'Name ': name, 'Job Title ': job})
df = pd.DataFrame([series])
df.to_csv(filename, index=False)
auto.py will be
import subprocess
subprocess.run(['python', 'input.py','Tim', 'Doctor'])
let me know if that helps and I can modify your need accordingly.
auto.py output:

Related

openpyxl: Is it possible to load a workbook (with data_only=False), work on it, save it and re open the saved vile with (data_only= True)?

Basically the title. The thing is that I got an Excel file already (with a lot of formulas) and I have to use it as a template, but I have to copy certain column and paste it in another column.
Since I have to make some graphs in between I need the numeric data of the excel file so my plan is the following:
1.- load the file with data_only = False.
2.- Make the for loops needed to copy and paste info from one worksheet to another.
3.- Save the copied data as another Excel file.
4.- Open the new Excel created file, this time with data_only = True, so I can work with the numeric values.
The problem is that after doing this, it's like after putting data_only on the new created file it doesn't work, because when I made a list that filters NoneType values and strings in a column that have actual numerical values it gives me an empty list.
#I made the following
wb = load_workbook('file_name.xlsx', data_only = True)
S1 = wb['Sheet 1']
S2 = wb['Sheet 2']
#Determination of min and max cols and rows
col_min = S1.min_column
col_max = S1.max_column
row_min = S1.min_row
row_max = S1.max_row
for i in range(row_min + 2, row_max + 1):
for j in range(col_min + Value, Value + 2):
S2.cell(row = i+6, column = j+10-Value).value = S1.cell(row = i, column = j).value
Transition_file = wb.save('transition.xlsx')
wb1 = load_workbook('transition.xlsx', data_only = True) #To obtain only numerical values
S2 = wb1['Sheet 2'] #Re define my Sheet 2 values

Getting "list assignment out of range" error when setting variable through exec or locals()

I am trying to change a variable by referring to it via a string.
The problem arises, when I try to replace a variable which before was was an array with length 1 with one that is longer ("list assignment out of range" is thrown when trying to assign a value to the second array element)
I tried using locals()[variable] = [-1, -1, -1] as well as exec(variable + ' = [-1, -1, -1]') to set the variable beforehand.
My code for reference:
exec('tmp = ' + variable) # save variable in case input gets reset
exec(variable + " = [-1, -1, -1]") # replace with 3-length array
try:
# set the first array element according to user input
exec(variable + '[0] = float(input("Enter lower bound for " + variable + ": "))')
except ValueError: # in case user is dumb
print("Invalid input")
exec(variable + ' = tmp')
continue
try:
# set the second array element according to user input (this line throws the error)
exec(variable + '[1] = float(input("Enter upper bound for " + variable + ": "))')
if eval(variable + '[1] < ' + variable + '[0]'): # more user safeguards
print("Value can't be lower than " + str(aq_p[0]))
exec(variable + ' = tmp')
continue
except ValueError:
print("Invalid input")
exec(variable + ' = tmp')
continue
For some context: It's about setting up a physics simulation where a dozen parameters have to be set (and later versions may include other parameters) which is why I want to keep the code as general as possible instead of hard-coding the input for every parameter. I wanted to avoid using a dictionary with all the parameters in it, as that may complicate other things and was hoping to be able to use locals() similarly.
As a side note: The continues in the code are for a larger loop which is too long to include here.

Fastest way to read a large excel file into databricks

So I have been having some issues reading large excel files into databricks using pyspark and pandas. Spark seems to be really fast at csv and txt but not excel
i.e
df2=pd.read_excel(excel_file, sheetname=sheets,skiprows = skip_rows).astype(str)
df = spark.read.format("com.crealytics.spark.excel").option("dataAddress", "\'" + sheet + "\'" + "!A1").option("useHeader","false").option("maxRowsInMemory",1000).option("inferSchema","false").load(filePath)
We have found the fastest way to read in an excel file to be one which was written by a contractor:
from openpyxl import load_workbook
import csv
from os import sys
excel_file = "/dbfs/{}".format(path)
sheets = []
workbook = load_workbook(excel_file,read_only=True,data_only=True)
all_worksheets = workbook.get_sheet_names()
for worksheet_name in workbook.get_sheet_names():
print("Export " + worksheet_name + " ...")
try:
worksheet = workbook.get_sheet_by_name(worksheet_name)
except KeyError:
print("Could not find " + worksheet_name)
sys.exit(1)
with open("/dbfs/{}/{}.csv".format(tempDir, worksheet_name), 'w') as your_csv_file:
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
headerDone = False
for row in worksheet.iter_rows():
lrow = []
if headerDone == True:
lrow.append(worksheet_name)
else:
lrow.append("worksheet_name")
headerDone = True
for cell in row:
lrow.append(cell.value)
wr.writerow(lrow)
#Sometimes python gets a bit ahead of itself and
#tries to do this before it's finished writing the csv
#and fails
retryCount = 0
retryMax = 20
while retryCount < retryMax:
try:
df2 = spark.read.format("csv").option("header", "true").load(tempDir)
if df2.count() == 0:
print("Retrying load from CSV")
retryCount = retryCount + 1
time.sleep(10)
else:
retryCount = retryMax
except:
print("Thew an error trying to read the file")
The reason it is fast is that it is only storing one line of excel sheet in memory when it loops round. I tried appending the list of rows together but this made it very slow.
The issue with the above method is that it writing to csv and re-reading it doesn't seem the most robust method. Its possible that the csv could be read part way while its written and it could still be read in and data could be lost.
Is there any other way of making this fast such as using cython so you can just put the append the list of rows without incurring a penalty for the memory and put them directly into spark directly via createDataFrame?

Why is the file being overwritten?

I'm designing a new server application, which includes a subroutine that parses the input into the console window, for example
LogAlways("--- CPU detection ---")
will be written as:
[net 21:8:38.939] --- CPU detection ---
This is the subroutine:
Public Sub LogAlways(ByVal input As String)
Dim dm As String = "[net " + Date.Now.Hour.ToString + ":" + Date.Now.Minute.ToString + ":" + Date.Now.Second.ToString + "." + Date.Now.Millisecond.ToString + "] "
Console.WriteLine(dm + input)
Dim fName As String = Application.StartupPath() + "\LogBackups\" + Date.Now.Day.ToString + Date.Now.Month.ToString + "" + Date.Now.Year.ToString + ".log"
Dim stWt As New Global.System.IO.StreamWriter(fName)
stWt.Write(dm + input)
stWt.Close()
End Sub
This works, but however only the last line of my desired input is written to the file.
Why is this happening, and how can I make it so that it does not overwrite the log file?
This is using the Wildfire Server API.
This is not a duplicate, as the destination question has a different answer which would otherwise not answer this question.
This occurs as the StreamWriter has not been told to append the output to the end of the file with the parameter set to True, Visual Studio actually gives it as a version of the StreamWriter:
To correctly declare it:
Dim stWt As New Global.System.IO.StreamWriter(fName, True)
or in the subroutine:
Public Sub LogAlways(ByVal input As String)
Dim dm As String = "[net " + Date.Now.Hour.ToString + ":" + Date.Now.Minute.ToString + ":" + Date.Now.Second.ToString + "." + Date.Now.Millisecond.ToString + "] "
Console.WriteLine(dm + input)
Dim fName As String = Application.StartupPath() + "\LogBackups\" + Date.Now.Day.ToString + Date.Now.Month.ToString + "" + Date.Now.Year.ToString + ".log"
Dim stWt As New Global.System.IO.StreamWriter(fName, True)
stWt.Write(dm + input)
stWt.Close()
End Sub
Requires the following to be Imports:
System.IO
System.Windows.Forms
It will now correctly write to the end of the file, but however it is noted that stWt.Close()'ing the file on every call may cause issues, therefore a queuing system may be better:
Desired log output is inserted into a single-dimensional array
A Timer dumps this array to the log file on every, say, five to ten seconds
When this is done, the array is cleared

Inserting a word within a given string

I want the user to input a phrase such as "Python" and have the program put the word "test" in the middle.... So it would print "pyttesthon".
After I input the phrase however, I am not sure which function to use.
You can just concatenate strings like so:
stringToInsert = "test"
oldString = "Python"
newString = oldString[0:len(oldString)/2] + stringToInsert + oldString[len(oldString)/2:]