Python - Change variable value on exit for next session - variables

I want to change the value of a variable on exit so that on the next run, it remains what was last set. This is a short version of my current code:
def example():
x = 1
while True:
x = x + 1
print x
On 'KeyboardInterrupt', I want the last value set in the while loop to be a global variable. On running the code next time, that value should be the 'x' in line 2. Is it possible?

This is a bit hacky, but hopefully it gives you an idea that you can better implement in your current situation (pickle/cPickle is what you should use if you want to persist more robust data structures - this is just a simple case):
import sys
def example():
x = 1
# Wrap in a try/except loop to catch the interrupt
try:
while True:
x = x + 1
print x
except KeyboardInterrupt:
# On interrupt, write to a simple file and exit
with open('myvar', 'w') as f:
f.write(str(x))
sys.exit(0)
# Not sure of your implementation (probably not this :) ), but
# prompt to run the function
resp = raw_input('Run example (y/n)? ')
if resp.lower() == 'y':
example()
else:
# If the function isn't to be run, read the variable
# Note that this will fail if you haven't already written
# it, so you will have to make adjustments if necessary
with open('myvar', 'r') as f:
myvar = f.read()
print int(myvar)

You could save any variables that you want to persist to a text file then read them back in to the script the next time it runs.
Here is a link for reading and writing to text files.
http://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files
Hope it helps!

Related

Does Snakemake have states during a workflow

Does Snakemake support states in the pipelines. Meaning the current run can be changed according to the last e.g.10 runs?
For example: Data is being processed and if the current value is greater than X and in the last 10 values there were at least 5 others with a value greater than X, then i want the workflow to branch differently otherwise it should continue normally.
You could potentially use some slightly hacky workaround with checkpoints and multiple snakefiles to achieve conditional execution of rules.
For example, a first snakemake file that includes a checkpoint, i.e. a rule that waits for the execution of previous rules and only gets evaluated then. Here you could check your conditions from the current pipeline and previous results. For the example code I'm just using a random number to determine what the checkpoint does.
rule all:
input:
"random_number.txt",
"next_step.txt"
rule random_number:
output: "random_number.txt"
run:
import numpy as np
r = np.random.choice([0, 1])
with open(output[0], 'w') as fh:
fh.write(f"{r}")
checkpoint next_rule:
output: "next_step.txt"
run:
# read random number
with open("random_number.txt", 'r') as rn:
num = int(rn.read())
print(num)
if num == 0:
with open(output[0], 'w') as fh:
fh.write("case_a")
elif num == 1:
with open(output[0],'w') as fh:
fh.write("case_b")
else:
exit(1)
Then you could have a second snakefile with a conditional rule all, i.e. a list of output files that depends on the result of the first pipeline.
with open("next_step.txt", 'r') as fh:
case = fh.read()
outputs = []
if case == "case_a":
outputs = ["output_a_0.txt", "output_a_1.txt"]
if case == "case_b":
outputs = ["output_b_0.txt", "output_b_1.txt"]
rule all:
input:
outputs

Pyinstaller, Multiprocessing, and Pandas - No such file/directory [duplicate]

Python v3.5, Windows 10
I'm using multiple processes and trying to captures user input. Searching everything I see there are odd things that happen when using input() with multiple processes. After 8 hours+ of trying, nothing I implement worked, I'm positive I am doing it wrong but I can't for the life of me figure it out.
The following is a very stripped down program that demonstrates the issue. Now it works fine when I run this program within PyCharm, but when I use pyinstaller to create a single executable it fails. The program constantly is stuck in a loop asking the user to enter something as shown below:.
I am pretty sure it has to do with how Windows takes in standard input from things I've read. I've also tried passing the user input variables as Queue() items to the functions but the same issue. I read you should put input() in the main python process so I did that under if __name__ = '__main__':
from multiprocessing import Process
import time
def func_1(duration_1):
while duration_1 >= 0:
time.sleep(1)
print('Duration_1: %d %s' % (duration_1, 's'))
duration_1 -= 1
def func_2(duration_2):
while duration_2 >= 0:
time.sleep(1)
print('Duration_2: %d %s' % (duration_2, 's'))
duration_2 -= 1
if __name__ == '__main__':
# func_1 user input
while True:
duration_1 = input('Enter a positive integer.')
if duration_1.isdigit():
duration_1 = int(duration_1)
break
else:
print('**Only positive integers accepted**')
continue
# func_2 user input
while True:
duration_2 = input('Enter a positive integer.')
if duration_2.isdigit():
duration_2 = int(duration_2)
break
else:
print('**Only positive integers accepted**')
continue
p1 = Process(target=func_1, args=(duration_1,))
p2 = Process(target=func_2, args=(duration_2,))
p1.start()
p2.start()
p1.join()
p2.join()
You need to use multiprocessing.freeze_support() when you produce a Windows executable with PyInstaller.
Straight out from the docs:
multiprocessing.freeze_support()
Add support for when a program which uses multiprocessing has been frozen to produce a Windows executable. (Has been tested with py2exe, PyInstaller and cx_Freeze.)
One needs to call this function straight after the if name == 'main' line of the main module. For example:
from multiprocessing import Process, freeze_support
def f():
print('hello world!')
if __name__ == '__main__':
freeze_support()
Process(target=f).start()
If the freeze_support() line is omitted then trying to run the frozen executable will raise RuntimeError.
Calling freeze_support() has no effect when invoked on any operating system other than Windows. In addition, if the module is being run normally by the Python interpreter on Windows (the program has not been frozen), then freeze_support() has no effect.
In your example you also have unnecessary code duplication you should tackle.

What is the intuition behind the Iterator.get_next method?

The name of the method get_next() is a little bit misleading. The documentation says
Returns a nested structure of tf.Tensors representing the next element.
In graph mode, you should typically call this method once and use its result as the input to another computation. A typical loop will then call tf.Session.run on the result of that computation. The loop will terminate when the Iterator.get_next() operation raises tf.errors.OutOfRangeError. The following skeleton shows how to use this method when building a training loop:
dataset = ... # A `tf.data.Dataset` object.
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
# Build a TensorFlow graph that does something with each element.
loss = model_function(next_element)
optimizer = ... # A `tf.compat.v1.train.Optimizer` object.
train_op = optimizer.minimize(loss)
with tf.compat.v1.Session() as sess:
try:
while True:
sess.run(train_op)
except tf.errors.OutOfRangeError:
pass
Python also has a function called next, which needs to be called every time we need the next element of the iterator. However, according to the documentation of get_next() quoted above, get_next() should be called only once and its result should be evaluated by calling the method run of the session, so this is a little bit unintuitive, because I was used to the Python's built-in function next. In this script, get_next() is also called only and the result of the call is evaluated at every step of the computation.
What is the intuition behind get_next() and how is it different from next()? I think that the next element of the dataset (or feedable iterator), in the second example I linked above, is retrieved every time the result of the first call to get_next() is evaluated by calling the method run, but this is a little unintuitive. I don't get why we do not need to call get_next at every step of the computation (to get the next element of the feedable iterator), even after reading the note in the documentation
NOTE: It is legitimate to call Iterator.get_next() multiple times, e.g. when you are distributing different elements to multiple devices in a single step. However, a common pitfall arises when users call Iterator.get_next() in each iteration of their training loop. Iterator.get_next() adds ops to the graph, and executing each op allocates resources (including threads); as a consequence, invoking it in every iteration of a training loop causes slowdown and eventual resource exhaustion. To guard against this outcome, we log a warning when the number of uses crosses a fixed threshold of suspiciousness.
In general, it is not clear how the Iterator works.
The idea is that get_next adds some operations to the graph such that, every time you evaluate them, you get the next element in the dataset. On each iteration, you just need to run the operations that get_next made, you do not need to create them over and over again.
Maybe a good way to get an intuition is to try to write an iterator yourself. Consider something like the following:
import tensorflow as tf
tf.compat.v1.disable_v2_behavior()
# Make an iterator, returns next element and initializer
def iterator_next(data):
data = tf.convert_to_tensor(data)
i = tf.Variable(0)
# Check we are not out of bounds
with tf.control_dependencies([tf.assert_less(i, tf.shape(data)[0])]):
# Get next value
next_val_1 = data[i]
# Update index after the value is read
with tf.control_dependencies([next_val_1]):
i_updated = tf.compat.v1.assign_add(i, 1)
with tf.control_dependencies([i_updated]):
next_val_2 = tf.identity(next_val_1)
return next_val_2, i.initializer
# Test
with tf.compat.v1.Graph().as_default(), tf.compat.v1.Session() as sess:
# Example data
data = tf.constant([1, 2, 3, 4])
# Make operations that give you the next element
next_val, iter_init = iterator_next(data)
# Initialize iterator
sess.run(iter_init)
# Iterate until exception is raised
while True:
try:
print(sess.run(next_val))
# assert throws InvalidArgumentError
except tf.errors.InvalidArgumentError: break
Output:
1
2
3
4
Here, iterator_next gives you something comparable to what get_next in an iterator would give you, plus an initializer operation. Every time you run next_val you get a new element from data, you don't need to call the function every time (which is how next works in Python), you call it once and then evaluate the result multiple times.
EDIT: The function iterator_next above could also be simplified to the following:
def iterator_next(data):
data = tf.convert_to_tensor(data)
# Start from -1
i = tf.Variable(-1)
# First increment i
i_updated = tf.compat.v1.assign_add(i, 1)
with tf.control_dependencies([i_updated]):
# Check i is not out of bounds
with tf.control_dependencies([tf.assert_less(i, tf.shape(data)[0])]):
# Get next value
next_val = data[i]
return next_val, i.initializer
Or even simpler:
def iterator_next(data):
data = tf.convert_to_tensor(data)
i = tf.Variable(-1)
i_updated = tf.compat.v1.assign_add(i, 1)
# Using i_updated directly as a value is equivalent to using i with
# a control dependency to i_updated
with tf.control_dependencies([tf.assert_less(i_updated, tf.shape(data)[0])]):
next_val = data[i_updated]
return next_val, i.initializer

Python Pandas: Read in External Dataset Into Dataframe Only If Conditions Are Met Using Function Call

Let's say I have an Excel file called "test.xlsx" on my local machine. I can read this data set in using traditional code.
df_test = pd.read_excel('test.xlsx')
However, I want to conditionally read that data set in if a condition is met ... if another condition is met I want to read in a different dataset.
Below is the code I tried using a function:
def conditional_run(x):
if x == 'fleet':
eval('''df_test = pd.read_excel('test.xlsx')''')
elif x != 'fleet':
eval('''df_test2 = pd.read_excel('test_2.xlsx')''')
conditional_run('fleet')
Below is the error I get:
File "<string>", line 1
df_test = pd.read_excel('0Day Work Items Raw Data.xlsx')
^
SyntaxError: invalid syntax
There probably isn't a reason to use eval in this case. It might be sufficient to conditionally read the file based on its name. For example:
def conditional_run(x):
if x == 'fleet':
file = "test.xlsx"
elif x != 'fleet':
file = "test_2.xlsx"
df_test = pd.read_excel(file)
return df_test
conditional_run('fleet')

Any idea where the additional printed None is coming from?

Hello am trying between local vs global variables and got the below code. When I run this code the following outputs are given. I thought the below code only asks for 2 outputs from the various print statements but am getting a "None" as well. Please can you let me know where this "None" is coming from? Thanks!
CODE:
x = 'global X'
def test():
global x
x = 'local x'
print (x)
print (test())
print (x)
Output:
local x
None
local x
In your test() function, you have a print.
The 1st local x of you output is the one in the test function,
then it tries to print the return of the test function, which is nothing, so it prints none. You could simply return x instead of printing it, like this:
def test():
global x
x = 'local x'
return x
This way it won't print none.