Itertools for containers - iterator

Considder the following interactive example
>>> l=imap(str,xrange(1,4))
>>> list(l)
['1', '2', '3']
>>> list(l)
[]
Does anyone know if there is already an implementation somewhere out there with a version of imap (and the other itertools functions) such that the second time list(l) is executed you get the same as the first. And I don't want the regular map because building the entire output in memory can be a waste of memory if you use larger ranges.
I want something that basically does something like
class cmap:
def __init__(self, function, *iterators):
self._function = function
self._iterators = iterators
def __iter__(self):
return itertools.imap(self._function, *self._iterators)
def __len__(self):
return min( map(len, self._iterators) )
But it would be a waste of time to do this manually for all itertools if someone already did this.
ps.
Do you think containers are more zen then iterators since for an iterator something like
for i in iterator:
do something
implicitly empties the iterator while a container you explicitly need to remove elements.

You do not have to build such an object for each type of container. Basically, you have the following:
mkimap = lambda: imap(str,xrange(1,4))
list(mkimap())
list(mkimap())
Now you onlky need a nice wrapping object to prevent the "ugly" function calls. This could work this way:
class MultiIter(object):
def __init__(self, f, *a, **k):
if a or k:
self.create = lambda: f(*a, **k)
else: # optimize
self.create = f
def __iter__(self):
return self.create()
l = MultiIter(lambda: imap(str, xrange(1,4)))
# or
l = MultiIter(imap, str, xrange(1,4))
# or even
#MultiIter
def l():
return imap(str, xrange(1,4))
# and then
print list(l)
print list(l)
(untested, hope it works, but you should get the idea)
For your 2nd question: Iterators and containers both have their uses. You should take whatever best fits your needs.

You may be looking for itertools.tee()

Iterators are my favorite topic ;)
from itertools import imap
class imap2(object):
def __init__(self, f, *args):
self.g = imap(f,*args)
self.lst = []
self.done = False
def __iter__(self):
while True:
try: # try to get something from g
x = next(self.g)
except StopIteration:
if self.done:
# give the old values
for x in self.lst:
yield x
else:
# g was consumed for the first time
self.done = True
return
else:
self.lst.append(x)
yield x
l=imap2(str,xrange(1,4))
print list(l)
print list(l)

Related

Pandas: which “function names” can be used? (how are they looked up?)

When using pandas you can in certain cases pass names of functions as strings instead of actual references to those functions. For example: df.transform('round').
In the pandas docs they call these strings "function names".
I discovered that the lookup mechanism here doesn't look at the current namespace:
import pandas as pd
sales = pd.DataFrame(data={
"price": [23.12, 22.34, 12.56, 27.78, 11.9],
})
display(sales)
def new_price(price):
return price * 1.1
display(sales.transform('round')) # Works
display(sales.transform(new_price)) # Works
display(sales.transform('new_price')) # Does not work
My question: is there a list of these function names that you can use in cases like this?
This is the relevant code from the pandas source:
class Apply(metaclass=abc.ABCMeta):
...
def _try_aggregate_string_function(self, obj, arg: str, *args, **kwargs):
"""
if arg is a string, then try to operate on it:
- try to find a function (or attribute) on ourselves
- try to find a numpy function
- raise
"""
assert isinstance(arg, str)
f = getattr(obj, arg, None)
if f is not None:
if callable(f):
return f(*args, **kwargs)
# people may try to aggregate on a non-callable attribute
# but don't let them think they can pass args to it
assert len(args) == 0
assert len([kwarg for kwarg in kwargs if kwarg not in ["axis"]]) == 0
return f
f = getattr(np, arg, None)
if f is not None and hasattr(obj, "__array__"):
# in particular exclude Window
return f(obj, *args, **kwargs)
raise AttributeError(
f"'{arg}' is not a valid function for '{type(obj).__name__}' object"
)
It basically searches for a method of self with that name or for a numpy method.

scipy.optimize.minimize with general array indexing

I want to solve an optimization problem with the method 'COBYLA' in scipy.optimize.minimize as follows:
test = spopt.minimize(testobj, x_init, method='COBYLA', constraints=cons1)
y = test.x
print 'solution x =', y
However, since the program is quite large, a scalable way to write the objective function (and the constraints) is to use a general index for the arguments. For example, if I could use x['parameter1'] or x.param1 instead of x[0], then the program would be easier to read and debug. I tried both writing x as an object or a pandas Series with general indexing like x['parameter1'], as follows:
def testobj(x):
return x['a']**2 + x['b'] + 1
def testcon1(x):
return x['a']
def testcon2(x):
return x['b']
def testcon3(x):
return 1 - x['a'] - x['b']
x_init = pd.Series([0.1, 0.1])
x_init.index = ['a','b']
cons1 = ({'type': 'ineq', 'fun': testcon1}, \
{'type': 'ineq', 'fun': testcon2}, \
{'type': 'ineq', 'fun': testcon3})
but whenever I pass that into the minimize routine, it throws an error:
return x['a']**2 + x['b'] + 1
ValueError: field named a not found
It works perfectly if I use the normal numpy array. Perhaps I'm not doing it right, but is that a limitation of the minimize function that I have to use numpy array and not any other data structure? The scipy documentation on this topic mentions that the initial guess has to be ndarray, but I'm curious how is the routine calling the arguments, because for pandas Series calling the variable with x[0] or x['a'] are equivalent.
As you note, scipy optimize uses numpy arrays as input, not pandas Series. When you initialize with a pandas series, it effectively converts it to an array and so you cannot access the fields by name anymore.
Probably the easiest way to go is to just create a function which re-wraps the parameters each time you call them; for example:
def make_series(params):
return pd.Series(params, index=['a', 'b'])
def testobj(x):
x = make_series(x)
return x['a']**2 + x['b'] + 1
def testcon1(x):
x = make_series(x)
return x['a']
def testcon2(x):
x = make_series(x)
return x['b']
def testcon3(x):
x = make_series(x)
return 1 - x['a'] - x['b']
x_init = make_series([1, 1])
test = spopt.minimize(testobj, x_init, method='COBYLA', constraints=cons1)
print('solution x =', test.x)
# solution x = [ 1.38777878e-17 0.00000000e+00]

return a list from class object

I am using multiprocessing module to generate 35 dataframes. I guess this will save my time. But the problem is that the class does not return anything. I expect the list of dataframes to be returned from self.dflist
Here is how to create dfnames list.
urls=[]
fnames=[]
dfnames=[]
for x in xrange(100,3600,100):
y = str(x)
i = y.zfill(4)
filename='DCHB_Town_Release_'+i+'.xlsx'
url = "http://www.censusindia.gov.in/2011census/dchb/"+filename
urls.append(url)
fnames.append(filename)
dfnames.append((filename, 'DCHB_Town_Release_'+i))
This is the class that uses the dfnames generated by above code.
import pandas as pd
import multiprocessing
class mydf1():
def __init__(self, dflist, jobs, dfnames):
self.dflist=list()
self.jobs=list()
self.dfnames=dfnames
def dframe_create(self, filename, dfname):
print 'abc', filename, dfname
dfname=pd.read_excel(filename)
self.dflist.append(dfname)
print self.dflist
return self.dflist
def mp(self):
for f,d in self.dfnames:
p = multiprocessing.Process(target=self.dframe_create, args=(f,d))
self.jobs.append(p)
p.start()
#return self.dflist
for j in self.jobs:
j.join()
print '%s.exitcode = %s' % (j.name, j.exitcode)
This class when called like this...
dflist=[]
jobs=[]
x=mydf1(dflist, jobs, dfnames)
y=x.mp()
Prints the self.dflist correctly. But does not return anything.
I can collect all datafarmes sequentially. But in order to save time, I need to use multiple processes simultaneously to generate and add dataframes to a list.
In your case I prefer to write as less code as possible and use Pool:
import pandas as pd
import logging
import multiprocessing
def dframe_create(filename):
try:
return pd.read_excel(filename)
except Exception as e:
logging.error("Something went wrong: %s", e, exc_info=1)
return None
p = multiprocessing.Pool()
excel_files = p.map(dframe_create, dfnames)
for f in excel_files:
if f is not None:
print 'Ready to work'
else:
print ':('
Prints the self.dflist correctly. But does not return anything.
That's because you don't have a return statement in the mp method, e.g.
def mp(self):
...
return self.dflist
It's not entirely clear what you're issue is, however, you have to take some care here in that you can't just pass objects/lists across processes. That's why you have special objects (which lock while they make modifications to a list), that way you don't get tripped up when two processes try to make a change at the same time (and you only get one update).
That is, you have to use multiprocessing's list.
class mydf1():
def __init__(self, dflist, jobs, dfnames):
self.dflist = multiprocessing.list() # perhaps should be multiprocessing.list(dflist or ())
self.jobs = list()
self.dfnames = dfnames
However you have a bigger problem: the whole point of multiprocessing is that they may run/finish out of order, so keeping two lists like this is doomed to fail. You should use a multiprocessing.dict that way the DataFrame is saved unambiguously with the filename.
class mydf1():
def __init__(self, dflist, jobs, dfnames):
self.dfdict = multiprocessing.dict()
...
def dframe_create(self, filename, dfname):
print 'abc', filename, dfname
df = pd.read_excel(filename)
self.dfdict[dfname] = df

"Pythonic" way to "reset" an object's variables?

("variables" here refers to "names", I think, not completely sure about the definition pythonistas use)
I have an object and some methods. These methods all need and all change the object's variables. How can I, in the most pythonic and in the best, respecting the techniques of OOP, way achieve to have the object variables used by the methods but also keep their original values for the other methods?
Should I copy the object everytime a method is called? Should I save the original values and have a reset() method to reset them everytime a method needs them? Or is there an even better way?
EDIT: I was asked for pseudocode. Since I am more interested in understanding the concept rather than just specifically solving the problem I am encountering I am going to try give an example:
class Player():
games = 0
points = 0
fouls = 0
rebounds = 0
assists = 0
turnovers = 0
steals = 0
def playCupGame(self):
# simulates a game and then assigns values to the variables, accordingly
self.points = K #just an example
def playLeagueGame(self):
# simulates a game and then assigns values to the variables, accordingly
self.points = Z #just an example
self.rebounds = W #example again
def playTrainingGame(self):
# simulates a game and then assigns values to the variables, accordingly
self.points = X #just an example
self.rebounds = Y #example again
The above is my class for a Player object (for the example assume he is a basketball one). This object has three different methods that all assign values to the players' statistics.
So, let's say the team has two league games and then a cup game. I'd have to make these calls:
p.playLeagueGame()
p.playLeagueGame()
p.playCupGame()
It's obvious that when the second and the third calls are made, the previously changed statistics of the player need to be reset. For that, I can either write a reset method that sets all the variables back to 0, or copy the object for every call I make. Or do something completely different.
That's where my question lays, what's the best approach, python and oop wise?
UPDATE: I am suspicious that I have superovercomplicated this and I can easily solve my problem by using local variables in the functions. However, what happens if I have a function inside another function, can I use locals of the outer one inside the inner one?
Not sure if it's "Pythonic" enough, but you can define a "resettable" decorator
for the __init__ method that creates a copy the object's __dict__ and adds a reset() method that switches the current __dict__ to the original one.
Edit - Here's an example implementation:
def resettable(f):
import copy
def __init_and_copy__(self, *args, **kwargs):
f(self, *args)
self.__original_dict__ = copy.deepcopy(self.__dict__)
def reset(o = self):
o.__dict__ = o.__original_dict__
self.reset = reset
return __init_and_copy__
class Point(object):
#resettable
def __init__(self, x, y):
self.x = x
self.y = y
def __str__(self):
return "%d %d" % (self.x, self.y)
class LabeledPoint(Point):
#resettable
def __init__(self, x, y, label):
self.x = x
self.y = y
self.label = label
def __str__(self):
return "%d %d (%s)" % (self.x, self.y, self.label)
p = Point(1, 2)
print p # 1 2
p.x = 15
p.y = 25
print p # 15 25
p.reset()
print p # 1 2
p2 = LabeledPoint(1, 2, "Test")
print p2 # 1 2 (Test)
p2.x = 3
p2.label = "Test2"
print p2 # 3 2 (Test2)
p2.reset()
print p2 # 1 2 (Test)
Edit2: Added a test with inheritance
I'm not sure about "pythonic", but why not just create a reset method in your object that does whatever resetting is required? Call this method as part of your __init__ so you're not duplicating the data (ie: always (re)initialize it in one place -- the reset method)
I would create a default dict as a data member with all of the default values, then do __dict__.update(self.default) during __init__ and then again at some later point to pull all the values back.
More generally, you can use a __setattr__ hook to keep track of every variable that has been changed and later use that data to reset them.
Sounds like you want to know if your class should be an immutable object. The idea is that, once created, an immutable object can't/should't/would't be changed.
On Python, built-in types like int or tuple instances are immutable, enforced by the language:
>>> a=(1, 2, 3, 1, 2, 3)
>>> a[0] = 9
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
As another example, every time you add two integers a new instance is created:
>>> a=5000
>>> b=7000
>>> d=a+b
>>> d
12000
>>> id(d)
42882584
>>> d=a+b
>>> id(d)
42215680
The id() function returns the address of the int object 12000. And every time we add a+b a new 12000 object instance is created.
User defined immutable classes must be enforced manually, or simply done as a convention with a source code comment:
class X(object):
"""Immutable class. Don't change instance variables values!"""
def __init__(self, *args):
self._some_internal_value = ...
def some_operation(self, arg0):
new_instance = X(arg0 + ...)
new_instance._some_internal_operation(self._some_internal_value, 42)
return new_instance
def _some_internal_operation(self, a, b):
"""..."""
Either way, it's OK to create a new instance for every operation.
See the Memento Design Pattern if you want to restore previous state, or the Proxy Design Pattern if you want the object to seem pristine, as if just created. In any case, you need to put something between what's referenced, and it's state.
Please comment if you need some code, though I'm sure you'll find plenty on the web if you use the design pattern names as keywords.
# The Memento design pattern
class Scores(object):
...
class Player(object):
def __init__(self,...):
...
self.scores = None
self.history = []
self.reset()
def reset(self):
if (self.scores):
self.history.append(self.scores)
self.scores = Scores()
It sounds like overall your design needs some reworking. What about a PlayerGameStatistics class that would keep track of all that, and either a Player or a Game would hold a collection of these objects?
Also the code you show is a good start, but could you show more code that interacts with the Player class? I'm just having a hard time seeing why a single Player object should have PlayXGame methods -- does a single Player not interact with other Players when playing a game, or why does a specific Player play the game?
A simple reset method (called in __init__ and re-called when necessary) makes a lot of sense. But here's a solution that I think is interesting, if a bit over-engineered: create a context manager. I'm curious what people think about this...
from contextlib import contextmanager
#contextmanager
def resetting(resettable):
try:
resettable.setdef()
yield resettable
finally:
resettable.reset()
class Resetter(object):
def __init__(self, foo=5, bar=6):
self.foo = foo
self.bar = bar
def setdef(self):
self._foo = self.foo
self._bar = self.bar
def reset(self):
self.foo = self._foo
self.bar = self._bar
def method(self):
with resetting(self):
self.foo += self.bar
print self.foo
r = Resetter()
r.method() # prints 11
r.method() # still prints 11
To over-over-engineer, you could then create a #resetme decorator
def resetme(f):
def rf(self, *args, **kwargs):
with resetting(self):
f(self, *args, **kwargs)
return rf
So that instead of having to explicitly use with you could just use the decorator:
#resetme
def method(self):
self.foo += self.bar
print self.foo
I liked (and tried) the top answer from PaoloVictor. However, I found that it "reset" itself, i.e., if you called reset() a 2nd time it would throw an exception.
I found that it worked repeatably with the following implementation
def resettable(f):
import copy
def __init_and_copy__(self, *args, **kwargs):
f(self, *args, **kwargs)
def reset(o = self):
o.__dict__ = o.__original_dict__
o.__original_dict__ = copy.deepcopy(self.__dict__)
self.reset = reset
self.__original_dict__ = copy.deepcopy(self.__dict__)
return __init_and_copy__
It sounds to me like you need to rework your model to at least include a separate "PlayerGameStats" class.
Something along the lines of:
PlayerGameStats = collections.namedtuple("points fouls rebounds assists turnovers steals")
class Player():
def __init__(self):
self.cup_games = []
self.league_games = []
self.training_games = []
def playCupGame(self):
# simulates a game and then assigns values to the variables, accordingly
stats = PlayerGameStats(points, fouls, rebounds, assists, turnovers, steals)
self.cup_games.append(stats)
def playLeagueGame(self):
# simulates a game and then assigns values to the variables, accordingly
stats = PlayerGameStats(points, fouls, rebounds, assists, turnovers, steals)
self.league_games.append(stats)
def playTrainingGame(self):
# simulates a game and then assigns values to the variables, accordingly
stats = PlayerGameStats(points, fouls, rebounds, assists, turnovers, steals)
self.training_games.append(stats)
And to answer the question in your edit, yes nested functions can see variables stored in outer scopes. You can read more about that in the tutorial: http://docs.python.org/tutorial/classes.html#python-scopes-and-namespaces
thanks for the nice input, as I had kind of a similar problem. I'm solving it with a hook on the init method, since I'd like to be able to reset to whatever initial state an object had. Here's my code:
import copy
_tool_init_states = {}
def wrap_init(init_func):
def init_hook(inst, *args, **kws):
if inst not in _tool_init_states:
# if there is a class hierarchy, only the outer scope does work
_tool_init_states[inst] = None
res = init_func(inst, *args, **kws)
_tool_init_states[inst] = copy.deepcopy(inst.__dict__)
return res
else:
return init_func(inst, *args, **kws)
return init_hook
def reset(inst):
inst.__dict__.clear()
inst.__dict__.update(
copy.deepcopy(_tool_init_states[inst])
)
class _Resettable(type):
"""Wraps __init__ to store object _after_ init."""
def __new__(mcs, *more):
mcs = super(_Resetable, mcs).__new__(mcs, *more)
mcs.__init__ = wrap_init(mcs.__init__)
mcs.reset = reset
return mcs
class MyResettableClass(object):
__metaclass__ = Resettable
def __init__(self):
self.do_whatever = "you want,"
self.it_will_be = "resetted by calling reset()"
To update the initial state, you could build some method like reset(...) that writes data into _tool_init_states. I hope this helps somebody. If this is possible without a metaclass, please let me know.

Specify action to be performed at the end of many functions

I have a python object in which a bunch of functions need to perform the same action at the end of execution, just before the return statement. For example:
def MyClass(object):
def __init__(self):
pass
def update_everything(self):
'''update everything'''
pass
def f1(self):
#do stuff
self.update_everything()
return result
def f2(self):
#do stuff
self.update_everything()
return result
def f3(self):
#do stuff
self.update_everything()
return result
What is the best (pythonic?) way to do this, except for the explicit calls at the end of each function?
I think that any solution to your problem would be unpythonic, because (as Tim Peters says in the Zen of Python (import this)):
Explicit is better than implicit.
Yes, using a decorator is actually more code, but it does have the advantage that you can see that a method updates everything at a glance. It's a different kind of explicitness ;-)
def update_after(m):
""" calls self.update_everything() after method m """
def decorated(self, *args, **kwargs):
r = m(self, *args, **kwargs)
self.update_everything()
return r
return decorated
def MyClass(object):
def __init__(self):
pass
def update_everything(self):
'''update everything'''
pass
#update_after
def f1(self):
#do stuff
return result
#update_after
def f2(self):
#do stuff
return result
#update_after
def f3(self):
#do stuff
return result
Maybe the other way round?
class MyClass(object):
def update(self, func):
value = func()
# do something common
return value
def f1(self):
# do stuff
return result
def f2(self):
# do stuff
return result
my_object = MyClass()
my_object.update(my_object.f1)
Edit:
You could also write it in such way that update accepts a string being a name of the object's method. This would prevent running other objects' methods.
my_object.update('f1')