I am trying to use numpy's SeedSequence to seed RNGs in different processes. However I am not sure whether I should use ss.generate_state or ss.spawn:
import concurrent.futures
import numpy as np
def worker(seed):
rng = np.random.default_rng(seed)
return rng.random(1)
num_repeats = 1000
ss = np.random.SeedSequence(243799254704924441050048792905230269161)
with concurrent.futures.ProcessPoolExecutor() as pool:
result1 = np.hstack(list(pool.map(worker, ss.generate_state(num_repeats))))
ss = np.random.SeedSequence(243799254704924441050048792905230269161)
with concurrent.futures.ProcessPoolExecutor() as pool:
result2 = np.hstack(list(pool.map(worker, ss.spawn(num_repeats))))
What are the differences between the two approaches and which should I use?
Using ss.generate_state is ~10% faster for the basic example above, likely because we are serializing floats instead of objects.
Well the SeedSequence is intended to generate good quality seeds from not so good seeds.
Performance
The generate_state is much faster than spawn
The difference of time for transfering the object or the state should not be the main reason for the difference you notice. You can test this without any
%%timeit
ss.generate_state(num_repeats)
100 times faster than
ss.spawn(num_repeats)
Seed size
In the code map(worker, ss.generate_state(num_repeats)) the RNGs are seeded with integers, while in the map(worker, ss.spawn(num_repeats)) the RNGs are seeded with SeedSequence that can potentially give result in quality initialization, the seed sequence can be used internally to generate a state vector with as many bits as required to completely initialize the RNG. Well, to be honest I expect it to do some expansion on this number to initialize the RNG, not simply padding with zeros for example.
Repeated use
The most important difference is that generate_state gives the same result if called multiple times. On the other hand, spawn gives different results each call.
For illustration, check the following example
ss = np.random.SeedSequence(243799254704924441050048792905230269161)
print('With generate_state')
print(np.hstack([worker(s) for s in ss.generate_state(5)]))
print(np.hstack([worker(s) for s in ss.generate_state(5)]))
print('With spawn')
print(np.hstack([worker(s) for s in ss.spawn(5)]))
print(np.hstack([worker(s) for s in ss.spawn(5)]))
With generate_state
[0.6625651 0.17654256 0.25323331 0.38250588 0.52670541]
[0.6625651 0.17654256 0.25323331 0.38250588 0.52670541]
With spawn
[0.06988312 0.40886412 0.55733136 0.43249601 0.53394111]
[0.64885573 0.16788206 0.12435154 0.14676836 0.51876499]
As you see the arrays generated by seeding different RNGs with generate_state gives the same result, not only immediately after construction, but every time the method is called. Spawn should give you the same results using a newly constructed SeedSequence (I am using numpy 1.19.2), however if you run the same code twice using the same instance, the second time will give produce different seeds.
Related
I get a memory error when using numpy.arange with large numbers. My code is as follows:
import numpy as np
list = np.arange(0, 10**15, 10**3)
profit_list = []
for diff in list:
x = do_some_calculation
profit_list.append(x)
What can be a replacement so I can avoid getting the memory error?
If you replace list¹ with a generator, that is, you do
for diff in range(10**15, 10**3):
x = do_some_calculation
profit_list.append(x)
then that will no longer cause MemoryErrors as you no longer initiate the full list. In this world, though, profit_list will probably by causing issues instead, as you are trying to add 10^12 items to that. Again, you can probably get around that by not storing the values explicitly, but rather yield them as you need them, using generators.
¹: Side note: Don't use list as a variable name as it shadows a built-in.
I am in the midst of trying to make the leap from Matlab to numpy, but I desperately need speed in my fft's. Now I know of pyfftw, but I don't know that I am using it properly. My approach is going something like
import numpy as np
import pyfftw
import timeit
pyfftw.interfaces.cache.enable()
def wrapper(func, *args):
def wrapped():
return func(*args)
return wrapped
def my_fft(v):
global a
global fft_object
a[:] = v
return fft_object()
def init_cond(X):
return my_fft(2.*np.cosh(X)**(-2))
def init_cond_py(X):
return np.fft.fft(2.*np.cosh(X)**(-2))
K = 2**16
Llx = 10.
KT = 2*K
dx = Llx/np.float64(K)
X = np.arange(-Llx,Llx,dx)
global a
global b
global fft_object
a = pyfftw.n_byte_align_empty(KT, 16, 'complex128')
b = pyfftw.n_byte_align_empty(KT, 16, 'complex128')
fft_object = pyfftw.FFTW(a,b)
wrapped = wrapper(init_cond, X)
print min(timeit.repeat(wrapped,repeat=100,number=1))
wrapped_two = wrapper(init_cond_py, X)
print min(timeit.repeat(wrapped_two,repeat=100,number=1))
I appreciate that there are builder functions and also standard interfaces to the scipy and numpy fft calls through pyfftw. These have all behaved very slowly though. By first creating an instance of the fft_object and then using it globally, I have been able to get speeds as fast or slightly faster than numpy's fft call.
That being said, I am working under the assumption that wisdom is implicitly being stored. Is that true? Do I need to make that explicit? If so, what is the best way to do that?
Also, I think timeit is completely opaque. Am I using it properly? Is it storing wisdom as I call repeat? Thanks in advance for any help you might be able to give.
In an interactive (ipython) session, I think the following is what you want to do (timeit is very nicely handled by ipython):
In [1]: import numpy as np
In [2]: import pyfftw
In [3]: K = 2**16
In [4]: Llx = 10.
In [5]: KT = 2*K
In [6]: dx = Llx/np.float64(K)
In [7]: X = np.arange(-Llx,Llx,dx)
In [8]: a = pyfftw.n_byte_align_empty(KT, 16, 'complex128')
In [9]: b = pyfftw.n_byte_align_empty(KT, 16, 'complex128')
In [10]: fft_object = pyfftw.FFTW(a,b)
In [11]: a[:] = 2.*np.cosh(X)**(-2)
In [12]: timeit np.fft.fft(a)
100 loops, best of 3: 4.96 ms per loop
In [13]: timeit fft_object(a)
100 loops, best of 3: 1.56 ms per loop
In [14]: np.allclose(fft_object(a), np.fft.fft(a))
Out[14]: True
Have you read the tutorial? What don't you understand?
I would recommend using the builders interface to construct the FFTW object. Have a play with the various settings, most importantly the number of threads.
The wisdom is not stored by default. You need to extract it yourself.
All your globals are unnecessary - the objects you want to change are mutable, so you can handle them just fine. fft_object always points to the same thing, so no problem with that not being a global. Ideally, you simply don't want that loop over ii. I suggest working out how to structure your arrays in order that you can do all your operations in a single call
Edit:
[edit edit: I wrote the following paragraph with only a cursory glance at your code, and clearly with it being a recursive update, vectorising is not an obvious approach without some serious cunning. I have a few comments on your implementation at the bottom though]
I suspect your problem is a more fundamental misunderstanding of how to best use a language like Python (or indeed Matlab) for numerical processing. The core tenet is vectorise as much as possible. By this, I mean roll up your python calls to be as few as possible. I can't see how to do that with your example unfortunately (though I've only thought about it for 2 mins). If that's still failing, think about cython - though make sure you really want to go down that route (i.e. you've exhausted the other options).
Regarding the globals: Don't do it that way. If you want to create an object with state, use a class (that is what they are for) or perhaps a closure in your case. The global is almost never what you want (I think I have one at least vaguely legit use for it in all my writing of python, and that's in the cache code in pyfftw). I suggest reading this nice SO question. Matlab is a crappy language - one of the many reasons for this is its crap scoping facilities which tend to lead to bad habits.
You only need global if you want to modify a reference globally. I suggest reading a bit more about the Python scoping rules and what variables really are in python.
FFTW objects carry with them all the arrays you need so you don't need to pass them around separately. Using the call interface carries almost no overhead (particularly if you disable the normalisation) either for setting or returning the values - if you're at that level of optimisation, I strongly suspect you've hit the limit (I'd caveat this that this may not quite be true for many many very small FFTs, but at this point you want to rethink your algorithm to vectorise the calls to FFTW). If you find a substantial overhead in updating the arrays every time (using the call interface), this is a bug and you should submit it as such (and I'd be pretty surprised).
Bottom line, don't worry about updating the arrays on every call. This is almost certainly not your bottleneck, though make sure you're aware of the normalisation and disable it if you wish (it might slow things down slightly compared to raw accessing of the update_arrays() and execute() methods).
Your code makes no use of the cache. The cache is only used when you're using the interfaces code, and reduces the Python overhead in creating new FFTW objects internally. Since you're handling the FFTW object yourself, there is no reason for a cache.
The builders code is a less constrained interface to get an FFTW object. I almost always use the builders now (it's much more convenient that creating a FFTW object from scratch). The cases in which you want to create an FFTW object directly are pretty rare and I'd be interested to know what they are.
Comments on the algorithm implementation:
I'm not familiar with the algorithm you're implementing. However, I have a few comments on how you've written it at the moment.
You're computing nl_eval(wp) on every loop, but as far as I can tell that's just the same as nl_eval(w) from the previous loop, so you don't need to compute it twice (but this comes with the caveat that it's pretty hard to see what's going on when you have globals everywhere, so I might be missing something).
Don't bother with the copies in my_fft or my_ifft. Simply do fft_object(u) (2.29 ms versus 1.67 ms on my machine for the forward case). The internal array update routine makes the copy unnecessary. Also, as you've written it, you're copying twice: c[:] means "copy into the array c", and the array you're copying into c is v.copy(), i.e. a copy of v (so two copies in total).
More sensible (and probably necessary) is copying the output into holding arrays (since that avoids clobbering interim results on calls to the FFTW object), though make sure your holding arrays are properly aligned. I'm sure you've noted this is important but it's rather more understandable to copy the output.
You can move all your scalings together. The 3 in the computation of wn can be be moved inside my_fft in nl_eval. You can also combine this with the normalisation constant from the ifft (and turn it off in pyfftw).
Take a look at numexpr for the basic array operations. It can offer quite a bit of speed-up over vanilla numpy.
Anyway take what you will from all that. No doubt I've missed something or said something incorrect, so please accept it with as much humility as I can offer. It's worth spending a little time working out how Python ticks compared to Matlab (in fact, just forget the latter).
I am running a wavelet transform (cmor) to estimate damping and frequencies that exists in a signal.cmor has 2 parameters that I can change them to get more accurate results. center frequency(Fc) and bandwidth frequency(Fb). If I construct a signal with few freqs and damping then I can measure the error of my estimation(fig 2). but in actual case I have a signal and I don't know its freqs and dampings so I can't measure the error.so a friend in here suggested me to reconstruct the signal and find error by measuring the difference between the original and reconstructed signal e(t)=|x(t)−x^(t)|.
so my question is:
Does anyone know a better function to find the error between reconstructed and original signal,rather than e(t)=|x(t)−x^(t)|.
can I use GA to search for Fb and Fc? or do you know a better search method?
Hope this picture shows what I mean, the actual case is last one. others are for explanations
Thanks in advance
You say you don't know the error until after running the wavelet transform, but that's fine. You just run a wavelet transform for every individual the GA produces. Those individuals with lower errors are considered fitter and survive with greater probability. This may be very slow, but conceptually at least, that's the idea.
Let's define a Chromosome datatype containing an encoded pair of values, one for the frequency and another for the damping parameter. Don't worry too much about how their encoded for now, just assume it's an array of two doubles if you like. All that's important is that you have a way to get the values out of the chromosome. For now, I'll just refer to them by name, but you could represent them in binary, as an array of doubles, etc. The other member of the Chromosome type is a double storing its fitness.
We can obviously generate random frequency and damping values, so let's create say 100 random Chromosomes. We don't know how to set their fitness yet, but that's fine. Just set it to zero at first. To set the real fitness value, we're going to have to run the wavelet transform once for each of our 100 parameter settings.
for Chromosome chr in population
chr.fitness = run_wavelet_transform(chr.frequency, chr.damping)
end
Now we have 100 possible wavelet transforms, each with a computed error, stored in our set called population. What's left is to select fitter members of the population, breed them, and allow the fitter members of the population and offspring to survive into the next generation.
while not done
offspring = new_population()
while count(offspring) < N
parent1, parent2 = select_parents(population)
child1, child2 = do_crossover(parent1, parent2)
mutate(child1)
mutate(child2)
child1.fitness = run_wavelet_transform(child1.frequency, child1.damping)
child2.fitness = run_wavelet_transform(child2.frequency, child2.damping)
offspring.add(child1)
offspring.add(child2)
end while
population = merge(population, offspring)
end while
There are a bunch of different ways to do the individual steps like select_parents, do_crossover, mutate, and merge here, but the basic structure of the GA stays pretty much the same. You just have to run a brand new wavelet decomposition for every new offspring.
Hey guys, I know that there are a million questions on random numbers, but exactly because of that I searched a lot but I couldn't find something similar to mine - without implying it's not there. In any case, pardon me if I am repeating a question, just point me to it if that's the case.
So, I wanna do something simple in the most efficient way.
I want to generate randomly all N integers in the range [0, N], one by one, such that there are no repetitions.
I know, I can do this by inserting everything in a list, shuffle it, get the head and then remove head from the list. But then I will have shuffled my list of length N, N-1 times.
Any better / faster idea?
You can just do one shuffle, and then step through the list.
I'd recommend a Fisher-Yates shuffle.
This question has been asked a few times, and in each case the correct answer given is to shuffle an array (either the original, or an array of indices), however this isn't a satisfactory answer in cases where the number of possible indices is prohibitively large (either it's huge, or memory is tight, or you simply crave maximum efficiency for whatever reason).
As such I want to add an alternative for the sake of completeness. Now, this isn't truly random, so if that's what you need then do not use this, however, if your goal is simply "good enough" with minimal memory requirements then the following pseudo-code may be of interest:
function init:
start = random [0, length) // Pick a fully random starting index
stride = random [1, length - 1) // Pick a random step size
next_index = start
function advance_next_index:
next_index = (next_index + stride) % length
if next_index is equal to start then
start = (start + 1) % length
next_index = start
Here's an example of how to implement a re-usable function for grabbing pseudo-random values:
counter = length
function pseudo_random:
counter = counter + 1
if counter is equal to length then
init()
counter = 0
advance_next_index()
return next_index
Quite simply pseudo_random will call init once every length iterations, thus re-shuffling the "random" pattern of results produced by advance_next_index, and ensure that for every length values there is not a single duplicate.
To reiterate; this isn't a particularly random algorithm, so it must not be used in situations where true randomness is required. However, the results are random enough for some basic, non-critical, tasks, and it has a tiny memory footprint. For example, if you just want to randomise some behaviour in a game to avoid something becoming repetitive, or the data-set is large and never exposed to the user (in which case it is effectively random to them) it would take a long time to piece together the order and somehow exploit it.
If anyone knows of any better algorithms with similar properties then please share!
The table in question contains roughly ten million rows.
for event in Event.objects.all():
print event
This causes memory usage to increase steadily to 4 GB or so, at which point the rows print rapidly. The lengthy delay before the first row printed surprised me – I expected it to print almost instantly.
I also tried Event.objects.iterator() which behaved the same way.
I don't understand what Django is loading into memory or why it is doing this. I expected Django to iterate through the results at the database level, which'd mean the results would be printed at roughly a constant rate (rather than all at once after a lengthy wait).
What have I misunderstood?
(I don't know whether it's relevant, but I'm using PostgreSQL.)
Nate C was close, but not quite.
From the docs:
You can evaluate a QuerySet in the following ways:
Iteration. A QuerySet is iterable, and it executes its database query the first time you iterate over it. For example, this will print the headline of all entries in the database:
for e in Entry.objects.all():
print e.headline
So your ten million rows are retrieved, all at once, when you first enter that loop and get the iterating form of the queryset. The wait you experience is Django loading the database rows and creating objects for each one, before returning something you can actually iterate over. Then you have everything in memory, and the results come spilling out.
From my reading of the docs, iterator() does nothing more than bypass QuerySet's internal caching mechanisms. I think it might make sense for it to a do a one-by-one thing, but that would conversely require ten-million individual hits on your database. Maybe not all that desirable.
Iterating over large datasets efficiently is something we still haven't gotten quite right, but there are some snippets out there you might find useful for your purposes:
Memory Efficient Django QuerySet iterator
batch querysets
QuerySet Foreach
Might not be the faster or most efficient, but as a ready-made solution why not use django core's Paginator and Page objects documented here:
https://docs.djangoproject.com/en/dev/topics/pagination/
Something like this:
from django.core.paginator import Paginator
from djangoapp.models import model
paginator = Paginator(model.objects.all(), 1000) # chunks of 1000, you can
# change this to desired chunk size
for page in range(1, paginator.num_pages + 1):
for row in paginator.page(page).object_list:
# here you can do whatever you want with the row
print "done processing page %s" % page
Django's default behavior is to cache the whole result of the QuerySet when it evaluates the query. You can use the QuerySet's iterator method to avoid this caching:
for event in Event.objects.all().iterator():
print event
https://docs.djangoproject.com/en/stable/ref/models/querysets/#iterator
The iterator() method evaluates the queryset and then reads the results directly without doing caching at the QuerySet level. This method results in better performance and a significant reduction in memory when iterating over a large number of objects that you only need to access once. Note that caching is still done at the database level.
Using iterator() reduces memory usage for me, but it is still higher than I expected. Using the paginator approach suggested by mpaf uses much less memory, but is 2-3x slower for my test case.
from django.core.paginator import Paginator
def chunked_iterator(queryset, chunk_size=10000):
paginator = Paginator(queryset, chunk_size)
for page in range(1, paginator.num_pages + 1):
for obj in paginator.page(page).object_list:
yield obj
for event in chunked_iterator(Event.objects.all()):
print event
For large amounts of records, a database cursor performs even better. You do need raw SQL in Django, the Django-cursor is something different than a SQL cursur.
The LIMIT - OFFSET method suggested by Nate C might be good enough for your situation. For large amounts of data it is slower than a cursor because it has to run the same query over and over again and has to jump over more and more results.
Django doesn't have good solution for fetching large items from database.
import gc
# Get the events in reverse order
eids = Event.objects.order_by("-id").values_list("id", flat=True)
for index, eid in enumerate(eids):
event = Event.object.get(id=eid)
# do necessary work with event
if index % 100 == 0:
gc.collect()
print("completed 100 items")
values_list can be used to fetch all the ids in the databases and then fetch each object separately. Over a time large objects will be created in memory and won't be garbage collected til for loop is exited. Above code does manual garbage collection after every 100th item is consumed.
This is from the docs:
http://docs.djangoproject.com/en/dev/ref/models/querysets/
No database activity actually occurs until you do something to evaluate the queryset.
So when the print event is run the query fires (which is a full table scan according to your command.) and loads the results. Your asking for all the objects and there is no way to get the first object without getting all of them.
But if you do something like:
Event.objects.all()[300:900]
http://docs.djangoproject.com/en/dev/topics/db/queries/#limiting-querysets
Then it will add offsets and limits to the sql internally.
Massive amount of memory gets consumed before the queryset can be iterated because all database rows for a whole query get processed into objects at once and it can be a lot of processing depending on a number of rows.
You can chunk up your queryset into smaller digestible bits. I call the pattern to do this "spoonfeeding". Here's an implementation with a progress-bar I use in my management commands, first pip3 install tqdm
from tqdm import tqdm
def spoonfeed(qs, func, chunk=1000, start=0):
"""
Chunk up a large queryset and run func on each item.
Works with automatic primary key fields.
chunk -- how many objects to take on at once
start -- PK to start from
>>> spoonfeed(Spam.objects.all(), nom_nom)
"""
end = qs.order_by('pk').last()
progressbar = tqdm(total=qs.count())
if not end:
return
while start < end.pk:
for o in qs.filter(pk__gt=start, pk__lte=start+chunk):
func(o)
progressbar.update(1)
start += chunk
progressbar.close()
To use this you write a function that does operations on your object:
def set_population(town):
town.population = calculate_population(...)
town.save()
and than run that function on your queryset:
spoonfeed(Town.objects.all(), set_population)
Here a solution including len and count:
class GeneratorWithLen(object):
"""
Generator that includes len and count for given queryset
"""
def __init__(self, generator, length):
self.generator = generator
self.length = length
def __len__(self):
return self.length
def __iter__(self):
return self.generator
def __getitem__(self, item):
return self.generator.__getitem__(item)
def next(self):
return next(self.generator)
def count(self):
return self.__len__()
def batch(queryset, batch_size=1024):
"""
returns a generator that does not cache results on the QuerySet
Aimed to use with expected HUGE/ENORMOUS data sets, no caching, no memory used more than batch_size
:param batch_size: Size for the maximum chunk of data in memory
:return: generator
"""
total = queryset.count()
def batch_qs(_qs, _batch_size=batch_size):
"""
Returns a (start, end, total, queryset) tuple for each batch in the given
queryset.
"""
for start in range(0, total, _batch_size):
end = min(start + _batch_size, total)
yield (start, end, total, _qs[start:end])
def generate_items():
queryset.order_by() # Clearing... ordering by id if PK autoincremental
for start, end, total, qs in batch_qs(queryset):
for item in qs:
yield item
return GeneratorWithLen(generate_items(), total)
Usage:
events = batch(Event.objects.all())
len(events) == events.count()
for event in events:
# Do something with the Event
There are a lot of outdated results here. Not sure when it was added, but Django's QuerySet.iterator() method uses a server-side cursor with a chunk size, to stream results from the database. So if you're using postgres, this should now be handled out of the box for you.
I usually use raw MySQL raw query instead of Django ORM for this kind of task.
MySQL supports streaming mode so we can loop through all records safely and fast without out of memory error.
import MySQLdb
db_config = {} # config your db here
connection = MySQLdb.connect(
host=db_config['HOST'], user=db_config['USER'],
port=int(db_config['PORT']), passwd=db_config['PASSWORD'], db=db_config['NAME'])
cursor = MySQLdb.cursors.SSCursor(connection) # SSCursor for streaming mode
cursor.execute("SELECT * FROM event")
while True:
record = cursor.fetchone()
if record is None:
break
# Do something with record here
cursor.close()
connection.close()
Ref:
Retrieving million of rows from MySQL
How does MySQL result set streaming perform vs fetching the whole JDBC ResultSet at once