Ways to write a number as a sum - sum

I want to write for example number 10 all the possible ways as a sum of units 3, 4, ..., n-1 (not 2)
for example i can write 10 as
10=10(units)
10=7 + 1*3 or
10=4 + 2*3 or
10=3 + 1*3 + 1*4 or
10=2 + 2*4 or ....
I dont care about the number of combinations!
I want the combinations!!! I mean an algoithm to output like this
un 3 4 5 6 7 8 9 10
-------------------
10 0 0 0 0 0 0 0 0
07 1 0 0 0 0 0 0 0
04 2 0 0 0 0 0 0 0
03 1 1 0 0 0 0 0 0
02 0 2 0 0 0 0 0 0 etc
any response is welcomed!!!

start with zero
try to add the largest possible number
store the result, if you arrive at the target sum
else if you ended with less than the target sum, go to (2)
else (if you ended with more than the target sum)
undo the last adding step, try to add one less instead. If you don't have any more smaller numbers to try, undo one more addition in past
if you run out of all possible additions, you already found all combinations and can exit.
Example:
target = 10
0
0 + 9 -> not there yet
0 + 9 + 9 -> too much, try something less
0 + 9 + 8 -> too much, try less
...
...
...
0 + 9 + 3 -> too much, cant try any less
0 + 8 -> not there yet
0 + 8 + 8 -> too much
...
...
...
0 + 8 + 3 -> too much, cant try any less
0 + 7 -> not there yet
0 + 7 + 7 -> too much
0 + 7 + 6 -> too much
...
...
...
0 + 7 + 3 -> thats it! save!
0 + 6 -> not yet
0 + 6 + 6 -> too much...
I hope you get the idea... :)
It will be much much faster, than trying all the possible combinations and selecting only the ones that sum to 10. This approach is called backtracking (https://en.wikipedia.org/wiki/Backtracking), because you try to go in some direction (adding the largest number), but when you fail (sum is too big), you track your progress back and then try something different.
You can then speed it up by trying the possibilities in more clever way (0 + 7 + 7 is too much by 4, so don't even try 0 + 7 + 6, but skip straight to 0 + 7 + 3).

Related

SQLite computed column in WHERE clause seems to be working(?)

The order in which DMBS execute queries is:
FROM -> JOIN -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY -> LIMIT
I was surprised to find out that this query works well on SQLite:
SELECT *, duration / 60 AS length_in_hours
FROM movies
WHERE length_in_hours > 10;
Why is this query working, while in some other cases it seems to fail?
OK, So I run EXPLAIN to see what's going on here.
.eqp full
PRAGMA vdbe_trace=true;
EXPLAIN
SELECT substr(name, 1, 1) AS initial
FROM names
WHERE initial = 'D';
The query results are:
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 11 0 0 Start at 11
1 OpenRead 0 7 0 2 0 root=7 iDb=0; names
2 Rewind 0 10 0 0
3 Column 0 1 2 0 r[2]=names.name
4 Function 6 2 1 substr(3) 0 r[1]=func(r[2..4])
5 Ne 5 9 1 80 if r[1]!=r[5] goto 9
6 Column 0 1 7 0 r[7]=names.name
7 Function 6 7 6 substr(3) 0 r[6]=func(r[7..9])
8 ResultRow 6 1 0 0 output=r[6]
9 Next 0 3 0 1
10 Halt 0 0 0 0
11 Transaction 0 0 10 0 1 usesStmtJournal=0
12 Integer 1 3 0 0 r[3]=1
13 Integer 1 4 0 0 r[4]=1
14 String8 0 5 0 D 0 r[5]='D'
15 Integer 1 8 0 0 r[8]=1
16 Integer 1 9 0 0 r[9]=1
17 Goto 0 1 0 0
In addr 0, the Init opcode sends us to addr 11 which open new Transaction.
Right after that SQLite allocate the integer 1 FOUR TIMES (addr 12-13, 15-16). That's where I started to suspect SQLite may artificially duplicate the expression before the AS into the WHERE clause.
In opcodes 3-5, which happen before the SELECT (opcodes 6-8), we can see that SQLite duplicated our substr into the WHERE clause.

Code if then statement by only using $ utility

How can I code this 'if' conditions in GAMS?
Set j/1*10/
S/1*6/;
Parameter
b(s,j) export this from excel
U(s,j) export from excel
M(s)/1 100,2 250,3 140,4 120,5 132/ export from excel
;
table b(s,j)
1 2 3 4 5 6 7 8 9 10
1 3 40 23 12 9 52 9 14 89 33
2 0 0 42 0 11 32 11 15 3 7
3 10 20 12 9 5 30 14 5 14 5
4 0 0 0 9 0 3 8 0 13 5
5 0 10 11 32 11 0 3 1 12 1
6 12 20 2 9 15 3 14 5 14 5
;
u(s,j)=0;
u(s,j)$(b(s,j))=1;
Variable delta(j); "binary"
After solving a model I got the value of delta ( suppose delta(1)=1, delta(5)=1). Then Set A is
A(j)$(delta.l(j)=1)=Yes; (A={1,5})
I want to calculate parameter R(s) according to the following :
If there is no j in A(j) s.t. j in u(s,j) then R(s)=M(s)
Else if there is a j in A(j) s.t. j in u(s,j) then R(s)=min{b(s,j): j in A(j) , j in u(s,j) }
Then R(1)=3, R(2)=11,R(3)=5, R(4)=120, R(5)=11,R(6)=12.
Is it possible to code this ' if then ' statement only by $ utility?
Thanks
Following on from the comments, I think this should work for you.
(Create a parameter that mimics your variable delta just for demonstration:)
parameter delta(j);
delta('1') = 1;
delta('5') = 1;
With loop and if/else:
Create parameter R(s). Then, looping over s , pick the minimum of b(s,A) across set A where b(s,A) is defined if the sum of b(s,A) is not zero (i.e. if one of the set is non-zero. Else, set R(s) equal to M(s).
Note, the loop is one solution to the issue you were having with mixed dimensions. And the $(b(s,A)) needs to be on the first argument of smin(.), not on the second argument.
parameter R(s);
loop(s,
if (sum(A, b(s,A)) ne 0,
R(s) = smin(A$b(s,A), b(s,A));
else
R(s) = M(s);
);
);
With $ command only (#Lutz in comments):
R(s)$(sum(A, b(s,A)) <> 0) = smin(A$b(s,A), b(s,A));
R(s)$(sum(A, b(s,A)) = 0) = M(s);
Gives:
---- 56 PARAMETER R
1 3.000, 2 11.000, 3 5.000, 4 120.000, 5 11.000, 6 12.000

Python particles simulator: out-of-core processing

Problem description
In writing a Monte Carlo particle simulator (brownian motion and photon emission) in python/numpy. I need to save the simulation output (>>10GB) to a file and process the data in a second step. Compatibility with both Windows and Linux is important.
The number of particles (n_particles) is 10-100. The number of time-steps (time_size) is ~10^9.
The simulation has 3 steps (the code below is for an all-in-RAM version):
Simulate (and store) an emission rate array (contains many almost-0 elements):
shape (n_particles x time_size), float32, size 80GB
Compute counts array, (random values from a Poisson process with previously computed rates):
shape (n_particles x time_size), uint8, size 20GB
counts = np.random.poisson(lam=emission).astype(np.uint8)
Find timestamps (or index) of counts. Counts are almost always 0, so the timestamp arrays will fit in RAM.
# Loop across the particles
timestamps = [np.nonzero(c) for c in counts]
I do step 1 once, then repeat step 2-3 many (~100) times. In the future I may need to pre-process emission (apply cumsum or other functions) before computing counts.
Question
I have a working in-memory implementation and I'm trying to understand what is the best approach to implement an out-of-core version that can scale to (much) longer simulations.
What I would like it exist
I need to save arrays to a file, and I would like to use a single file for a simulation. I also need a "simple" way to store and recall a dictionary of simulation parameter (scalars).
Ideally I would like a file-backed numpy array that I can preallocate and fill in chunks. Then, I would like the numpy array methods (max, cumsum, ...) to work transparently, requiring only a chunksize keyword to specify how much of the array to load at each iteration.
Even better, I would like a Numexpr that operates not between cache and RAM but between RAM and hard drive.
What are the practical options
As a first option
I started experimenting with pyTables, but I'm not happy with its complexity and abstractions (so different from numpy). Moreover my current solution (read below) is UGLY and not very efficient.
So my options for which I seek an answer are
implement a numpy array with required functionality (how?)
use pytable in a smarter way (different data-structures/methods)
use another library: h5py, blaze, pandas... (I haven't tried any of them so far).
Tentative solution (pyTables)
I save the simulation parameters in '/parameters' group: each parameter is converted to a numpy array scalar. Verbose solution but it works.
I save emission as an Extensible array (EArray), because I generate the data in chunks and I need to append each new chunk (I know the final size though). Saving counts is more problematic. If a save it like a pytable array it's difficult to perform queries like "counts >= 2". Therefore I saved counts as multiple tables (one per particle) [UGLY] and I query with .get_where_list('counts >= 2'). I'm not sure this is space-efficient, and
generating all these tables instead of using a single array, clobbers significantly the HDF5 file. Moreover, strangely enough, creating those tables require creating a custom dtype (even for standard numpy dtypes):
dt = np.dtype([('counts', 'u1')])
for ip in xrange(n_particles):
name = "particle_%d" % ip
data_file.create_table(
group, name, description=dt, chunkshape=chunksize,
expectedrows=time_size,
title='Binned timetrace of emitted ph (bin = t_step)'
' - particle_%d' % particle)
Each particle-counts "table" has a different name (name = "particle_%d" % ip) and that I need to put them in a python list for easy iteration.
EDIT: The result of this question is a Brownian Motion simulator called PyBroMo.
Dask.array can perform chunked operations like max, cumsum, etc. on an on-disk array like PyTables or h5py.
import h5py
d = h5py.File('myfile.hdf5')['/data']
import dask.array as da
x = da.from_array(d, chunks=(1000, 1000))
X looks and feels like a numpy array and copies much of the API. Operations on x will create a DAG of in-memory operations which will execute efficiently using multiple cores streaming from disk as necessary
da.exp(x).mean(axis=0).compute()
http://dask.pydata.org/en/latest/
conda install dask
or
pip install dask
See here for how to store your parameters in the HDF5 file (it pickles, so you can store them how you have them; their is a 64kb limit on the size of the pickle).
import pandas as pd
import numpy as np
n_particles = 10
chunk_size = 1000
# 1) create a new emission file, compressing as we go
emission = pd.HDFStore('emission.hdf',mode='w',complib='blosc')
# generate simulated data
for i in range(10):
df = pd.DataFrame(np.abs(np.random.randn(chunk_size,n_particles)),dtype='float32')
# create a globally unique index (time)
# http://stackoverflow.com/questions/16997048/how-does-one-append-large-amounts-of-
data-to-a-pandas-hdfstore-and-get-a-natural/16999397#16999397
try:
nrows = emission.get_storer('df').nrows
except:
nrows = 0
df.index = pd.Series(df.index) + nrows
emission.append('df',df)
emission.close()
# 2) create counts
cs = pd.HDFStore('counts.hdf',mode='w',complib='blosc')
# this is an iterator, can be any size
for df in pd.read_hdf('emission.hdf','df',chunksize=200):
counts = pd.DataFrame(np.random.poisson(lam=df).astype(np.uint8))
# set the index as the same
counts.index = df.index
# store the sum across all particles (as most are zero this will be a
# nice sub-selector
# better maybe to have multiple of these sums that divide the particle space
# you don't have to do this but prob more efficient
# you can do this in another file if you want/need
counts['particles_0_4'] = counts.iloc[:,0:4].sum(1)
counts['particles_5_9'] = counts.iloc[:,5:9].sum(1)
# make the non_zero column indexable
cs.append('df',counts,data_columns=['particles_0_4','particles_5_9'])
cs.close()
# 3) find interesting counts
print pd.read_hdf('counts.hdf','df',where='particles_0_4>0')
print pd.read_hdf('counts.hdf','df',where='particles_5_9>0')
You can alternatively, make each particle a data_column and select on them individually.
and some output (pretty active emission in this case :)
0 1 2 3 4 5 6 7 8 9 particles_0_4 particles_5_9
0 2 2 2 3 2 1 0 2 1 0 9 4
1 1 0 0 0 1 0 1 0 3 0 1 4
2 0 2 0 0 2 0 0 1 2 0 2 3
3 0 0 0 1 1 0 0 2 0 3 1 2
4 3 1 0 2 1 0 0 1 0 0 6 1
5 1 0 0 1 0 0 0 3 0 0 2 3
6 0 0 0 1 1 0 1 0 0 0 1 1
7 0 2 0 2 0 0 0 0 2 0 4 2
8 0 0 0 1 3 0 0 0 0 1 1 0
10 1 0 0 0 0 0 0 0 0 1 1 0
11 0 0 1 1 0 2 0 1 2 1 2 5
12 0 2 2 4 0 0 1 1 0 1 8 2
13 0 2 1 0 0 0 0 1 1 0 3 2
14 1 0 0 0 0 3 0 0 0 0 1 3
15 0 0 0 1 1 0 0 0 0 0 1 0
16 0 0 0 4 3 0 3 0 1 0 4 4
17 0 2 2 3 0 0 2 2 0 2 7 4
18 0 1 2 1 0 0 3 2 1 2 4 6
19 1 1 0 0 0 0 1 2 1 1 2 4
20 0 0 2 1 2 2 1 0 0 1 3 3
22 0 1 2 2 0 0 0 0 1 0 5 1
23 0 2 4 1 0 1 2 0 0 2 7 3
24 1 1 1 0 1 0 0 1 2 0 3 3
26 1 3 0 4 1 0 0 0 2 1 8 2
27 0 1 1 4 0 1 2 0 0 0 6 3
28 0 1 0 0 0 0 0 0 0 0 1 0
29 0 2 0 0 1 0 1 0 0 0 2 1
30 0 1 0 2 1 2 0 2 1 1 3 5
31 0 0 1 1 1 1 1 0 1 1 2 3
32 3 0 2 1 0 0 1 0 1 0 6 2
33 1 3 1 0 4 1 1 0 1 4 5 3
34 1 1 0 0 0 0 0 3 0 1 2 3
35 0 1 0 0 1 1 2 0 1 0 1 4
36 1 0 1 0 1 2 1 2 0 1 2 5
37 0 0 0 1 0 0 0 0 3 0 1 3
38 2 5 0 0 0 3 0 1 0 0 7 4
39 1 0 0 2 1 1 3 0 0 1 3 4
40 0 1 0 0 1 0 0 4 2 2 1 6
41 0 3 3 1 1 2 0 0 2 0 7 4
42 0 1 0 2 0 0 0 0 0 1 3 0
43 0 0 2 0 5 0 3 2 1 1 2 6
44 0 2 0 1 0 0 1 0 0 0 3 1
45 1 0 0 2 0 0 0 1 4 0 3 5
46 0 2 0 0 0 0 0 1 1 0 2 2
48 3 0 0 0 0 1 1 0 0 0 3 2
50 0 1 0 1 0 1 0 0 2 1 2 3
51 0 0 2 0 0 0 2 3 1 1 2 6
52 0 0 2 3 2 3 1 0 1 5 5 5
53 0 0 0 2 1 1 0 0 1 1 2 2
54 0 1 2 2 2 0 1 0 2 0 5 3
55 0 2 1 0 0 0 0 0 3 2 3 3
56 0 1 0 0 0 2 2 0 1 1 1 5
57 0 0 0 1 1 0 0 1 0 0 1 1
58 6 1 2 0 2 2 0 0 0 0 9 2
59 0 1 1 0 0 0 0 0 2 0 2 2
60 2 0 0 0 1 0 0 1 0 1 2 1
61 0 0 3 1 1 2 0 0 1 0 4 3
62 2 0 1 0 0 0 0 1 2 1 3 3
63 2 0 1 0 1 0 1 0 0 0 3 1
65 0 0 1 0 0 0 1 5 0 1 1 6
.. .. .. .. .. .. .. .. .. .. ... ...
[9269 rows x 12 columns]
PyTable Solution
Since functionality provided by Pandas is not needed, and the processing is much slower (see notebook below), the best approach is using PyTables or h5py directly. I've tried only the pytables approach so far.
All tests were performed in this notebook:
Python particles simulator: numpy out-of-core processing
Introduction to pytables data-structures
Reference: Official PyTables Docs
Pytables allows store data in HDF5 files in 2 types of formats: arrays and tables.
Arrays
There are 3 types of arrays Array, CArray and EArray. They all allow to store and retrieve (multidimensional) slices with a notation similar to numpy slicing.
# Write data to store (broadcasting works)
array1[:] = 3
# Read data from store
in_ram_array = array1[:]
For optimization in some use cases, CArray is saved in "chunks", whose size can be chosen with chunk_shape at creation time.
Array and CArray size is fixed at creation time. You can fill/write the array chunk-by-chunk after creation though. Conversely EArray can be extended with the .append() method.
Tables
The table is a quite different beast. It's basically a "table". You have only 1D index and each element is a row. Inside each row there are the "columns" data types, each columns can have a different type. It you are familiar with numpy record-arrays, a table is basically an 1D record-array, with each element having many fields as the columns.
1D or 2D numpy arrays can be stored in tables but it's a bit more tricky: we need to create a row data type. For example to store an 1D uint8 numpy array we need to do:
table_uint8 = np.dtype([('field1', 'u1')])
table_1d = data_file.create_table('/', 'array_1d', description=table_uint8)
So why using tables? Because, differently from arrays, tables can be efficiently queried. For example, if we want to search for elements > 3 in a huge disk-based table we can do:
index = table_1d.get_where_list('field1 > 3')
Not only it is simple (compared with arrays where we need to scan the whole file in chunks and build index in a loop) but it is also very extremely fast.
How to store simulation parameters
The best way to store simulation parameters is to use a group (i.e. /parameters), convert each scalar to numpy array and store it as CArray.
Array for "emission"
emission is the biggest array that is generated and read sequentially. For this usage pattern A good data structure is EArray. On "simulated" data with ~50% of zeros elements blosc compression (level=5) achieves 2.2x compression ratio. I found that a chunk-size of 2^18 (256k) has the minimum processing time.
Storing "counts"
Storing also "counts" will increase the file size by 10% and will take 40% more time to compute timestamps. Having counts stored is not an advantage per-se because only the timestamps are needed in the end.
The advantage is that recostructing the index (timestamps) is simpler because we query the full time axis in a single command (.get_where_list('counts >= 1')). Conversely, with chunked processing, we need to perform some index arithmetics that is a bit tricky, and maybe a burden to maintain.
However the the code complexity may be small compared to all the other operations (sorting and merging) that are needed in both cases.
Storing "timestamps"
Timestamps can be accumulated in RAM. However, we don't know the arrays size before starting and a final hstack() call is needed to "merge" the different chunks stored in a list. This doubles the memory requirements so the RAM may be insufficient.
We can store as-we-go timestamps to a table using .append(). At the end we can load the table in memory with .read(). This is only 10% slower than all-in-memory computation but avoids the "double-RAM" requirement. Moreover we can avoid the final full-load and have minimal RAM usage.
H5Py
H5py is a much simpler library than pytables. For this use-case of (mainly) sequential processing seems a better fit than pytables. The only missing feature is the lack of 'blosc' compression. If this results in a big performance penalty remains to be tested.
Use OpenMM to simulate particles (https://github.com/SimTk/openmm) and MDTraj (https://github.com/rmcgibbo/mdtraj) to handle trajectory IO.
The pytables vs pandas.HDFStore tests in the accepted answer is completely misleading:
The first critical error is the timing did not apply os.fsync after
flush, which make the speed test unstable. So sometime, the pytables
function is accidentally much faster.
The 2nd problem is the pytables and pandas versions have completely
different shapes due to misunderstanding the pytables.EArray's
shape arg. The author try to append column into pytables version but
append row into pandas version.
The 3rd problem is the author used different chunkshape during
comparison.
The author also forgot to disable the table index generation during store.append() which is a time consuming process.
The follow table showed the performance results from his version and my fixes.
tbold is his pytables version, pdold is his pandas version. tbsync and pdsync are his version with fsync() after flush() and also disable the table index generation during append. the tbopt and pdopt are my optimized version, with blosc:lz4 and complevel 9.
| name | dt | data size [MB] | comp ratio % | chunkshape | shape | clib | indexed |
|:-------|-----:|-----------------:|---------------:|:-------------|:--------------|:----------------|:----------|
| tbold | 5.11 | 300.00 | 84.63 | (15, 262144) | (15, 5242880) | blosc[5][1] | False |
| pdold | 8.39 | 340.00 | 39.26 | (1927,) | (5242880,) | blosc[5][1] | True |
| tbsync | 7.47 | 300.00 | 84.63 | (15, 262144) | (15, 5242880) | blosc[5][1] | False |
| pdsync | 6.97 | 340.00 | 39.27 | (1927,) | (5242880,) | blosc[5][1] | False |
| tbopt | 4.78 | 300.00 | 43.07 | (4369, 15) | (5242880, 15) | blosc:lz4[9][1] | False |
| pdopt | 5.73 | 340.00 | 38.53 | (3855,) | (5242880,) | blosc:lz4[9][1] | False |
The pandas.HDFStore uses pytables under the hood. Thus if we use them correctly, they should have no difference at all.
We can see the pandas version has larger data size. This is because the pandas use pytables.Table instead of EArray. And the pandas.DataFrame always have an index column. The first column of the Table object is this DataFrame index which require some extra space to save. This only affect IO performance a little but provide more features such as out-of-core query. So I still recommend pandas here. #MRocklin also mentioned a nicer out-of-core package dask, if most features you used are just array operations instead of table-like query. But the IO performance won't have distinguishable difference.
h5f.root.emission._v_attrs
Out[82]:
/emission._v_attrs (AttributeSet), 15 attributes:
[CLASS := 'GROUP',
TITLE := '',
VERSION := '1.0',
data_columns := [],
encoding := 'UTF-8',
index_cols := [(0, 'index')],
info := {1: {'names': [None], 'type': 'RangeIndex'}, 'index': {}},
levels := 1,
metadata := [],
nan_rep := 'nan',
non_index_axes := [(1, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])],
pandas_type := 'frame_table',
pandas_version := '0.15.2',
table_type := 'appendable_frame',
values_cols := ['values_block_0']]
Here is the functions:
def generate_emission(shape):
"""Generate fake emission."""
emission = np.random.randn(*shape).astype('float32') - 1
emission.clip(0, 1e6, out=emission)
assert (emission >=0).all()
return emission
def test_puretb_earray(outpath,
n_particles = 15,
time_chunk_size = 2**18,
n_iter = 20,
sync = True,
clib = 'blosc',
clevel = 5,
):
time_size = n_iter * time_chunk_size
data_file = pytb.open_file(outpath, mode="w")
comp_filter = pytb.Filters(complib=clib, complevel=clevel)
emission = data_file.create_earray('/', 'emission', atom=pytb.Float32Atom(),
shape=(n_particles, 0),
chunkshape=(n_particles, time_chunk_size),
expectedrows=n_iter * time_chunk_size,
filters=comp_filter)
# generate simulated emission data
t0 =time()
for i in range(n_iter):
emission_chunk = generate_emission((n_particles, time_chunk_size))
emission.append(emission_chunk)
emission.flush()
if sync:
os.fsync(data_file.fileno())
data_file.close()
t1 = time()
return t1-t0
def test_puretb_earray2(outpath,
n_particles = 15,
time_chunk_size = 2**18,
n_iter = 20,
sync = True,
clib = 'blosc',
clevel = 5,
):
time_size = n_iter * time_chunk_size
data_file = pytb.open_file(outpath, mode="w")
comp_filter = pytb.Filters(complib=clib, complevel=clevel)
emission = data_file.create_earray('/', 'emission', atom=pytb.Float32Atom(),
shape=(0, n_particles),
expectedrows=time_size,
filters=comp_filter)
# generate simulated emission data
t0 =time()
for i in range(n_iter):
emission_chunk = generate_emission((time_chunk_size, n_particles))
emission.append(emission_chunk)
emission.flush()
if sync:
os.fsync(data_file.fileno())
data_file.close()
t1 = time()
return t1-t0
def test_purepd_df(outpath,
n_particles = 15,
time_chunk_size = 2**18,
n_iter = 20,
sync = True,
clib='blosc',
clevel=5,
autocshape=False,
oldversion=False,
):
time_size = n_iter * time_chunk_size
emission = pd.HDFStore(outpath, mode='w', complib=clib, complevel=clevel)
# generate simulated data
t0 =time()
for i in range(n_iter):
# Generate fake emission
emission_chunk = generate_emission((time_chunk_size, n_particles))
df = pd.DataFrame(emission_chunk, dtype='float32')
# create a globally unique index (time)
# http://stackoverflow.com/questions/16997048/how-does-one-append-large-
# amounts-of-data-to-a-pandas-hdfstore-and-get-a-natural/16999397#16999397
try:
nrows = emission.get_storer('emission').nrows
except:
nrows = 0
df.index = pd.Series(df.index) + nrows
if autocshape:
emission.append('emission', df, index=False,
expectedrows=time_size
)
else:
if oldversion:
emission.append('emission', df)
else:
emission.append('emission', df, index=False)
emission.flush(fsync=sync)
emission.close()
t1 = time()
return t1-t0
def _test_puretb_earray_nosync(outpath):
return test_puretb_earray(outpath, sync=False)
def _test_purepd_df_nosync(outpath):
return test_purepd_df(outpath, sync=False,
oldversion=True
)
def _test_puretb_earray_opt(outpath):
return test_puretb_earray2(outpath,
sync=False,
clib='blosc:lz4',
clevel=9
)
def _test_purepd_df_opt(outpath):
return test_purepd_df(outpath,
sync=False,
clib='blosc:lz4',
clevel=9,
autocshape=True
)
testfns = {
'tbold':_test_puretb_earray_nosync,
'pdold':_test_purepd_df_nosync,
'tbsync':test_puretb_earray,
'pdsync':test_purepd_df,
'tbopt': _test_puretb_earray_opt,
'pdopt': _test_purepd_df_opt,
}

Generating certain sequence of numbers with defined ranges

How can I use a collection of nested for loops (or any other type) to produce a sequence like this with these variables:
length is how many digts to go to
max is the maximum number
min is the minimum number
Lets say for this case:
length = 2
max = 3
min = 1
it would produce:
11
12
13
21
22
23
31
32
33
This works "ok" for only length = 1, but not really, since I still have annoying 0's at the start
For i = 1 To length
For ii = 0 To i
For iii = 1 To 5
Console.WriteLine(Str(ii) + Str(iii))
Next
Next
Next
As this looks like a homework problem, I am going to attempt to help you think through this problem without actually giving you the answer in code.
Let's think through this problem...
You have the range 1-3. So your first sequence is easy:
1, 2, 3
Now you want to produce a sequence from 11 to 13. What's the change, or difference, between 1 through 3 and 11 through 13? The answer is you've added 10.
The same is true for 21 through 23 - you've added 10 again.
So, what you want to do is iterate from 1 through 3.
Then, iterate from 1 through 3 this time adding 10.
Then, iterate from 1 through 3 this time adding 20.
Thinking about this, you are essentially doing this:
1
2
3
10 + 1
10 + 2
10 + 3
10 + 10 + 1
10 + 10 + 2
10 + 10 + 3
etc
Or, you could also think about it like this:
(0 * 10) + 1
(0 * 10) + 2
(0 * 10) + 3
(1 * 10) + 1
(1 * 10) + 2
(1 * 10) + 3
(2 * 10) + 1
(2 * 10) + 2
(2 * 10) + 3
etc
Can you see a pattern forming?

Check if a number is divisible by 3 [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Write code to determine if a number is divisible by 3. The input to the function is a single bit, 0 or 1, and the output should be 1 if the number received so far is the binary representation of a number divisible by 3, otherwise zero.
Examples:
input "0": (0) output 1
inputs "1,0,0": (4) output 0
inputs "1,1,0,0": (6) output 1
This is based on an interview question. I ask for a drawing of logic gates but since this is stackoverflow I'll accept any coding language. Bonus points for a hardware implementation (verilog etc).
Part a (easy): First input is the MSB.
Part b (a little harder): First input is the LSB.
Part c (difficult): Which one is faster and smaller, (a) or (b)? (Not theoretically in the Big-O sense, but practically faster/smaller.) Now take the slower/bigger one and make it as fast/small as the faster/smaller one.
There's a fairly well-known trick for determining whether a number is a multiple of 11, by alternately adding and subtracting its decimal digits. If the number you get at the end is a multiple of 11, then the number you started out with is also a multiple of 11:
47278 4 - 7 + 2 - 7 + 8 = 0, multiple of 11 (47278 = 11 * 4298)
52214 5 - 2 + 2 - 1 + 4 = 8, not multiple of 11 (52214 = 11 * 4746 + 8)
We can apply the same trick to binary numbers. A binary number is a multiple of 3 if and only if the alternating sum of its bits is also a multiple of 3:
4 = 100 1 - 0 + 0 = 1, not multiple of 3
6 = 110 1 - 1 + 0 = 0, multiple of 3
78 = 1001110 1 - 0 + 0 - 1 + 1 - 1 + 0 = 0, multiple of 3
109 = 1101101 1 - 1 + 0 - 1 + 1 - 0 + 1 = 1, not multiple of 3
It makes no difference whether you start with the MSB or the LSB, so the following Python function works equally well in both cases. It takes an iterator that returns the bits one at a time. multiplier alternates between 1 and 2 instead of 1 and -1 to avoid taking the modulo of a negative number.
def divisibleBy3(iterator):
multiplier = 1
accumulator = 0
for bit in iterator:
accumulator = (accumulator + bit * multiplier) % 3
multiplier = 3 - multiplier
return accumulator == 0
Here... something new... how to check if a binary number of any length (even thousands of digits) is divisible by 3.
-->((0))<---1--->()<---0--->(1) ASCII representation of graph
From the picture.
You start in the double circle.
When you get a one or a zero, if the digit is inside the circle, then you stay in that circle. However if the digit is on a line, then you travel across the line.
Repeat step two until all digits are comsumed.
If you finally end up in the double circle then the binary number is divisible by 3.
You can also use this for generating numbers divisible by 3. And I wouldn't image it would be hard to convert this into a circuit.
1 example using the graph...
11000000000001011111111111101 is divisible by 3 (ends up in the double circle again)
Try it for yourself.
You can also do similar tricks for performing MOD 10, for when converting binary numbers into base 10 numbers. (10 circles, each doubled circled and represent the values 0 to 9 resulting from the modulo)
EDIT: This is for digits running left to right, it's not hard to modify the finite state machine to accept the reverse language though.
NOTE: In the ASCII representation of the graph () denotes a single circle and (()) denotes a double circle. In finite state machines these are called states, and the double circle is the accept state (the state that means its eventually divisible by 3)
Heh
State table for LSB:
S I S' O
0 0 0 1
0 1 1 0
1 0 2 0
1 1 0 1
2 0 1 0
2 1 2 0
Explanation: 0 is divisible by three. 0 << 1 + 0 = 0. Repeat using S = (S << 1 + I) % 3 and O = 1 if S == 0.
State table for MSB:
S I S' O
0 0 0 1
0 1 2 0
1 0 1 0
1 1 0 1
2 0 2 0
2 1 1 0
Explanation: 0 is divisible by three. 0 >> 1 + 0 = 0. Repeat using S = (S >> 1 + I) % 3 and O = 1 if S == 0.
S' is different from above, but O works the same, since S' is 0 for the same cases (00 and 11). Since O is the same in both cases, O_LSB = O_MSB, so to make MSB as short as LSB, or vice-versa, just use the shortest of both.
Here is an simple way to do it by hand.
Since 1 = 22 mod 3, we get 1 = 22n mod 3 for every positive integer.
Furthermore 2 = 22n+1 mod 3. Hence one can determine if an integer is divisible by 3 by counting the 1 bits at odd bit positions, multiply this number by 2, add the number of 1-bits at even bit posistions add them to the result and check if the result is divisible by 3.
Example: 5710=1110012.
There are 2 bits at odd positions, and 2 bits at even positions. 2*2 + 2 = 6 is divisible by 3. Hence 57 is divisible by 3.
Here is also a thought towards solving question c). If one inverts the bit order of a binary integer then all the bits remain at even/odd positions or all bits change. Hence inverting the order of the bits of an integer n results is an integer that is divisible by 3 if and only if n is divisible by 3. Hence any solution for question a) works without changes for question b) and vice versa. Hmm, maybe this could help to figure out which approach is faster...
You need to do all calculations using arithmetic modulo 3. This is the way
MSB:
number=0
while(!eof)
n=input()
number=(number *2 + n) mod 3
if(number == 0)
print divisible
LSB:
number = 0;
multiplier = 1;
while(!eof)
n=input()
number = (number + multiplier * n) mod 3
multiplier = (multiplier * 2) mod 3
if(number == 0)
print divisible
This is general idea...
Now, your part is to understand why this is correct.
And yes, do homework yourself ;)
The idea is that the number can grow arbitrarily long, which means you can't use mod 3 here, since your number will grow beyond the capacity of your integer class.
The idea is to notice what happens to the number. If you're adding bits to the right, what you're actually doing is shifting left one bit and adding the new bit.
Shift-left is the same as multiplying by 2, and adding the new bit is either adding 0 or 1. Assuming we started from 0, we can do this recursively based on the modulo-3 of the last number.
last | input || next | example
------------------------------------
0 | 0 || 0 | 0 * 2 + 0 = 0
0 | 1 || 1 | 0 * 2 + 1 = 1
1 | 0 || 2 | 1 * 2 + 0 = 2
1 | 1 || 0 | 1 * 2 + 1 = 0 (= 3 mod 3)
2 | 0 || 1 | 2 * 2 + 0 = 1 (= 4 mod 3)
2 | 1 || 2 | 2 * 2 + 1 = 2 (= 5 mod 3)
Now let's see what happens when you add a bit to the left. First, notice that:
22n mod 3 = 1
and
22n+1 mod 3 = 2
So now we have to either add 1 or 2 to the mod based on if the current iteration is odd or even.
last | n is even? | input || next | example
-------------------------------------------
d/c | don't care | 0 || last | last + 0*2^n = last
0 | yes | 1 || 0 | 0 + 1*2^n = 1 (= 2^n mod 3)
0 | no | 1 || 0 | 0 + 1*2^n = 2 (= 2^n mod 3)
1 | yes | 1 || 0 | 1 + 1*2^n = 2
1 | no | 1 || 0 | 1 + 1*2^n = 0 (= 3 mod 3)
1 | yes | 1 || 0 | 2 + 1*2^n = 0
1 | no | 1 || 0 | 2 + 1*2^n = 1
input "0": (0) output 1
inputs "1,0,0": (4) output 0
inputs "1,1,0,0": (6) output 1
shouldn't this last input be 12, or am i misunderstanding the question?
Actually the LSB method would actually make this easier. In C:
MSB method:
/*
Returns 1 if divisble by 3, otherwise 0
Note: It is assumed 'input' format is valid
*/
int is_divisible_by_3_msb(char *input) {
unsigned value = 0;
char *p = input;
if (*p == '1') {
value &= 1;
}
p++;
while (*p) {
if (*p != ',') {
value <<= 1;
if (*p == '1') {
ret &= 1;
}
}
p++;
}
return (value % 3 == 0) ? 1 : 0;
}
LSB method:
/*
Returns 1 if divisble by 3, otherwise 0
Note: It is assumed 'input' format is valid
*/
int is_divisible_by_3_lsb(char *input) {
unsigned value = 0;
unsigned mask = 1;
char *p = input;
while (*p) {
if (*p != ',') {
if (*p == '1') {
value &= mask;
}
mask <<= 1;
}
p++;
}
return (value % 3 == 0) ? 1 : 0;
}
Personally I have a hard time believing one of these will be significantly different to the other.
I think Nathan Fellman is on the right track for part a and b (except b needs an extra piece of state: you need to keep track of if your digit position is odd or even).
I think the trick for part C is negating the last value at each step. I.e. 0 goes to 0, 1 goes to 2 and 2 goes to 1.
A number is divisible by 3 if the sum of it's digits is divisible by 3.
So you can add the digits and get the sum:
if the sum is greater or equal to 10 use the same method
if it's 3, 6, 9 then it is divisible
if the sum is different than 3, 6, 9 then it's not divisible