Saving with numpy savetxt. Array elements as columns - numpy

I am pretty new to Python and trying to kick my Matlab addiction. I am converting a lot of my lab's machine vision code over to Python but I am just stuck on one aspect of the saving. At each line of the code we save 6 variables in an array. I'd like these to be entered in as one of 6 columns in a txt file with bumpy.savetxt. Each iteration of the tracking loop would then add similar variables for that given frame as the next row in the txt file.
But I keep getting either a single column that just grows with every loop. I've attached a simple code to show my problem. As it loops through, there will be a variable generated that is called output. I would like this to be the three columns of the txt file and each iteration of the loop to be a new row. Is there any easy way to do this?
import numpy as np
dataFile_Path = "dataFile.txt"
dataFile_id = open(dataFile_Path, 'w+')
for x in range(0, 9):
variable = np.array([2,3,4])
output = x*variable+1
output.astype(float)
print(output)
np.savetxt(dataFile_id, output, fmt="%d")
dataFile_id.close()

In [160]: for x in range(0, 9):
...: variable = np.array([2,3,4])
...: output = x*variable+1
...: output.astype(float)
...: print(output)
...:
[1 1 1]
[3 4 5]
[5 7 9]
[ 7 10 13]
[ 9 13 17]
[11 16 21]
[13 19 25]
[15 22 29]
[17 25 33]
So you are writing one row at a time. savetxt normally is used to write a 2d array.
Notice that the print is still integers - astype returns a new array, it does not change things inplace.
But because you are giving it 1d arrays it writes those as columns:
In [177]: f = open('txt','bw+')
In [178]: for x in range(0, 9):
...: variable = np.array([2,3,4])
...: output = x*variable+1
...: np.savetxt(f, output, fmt='%d')
...:
In [179]: f.close()
In [180]: cat txt
1
1
1
3
4
5
5
7
9
if instead I give savetxt a 2d array ((1,3) shape), it writes
In [181]: f = open('txt','bw+')
In [182]: for x in range(0, 9):
...: variable = np.array([2,3,4])
...: output = x*variable+1
...: np.savetxt(f, [output], fmt='%d')
...:
...:
In [183]: f.close()
In [184]: cat txt
1 1 1
3 4 5
5 7 9
7 10 13
9 13 17
11 16 21
13 19 25
15 22 29
17 25 33
But a better approach is to construct the 2d array, and write that with one savetxt call:
In [185]: output = np.array([2,3,4])*np.arange(9)[:,None]+1
In [186]: output
Out[186]:
array([[ 1, 1, 1],
[ 3, 4, 5],
[ 5, 7, 9],
[ 7, 10, 13],
[ 9, 13, 17],
[11, 16, 21],
[13, 19, 25],
[15, 22, 29],
[17, 25, 33]])
In [187]: np.savetxt('txt', output, fmt='%10d')
In [188]: cat txt
1 1 1
3 4 5
5 7 9
7 10 13
9 13 17
11 16 21
13 19 25
15 22 29
17 25 33

Related

Randomness of np.random.shuffle

I have two arrays (i and j) that are exactly the same. I shuffle them with a specified random seed.
import numpy as np
np.random.seed(42)
i = np.array([0, 1, 2, 3, 4, 5, 6, 7])
j = np.array([0, 1, 2, 3, 4, 5, 6, 7])
np.random.shuffle(i)
np.random.shuffle(j)
print(i, j)
# [1 5 0 7 2 4 3 6] [3 7 0 4 5 2 1 6]
They were supposed to be the same after shuffling, but it is not the case.
Do you have any ideas about how to get the same results (like the example below) after shuffling?
# [1 5 0 7 2 4 3 6] [1 5 0 7 2 4 3 6]
Many thanks in advance!
Calling seed() sets the state of a global random number generator. Each call of shuffle continues with the same global random number generator, so the results are different, as they should be. If you want them to be the same, reset the seed before each call of shuffle.

Reading text file with multiple blocks separated by #

I have a text file with multiple blocks separated by #. The number of rows in each block is different. I would integrate a variable for each block. The text file looks like the following:
# a b c
### grid 1
1 2 3
2 3 4
3 4 5
### grid 2
11 12 13
12 13 14
13 14 15
### grid 3
21 22 23
22 23 24
23 24 25
24 25 26
I wound integrate a*c for each block. Using block one as an example, the result should be 1*3 + 2*4 + 3*5. Any ideas to implement it using numpy or pandas?
Once you have loaded a block into memory, you'll get an array like:
In [115]: arr = np.arange(1,4)+np.arange(0,3)[:,None]
In [116]: arr
Out[116]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
the sum of products is then easy:
In [117]: np.dot(arr[:,0], arr[:,2])
Out[117]: 26
In [118]: 1*3+2*4+3*5
Out[118]: 26
I found an answer from #Fred Foo, which reads the file quite good.
from itertools import groupby
def contains_data(ln):
# just an example; there are smarter ways to do this
return ln[0] not in "#\n"
with open("example") as f:
datasets = [[ln.split() for ln in group] \
for has_data, group in groupby(f, contains_data) \
if has_data]
dim1 = len(datasets)
cooling_intgrl = np.zeros(dim1)
for i in range(dim1):
block = np.array(datasets[i]).astype(float)
length = block[:,0]
cooling = block[:,2]
result = np.dot(length, cooling)
cooling_intgrl[i] = result
This works very well for me.

how to convert a pandas column containing list into dataframe

I have a pandas dataframe.
One of its columns contains a list of 60 elements, constant across its rows.
How do I convert each of these lists into a row of a new dataframe?
Just to be clearer: say A is the original dataframe with n rows. One of its columns contains a list of 60 elements.
I need to create a new dataframe nx60.
My tentative:
def expand(x):
return(pd.DataFrame(np.array(x)).reshape(-1,len(x)))
df["col"].apply(lambda x: expand(x))]
it gives funny results....
The weird thing is that if i call the function "expand" on a single raw, it does exactly what I expect from it
expand(df["col"][0])
To ChootsMagoots: Thjis is the result when i try to apply your suggestion. It does not work.
Sample data
df = pd.DataFrame()
df['col'] = np.arange(4*5).reshape(4,5).tolist()
df
Output:
col
0 [0, 1, 2, 3, 4]
1 [5, 6, 7, 8, 9]
2 [10, 11, 12, 13, 14]
3 [15, 16, 17, 18, 19]
now exctract DataFrame from col
df.col.apply(pd.Series)
Output:
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
Try this:
new_df = pd.DataFrame(df["col"].tolist())
This is a little frankensteinish, but you could also try:
import numpy as np
np.savetxt('outfile.csv', np.array(df['col'].tolist()), delimiter=',')
new_df = pd.read_csv('outfile.csv')
You can try this as well:
newCol = pd.Series(yourList)
df['colD'] = newCol.values
The above code:
1. Creates a pandas series.
2. Maps the series value to columns in original dataframe.

Pandas DataFrame.update with MultiIndex label

Given a DataFrame A with MultiIndex and a DataFrame B with one-dimensional index, how to update column values of A with new values from B where the index of B should be matched with the second index label of A.
Test data:
begin = [10, 10, 12, 12, 14, 14]
end = [10, 11, 12, 13, 14, 15]
values = [1, 2, 3, 4, 5, 6]
values_updated = [10, 20, 3, 4, 50, 60]
multiindexed = pd.DataFrame({'begin': begin,
'end': end,
'value': values})
multiindexed.set_index(['begin', 'end'], inplace=True)
singleindexed = pd.DataFrame.from_dict(dict(zip([10, 11, 14, 15],
[10, 20, 50, 60])),
orient='index')
singleindexed.columns = ['value']
And the desired result should be
value
begin end
10 10 10
11 20
12 12 3
13 4
14 14 50
15 60
Now I was thinking about a variant of
multiindexed.update(singleindexed)
I searched the docs of DataFrame.update, but could not find anything w.r.t. index handling.
Am I missing an easier way to accomplish this?
You can use loc for selecting data in multiindexed and then set new values by values:
print singleindexed.index
Int64Index([10, 11, 14, 15], dtype='int64')
print singleindexed.values
[[10]
[20]
[50]
[60]]
idx = pd.IndexSlice
print multiindexed.loc[idx[:, singleindexed.index],:]
value
start end
10 10 1
11 2
14 14 5
15 6
multiindexed.loc[idx[:, singleindexed.index],:] = singleindexed.values
print multiindexed
value
start end
10 10 10
11 20
12 12 3
13 4
14 14 50
15 60
Using slicers in docs.

Aggregating a time series in Pandas given a window size

Lets say I have this data
a = pandas.Series([1,2,3,4,5,6,7,8])
a
Out[313]:
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
dtype: int64
I would like aggregate data which groups data n rows at a time and sums them up. So if n=2 the new series would look like {3,7,11,15}.
try this:
In [39]: a.groupby(a.index//2).sum()
Out[39]:
0 3
1 7
2 11
3 15
dtype: int64
In [41]: a.index//2
Out[41]: Int64Index([0, 0, 1, 1, 2, 2, 3, 3], dtype='int64')
n=3
In [42]: n=3
In [43]: a.groupby(a.index//n).sum()
Out[43]:
0 6
1 15
2 15
dtype: int64
In [44]: a.index//n
Out[44]: Int64Index([0, 0, 0, 1, 1, 1, 2, 2], dtype='int64')
you can use pandas rolling mean and get it like the following:
if n is your interval:
sums = list(a.rolling(n).sum()[n-1::n])
# Optional !!!
rem = len(a)%n
if rem != 0:
sums.append(a[-rem:].sum())
The first line perfectly adds the rows if the data can be properly divided into groups, else, we also can add the remaining sum (depends on your preference).
For e.g., in the above case, if n=3, then you may want to get either {6, 15, 15} or just {6, 15}. The code above is for the former case. And skipping the optional part gives you just {6, 15}.