How to achieve 'inner' zip? - numpy

The shape of the two arrays x and y is (a,b). How do I get a combined array of shape (a,b,2)?
My current solution is
z = np.zeros((a,b,2))
z[:,:,0] = x
z[:,:,1] = y
Is it possible to achieve this without creating a new array?

You can use np.dstack:
In [2]: import numpy as np
In [3]: a = np.random.normal(size=(4,6))
In [4]: b = np.random.normal(size=(4,6))
In [5]: np.dstack((a,b)).shape
Out[5]: (4, 6, 2)
And a comparison:
In [10]: d = np.dstack((a,b))
In [11]: c = np.zeros((4,6,2))
In [12]: c[:,:,0] = a
In [13]: c[:,:,1] = b
In [14]: np.allclose(c,d)
Out[14]: True

Related

How can delete the index from the data?

I was trying to use the re.sub() on my data, but it keeps showing the TypeError.
(TypeError: expected string or bytes-like object).
This (example) is the data that I'm using:
I was trying to do:
import re
example_sub = re.sub('\n', ' ', example)
example_sub
I tried to resolve it by removing the index using reset_index(), but it didn't work.
What should I do?
Thank you!
You can use pandas.Series.str.replace:
>>> import pandas as pd
>>> df = pd.DataFrame({"a": ["a\na", "b\nb", "c\nc\nc\nc\n"]})
>>> df.a.str.replace("\n", " ")
0 a a
1 b b
2 c c c c
Name: a, dtype: object
For more complex substitutions, you can use a regex pattern:
>>> import re
>>> import pandas as pd
>>> df = pd.DataFrame({"a": ["a\na", "b\nb", "c\nc\nc\nc\n"]})
>>> pattern = re.compile(r"\n")
>>> df.a.str.replace(pattern, " ")
0 a a
1 b b
2 c c c c
Name: a, dtype: object

Pandas: Memory error when using apply to split single column array into columns

I am wondering if anybody has a quick fix for a memory error that appears when doing the same thing as in the below example on larger data?
Example:
import pandas as pd
import numpy as np
nRows = 2
nCols = 3
df = pd.DataFrame(index=range(nRows ), columns=range(1))
df2 = df.apply(lambda row: [np.random.rand(nCols)], axis=1)
df3 = pd.concat(df2.apply(pd.DataFrame, columns=range(nCols)).tolist())
It is when creating df3 I get memory error.
The DF's in the example:
df
0
0 NaN
1 NaN
df2
0 [[0.6704675101784022, 0.41730480236712697, 0.5...
1 [[0.14038693859523377, 0.1981014890848788, 0.8...
dtype: object
df3
0 1 2
0 0.670468 0.417305 0.558690
0 0.140387 0.198101 0.800745
First I think working with lists in pandas is not good idea, if possible, you can avoid it.
So I believe you can simplify your code a lot:
nRows = 2
nCols = 3
np.random.seed(2019)
df3 = pd.DataFrame(np.random.rand(nRows, nCols))
print (df3)
0 1 2
0 0.903482 0.393081 0.623970
1 0.637877 0.880499 0.299172
Here's an example with a solution of the problem (note that in this example lists are not used in the columns, but arrays instead. This I cannot avoid, since my original problem comes with lists or array in a column).
import pandas as pd
import numpy as np
import time
np.random.seed(1)
nRows = 25000
nCols = 10000
numberOfChunks = 5
df = pd.DataFrame(index=range(nRows ), columns=range(1))
df2 = df.apply(lambda row: np.random.rand(nCols), axis=1)
for start, stop in zip(np.arange(0, nRows , int(round(nRows/float(numberOfChunks)))),
np.arange(int(round(nRows/float(numberOfChunks))), nRows + int(round(nRows/float(numberOfChunks))), int(round(nRows/float(numberOfChunks))))):
df2tmp = df2.iloc[start:stop]
if start == 0:
df3 = pd.DataFrame(df2tmp.tolist(), index=df2tmp.index).astype('float16')
continue
df3tmp = pd.DataFrame(df2tmp.tolist(), index=df2tmp.index).astype('float16')
df3 = pd.concat([df3, df3tmp])

How to develop a function

I am completely new in python and I am learning it. I have written the following code but i couldnt make any functions of it. Can somebody help me please?
import pandas as pd
import numpy as np
f = open('1.csv', 'r')
df = pd.read_csv(f, usecols=[0], sep="\t", index_col=False)
Primary_List = df.values.tolist()
x = 0
y = len(Primary_List)
for i in range(x, y):
x = i
MyMatrix = Primary_List[x:x + 10]
print(MyMatrix)
You could create a function where you pass in the filename, then you could use this code to read and print many csv files.
def createMatrix(filename):
f = open(filename, 'r')
df = pd.read_csv(f, usecols=[0], sep="\t", index_col=False)
Primary_List = df.values.tolist()
x = 0
y = len(Primary_List)
for i in range(x, y):
x = i
MyMatrix = Primary_List[x:x + 10]
return MyMatrix
print(createMatrix('1.csv'))
print(createMatrix('2.csv'))
print(createMatrix('3.csv'))

Assign a list with a missing value to a Pandas Series in Python

Something wired when I tried to assign a list with missing value np.nan to a Pandas Series
Below are the codes to reproduce the fact.
import numpy as np
import pandas as pd
S = pd.Series(0, index = list('ABCDE'))
>>> S
A 0
B 0
C 0
D 0
E 0
dtype: int64
ind = [True, False, True, False, True]
x = [1, np.nan, 2]
>>> S[ind]
A 0
C 0
E 0
dtype: int64
Assign x to S[ind]
S[ind] = x
Something wired in S
>>> S
A 1
B 0
C 2
D 0
E NaN
dtype: float64
I am expecting S to be
>>> S
A 1
B 0
C NaN
D 0
E 2
dtype: float64
Anyone can give an explanation for this?
You can try this:
S[S[ind].index] = x
or
S[S.index[ind]] = x

How do you filter out rows with NaN in a panda's dataframe

I have a few entries in a panda dataframe that are NaN. How would I remove any row with a NaN?
Just use x.dropna():
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]:
In [3]: df = pd.DataFrame(np.random.randn(5, 2))
In [4]: df.iloc[0, 1] = np.nan
In [5]: df.iloc[4, 0] = np.nan
In [6]: print(df)
0 1
0 2.264727 NaN
1 0.229321 1.615272
2 -0.901608 -1.407787
3 -0.198323 0.521726
4 NaN 0.692340
In [7]: df2 = df.dropna()
In [8]: print(df2)
0 1
1 0.229321 1.615272
2 -0.901608 -1.407787
3 -0.198323 0.521726