Deleting Chained Duplicates - numpy

Lets say I have a list:
lits = [1, 1, 1, 2, 0, 0, 0, 0, 3, 3, 1, 4, 5, 2, 2, 2, 0, 0, 0]
and i need this to become [1, 1, 2, 0, 0, 3, 3, 1, 4, 5, 2, 2, 0, 0]
(Delete duplicates, but only in a chain of duplicates. Going to do this on a huge HDF5 file, with pandas, numpy. Would rather not use a for loop iterating through all elements.
table = table.drop_duplicates(cols='[SPEED OVER GROUND [kts]]', take_last=True)
Is there a modification I can do to this code?

In pandas you can do a boolean mask, selecting a row only if it is differs from either the preceding or succeeding value:
>>> df=pd.DataFrame({ 'lits':lits })
>>> df[ (df.lits != df.lits.shift(1)) | (df.lits != df.lits.shift(-1)) ]
lits
0 1
2 1
3 2
4 0
7 0
8 3
9 3
10 1
11 4
12 5
13 2
15 2
16 0
18 0

Related

Check every 4 values and change values accordingly in an np array

Hi there I have an np array with zeros and ones. I would like to check every 4 values, and if there is at least one (1) to put all four values equal to (1). Else leave all them to zero.
do you know how to do it? thanks here is a sample
np= [ 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 0 ]
np_corrected=np= [ 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 ]
many thanks, hope the question is now clear!
Probably not the shortest solution but definitely working and fast:
reshape
create a mask
apply the mask and get the result:
import numpy as np
array = np.array([0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0])
array
groups = array.reshape(-1, 4) # group every 4 elements into new columns
groups
mask = groups.sum(axis=1)>0 # identify groups with at least one '1'
mask
np.logical_or(groups.T, mask).T.astype(int).flatten()
# swap rows and columns in groups, apply mask, swap back,
# replace True/False with 1/0 and restore original shape
Returns (in Jupyter notebook):
array([0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0])
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[0, 0, 1, 0],
[0, 0, 0, 0]])
array([False, True, True, False])
array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0])

how to change value based on criteria pandas

I have a following problem. I have this df:
d = {'id': [1, 1, 2, 2, 3], 'value': [0, 1, 0, 0, 1]}
df = pd.DataFrame(data=d)
I would like to have a new column where value will be 1 if in any other cases it is also 1. See desired output:
d = {'id': [1, 1, 2, 2, 3], 'value': [0, 1, 0, 0, 1], 'newvalue': [1, 1, 0, 0, 1]}
df = pd.DataFrame(data=d)
How can I do it please?
If need set 0,1 by condition - here at least one 1 use GroupBy.transform with GroupBy.any for mask and casting to integers for True, False to 1,0 map:
df['newvalue'] = df['value'].eq(1).groupby(df['id']).transform('any').astype(int)
Alternative:
df['newvalue'] = df['id'].isin(df.loc[df['value'].eq(1), 'id']).astype(int)
Or if only 0,1 values is possible simplify solution for new column by maximal values per groups:
df['newvalue'] = df.groupby('id')['value'].transform('max')
print (df)
id value newvalue
0 1 0 1
1 1 1 1
2 2 0 0
3 2 0 0
4 3 1 1

How to efficiently perform closest neighbor interpolation with Numpy

I have an image that for the sake of this issue is just a numpy array. I want to filter the image to remove noise in form of isolated transparent pixels (more generally, I also would like to remove lines, but this the next problem).
Let's set a playable example:
a = np.ones((10, 10), np.uint8)
a[5,5] = 0 # isolated hole
a[5,6] = 2 # the neighbour to clone
Leading to this matrix:
[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 0, 2, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
Well, the problem is that when there is a 0 in the matrix, I want it to be replaced by the closest neighbour (2 in this case, but perhaps an average of the four, if easy to implement, would be better).
Can this be done without explicit loops?
Detect the 0 entries in a using boolean array indexing. To calculate the described average, use scipy.signal.convolve2d with mode='same' and the following kernel:
kernel = np.array([[0, 1, 0], [1, 0, 1], [0, 1, 0]]) / 4
Finally, replace the found 0 entries in a with the corresponding entries from the convolution result. See this code snippet:
import numpy as np
from scipy.signal import convolve2d
a = np.ones((10, 10), np.uint8)
a[5, 5] = 0
a[5, 6] = 2
print(a)
kernel = np.array([[0, 1, 0], [1, 0, 1], [0, 1, 0]]) / 4
b = convolve2d(a, kernel, mode='same')
a[a == 0] = b[a == 0]
print(a)
The result is:
[[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 2 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]]
Since you initialized a as np.uint8, the actual replacing value 1.25 is truncated to 1. If you initialize a as np.float32, you'll see, that 1.25 is correctly placed there.
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.9.1
NumPy: 1.20.1
SciPy: 1.6.0
----------------------------------------
EDIT: The hardcoded kernel won't work correctly for corner and border pixels, if you just wanted to calculate the average of the only two/three neighbours. Maybe, set up the kernel without the division, and store three different convolution results, each divided by 2, 3, or 4 and pick the correct values w.r.t. the presence of corner, border, or regular pixels.

A question about numpy ndarray transformation

any simple way to change this array
[[ 3 4 0 1 2]
[ 8 9 5 6 7]
[13 14 10 11 12]]
into:
[[ 0 0 0 1 2]
[ 0 0 5 6 7]
[ 0 0 10 11 12]]
?
Edit: maximum supported dimension for an ndarray is 32, found 306 for transpose
Use Slicing:
>>> a[:,:2] = 0
>>> a
array([[ 0, 0, 0, 1, 2],
[ 0, 0, 5, 6, 7],
[ 0, 0, 10, 11, 12]])

How does tensorflow get indices of unique value in Tensorflow Tensor?

Suppose I have one input 1D tensor, I want to get indices for unique elements in 1D tensor.
input 1D tensor
[ 1 3 0 0 0 3 5 6 8 9 12 2 5 7 0 11 6 7 0 0]
expected output
Values: [1, 3, 0, 5, 6, 8, 9, 12, 2, 7, 11]
indices: [0, 1, 2, 6, 7, 8, 9, 10, 11, 13, 15]
Here is my strategy now.
input = [ 1, 3, 0, 0, 0, 3, 5, 6, 8, 9, 12, 2, 5, 7, 0, 11, 6, 7, 0, 0,]
unique_value_in_input, _ = tf.unique(input) # [1 3 0 5 6 8 9 12 2 7 11]
number_of_unique_value = tf.shape(unique_value_in_input)[0] #11
y = tf.reshape(y, (number_of_unique_value, 1)) #[[1], [3], [0], [5], [6], [8], [9], ..]
input_matrix = tf.tile(input, [number_of_unique_value]) # repeat the tensor for tf.equal()
input_matrix = tf.reshape(input, [number_of_unique_value,-1])
cols = tf.where(tf.equal(input_matrix, y))[:,-1] #[[ 0 0] [ 1 1] [ 1 5] [ 2 6] [ 2 12] ...]
Since I will have repeat value in tf.where() step, which means I have duplicated True in result.
Is there any function I can use in this issue?
You should be able to do the following and get the desired output. We do the following. For each value in unique values, you get a boolean tensor and get the maximum index (i.e only the first maximum index) through tf.argmax.
import tensorflow as tf
input = tf.constant([ 1, 3, 0, 0, 0, 3, 5, 6, 8, 9, 12, 2, 5, 7, 0, 11, 6, 7, 0, 0,], tf.int64)
unique_vals, _ = tf.unique(input)
res = tf.map_fn(
lambda x: tf.argmax(tf.cast(tf.equal(input, x), tf.int64)),
unique_vals)
with tf.Session() as sess:
print(sess.run(res))