Skip to main content

Remove duplicate index in 2D array

I have this 2D numpy array here:

arr = np.array([[1,2],
                [2,2],
                [3,2],
                [4,2],
                [5,3]])

I would like to delete all duplicates corresponding to the previous index at index 1 and get an output like so:

np.array([[1,2],
          [5,3]])

However, when I try my code it errors. Here is my code:

for x in range(0, len(arr)):
    if arr[x][1] == arr[x-1][1]:
        arr = np.delete(arr, x, 0)

>>> IndexError: index 3 is out of bounds for axis 0 with size 2
Answer

Rather than trying to delete from the array, you can use np.unique to find the indices of first occurrences of the unique values in the second columns and use that to pull those values out:

import numpy as np   

arr = np.array([[1,2],
                [2,2],
                [3,2],
                [4,2],
                [5,3]])

u, i = np.unique(arr[:,1], return_index=True)

arr[i]    
# array([[1, 2],
#       [5, 3]])

Comments