Python List Manipulation — First Duplicate
The problem: Given a certain array as a parameter, find the first duplicated element in it. The first repeated element in the array should be return. Sound pretty simple! lets see.
Assumption:
- The given array is integer type .
- The element in the array have to be between 1 and N. This mean not 0 ot negative numbers. [1…….N]
- The array is not sorted
To visualize the use cases of this method, The array above mark the first appearance of the a duplicate number with green, while the other duplicate in yellow since they are irrelevant and our focus here to find the first duplicate.
In a real life number the array usually longer than the given example, but for the matter of abstraction we will work with the given example. so in the first array we need to return 2 and similarly in the second array we should return 1 and return 3 for the third array. in another words we are looking for the second occurrence of a number.
Solution
The easiest and most straight forward yet the worst solution is the “Brute Force Algorithm”, This approach usually don’t pay any attention to the performance, instead the algorithm will iterate through all probabilities till it solve the problem. So, we can loop through the whole list element by element to ask if that specific element is duplicated anywhere in the list.
You apply the same process for all elements till you find your first duplicate, keep in mind when you compare you start from n+1 since you don’t need to check the already checked elements. With the nested for loop we can achieve our result, so we can return the the element when duplicated or -1 when not found.
If you have encounter any nested loop like this before you know for sure that the time complexity for this solution will cost it heavily. it’s going to have a time complicity if o(n²), this is not acceptable consider the salability of this data set we apply our algorithm to, imagine if you only have 100 elements, then you need a time of 100² which is 10000, so we have to come up with better solution that solve the problem more efficiently.
We should at least bring the time complexity to o(n), so we need to discard the comparison of each element in a nested loop, instead we should consider another data structure that provide easy index and no time accessibility. Hash table or Hash set is considered as space complexity that can help us optimize our time complicity. You use the hash table to store elements already visited, so for each element in the array we check if it is already in the hash-table, then we’ve found a duplicate and we return the current element, otherwise we add the item to the hash-table. So basically a set
uses a hashtable as its underlying data structure.
This is great, we reduce the time complexity to o(n) from o(n²), However it take some space, your homework to think of some other solution achieve o(n) with no space.