Software IT-Consulting und Coaching

Find duplicated items in 2 Collections with Lambda expressions

Imagine a case, where items of two Collections need to be filtered out, which are contained in both. But not the item itself should be compared, but only a property within the item. Of course, we are not able to overwrite equals and hashCode, because we have to deal with domain objects.

class Item {
  private String id;
  private String attribute;
  // getters ...
}

We got two Collections existingItems and additionalItems where duplicates shall be filtered out by Item.getId

At first, both collections need to be joined together:

Stream<Item> merged = Stream.concat(existingItem.stream(), additionalItems.stream());

Now we group them by id and filter those, which have more than one related item.

Stream<Entry<String, Long>> duplicates = merged.collect(Collectors.groupingBy(Item::getId), Collectors.counting())).entrySet().stream().filter(e -> e.getValue() > 1);

As result a Stream of id to number of occurences is exposed, caused by Collectors.groupingBy function, which produces a Map.
To get the duplicated id’s only, just map it back:

Stream<String> ids = duplicates.map(Map.Entry::getKey);

Putting it together:

Stream<String> ids = Stream.concat(existingItem.stream(), additionalItems.stream()).collect(Collectors.groupingBy(Item::getId), Collectors.counting())).entrySet().stream().filter(e -> e.getValue() > 1).map(Map.Entry::getKey);

If the items of found duplicates shall be retained, than the collect groupingBy must not be called with a downStream function:

Stream<Entry<String, List<Item>>> duplicates = merged.collect(Collectors.groupingBy(Item::getId))).entrySet().stream().filter(e -> e.getValue() > 1);