Imagine a case, where items of two Collections need to be filtered out, which are contained in both. But not the item itself should be compared, but only a property within the item. Of course, we are not able to overwrite equals
and hashCode
, because we have to deal with domain objects.
class Item { private String id; private String attribute; // getters ... }
We got two Collections existingItems
and additionalItems
where duplicates shall be filtered out by Item.getId
At first, both collections need to be joined together:
Stream<Item> merged = Stream.concat(existingItem.stream(), additionalItems.stream());
Now we group them by id and filter those, which have more than one related item.
Stream<Entry<String, Long>> duplicates = merged.collect(Collectors.groupingBy(Item::getId), Collectors.counting())).entrySet().stream().filter(e -> e.getValue() > 1);
As result a Stream of id to number of occurences is exposed, caused by Collectors.groupingBy function, which produces a Map.
To get the duplicated id’s only, just map it back:
Stream<String> ids = duplicates.map(Map.Entry::getKey);
Putting it together:
Stream<String> ids = Stream.concat(existingItem.stream(), additionalItems.stream()).collect(Collectors.groupingBy(Item::getId), Collectors.counting())).entrySet().stream().filter(e -> e.getValue() > 1).map(Map.Entry::getKey);
If the items of found duplicates shall be retained, than the collect groupingBy must not be called with a downStream function:
Stream<Entry<String, List<Item>>> duplicates = merged.collect(Collectors.groupingBy(Item::getId))).entrySet().stream().filter(e -> e.getValue() > 1);