Core Java

Calculate Word Frequency Count In Java 8

Introduction:

In this quick tutorial, we’ll look at ways in which we can create a word frequency map in Java 8.

Problem Definition:

Let’s say we have been given a list of names:

List<String> names = {"Sam", "James", "Selena", "James", "Joe", "Sam", "James"};

We wish to print the frequency map specifying the frequency count of each name in the list:

{Joe=1, James=3, Selena=1, Sam=2}

Word Frequency Map In Java 8:

Java 8 Streams helps us to come up with an easy and pretty straight-forward solution to this problem. Our code would look similar to:

Map<String, Long> frequencyMap = names.stream()
  .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

The idea here is to:

  1. Open a stream() over our list of names
  2. Collecting the outcome in a Map using Collectors.groupingBy() with each unique word treated as a key and its occurrence count as the value

If you’re new to Java 8 Stream, we’ll recommend you check out our article on Java 8 Streams.

Also, if we specifically intend to create a Map<String, Integer> i.e having counts stored as an Integer, we can use:

Map<String, Integer> frequencyMap = names.stream() 
  .collect(Collectors.groupingBy(Function.identity(), Collectors.summingInt(val -> 1)));

 

Collectors.summingInt() is responsible for summing integer values using a given mapper function. Our mapper function is simply incrementing the value by 1 for each occurrence.

Word Frequency Map In Java 8 (Case – Insensitive):

Let’s further improve over our solution and try to create a frequency map which ignores word casing. So, for a list of names:

List<String> names = {"Sam", "james", "Selena", "JAMes", "Joe", "sam", "JamES"};

Our solution should now ignore letter-casing and return:

{joe=1, selena=1, james=3, sam=2}

We can achieve it by using a very simple tweak to the above solution. It involves transforming each word in the stream to its lowercase version before performing a grouping operation:

Map<String, Long> frequencyMap = names.stream()
  .map(String::toLowerCase)
  .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

We can further tweak the solution based on our specific requirement.

Conclusion:

In this mini-tutorial, we learned how to create a word frequency map in Java 8.

Prior to Java 8, writing a method for calculating a word frequency count usually required us to write around 5-7 lines of code. The idea was to insert new keys into the map and keep on incrementing the counters for any word repetitions.

Java 8 Streams API made this solution a lot more sophisticated and hardly a one-liner. Hope this helps us realize the beauty and power of Java 8 Streams API.

Be the First to comment.

Leave a Comment

Your email address will not be published. Required fields are marked *