Scala Tutorial - Learn How To Use Par Function For Scala 2.13.0 And Above

By Nadim Bahadoor | Last updated: February 5, 2020 at 12:28 pm

Overview

In the previous tutorial, we introduced the par function to run parallel computations on collection data structures in Scala. The par function is short for parallel, and represents a convenient way of accessing Scala’s Parallel Collections. These are designed from the ground up to work seamlessly with both Scala's Mutable and Immutable collection data structures. That is, when you call the par function on either an Immutable, or Mutable, collection, you get back an equivalent Parallel Collection. These include ParVector, ParHashMap, ParHashSet, ParRange, ParArray and ParTrieMap. The above-mentioned data structures inherently facilitate running computation over multi-cores.

 

However, as of Scala 2.13.0, the par function has undergone some important changes. As of the Scala 2.13.0 version, the Parallel Collections has its own repository and is no longer part of the standard Scala module. Therefore, do not forget to add its respective artifact in your build.sbt file as shown below.


libraryDependencies += "org.scala-lang.modules" %% "scala-parallel-collections" % "0.2.0"

For illustration purposes, we will use the par function to run some transformations over a Sequence of Donuts of type String. In addition, we will workout a basic parallel computation for summing all Donut prices of type Double from a ParVector[Double]. Beyond any doubt, the par function and, for that matter, the Parallel Collections are best suited when you have data structures with large data points, as opposed to the small sample data set that we will use in this section.

 

Steps

1. How to initialize a Sequence of donuts

We initialize a Sequence containing elements of type String to represent Donut names, which does not contain the ‘ Donut‘ literal.

println("Step1:HowtoinitializeanImmutableSequenceofvariousdonutflavours") 
val donuts = Seq("Plain","Strawberry","Glazed") 
println(s"ElementsofdonutsimmutableSequence=$donuts") 

You should see the following output when you run your Scala application in IntelliJ:


Step 1: How to initialize an Immutable Sequence of various donut flavours 
Elements of donuts immutable Sequence = List(Plain, Strawberry, Glazed) 

 

2. Convert the Immutable donuts Sequence into a Parallel Collection 

Using the par function, you can convert the Donut Sequence into an equivalent parallel ParSeq without much effort. Notice that a ParSeq is in fact a trait which generalizes the behavior of parallel index-based, or sequential, data structures. One of its concrete implementation is a ParVector, as shown in the output: ParVector(Plain, Strawberry, Glazed). With Scala 2.13.0, you will need to import scala.collection.parallel.CollectionConverters._ before you gain access to the .par function.


println("\nStep2:ConverttheImmutabledonutsSequenceintoParallelCollection") 
import scala.collection.parallel.ParSeq 
import scala.collection.parallel.CollectionConverters._ 
val donutsParallel:ParSeq[String] = donuts.par 
println(s"ElementsofdonutsParallel=$donutsParallel")
 
You should see the following output when you run your Scala application in IntelliJ:


Step 2: Convert the Immutable donuts Sequence into a Parallel Collection 
Elements of donutsParallel = ParVector(Plain, Strawberry, Glazed) 

3. How to use a Scala Parallel Collection

With the Parallel Collection at hand from Step 2, we can call the map function to augment each Donut name with the Donut literal. As a matter of fact, Parallel Collections provide the related functions that are available on the Immutable and Mutable collections, and the map function being just one example. What is important to consider here, is that the ParSeq collection will split and run the map transformation in parallel, accordingly.


println("\nStep3:HowtouseaScalaParallelCollection") 
val donutsParSeq:ParSeq[String] = donutsParallel.map(d=>s"$dDonut") 
println(s"ElementsofdonutsParSeqcollection=$donuts") 

You should see the following output when you run your Scala application in IntelliJ:


Step 3: How to use a Scala Parallel Collection 
Elements of donutsParSeq collection = ParVector(Plain Donut, Strawberry Donut, Glazed Donut) 

 

4. Explicitly create a Parallel Collection

As a side note, Parallel Collections can be instantiated in the same way as an Immutable, or Mutable, collection. The code below explicitly creates a ParVector of type Double to represent some Donut prices.

println("\nStep4:ExplicitlycreateaParallelCollection") 
import scala.collection.parallel.immutable.ParVector 
val donutsParVector = ParVector[Double](1.50,2.0,2.50)
println(s"ElementsofdonutParVector=$donutsParVector")  

You should see the following output when you run your Scala application in IntelliJ:


Step 4: Explicitly create a Parallel Collection
Elements of donutParVector = ParVector(1.5, 2.0, 2.5) 

 

5. Find the sum of all Donut prices in parallel

Generally speaking, you would make use of a Parallel Collection if you need to run a MapReduce functionality over your data points. To illustrate this, we’ll call the fold function with an initial value of 0.0 - that is, fold(0.0) - and then pass-through the (_ + _) notation. This implies that we add each Donut price to the accumulated value over each iteration of the fold function. The use of the fold function here should feel no different to what we’ve described earlier under the fold tutorial.

 

Nonetheless, this operation is most definitely being run in parallel from the fact that we are making use of a ParVector. An important point to consider with parallel computations is to ensure that our operation is naturally both associative and commutative. In other words, you do not have control over the execution order of your parallel computation. Hence, you should make sure that your expression will produce the same result if it is split and fan-out to multiple cores in a non- deterministic way.

println("\nStep5:FindthesumofallDonutpricesinparallel") 
val donutsPricesSum = donutsParVector.fold(0.0)(_ + _) 
println(s"ThesumofallDonutprices=$donutsPricesSum")  

You should see the following output when you run your Scala application in IntelliJ:


Step 5: Find the sum of all Donut prices in parallel 
The sum of all Donut prices = 6.0 

This concludes our tutorial on Learn How To Use Par Function For Scala 2.13.0 And Above and I hope you've found it useful!

 

Stay in touch via Facebook and Twitter for upcoming tutorials!

 

Don't forget to like and share this page :)

Summary

In this tutorial, we went over the following:

  • How to initialize a Sequence of donuts
  • Convert the Immutable donuts Sequence into a Parallel Collection
  • How to use a Scala Parallel Collection
  • Explicitly create a Parallel Collection
  • Find the sum of all Donut prices in parallel

Tip

  • As a reminder, the par function is applicable to both Immutable and Mutable collection data structures.
  • Review the tutorials on Mutable and Immutable collection data structures in Scala.

Source Code

The source code is available on the allaboutscala GitHub repository.

 

What's Next

In the next tutorial, I will show you how to use partition function.

Nadim Bahadoor on FacebookNadim Bahadoor on GithubNadim Bahadoor on LinkedinNadim Bahadoor on Twitter
Nadim Bahadoor
Technology and Finance Consultant with over 14 years of hands-on experience building large scale systems in the Financial (Electronic Trading Platforms), Risk, Insurance and Life Science sectors. I am self-driven and passionate about Finance, Distributed Systems, Functional Programming, Big Data, Semantic Data (Graph) and Machine Learning.
Other allaboutscala.com tutorials you may like: