Scala Tutorial - Data Structures
Overview
This chapter is primarily split into two sections, namely, Scala’s Immutable Collections, and Mutable Collections, respectively.
Prior to delving deeper into the various collection types that Scala provides out-of-the-box, we start by reviewing the most commonly used data structures in the field of computer science, as these relate to day-to-day programming. This review is quite primordial so as you can easily understand the characteristics of particular data structures from either, the Immutable, or the Mutable, collection. For instance, Scala does not limit itself to providing the basic data structures, such as, a List, a Set, or a Map. It further combines the features of the above-mentioned into additional types, such as, a ListSet, or a ListMap.
The section on the Immutable Collection obviously presents step-by-step examples in using Scala’s immutable data structures. Likewise, the part on the Mutable Collection naturally introduces Scala’s mutable data structures one step at a time.
More advanced functions such as aggregate, fold, reduce, map, flatMap etc on the Immutable List will be discussed in Chapter 8 on Collection Functions.
Steps
1. What is a data structure?
In the field of computing in general, a Data Structure facilitates the storing, organization, and retrieval of your data. It is important to note that Scala makes a clear distinction on immutable and mutable collection data types. Immutable collections reside in the scala.collection.immutable package, and mutable collections are organized under the scala.collection.mutable package, respectively. Without any surprises, Scala favors immutable collections, and adds them in scope without requiring any explicit import statements. Mutation is however not always as bad as we think - there are problem sets that are inherently mutable in nature. Generally speaking, however, mutations make it difficult to, say, fan-out, or run computations in parallel.
Before we proceed with showing how to use Scala’s Immutable and Mutable collections, it would be good to first run through a quick review of the common data structures. The Scala programming language provides a wide range of data structures out-of-the-box, which we will see shortly and, broadly speaking, these can be grouped into an array, a map, a list, a tree, a set and a queue.
2. An Array data structure
An Array is a data structure where each element is stored in a continuous sequence and is referenced by an index. Arrays are great for random access, and thanks to the index facility, it forms the basis of other data structures, such as, lists, queues, stacks and trees. Consider the simple array below which stores the letters a, b, c and d in their corresponding indices of 0, 1, 2, and 3.
Array elements: [a] [b] [c] [d]
Array index: 0 1 2 3
3. A List data structure
A List represents elements in sequence similar to an array, but each element points to the next, or the previous element, as opposed to being constrained by an index. Consequently, lists are great for dynamic sizing and they obviously do not fair well with random access of elements. In contrast, lists are ideally suited for Last In First Out (LIFO) access pattern. Therefore, a list can be used to implement a stack, where the last element added to the list will be the first one to be retrieved. The diagram below shows a Singly Linked List, where the numeral 12 points to 99 which in turns points to 37.
4. A Map data structure
A Map combines the semantics of arrays and lists, whereby a given key points to a corresponding element. Map data structures will typically make use of a hashing function in order to distribute and map keys to their values. With the key-to-value relationship, maps are significantly more adequate for fast lookups.
5. A Set data structure
A Set is a special data structure which enforces the uniqueness of data points. You would commonly use a set to store referential data, such as, a list of user logins, or user ids that should not be duplicated. The use of sets have obvious roots in mathematics252 to facilitate operations, such as, union, intersection, and set difference.
6. A Queue data structure
A Queue stores elements in sequence similar to an array. It, however, facilitates First In First Out (FIFO) behaviour, whereby the first element which was added to the queue will be the first one to be removed in an operation commonly referred to as dequeue. Conversely, additional elements are added to the end of the queue data structure using an enqueue operation. Queues are obviously useful whenever you require a FIFO semantic, such as, serving the first customer in line at a shop.
7. A Tree data structure
A Tree is a hierarchical data structure that store elements at a root node, and branches off into subsequent child nodes. The use of trees are especially convenient in the applications of indexing, searching, as well as in data science for predictive analysis.
Summary
In this tutorial, we went over the following:
- What is a data structure?
- An Array data structure
- A List data structure
- A Map data structure
- A Set data structure
- A Queue data structure
- A Tree data structure
Tip
- Don't forget to review Chapter 8 on Collection Functions where we will go over the rich suite of functions exposed on collections such as aggregate, collect, fold, reduce, map, flatMap, scan, partition etc
- Scala's documentation on immutable List.
Source Code
The source code is available on the allaboutscala GitHub repository.
What's Next
In the next tutorial, I will show you how to use Scala's Immutable ListSet.