Iteratees were pretty hard concept to grasp for me. Thanks to nice article http://mandubian.com/2012/08/27/understanding-play2-iteratees-for-normal-humans/ I managed to understand what it is and how it works, but event then it wasn’t clear for me why one may need it - mentioned features seem to be achievable with simpler tools like Scala lazy Streams (http://scala-lang.org/api/current/#scala.collection.immutable.Stream) and RxScala observables (http://reactivex.io/documentation/observable.html):
- Backpressure (produce data with such speed that consumer has time to process it) - lazy Streams do exactly this thing: element of Stream isn’t evaluated until someone attempts to retrieve it. RxScala observables at the moment seem to miss this feature.
- Ability to stop processing before input ends - for some cases lazy Stream has ready-made methods like
collectFirstetc. which return result without iterating the full data set. Observables may be unsubscribed to stop processing. Iteratees as far as I see always require rather complex custom code.
- Composition - both lazy streams and observables allow composition of processing step in monad-like way.
- Asynchronous, non-blocking - observables are non-blocking as well. Lazy streams miss this point and it can be just partly emulated by wrapping into
From these points in my mind arise such things as Scala lazy Streams (http://scala-lang.org/api/current/#scala.collection.immutable.Stream) and RxScala observables (http://reactivex.io/documentation/observable.html). They seem much simpler to understand then iteratees, that’s why there’s a natural question: what features do iteratees provide that make them worth learning (btw - I think this is the best introduction article: http://mandubian.com/2012/08/27/understanding-play2-iteratees-for-normal-humans/)?
I’ve implemented 2 pretty simple tasks (print all elements and calculate sum of all elements of a Seq) with each technology in order to feel the difference. Full code is available here: https://github.com/paul-lysak/misc_learning/blob/master/iteratee/play-iteratee/test/IterateeSpec.scala .
Having sample data let’s look at all implementations and then compare them:
val data = Seq[Int](10, 20, 30, 40, 50)
val itPrint = Iteratee.foreach[Int](a => println("element="+a)) val itSum = Iteratee.fold[Int, Int](0)(_ + _) val en = Enumerator(data: _*) val fp1 = en.run(itPrint) Await.ready(fp, DurationInt(10).seconds) val fs1 = en.run(itSum) fs1.foreach(s => println("sum="+s))
Lazy Stream implementation:
val str = data.toStream str.foreach(a => println("elStr="+a)) val s = str.fold(0)(_ + _) println("sumStr="+s)
val o = Observable.from(data) val o1 = Observable.from(data) o.subscribe(a => println("rxItem="+a)) val so = o1.foldLeft(0)(_ + _) so.subscribe(a => println("rxSum="+a))
At its core trait
play.api.libs.iteratee.Iteratee just defines reaction to 3 possible events (next item, empty input, end of input) in a pretty complex way. So constructing manually is pretty tedious and error-prone. However, luckily
Iteratee companion object contains couple of utility methods that hide most of complexity and make Iteratee construction almost as simple as
map on regular collections - see examples in code above. But still - what makes Iteratees special? Here is what I can say:
- Unlike Streams and Observables, iteration logic is fully decoupled from data source. That means that you can define Iteretee before defining data source. I would call this killer feature of Iteratees - both stream and observable require that corresponding stream or observable already exist before doing foreach/map/fold/etc. .
- Ability to reduce threads consumption. Iteratee construction methods come in 2 flavours - blocking and non-blocking. For example, object
def fold[E, A](state: A)(f: (A, E) => A)and
def foldM[E, A](state: A)(f: (A, E) => Future[A]). First one (
fold) is blocking - despite the fact that
Futurewhen called with such
Iterateeand doesn’t block current thread, 1 thread from
ExecutionContextwill be 100% time busy with Iteratee until that
Futurecompletes - no matter what job Iteratee is doing. Second method (
foldM) is non-blocking - as a reaction for new element it may run slow I/O operation and return
Futurethat will be completed after I/O end. Thus the thread will be used only for doing actual job by CPU when sending I/O operation or processing its result. That’s clear advantage compared to lazy stream (which could be packed in future to partly emulate asynchronous behavior), but observables do the things this way too.
- Remainder handling - if there’s an error during some element processing, or Iteratee decides to stop before reaching input end, there are means to get failed element and remaining part of input. So we may retry operation or go on processing with another iteratee. That may be nice advantage in some special cases compared both to lazy streams and observables.
Conclusion: Iteratees seem to have richer feature set then similar tools, but it is harder to use. I would call 2 cases when I would definitely stick to Iteratees:
- When asynchronous handling (slow I/O for each element) and backpressure required at the same time
- When iteration logic should be strongly decoupled from data source