Chong's

Chong's

Domaine en aide à la décision, machine learning et SAP technical architecture

Let’s talk about stream api

Stream, it has been already an old story, since JAVA 8. Now we can easily use stream API in (almost?) any Collection.

What is the Stream API?

Stream API is a new API to make bulk data operation easier.

When should we use it?

When you have to read a bunch of data from a collection(or more) and you want just read them, rather then change them from the origin collection.

For exemple: there are 1000 dogs in an array, you have to copy all the pugs to a new array, so it’s a good time to use stream API, a pseudo code:

List<Dog> pugList = allDogArray.stream()
           .filter(dog -> isPug(dog))
           .collect(Collectors.toList());

if you have to have an arrayList, then use .collect(Collectors.toCollection(ArrayList::new))

it’s the same as:


List<Dog> pugList = new ArrayList<>();

for(Dog dog: allDogArray){
          if(isPug(dog)) pugList.add(dog);
}

not a big deal right? but if you want make a complicated operation, stream API can make it easy.

Now I want a list of Ids whose pugs sorted by their weight.


List<Dog> pugList = new ArrayList<>();

for(Dog dog: allDogArray){
          if(isPug(dog)) pugList.add(dog);
}//save all pugs to a list

Collections.sort(pugList, new Comparator(){
          public int compare(Dog d1, Dog d2){
          return d1.getWeight().compareTo(d2.getWeight());
}
}); // sort pugs by weight

List pugIds= new ArrayList<>();
for(Dog d: pugList){
           pugIds.add(d.getId());
}//save ids

And by using stream API:

List pugList = allDogArray.stream()
           .filter(dog -> isPug(dog))
           .sorted(Comparator.comparing(Dog::getWeight))
           .map(pug -> pug.getId())
           .collect(Collectors.toList());

Not bad!

 

Stream

To make a stream, you should open your netflix, no, I mean you should have a collection or an array or a bunch of data, in official java doc, it shows a couple of ways to make a stream:

  • From a Collection via the stream() and parallelStream() methods;
  • From an array via Arrays.stream(Object[]);
  • From static factory methods on the stream classes, such as Stream.of(Object[]), IntStream.range(int, int) or Stream.iterate(Object, UnaryOperator);
  • The lines of a file can be obtained from BufferedReader.lines();
  • Streams of file paths can be obtained from methods in Files;
  • Streams of random numbers can be obtained from Random.ints();
  • Numerous other stream-bearing methods in the JDK, including BitSet.stream(), Pattern.splitAsStream(java.lang.CharSequence), and JarFile.stream().

Une fois vous avez le stream, vous pouvez ajouter de l’operation, for operating, there are two kinds of operation, one is Intermediate, anther one is terminal.

I’d like to imagine that a stream likes a fountain pen, no, I mean a stream likes a fountain pump, stream can push the data out of the data structure, then we can use the intermediate operations to manipulate its flow without touch the origin data structure, of course, you can make a lot of intermediate continuously, once you have done everything, it’s the moment for the terminal operation’s playing: after a terminal operation, the stream will be closed, the fountain pump has done its job!

Intermediate operation

Intermediate operation Overview
filter filtering data flow by a given predicate
map changing the data’s flow, one data element to one out
flatmap changing the dada’s flow, one data element to multiple out
peek to make a side-effects
distinct show a data element only once
sorted sorting data by a given Comparator.comparing
skip skip a number of elements
limit limit a number of elements when they get out

Terminal operation

Terminal operation Overview
collect/toArray collection the flow
foreach/forEachOrdered to make a side-effects while each element passes
count count number of elements
reduce Higher order function, like fold in haskell
min/max reduce’s applying, find the min/max element
anyMatch allMatch noneMatch anyMath allMatch noneMatch… like their names
findFirst findAny like their names

Exemples

filter

IntStream.rangeClosed(1, 100)
           .filter(e -> ( (e & 0x0001) != 1))
           .forEach(System.out::println);  //it will show you all even number which is between 1 and 100

map

IntStream.rangeClosed(1, 100)
		.filter(e -> ( (e & 0x0001) != 1)) // if the element is even, let it passed
		 .map(e -> e << 1) // then make the value = value * 2
		 .forEach(System.out::println); // show the value

Performance

So how about the stream API’s performance?

My personnel experience is, the stream will be slower than for loop, but their performances can be very close. If your data were primitive types, then take for loop, it’s much faster than stream, If your data were Object, the stream’s performance is very close to for loop.

And the stream API is multiple-core friendly.

So, à vous de jouer.


package lambda;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;

public class Dog {

	// dog types
	private enum Type {
		PUG, BOXER, LABRADOR, HUSKY
	}

	final String name;
	final double weight;
	final Type type;

	private Dog(String name, double weight, Type type) {
		this.name = name;
		this.weight = weight;
		this.type = type;
	}

	public static Dog newPug(String name, double weight) {
		return new Dog(name, weight, Type.PUG);
	}

	public static Dog newBoxer(String name, double weight) {
		return new Dog(name, weight, Type.BOXER);
	}

	public static Dog newLabrador(String name, double weight) {
		return new Dog(name, weight, Type.LABRADOR);
	}

	public static Dog newHusky(String name, double weight) {
		return new Dog(name, weight, Type.HUSKY);
	}

	@Override
	public String toString() {
		return "Dog [name=" + name + ", weight=" + weight + ", type=" + type + "]";
	}

	@Override
	public int hashCode() {
		final int prime = 31;
		int result = 1;
		result = prime * result + ((name == null) ? 0 : name.hashCode());
		result = prime * result + ((type == null) ? 0 : type.hashCode());
		long temp;
		temp = Double.doubleToLongBits(weight);
		result = prime * result + (int) (temp ^ (temp >>> 32));
		return result;
	}

	@Override
	public boolean equals(Object obj) {
		if (this == obj)
			return true;
		if (obj == null)
			return false;
		if (getClass() != obj.getClass())
			return false;
		Dog other = (Dog) obj;
		if (name == null) {
			if (other.name != null)
				return false;
		} else if (!name.equals(other.name))
			return false;
		if (type != other.type)
			return false;
		if (Double.doubleToLongBits(weight) != Double.doubleToLongBits(other.weight))
			return false;
		return true;
	}


	public static void main(String[] args) {
		

		List<Dog> dogs = new ArrayList<>();

		dogs.add(Dog.newBoxer("boxer1", 15));
		dogs.add(Dog.newBoxer("boxer2", 16));
		dogs.add(Dog.newBoxer("boxer3", 13));
		dogs.add(Dog.newBoxer("boxer4", 14));
		dogs.add(Dog.newBoxer("boxer5", 15));

		dogs.add(Dog.newHusky("husky1", 18));
		dogs.add(Dog.newHusky("husky2", 19));
		dogs.add(Dog.newHusky("husky3", 17));
		dogs.add(Dog.newHusky("husky4", 18));
		dogs.add(Dog.newHusky("husky5", 20));

		dogs.add(Dog.newLabrador("labrador1", 17));
		dogs.add(Dog.newLabrador("labrador2", 18));
		dogs.add(Dog.newLabrador("labrador3", 19));
		dogs.add(Dog.newLabrador("labrador4", 20));
		dogs.add(Dog.newLabrador("labrador5", 17));

		dogs.add(Dog.newPug("pug1", 10));
		dogs.add(Dog.newPug("pug2", 11));
		dogs.add(Dog.newPug("pug3", 9));
		dogs.add(Dog.newPug("pug4", 11));
		dogs.add(Dog.newPug("pug5", 10));

		Collections.shuffle(dogs);

		System.out.println(dogs.size());
		
		

		/*
		 * Exemple 1
		 * 
		 * map/mapToDouble/forEach
		 */

		System.out.println("======Exemple1======");
		// size
		System.out.printf("count %d%n", dogs.stream().count());

		// max weight
		System.out.printf("max %f%n", dogs.stream().mapToDouble(dog -> dog.weight).max().orElse(-1));
		// min weight
		System.out.printf("min %f%n", dogs.stream().mapToDouble(dog -> dog.weight).min().orElse(-1));
		// average weight
		System.out.printf("average %f%n", dogs.stream().mapToDouble(dog -> dog.weight).average().orElse(-1));
		// ^^^^^ dogs.stream() makes a stream of objects, and mapToDouble means
		// I want convert objects to double,

		// if you want, you can, of cause use map in the place of mapToDouble,
		// the strength point of using mapToDouble is you can use .max() .min()
		// etc
		// by using map, you have to do it by yourself, there is an exemple
		System.out.printf("max by reduce %f%n", dogs.stream().map(dog -> {
			return dog.weight;
		}).reduce((x, y) -> {
			return x > y ? x : y;
		}).orElse(-1.0));

		dogs.stream().forEach(System.out::println);
		// same as
		// dogs.stream().forEach(dog -> System.out.println(dog));

		/*
		 * Exemple 2 filter/ collect
		 */

		System.out.println("======Exemple2======");
		// all pug in a list
		List<Dog> pugs = dogs.stream().filter(dog -> dog.type.equals(Dog.Type.PUG)).collect(Collectors.toList());

		// all lab in an ArrayList
		List<Dog> labs = dogs.stream().filter(dog -> dog.type.equals(Dog.Type.LABRADOR))
				.collect(Collectors.toCollection(ArrayList::new));

		// all boxer in a set
		Set<Dog> boxs = dogs.stream().filter(dog -> dog.type.equals(Dog.Type.BOXER)).collect(Collectors.toSet());

		// all husky in a map
		Map<String, Dog> husk = dogs.stream().filter(dog -> dog.type.equals(Dog.Type.HUSKY))
				.collect(Collectors.toMap(dog -> (dog.name), item -> item));

		System.out.println(pugs);
		System.out.println(labs);
		System.out.println(boxs);
		System.out.println(husk);

		/*
		 * Exemple 3 filter/collect/ sorted
		 */

		System.out.println("======Exemple3======");
		// I sorted them by name

		// all pug in a list
		pugs = dogs.stream().filter(dog -> dog.type.equals(Dog.Type.PUG))
				.sorted((x, y) -> x.name.compareToIgnoreCase(y.name)).collect(Collectors.toList());

		// all lab in an ArrayList
		labs = dogs.stream().filter(dog -> dog.type.equals(Dog.Type.LABRADOR))
				.sorted((x, y) -> x.name.compareToIgnoreCase(y.name)).collect(Collectors.toCollection(ArrayList::new));

		System.out.println(pugs);
		System.out.println(labs);

		
		// I sorted them by weight

		// all pug in a list
		pugs = dogs.stream().filter(dog -> dog.type.equals(Dog.Type.PUG))
				.sorted((x, y) -> Double.compare(x.weight, y.weight)).collect(Collectors.toList());

		// all lab in an ArrayList
		labs = dogs.stream().filter(dog -> dog.type.equals(Dog.Type.LABRADOR))
				.sorted((x, y) -> Double.compare(x.weight, y.weight)).collect(Collectors.toCollection(ArrayList::new));

		System.out.println(pugs);
		System.out.println(labs);
		
		/*
		 * Exemple 4
		 * flatMap
		 */
		
		//now I'm going to merge all these collections to a list
		
		
		
		List<Collection<Dog>> collections = Arrays.asList(pugs,labs, boxs, husk.values());
		
		//they are the same
		collections.stream().flatMap(collection -> collection.stream()).collect(Collectors.toList());
		collections.stream().flatMap(Collection::stream).collect(Collectors.toList());

		//now they are back to a list again
		System.out.println(collections.stream().flatMap(collection -> collection.stream()).collect(Collectors.toList()));
		System.out.printf("%d", collections.stream().flatMap(collection -> collection.stream()).collect(Collectors.toList()).size());
		//the different between map and flatMap, is map() returns only a value, flatMap can return 
		//a stream that has a lot of values, so in my exemple, flatMap(Stream::of) means
		//each collection became a stream, and collect will be applied in each element of those streams.
		
		
	}

}

source Code