-
Notifications
You must be signed in to change notification settings - Fork 5
Test3
This is the page of Test3.
The goal of this test is to see how stable the elapsed times are for job that has only mappers and they also produce mapper outputs (that is written to hdfs). We check if a pause between each run influences the elapsed times of the runs or not.
##Setup Input data: 8 GB data in 8 text files. 1 GB each. The block size is 256 MB. Each file contains 16 bytes long rows. That is 16777216 rows (records). TextInputFormat is used for the mappers
Mapper: The mapper sends each input records to the output. For each input record one output records is created.
Reducers: There are no reducers.
Test and runs: 32 map tasks were running in one mrrun that is less the available tasks slots (19*4).
10 mrruns (run21...run30) where used in mrtest (test3). The delay between the start of current and finish of previous run is 10 minutes.
##Observations
We can see from the picture below that the elapsed times are higher for the case when we need to write data to hdfs.
It seems that elapsed times don't have peeks since we have removed node06 from the cluster (see pictures of Test1). But the variance in elapsed times are bigger in Test3 than in Test1. Still the variance is roughly the same as in Test2, so the delay between the runs does not affect the results.
Here we can see the mean elapsed times and standard deviation (also min and max) for the mrruns.
Here we have something about mbytes processes per sec for each run.
And heres is the number of input records processed for each run by per second.
We can see that is much less than in test1.