Skip to content
gzoli edited this page May 2, 2014 · 3 revisions

This page describes the way the we do testing of MR jobs using R.

MRRun

Each instance of an MR Job is called MRRun. An MRRun has it's startTime, finishTime, tasks, mappers, reducers, etc.. All the data that we could get out of HistoryServer regards a job can be found in MRRun so that we can analyze it later. The data is retrieved from the HistoryServer using R functions (see this). The data is than stored in a .RData file, so that it can be processed when the HistoryServer or the job is not available anymore. An MRRun has its name (Run1, Run2,..) and it also stores the original jobId just to have the connection with the data stored in HistoryServer.

MRTest

MRTest is a collection of MRRuns. We can created statistics based on more instances of MR job runs. One MRRun can be part of more MRTests. MRTest has a name as well, like Test1, Test2, ... The actual tests can be found here.

Clone this wiki locally