-
Notifications
You must be signed in to change notification settings - Fork 5
MRRuns and MRTests
This page describes the way the we do testing of MR jobs using R.
Each instance of an MR Job is called MRRun. An MRRun has it's startTime, finishTime, tasks, mappers, reducers, etc.. All the data that we could get out of HistoryServer regards a job can be found in MRRun so that we can analyze it later. The data is retrieved from the HistoryServer using R functions (see this). The data is than stored in a .RData file, so that it can be processed when the HistoryServer or the job is not available anymore. An MRRun has its name (Run1, Run2,..) and it also stores the original jobId just to have the connection with the data stored in HistoryServer.
MRTest is a collection of MRRuns. We can created statistics based on more instances of MR job runs. One MRRun can be part of more MRTests. MRTest has a name as well, like Test1, Test2, ... The actual tests can be found here.