EPiC MapReduce main class. More...
Inherits Configured, and Tool.
Public Member Functions | |
int | run (String[] args) throws Exception |
Run the job. | |
Static Public Member Functions | |
static void | main (String[] args) throws Exception |
Entry point of the class. | |
Static Package Functions | |
static DataInput | getInput (Path[] files, String name) |
Returns the data input interface for a file with specified patterns among given file paths. |
EPiC MapReduce main class.
Based on provided parameters via command line, an appropriate MapReduce class is executed. Currently, there are two implementation of counting job that can be called via Count:
Usage of Count via command-line:
hadoop jar <JARFILE> mapred.Count [options] <input> <output>
The options given to Hadoop must be prefixed by "-D". The following options are supported:
-Dparamfile=params.txt
-Drequest=request
Calling MapRedEpic to execute the query, which applies a variant version of the approach presented in the EPiC paper. Precisely, based on the provided request
, which contains the encrypted coefficients of the queried indicator polynomial, the Mappers evaluate the indicator polynomial for each record by multiplying the monomials with the coefficients before adding them together. In the last step at the Reducer, those results from Mappers (now considered as the value of the indicator polynomial evaluated for the corresponding subsets) are added together to obtain the final results and return to the user.
The approach presented in the EPiC paper is implemented in MapRedEpicReducerEvaluate, in which the Mappers compute the monomials without multiplying with the coefficients. At the final step, the Reducer adds those results from the Mappers together and then multiplies with the given coefficients to yield the final results.
An older approach of EPiC (see MapRedNotSendCoeff) is to keep the coefficients at the user side. The Mappers and Reducers only need to compute the monomials and add them together, then return to the user, who will be responsible to multiply the results with the precomputed coefficients to obtain the counting result. This, however, requires much more communication for downloading the results, therefore, is impractical.
-Dmapred=epic
static DataInput mapred.Count.getInput | ( | Path[] | files, |
String | name | ||
) | [inline, static, package] |
Returns the data input interface for a file with specified patterns among given file paths.
This method should be used inside the package only.
files | set of files. |
name | pattern that ends the files. |
static void mapred.Count.main | ( | String[] | args | ) | throws Exception [inline, static] |
Entry point of the class.
args | command-line arguments provided to the class. |
Exception | if errors occur. |
int mapred.Count.run | ( | String[] | args | ) | throws Exception [inline] |
Run the job.
args | argument list for the running job. |
Exception | if errors occur. |