@InterfaceAudience.Public @InterfaceStability.Stable public class DBInputFormat<T extends DBWritable> extends InputFormat<LongWritable,T> implements Configurable
DBInputFormat emits LongWritables containing the record number as key and DBWritables as value. The SQL query, and input class can be using one of the two setInput methods.
| Modifier and Type | Field and Description |
|---|---|
protected String |
conditions |
protected Connection |
connection |
protected DBConfiguration |
dbConf |
protected String |
dbProductName |
protected String[] |
fieldNames |
protected String |
tableName |
| Constructor and Description |
|---|
DBInputFormat() |
| Modifier and Type | Method and Description |
|---|---|
protected void |
closeConnection() |
Connection |
createConnection() |
protected RecordReader<LongWritable,T> |
createDBRecordReader(org.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplit split,
Configuration conf) |
RecordReader<LongWritable,T> |
createRecordReader(InputSplit split,
TaskAttemptContext context)
Create a record reader for a given split.
|
Configuration |
getConf()
Return the configuration used by this object.
|
Connection |
getConnection() |
protected String |
getCountQuery()
Returns the query for getting the total number of rows,
subclasses can override this for custom behaviour.
|
DBConfiguration |
getDBConf() |
String |
getDBProductName() |
List<InputSplit> |
getSplits(JobContext job)
Logically split the set of input files for the job.
|
void |
setConf(Configuration conf)
Set the configuration to be used by this object.
|
static void |
setInput(Job job,
Class<? extends DBWritable> inputClass,
String inputQuery,
String inputCountQuery)
Initializes the map-part of the job with the appropriate input settings.
|
static void |
setInput(Job job,
Class<? extends DBWritable> inputClass,
String tableName,
String conditions,
String orderBy,
String... fieldNames)
Initializes the map-part of the job with the appropriate input settings.
|
protected String dbProductName
protected String conditions
protected Connection connection
protected String tableName
protected String[] fieldNames
protected DBConfiguration dbConf
public void setConf(Configuration conf)
setConf in interface Configurableconf - configuration to be usedpublic Configuration getConf()
ConfigurablegetConf in interface Configurablepublic DBConfiguration getDBConf()
public Connection getConnection()
public Connection createConnection()
public String getDBProductName()
protected RecordReader<LongWritable,T> createDBRecordReader(org.apache.hadoop.mapreduce.lib.db.DBInputFormat.DBInputSplit split, Configuration conf) throws IOException
IOExceptionpublic RecordReader<LongWritable,T> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException
RecordReader.initialize(InputSplit, TaskAttemptContext) before
the split is used.createRecordReader in class InputFormat<LongWritable,T extends DBWritable>split - the split to be readcontext - the information about the taskIOExceptionInterruptedExceptionpublic List<InputSplit> getSplits(JobContext job) throws IOException
Each InputSplit is then assigned to an individual Mapper
for processing.
Note: The split is a logical split of the inputs and the
input files are not physically split into chunks. For e.g. a split could
be <input-file-path, start, offset> tuple. The InputFormat
also creates the RecordReader to read the InputSplit.
getSplits in class InputFormat<LongWritable,T extends DBWritable>job - job configuration.InputSplits for the job.IOExceptionprotected String getCountQuery()
public static void setInput(Job job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String orderBy, String... fieldNames)
job - The map-reduce jobinputClass - the class object implementing DBWritable, which is the
Java object holding tuple fields.tableName - The table to read data fromconditions - The condition which to select data with,
eg. '(updated > 20070101 AND length > 0)'orderBy - the fieldNames in the orderBy clause.fieldNames - The field names in the tablesetInput(Job, Class, String, String)public static void setInput(Job job, Class<? extends DBWritable> inputClass, String inputQuery, String inputCountQuery)
job - The map-reduce jobinputClass - the class object implementing DBWritable, which is the
Java object holding tuple fields.inputQuery - the input query to select fields. Example :
"SELECT f1, f2, f3 FROM Mytable ORDER BY f1"inputCountQuery - the input query that returns
the number of records in the table.
Example : "SELECT COUNT(f1) FROM Mytable"setInput(Job, Class, String, String, String, String...)protected void closeConnection()
Copyright © 2023 Apache Software Foundation. All rights reserved.