te String filename;
private String topic;
定义实际读取数据的类
@Override
public Class<? extends Task> taskClass() {
return FileStreamSourceTask.class;
}
在FileStreamSourceTask
下面定义该类。接下来,我们添加一些标准的生命周期方法,start()
和stop()
@Override
public void start(Map<String, String> props) {
// The complete version includes error handling as well.
filename = props.get(FILE_CONFIG);
topic = props.get(TOPIC_CONFIG);
}
@Override
public void stop() {
// Nothing to do since no background monitoring is required.
}
最后,实施的真正核心在于taskConfigs()
@Override
public List<Map<String, String>> taskConfigs(int maxTasks) {
ArrayList<Map<String, String>> configs = new ArrayList<>();
// Only one input stream makes sense.
Map<String, String> config = new HashMap<>();
if (filename != null)
config.put(FILE_CONFIG, filename);
config.put(TOPIC_CONFIG, topic);
configs.add(config);
return configs;
}
任务示例:
源任务
实现SourceTask
创建FileStreamSourceTask继承SourceTask
public class FileStreamSourceTask extends SourceTask {
String filename;
InputStream stream;
String topic;
@Override
public void start(Map<String, String> props) {
filename = props.get(FileStreamSourceConnector.FILE_CONFIG);
stream = openOrThrowError(filename);
topic = props.get(FileStreamSourceConnector.TOPIC_CONFIG);
}
@Override
public synchronized void stop() {
stream.close();
}
接下来,我们实现任务的主要功能,即poll()
从输入系统获取事件并返回以下内容的方法List
:
@Override
public List<SourceRecord> poll() throws InterruptedException {
try {
ArrayList<SourceRecord> records = new ArrayList<>();
while (streamValid(stream) && records.isEmpty()) {
LineAndOffset line = readToNextLine(stream);
if (line != null) {
Map<String, Object> sourcePartition = Collections.singletonMap("filename", filename);
Map<String, Object> sourceOffset = Collections.singletonMap("position", streamOffset);
records.add(new SourceRecord(sourcePartition, sourceOffset, topic, Schema.STRING_SCHEMA, line));
} else {
Thread.sleep(1);
}
}
return records;
} catch (IOException e) {
// Underlying stream was killed, probably as a result of calling stop. Allow to return
// null, and driving thread will handle any shutdown if necessary.
}
return null;
}
接收任务
不像SourceConnector
和SinkConnector
,SourceTask
并SinkTask
有非常不同的接口,因为SourceTask
采用的是拉接口,并SinkTask
使用推接口。两者共享公共生命周期方法,但SinkTask
完全不同:
public abstract class SinkTask implements Task {
public void initialize(SinkTaskContext context) {
this.context = context;
}
public abstract void put(Collection<SinkRecord> records);
public void flush(Map<TopicPartition, OffsetAndMetadata> currentOffsets) {
}
这是一个简单的例子,它们有简单的结构化数据 - 每一行只是一个字符串。几乎所有实用的连接器都需要具有更复杂数据格式的模式。要创建更复杂的数据,您需要使用Kafka Connect data
API。
Schema schema = SchemaBuilder.struct().name(NAME)
.field("name", Schema.STRING_SCHEMA)
.field("age", Schema.INT_SCHEMA)
.field("admin", new SchemaBuilder.boolean().defaultValue(false).build())
.build();
Struct struct = new Struct(schema)
.put("name", "Barbara Liskov")
.put("age", 75);
更多Kafka相关技术文章:
什么是Kafka?
Kafka监控工具汇总
Kafka快速入门
Kafka核心之Consumer
Kafka核心之Producer
更多实时计算,Flink,Kafka等相关技术博文,欢迎关注实时流式计算