Drill中实现HTTPstorageplugin - 数据库编程

y of string HttpScanSpec spec = new HttpScanSpec(tableName); // will be pass to getPhysicalScan return new DynamicDrillTable(plugin, schemaName, null, spec); }

这里的HttpScanSpec用于保存查询中的一些参数，例如这里保存了table name，也就是HTTP service的query，例如/e/api:search?q=avi&p=2。它会被传到AbstraceStoragePlugin.getPhysicalScan中的JSONOptions：

public AbstractGroupScan getPhysicalScan(String userName, JSONOptions selection) throws IOException {
        HttpScanSpec spec = selection.getListWith(new ObjectMapper(), new TypeReference
  
   () {});
        return new HttpGroupScan(userName, httpConfig, spec);
    }

HttpGroupScan后面会看到用处。

AbstractRecordReader

AbstractRecordReader负责真正地读取数据并返回给Drill。BatchCreator则是用于创建AbstractRecordReader。

public class HttpScanBatchCreator implements BatchCreator
  
    {

      @Override
      public CloseableRecordBatch getBatch(FragmentContext context,
          HttpSubScan config, List
   
     children) throws ExecutionSetupException { List
    
      readers = Lists.newArrayList(); readers.add(new HttpRecordReader(context, config)); return new ScanBatch(config, context, readers.iterator()); } }

既然AbstractRecordReader负责真正读取数据，那么它肯定是需要知道传给HTTP service的query的，但这个query最早是在HttpScanSpec中，然后传给了HttpGroupScan，所以马上会看到HttpGroupScan又把参数信息传给了HttpSubScan。

Drill也会自动扫描BatchCreator的实现类，所以这里就不用关心HttpScanBatchCreator的来历了。

HttpSubScan的实现比较简单，主要是用来存储HttpScanSpec的：

public class HttpSubScan extends AbstractBase implements SubScan // 需要实现SubScan

回到HttpGroupScan，必须实现的接口：

public SubScan getSpecificScan(int minorFragmentId) { // pass to HttpScanBatchCreator
        return new HttpSubScan(config, scanSpec); // 最终会被传递到HttpScanBatchCreator.getBatch接口
      }

最终query被传递到HttpRecordReader，该类需要实现的接口包括：setup和next，有点类似于迭代器。setup中查询出数据，然后next中转换数据给Drill。转换给Drill时可以使用到VectorContainerWriter和JsonReader。这里也就是Drill中传说的vector数据格式，也就是列存储数据。

总结

以上，就包含了plugin本身的创建，及查询中query的传递。查询中类似select titile, name 中的columns会被传递到HttpGroupScan.clone接口，只不过我这里并不关注。实现了这些，就可以通过Drill查询HTTP service中的数据了。

而select * from xx where xx中的where filter，Drill自己会对查询出来的数据做过滤。如果要像mongo plugin中构造mongodb的filter，则需要实现StoragePluginOptimizerRule。

我这里实现的HTTP storage plugin，本意是觉得传给HTTP service的query可能会动态构建，例如：

select name, length from http.`/e/api:search` where $p=2 and $q='avi' # p=2&q=avi 就是动态构建，其值可以来源于其他查询结果
select name, length from http.`/e/api:search?q=avi&p=2` where length > 0  # 这里就是静态的

第一条查询就需要借助StoragePluginOptimizerRule，它会收集所有where中的filter，最终作为HTTP serivce的query。但这里的实现还不完善。

总体而言，由于Drill项目相对较新，要进行扩展还是比较困难的。尤其是Plan优化部分。

Drill中实现HTTPstorageplugin(二)

AbstractRecordReader

总结