ier name of column qualifier
* @param compareOp operator
* @param comparator Comparator to use.
*/
public SingleColumnValueFilter(final byte [] family, final byte [] qualifier,
final CompareOp compareOp, final ByteArrayComparable comparator) {
this.columnFamily = family;
this.columnQualifier = qualifier;
this.compareOp = compareOp;
this.comparator = comparator;
}
@Override
public ReturnCode filterKeyValue(Cell c) {
if (this.matchedColumn) {
// We already found and matched the single column, all keys now pass
return ReturnCode.INCLUDE;
} else if (this.latestVersionOnly && this.foundColumn) {
// We found but did not match the single column, skip to next row
return ReturnCode.NEXT_ROW;
}
if (!CellUtil.matchingColumn(c, this.columnFamily, this.columnQualifier)) {
return ReturnCode.INCLUDE;
}
foundColumn = true;
if (filterColumnValue(c.getValueArray(), c.getValueOffset(), c.getValueLength())) {
return this.latestVersionOnly? ReturnCode.NEXT_ROW: ReturnCode.INCLUDE;
}
this.matchedColumn = true;
return ReturnCode.INCLUDE;
}
private boolean filterColumnValue(final byte [] data, final int offset,
final int length) {
int compareResult = this.comparator.compareTo(data, offset, length);
switch (this.compareOp) {
case LESS:
return compareResult <= 0;
case LESS_OR_EQUAL:
return compareResult < 0;
case EQUAL:
return compareResult != 0;
case NOT_EQUAL:
return compareResult == 0;
case GREATER_OR_EQUAL:
return compareResult > 0;
case GREATER:
return compareResult >= 0;
default:
throw new RuntimeException("Unknown Compare op " + compareOp.name());
}
}
public boolean filterRow() {
// If column was found, return false if it was matched, true if it was not
// If column not found, return true if we filter if missing, false if not
return this.foundColumn? !this.matchedColumn: this.filterIfMissing;
}
}
在HBase中,对于每一行的每一列都会调用到filterKeyValue,SCVFilter的该方法处理逻辑如下:
1. 如果已经匹配过对应的列并且对应列的值符合要求,则直接返回INCLUE,表示这一行的这一列要被加入到结果集 2. 否则如latestVersionOnly为true(latestVersionOnly代表是否只查询最新的数据,一般为true),并且已经匹配过对应的列(但是对应的列的值不满足要求),则返回EXCLUDE,代表丢弃该行 3. 如果当前列不是要匹配的列。则返回INCLUDE,否则将matchedColumn置为true,代表以及找到了目标列 4. 如果当前列的值不满足要求,在latestVersionOnly为true时,返回NEXT_ROW,代表忽略当前行还剩下的列,直接跳到下一行 5. 如果当前列的值满足要求,将matchedColumn置为true,代表已经找到了对应的列,并且对应的列值满足要求。这样,该行下一列再进入这个方法时,到第1步就会直接返回,提高匹配效率
再看filterRow方法,该方法调用时机在filterKeyValue之后,对每一行只会调用一次。 SCVFilter中该方法逻辑很简单:
1. 如果找到了对应的列,如其值满足要求,则返回false,代表将该行加入到结果集,如其值不满足要求,则返回true,代表过滤该行 2. 如果没找到对应的列,返回filterIfMissing的值。
猜想:
是不是因为将PageFilter添加到SCVFilter的前面,当判断第一行的时候,调用PageFilter的filterRow,导致PageFilter的计数器+1,但是进行到SCVFilter的filterRow的时候,该行又被过滤掉了,在检验下一行时,因为PageFilter计数器已经达到了我们设定的pageSize,所以接下来的行都会被过滤掉,返回结果没有数据。
验证:
在FilterList中,先加入SCVFilter,再加入PageFilter
Scan scan = initScan(xxx);
FilterList filterList=new FilterList();
scan.setFilter(filterList);
filterList.addFilter(new SingleColumnValueFilter(FAMILY,ISDELETED, CompareFilter.CompareOp.EQUAL, Bytes.toBytes(false)));
filterList.addFilter(new PageFilter(1));
结果是我们期望的第2行的值。
结论
当要将PageFilter和其他Filter使用时,最好将PageFilter加入到FilterList的末尾,否则可能会出现结果个数小于你期望的数量。 (其实正常情况PageFilter返回的结果数量可能大于设定的值,因为服务器集群的P |