Obtain the MongoDB Hadoop Connector. You can either build it or download the jars. For Hive, you'll need the "core" jar and the "hive" jar.
Get a JAR for the MongoDB Java Driver. The connector requires at least version 3.0.0 of the driver "uber" jar (called "mongo-java-driver.jar").
In your Hive script, use ADD JAR commands to include these JARs (core, hive, and the Java driver), e.g., ADD JAR /path-to/mongo-hadoop-hive-<version>.jar;.
Requirements
Supported Hadoop and Hive versions
As of August 2013, only Hive versions <= 0.10 are stable. Mongo-Hadoop currently supports Hive versions >= 0.9. Some classes and functions are deprecated in Hive 0.11, but they’re still functional.
Hadoop versions greater than 0.20.x are supported. CDH4 is supported, but CDH3 with its native Hive 0.7 is not. However, CDH3 is compatible with newer versions of Hive. Installing a non-native version with CDH3 can be used with Mongo-Hadoop.
1.版本一定要按它要求的来,jar包去http://mvnrepository.com/下载就可以了,使用Hive只需要三个:
mongo-hadoop-core-1.5.1.jar
mongo-hadoop-hive-1.5.1.jar
mongo-java-driver-3.2.1.jar
2.将jar包拷到 HADOOPHOME/lib与<script id="MathJax-Element-3" type="math/tex">{HADOOP_HOME}/lib与{HIVE_HOME}/lib下,然后启动Hive,加入jar包
[hadoop@DEV21 ~]$ hive
Logging initialized using configuration in jar:file:/home/hadoop/opt/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> add jar /home/hadoop/opt/hive/lib/mongo-hadoop-core-1.5.1.jar;#三个都加,我这就不写了。
SELECT id, fid
FROM temp.ldc_test_mongo LATERAL VIEW explode(fav_id) favids AS fid;-- 访问struct结构数据 select id, info.github from temp.ldc_test_mongo
//根据不同的数据类型进行反序列操作,复杂类型在内容做element的循环,最终调用的都是对原子类型的操作.public Object deserializeField(final Object value, final TypeInfo valueTypeInfo, final String ext) {
if (value != null) {
switch (valueTypeInfo.getCategory()) {
case LIST:
return deserializeList(value, (ListTypeInfo) valueTypeInfo, ext);
case MAP:
return deserializeMap(value, (MapTypeInfo) valueTypeInfo, ext);
case PRIMITIVE:
return deserializePrimitive(value, (PrimitiveTypeInfo) valueTypeInfo);
case STRUCT:
// Supports both struct and map, but should use struct return deserializeStruct(value, (StructTypeInfo) valueTypeInfo, ext);
case UNION:
// Mongo also has no union
LOG.warn("BSONSerDe does not support unions.");
returnnull;
default:
// Must be an unknown (a Mongo specific type)return deserializeMongoType(value);
}
}
returnnull;
}
// 转为java的原子类型存储.private Object deserializePrimitive(final Object value, final PrimitiveTypeInfo valueTypeInfo) {
switch (valueTypeInfo.getPrimitiveCategory()) {
case BINARY:
returnvalue;
case BOOLEAN:
returnvalue;
case DOUBLE:
return ((Number) value).doubleva lue();
case FLOAT:
return ((Number) value).floatValue();
case INT:
return ((Number) value).intValue();
case LONG:
return ((Number) value).longValue();
case SHORT:
return ((Number) value).shortValue();
case STRING:
returnvalue.toString();
case TIMESTAMP:
if (value instanceof Date) {
returnnew Timestamp(((Date) value).getTime());
} elseif (value instanceof BSONTimestamp) {
returnnew Timestamp(((BSONTimestamp) value).getTime() * 1000L);
} elseif (value instanceof String) {
return Timestamp.valueOf((String) value);
} else {
returnvalue;
}
default:
return deserializeMongoType(value);
}
}