设为首页 加入收藏

TOP

MongoDB千万级数据的分析(二)
2014-11-24 00:43:58 来源: 作者: 【 】 浏览:44
Tags:MongoDB 千万 数据 分析
SELFCTLB","SECDUPBD","SECDLWBD"
02
3070801,1963,1096,,"BE","",,1,,269,6,69,,1,,0,,,,,,,
03
3070802,1963,1096,,"US","TX",,1,,2,6,63,,0,,,,,,,,,
04
3070803,1963,1096,,"US",
05
"IL",,1,,2,6,63,,9,,0.3704,,,,,,,
06
3070804,1963,1096,,"US","OH",,1,,2,6,63,,3,,0.6667,,,,,,,
07
3070805,1963,1096,,"US","CA",,1,,2,6,63,,1,,0,,,,,,,
08
3070806,1963,1096,,"US","PA",,1,,2,6,63,,0,,,,,,,,,
09
3070807,1963,1096,,"US","OH",,1,,623,3,39,,3,,0.4444,,,,,,,
10
3070808,1963,1096,,"US","IA",,1,,623,3,39,,4,,0.375,,,,,,,
11
3070809,1963,1096,,,,1,,4,6,65,,0,,,,,,,,,
1
mongoimport -d TYK -c guest --type csv --file d:\text.csv --headerline
一共11行。第一行注释,9条数据。第3条中间截断,第9条取出中间两个数值"US","AZ"。按照csv规定现在应该是10条数据
结果:
01
> db.guest.find({}, {"PATENT" : 1, "_id" : 1})
02
{ "_id" : ObjectId("52692c2a0b082a1bbb727d86"), "PATENT" : 3070801 }
03
{ "_id" : ObjectId("52692c2a0b082a1bbb727d87"), "PATENT" : 3070802 }
04
{ "_id" : ObjectId("52692c2a0b082a1bbb727d88"), "PATENT" : 3070803 }
05
{ "_id" : ObjectId("52692c2a0b082a1bbb727d89"), "PATENT" : "IL" }
06
{ "_id" : ObjectId("52692c2a0b082a1bbb727d8a"), "PATENT" : 3070804 }
07
{ "_id" : ObjectId("52692c2a0b082a1bbb727d8b"), "PATENT" : 3070805 }
08
{ "_id" : ObjectId("52692c2a0b082a1bbb727d8c"), "PATENT" : 3070806 }
09
{ "_id" : ObjectId("52692c2a0b082a1bbb727d8d"), "PATENT" : 3070807 }
10
{ "_id" : ObjectId("52692c2a0b082a1bbb727d8e"), "PATENT" : 3070808 }
11
{ "_id" : ObjectId("52692c2a0b082a1bbb727d8f"), "PATENT" : 3070809 }
12
> db.guest.count()
13
10
14
>
刚好10条,可见此命令导入是不会过滤异常数据。
2.以UTF-8有BOM格式再试一次。实验数据同上
01
> db.guest.find({}, {"PATENT" : 1, "_id" : 1})
02
{ "_id" : ObjectId("52692d730b082a1bbb727d90"), "PATENT" : 3070801 }
03
{ "_id" : ObjectId("52692d730b082a1bbb727d91"), "PATENT" : 3070802 }
04
{ "_id" : ObjectId("52692d730b082a1bbb727d92"), "PATENT" : 3070803 }
05
{ "_id" : ObjectId("52692d730b082a1bbb727d93"), "PATENT" : "IL" }
06
{ "_id" : ObjectId("52692d730b082a1bbb727d94"), "PATENT" : 3070804 }
07
{ "_id" : ObjectId("52692d730b082a1bbb727d95"), "PATENT" : 3070805 }
08
{ "_id" : ObjectId("52692d730b082a1bbb727d96"), "PATENT" : 3070806 }
09
{ "_id" : ObjectId("52692d730b082a1bbb727d97"), "PATENT" : 3070807 }
10
{ "_id" : ObjectId("52692d730b082a1bbb727d98"), "PATENT" : 3070808 }
11
{ "_id" : ObjectId("52692d730b082a1bbb727d99"), "PATENT" : 3070809 }
12
> db.guest.count()
13
10
结果同上面一样,key"PATENT "中并没有因BOM引起的空格
3.mongoimport命令解释
1
mongoimport -d TYK -c guest --type csv --file d:\text.csv --headerline
2
3
-c 集合
4
--type 数据格式
5
--file 文件路径
6
--headerline 貌似指定这个后以第一行为key,另 -f 可以指定key “-f Name, age”
二、统计分析
1.根据性别统计
由于数据不规范,先查询一下有多少种方式来表示性别的
1
db.runCommand({"distinct" : "guestHouse", "key" : "Gender"})
01
{
02
"values" : [
03
"M",
04
"F",
05
"0",
06
" ",
07
"1",
08
"",
09
"19790522",
10
"#0449",
11
"#M",
12
"
13 "N"
14 ],
15 "stats" : {
16 "n" : 20048891,
17 "nscanned" : 20048891,
18 "nscannedObjects" : 20048891,
19 "timems" : 377764,
首页 上一页 1 2 3 下一页 尾页 2/3/3
】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
分享到: 
上一篇postgresql创建function使两个表.. 下一篇mongoDB第八讲:分片

评论

帐  号: 密码: (新用户注册)
验 证 码:
表  情:
内  容: