当前位置:   article > 正文

mongodb分页java写的_亿级别记录的mongodb分页查询java代码实现

mongodb 多线程分页查询千万数据

1.准备环境

1.1 mongodb下载

1.2 mongodb启动

C:\mongodb\bin\mongod --dbpath D:\mongodb\data

1.3 可视化mongo工具Robo 3T下载

2.准备数据

org.mongodb

mongo-java-driver

3.6.1

java代码执行

public static voidmain(String[] args) {try{/**** Connect to MongoDB ****/

//Since 2.10.0, uses MongoClient

MongoClient mongo = new MongoClient("localhost", 27017);/**** Get database ****/

//if database doesn't exists, MongoDB will create it for you

DB db = mongo.getDB("www");/**** Get collection / table from 'testdb' ****/

//if collection doesn't exists, MongoDB will create it for you

DBCollection table = db.getCollection("person");/**** Insert ****/

//create a document to store key and value

BasicDBObject document=null;for(int i=0;i<100000000;i++) {

document= newBasicDBObject();

document.put("name", "mkyong"+i);

document.put("age", 30);

document.put("sex", "f");

table.insert(document);

}/**** Done ****/System.out.println("Done");

}catch(UnknownHostException e) {

e.printStackTrace();

}catch(MongoException e) {

e.printStackTrace();

}

}

3.分页查询

传统的limit方式当数据量较大时查询缓慢,不太适用。考虑别的方式,参考了logstash-input-mongodb的思路:

publicdefget_cursor_for_collection(mongodb, mongo_collection_name, last_id_object, batch_size)

collection=mongodb.collection(mongo_collection_name)#Need to make this sort by date in object id then get the first of the series

#db.events_20150320.find().limit(1).sort({ts:1})

return collection.find({:_id => {:$gt =>last_id_object}}).limit(batch_size)

end

collection_name=collection[:name]

@logger.debug("collection_data is: #{@collection_data}")

last_id=@collection_data[index][:last_id]#@logger.debug("last_id is #{last_id}", :index => index, :collection => collection_name)

#get batch of events starting at the last_place if it is set

last_id_object=last_idif since_type == 'id'last_id_object=BSON::ObjectId(last_id)

elsif since_type== 'time'

if last_id != ''last_id_object=Time.at(last_id)

end

end

cursor= get_cursor_for_collection(@mongodb, collection_name, last_id_object, batch_size)

使用java实现

importjava.net.UnknownHostException;importjava.util.List;importorg.bson.types.ObjectId;importcom.mongodb.BasicDBObject;importcom.mongodb.DB;importcom.mongodb.DBCollection;importcom.mongodb.DBCursor;importcom.mongodb.DBObject;importcom.mongodb.MongoClient;importcom.mongodb.MongoException;public classTest {public static voidmain(String[] args) {int pageSize=50000;try{/**** Connect to MongoDB ****/

//Since 2.10.0, uses MongoClient

MongoClient mongo = new MongoClient("localhost", 27017);/**** Get database ****/

//if database doesn't exists, MongoDB will create it for you

DB db = mongo.getDB("www");/**** Get collection / table from 'testdb' ****/

//if collection doesn't exists, MongoDB will create it for you

DBCollection table = db.getCollection("person");

DBCursor dbObjects;

Long cnt=table.count();//System.out.println(table.getStats());

Long page=getPageSize(cnt,pageSize);

ObjectId lastIdObject=new ObjectId("5bda8f66ef2ed979bab041aa");for(Long i=0L;i

Long start=System.currentTimeMillis();

dbObjects=getCursorForCollection(table, lastIdObject, pageSize);

System.out.println("第"+(i+1)+"次查询,耗时:"+(System.currentTimeMillis()-start)/1000+"秒");

List objs=dbObjects.toArray();

lastIdObject=(ObjectId) objs.get(objs.size()-1).get("_id");

}

}catch(UnknownHostException e) {

e.printStackTrace();

}catch(MongoException e) {

e.printStackTrace();

}

}public static DBCursor getCursorForCollection(DBCollection collection,ObjectId lastIdObject,intpageSize) {

DBCursor dbObjects=null;if(lastIdObject==null) {

lastIdObject=(ObjectId) collection.findOne().get("_id"); //TODO 排序sort取第一个,否则可能丢失数据

}

BasicDBObject query=newBasicDBObject();

query.append("_id",new BasicDBObject("$gt",lastIdObject));

BasicDBObject sort=newBasicDBObject();

sort.append("_id",1);

dbObjects=collection.find(query).limit(pageSize).sort(sort);returndbObjects;

}public static Long getPageSize(Long cnt,intpageSize) {return cnt%pageSize==0?cnt/pageSize:cnt/pageSize+1;

}

}

4.一些经验教训

1. 不小心漏打了一个$符号,导致查询不到数据,浪费了一些时间去查找原因

query.append("_id",new BasicDBObject("$gt",lastIdObject)); 2.创建索引

创建普通的单列索引:db.collection.ensureIndex({field:1/-1});  1是升续 -1是降续

实例:db.articles.ensureIndex({title:1}) //注意 field 不要加""双引号,否则创建不成功

查看当前索引状态: db.collection.getIndexes();

实例:

db.articles.getIndexes();

删除单个索引db.collection.dropIndex({filed:1/-1});

3.执行计划

db.student.find({"name":"dd1"}).explain()

fc20c9ca87a1443514fe836550ec3296.png

参考文献:

【1】https://github.com/phutchins/logstash-input-mongodb/blob/master/lib/logstash/inputs/mongodb.rb

【2】https://www.cnblogs.com/yxlblogs/p/4930308.html

【3】https://docs.mongodb.com/manual/reference/method/db.collection.ensureIndex/

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/article/detail/56045
推荐阅读
相关标签
  

闽ICP备14008679号