zoukankan      html  css  js  c++  java
  • 亿级别记录的mongodb分页查询java代码实现

    1.准备环境

      1.1 mongodb下载

      1.2 mongodb启动

         C:mongodbinmongod --dbpath D:mongodbdata

      1.3 可视化mongo工具Robo 3T下载

    2.准备数据

      

            <dependency>
                <groupId>org.mongodb</groupId>
                <artifactId>mongo-java-driver</artifactId>
                <version>3.6.1</version>
            </dependency>

    java代码执行

        public static void main(String[] args) {
    
            try {
    
                /**** Connect to MongoDB ****/
                // Since 2.10.0, uses MongoClient
                MongoClient mongo = new MongoClient("localhost", 27017);
    
                /**** Get database ****/
                // if database doesn't exists, MongoDB will create it for you
                DB db = mongo.getDB("www");
    
                /**** Get collection / table from 'testdb' ****/
                // if collection doesn't exists, MongoDB will create it for you
                DBCollection table = db.getCollection("person");
    
                /**** Insert ****/
                // create a document to store key and value
                BasicDBObject document=null;
                
                for(int i=0;i<100000000;i++) {
                    document = new BasicDBObject();
                    document.put("name", "mkyong"+i);
                    document.put("age", 30);
                    document.put("sex", "f");
                    table.insert(document);
                }
    
    
                /**** Done ****/
                System.out.println("Done");
    
            } catch (UnknownHostException e) {
                e.printStackTrace();
            } catch (MongoException e) {
                e.printStackTrace();
            }
    
        }

    3.分页查询

     传统的limit方式当数据量较大时查询缓慢,不太适用。考虑别的方式,参考了logstash-input-mongodb的思路:

      public
      def get_cursor_for_collection(mongodb, mongo_collection_name, last_id_object, batch_size)
        collection = mongodb.collection(mongo_collection_name)
        # Need to make this sort by date in object id then get the first of the series
        # db.events_20150320.find().limit(1).sort({ts:1})
        return collection.find({:_id => {:$gt => last_id_object}}).limit(batch_size)
      end
    
              collection_name = collection[:name]
              @logger.debug("collection_data is: #{@collection_data}")
              last_id = @collection_data[index][:last_id]
              #@logger.debug("last_id is #{last_id}", :index => index, :collection => collection_name)
              # get batch of events starting at the last_place if it is set
    
    
              last_id_object = last_id
              if since_type == 'id'
                last_id_object = BSON::ObjectId(last_id)
              elsif since_type == 'time'
                if last_id != ''
                  last_id_object = Time.at(last_id)
                end
              end
              cursor = get_cursor_for_collection(@mongodb, collection_name, last_id_object, batch_size)

    使用java实现

    import java.net.UnknownHostException;
    import java.util.List;
    
    import org.bson.types.ObjectId;
    
    import com.mongodb.BasicDBObject;
    import com.mongodb.DB;
    import com.mongodb.DBCollection;
    import com.mongodb.DBCursor;
    import com.mongodb.DBObject;
    import com.mongodb.MongoClient;
    import com.mongodb.MongoException;
    
    public class Test {
    
        public static void main(String[] args) {
            int pageSize=50000;
    
            try {
    
                /**** Connect to MongoDB ****/
                // Since 2.10.0, uses MongoClient
                MongoClient mongo = new MongoClient("localhost", 27017);
    
                /**** Get database ****/
                // if database doesn't exists, MongoDB will create it for you
                DB db = mongo.getDB("www");
    
                /**** Get collection / table from 'testdb' ****/
                // if collection doesn't exists, MongoDB will create it for you
                DBCollection table = db.getCollection("person");
                DBCursor dbObjects;            
                Long cnt=table.count();
                //System.out.println(table.getStats());
                Long page=getPageSize(cnt,pageSize);
                ObjectId lastIdObject=new ObjectId("5bda8f66ef2ed979bab041aa");
                
                for(Long i=0L;i<page;i++) {
                    Long start=System.currentTimeMillis();
                    dbObjects=getCursorForCollection(table, lastIdObject, pageSize);
                    System.out.println("第"+(i+1)+"次查询,耗时:"+(System.currentTimeMillis()-start)/1000+"秒");
                    List<DBObject> objs=dbObjects.toArray();
                    lastIdObject=(ObjectId) objs.get(objs.size()-1).get("_id");
                                    
                }            
    
            } catch (UnknownHostException e) {
                e.printStackTrace();
            } catch (MongoException e) {
                e.printStackTrace();
            }
    
        
        }
        
        public static DBCursor getCursorForCollection(DBCollection collection,ObjectId lastIdObject,int pageSize) {
            DBCursor dbObjects=null;
            if(lastIdObject==null) {
                lastIdObject=(ObjectId) collection.findOne().get("_id"); //TODO 排序sort取第一个,否则可能丢失数据
            }
            BasicDBObject query=new BasicDBObject();
            query.append("_id",new BasicDBObject("$gt",lastIdObject));
            BasicDBObject sort=new BasicDBObject();
            sort.append("_id",1);
            dbObjects=collection.find(query).limit(pageSize).sort(sort);
            return dbObjects;
        }
        
        public static Long getPageSize(Long cnt,int pageSize) {
            return cnt%pageSize==0?cnt/pageSize:cnt/pageSize+1;
        }
    
    }

    4.一些经验教训

      1. 不小心漏打了一个$符号,导致查询不到数据,浪费了一些时间去查找原因

    query.append("_id",new BasicDBObject("$gt",lastIdObject));
    2.创建索引
      创建普通的单列索引:db.collection.ensureIndex({field:1/-1});  1是升续 -1是降续
    实例:db.articles.ensureIndex({title:1}) //注意 field 不要加""双引号,否则创建不成功
      查看当前索引状态: db.collection.getIndexes();
      实例:
      db.articles.getIndexes();
      删除单个索引db.collection.dropIndex({filed:1/-1});

          3.执行计划

       db.student.find({"name":"dd1"}).explain()

     参考文献:

    【1】https://github.com/phutchins/logstash-input-mongodb/blob/master/lib/logstash/inputs/mongodb.rb

    【2】https://www.cnblogs.com/yxlblogs/p/4930308.html

    【3】https://docs.mongodb.com/manual/reference/method/db.collection.ensureIndex/

  • 相关阅读:
    强连通分量 Tarjan
    【二叉搜索树】hdu 3791
    【二叉树】hdu 1622 Trees on the level
    【二叉树】hdu 1710 Binary Tree Traversals
    【leetcode】lower_bound
    【leetcode dp】629. K Inverse Pairs Array
    【leetcode最短路】818. Race Car
    【leetcode 字符串】466. Count The Repetitions
    【leetcode dp】132. Palindrome Partitioning II
    【leetcode dp】Dungeon Game
  • 原文地址:https://www.cnblogs.com/davidwang456/p/9890377.html
Copyright © 2011-2022 走看看