Introduction of Indexes:
1> Provide high performance
2> Provide efficient execution to queries
3> Prevent collection scans
4> Create indexes to support common and user-facing queries
5> Store a small portion of collections data set
6> Stores the values ofspecific fields, ordered by the value
7> Can return sorted results directly from index
8> Can return results without scannning any documents
9> Are all B-tree indexes
Index Concepts:
1> Multiple index types
2> All stored as B-tree indexes
3> Values are stored in a sorted order- ascending or descending
4> Support different types of data and queries
Index Types:
1> Single field:
- Includes data from only a single field
- Fields must be at the top level and on fields in sub-documents
2> Compound:
- Includes more than one field
3> Multikey:
- References an array and records a match if a query matches any value in the array
4> Geospatial:
- Support location-based searches
- Data is sotred as either GeoJSON objects or coordinate pairs
5> Text:
- Supports searches on strings of data
6> Hashed:
- Maintain entries with hashes of values of the indexed field
Index Properties:
- Each index can have different properties
- Properties include:
- TTL
- Unique
- Sparse
TTL Indexes:
- Used to automatically remove documents after a certain time
- Good for event logs, session information, etc.
- Limitations:
- Compound indexes not supported
- Indexed field must be date
- If the field is an array with multiple data fields, the lowest is used for expiration
Unique Indexes:
- Separate documents containg duplicate values for the indexed field are rejected in a collection
- When used on a compound index, uniqueness is on the combination of values
- Only one document in a collection can have NULL stored as the indexed field
- Cannot be specified on a hashed index
Sparse Indexes:
- Contain entries for documents with an indexed field, even if it is NULL
- Is "Sparse" because not all documents in a collection are included
- Can have a sparse and unique index
Query Plans:
- Processes queries using the most efficient query plan
- Based on the available indexes
- Can specify the indexes used with Index Filters
Query Optimization: The query optimizer
- Execute the query using all available indexes in parallel
- Records matches and determine which index is the best for execution of the query
- The index becomes part of the query plan
- Query plan is cached for future executions of the same query
- Query optimizer re-evaluates the plan after certain events occur
Create simple index Demo:
db.test.find()
db.test.creatIndex({Title:1})
db.test.creatIndex({Title:1, Author:1})
Drop one index: db.test.dropIndex({Title:1})
Drop all indexes: db.test.dropIndexes()
Creating more complex indexes:
db.test.find()
db.test.createIndex({Title:1},{unique:true})
db.test.createIndex({Title:1, Author:1}, {unique:true})
db.collection.createIndex({Title:"text"})
db.test.createIndex({price:"hashed"})
Creating query plans:
db.test.find()
db.test.find({Author:"Poe"}).explain("executionState"), completes the command to specify the winning plan for query.
db.test.find({Title:"The Raven"}).explain("executionState")
db.test.getIndexes()
db.test.find({Author:"Poe", Title:"The Raven"}).explain("executionState")
db.test.find({Author:"Poe"}).explain("querPlanner")
db.test.find({Author:"Poe"}).explain("allPlansExecution")
MongoDB replication:
- previous redundancy
- Increase data availablity
- Replica set is a group of instances
- primary
- secondary
- optional arbiter
- Use asynchronous replication
- Support automatic failover
Replica Set Architectures
- Affects the capacity and capability
- Three-member replica set is common
- Provides redundancy and fault tolerance
- Complexity should be avoided
Strategies for Designing Replica Sets
- Determine the number of member required
- Deploy odd numbers
- Consider fault tolerance
- Support dedicated functions using hidden members
- Load balance
- Foresee added demand and adjust
Plannign Deployment
Replica Set High Availability
- Uses automatic failover
- Secondary becomes primary
- Usually does not require mannual intervention
- Replica set holds an election
Replica Set Elections
- Used to determain the number to become the primary
- Each time primary becomes unavailable
- Each time a replica set is initated
Rollbacks During Failover
- Revers writes on a former primary
- Required only if primary had accepted writes the secondary's had NOT successfully replicated
- Used to maintain database consistency
Read and Write Semantics
- Read operations are consistent with the last write operation
- User can configure read preferences
- May cause eventual consistency
Write Concern
- Write operations succeeed on all members
- Guarantees consistency of reads from secondary members
- Default only confirm write operations onto the primary
- Can override
MongoDB: Deploying a Replica Set
- Essentially you are starting mongod as a replica set, changing the current stand alone into a replica set.
- Mongod --port 27017 --dbpath /srv/mongodb/db0 --replSet re0
- Then initialize replica sets with rs.initiate()
- Verify the replica set
- rs.conf()
- Add members
- rs.add("<hostname><:port>"), All members of the replica must use the same port
- rs.add("mongodb1.example.net")
- rs.add("mongodb2.example.net")
- The new replica set will elect a primary
- A replica set can have a maximum of seven voting memebers. To add a member to a replica set that already has seven votes, you must either add the member as a non-voting member or remove a vote from an existing member.
- To verify()
- rs.status()
MongoDB: Managing the Replica Set Oplog
- The oplog (operations log) is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases
- MongoDB applies database operations on the primary and then records the operations on the primary's oplog.
- The secondary members then copy and apply these operations in an asynchronous process.
- All replica set members contain a copy of the oplog, in the local.oplog.rs collection.
- To facilitate replication, all replica set members send heartbeats to all other members.
- Any member can import oplog entries from other member
MongoDB: Replica Set Data Synchronization
- Initial sync copies all the data from one member of the replica set to member
- Clones all databases
- Applies all changes to the data set
- 2 methods
- Restart the mongod with an empty data directory and let MongDB's normal initial syncing feature restore the data. This is the more simple option but may take longer to replace the data.
- Restart the mechine with a copy of recent data directory from another member in the replica set. This procedure can replace the data more quickly but requires more manual steps.
Security Introduction
- Users should only have access to data they require
- MongoDB offers:
- Authentication mechanisms
- Role-based access control(authorization)
- Auditing
- Encryption
Security Checklist, endure to:
- Enable authentication and specify the mechanism
- Configure role-based access control - principle of least privilege
- Encrypt communication using SSL
- Limit network exposure
- Audit system activity
- Encrypt system activity
- Run processes using a dedicated user
- Use secure configuration options
Authentication Overview
- Verifies the identity of the user
- All clients are required to authenticate themselves
- Client users are created in specific databases
- User name and database create a unique user
- All user information stored in the admin database
Authentication Mechanisms
- Supports:
- Challenge and response(MONGODB - CR) - default:
- Verifies users using name, password, and database
- x.509 certificate:
- For use with a secure SSL connection
- Kerberos:
- Must use configure Kerberos deployment and add user principal to MongoDB
- Use authenticateMechanisms to specify the method
MongoDB: Enabling Authentication and Specifying an Authenication Mechanism
- mongod --auth --setParameter authenticationMechanisms=GSSAPI --service
MONGODB-CR
PLAIN
- Configure a Kerberos service principal for each mongod and mongos instance in your MongoDB deployment
- Generate ans distribute keytab files for each MongoDB component(i.e. mongod and mongos) in your deployment. Ensure that you only transmit keytab files over secure channels
- Optional. Start the mongod instance without auth and create users inside of MongoDB that you can use to bootstrap your deployment
- Start mongod and mongos with the KRB5_KTNAME environment variable as well as a number of required run time options
- If you did not create users Kerberos user accounts, you can use the localhost exception to create users at this point until you create the first user on the admin database
- Authenticate clients, including the mongo shell using Kerberos.
mongo --host mongoserver.xyz.com --authenticationMechanism=GSSAPI --authenticationDatabase=$external --username someuser@xyz.com
Enabling Client Access Control
- db.createUser({user:"OurNewAdmin", pwd:"password", roles:[{role:"userAdmin", db:"test"}]})
- mongod --auth --config C:/mongodb/mongod.config
Configuring Role-based Access Control
- db.getUsers()
- db.getUser("jdoe")
- db.removeUser("jdoe")
- db.dropUser("OurNewAdmin")
MongoDB: Configuring System Events Auditing
- mongod --dbpath C:/mongodb/data/db/ --auditDestination syslog
- mongod --dbpath data/db --auditDestination console
For debugging purpose, when you want to output rhe system event logs to the screen.
- mongod --dbpath data/db --auditDestination file --auditFormat JSON --auditPath data/db/
Configfuring Support for SSL
- Before you can use SSL, you must have a .pem file containing a public key certificate and its associate private key.
- Self Signed Cert on Unix/Linux /etc/ssl/ openssl req -newkey rsa:2048 -new -x509 -days 365 -nodes -out mongodb -cert.crt -keyout
- Create Pem file
- mongod --sslMode requireSSL --sslPEMKeyFile <pem>
- You may also specify these options in the configuration file, as in the following example:
- sslMode = requireSSL
- sslPEMKeyFile = /etc/ssl/mongodb.pem