Tuesday, 24 September 2024

MongoDB

https://www.w3schools.com/mongodb/mongodb_aggregations_out.php
https://www.geeksforgeeks.org/mongodb-regex/
https://www.mongodb.com/docs/manual/

MongoDB is a document-oriented NoSQL database system that provides high scalability, flexibility, and performance.
Unlike standard relational databases, MongoDB stores data in a JSON document structure form.
mongoDB is key value pair, documents , graph
to improve performance we can add new server in no sql databases
but in regular ones we need to improve hardware..

Relational Database Management System(RDBMS) is not the correct choice when it comes to handling big data by the virtue of their design since they are not horizontally scalable.
If the database runs on a single server, then it will reach a scaling limit.
NoSQL databases are more scalable and provide superior performance.
What’s called Table in RDBMS is called a Collection in MongoDB. Similarly, a Row is called a Document and a Column is called a Field.

MongoDB is such a NoSQL database that scales by adding more and more servers and increases productivity with its flexible document model.
load balancing to manage high traffic
regular expresion based queries
We use a 1 to include a field and 0 to exclude a field.
The sort order can be chosen by using 1 or -1. 1 is ascending and -1 is descending.
Using $match early in the pipeline can improve performance since it limits the number of documents the next stages must process.

$lookup
This aggregation stage performs a left outer join to a collection in the same database.
There are four required fields:
from: The collection to use for lookup in the same database
localField: The field in the primary collection that can be used as a unique identifier in the from collection.
foreignField: The field in the from collection that can be used as a unique identifier in the primary collection.
as: The name of the new field that will contain the matching documents from the from collection.

cursor
var mycursor = db.student.find({studentId:{$gt:1}});
> while(mycursor.hasNext()){
... print(tojson(mycursor.next()));
... }

toArray
var mycursor = db.student.find().pretty()
var docs = mycursor.toArray()
var resultdoc = docs[0]
resultdoc

i: To match both lower case and upper case pattern in the string.

use single quote

first command -- mongo in terminal

use db_name
--creates db

You can create a collection using the createCollection() database method.
db.posts.insertOne({"title": "Post 1"}
This will create the "posts" collection if it does not already exist.

db.posts.updateOne( { title: "Post Title 1" }, { $set: { likes: 2 } } )

----

MongoDB
++++++++

According to International Data Corporation around 2.4 Quintillion bytes of data is generated everyday and more than 90% of data we have today is from last two years only.

Data is generated from mobile devices , social networks and applications.

Big Data
++++++++++

Data volume - zettabytes of data (1024 to power of 8) generated daily
Data velocity - speed of generation , storage and processing is very fast
Data variety - Structured (RDBMS)
Semi-structured (XML,HTML)
Unstructured data (docs , social)

NO SQL DBs
++++++++++++++++

Non - relational - document oriented / column oriented / graph based etc

Distributed and Horizontally scalable - more clusters with low cost hardware

Open Source - wide community support

RDBMS - ACID

NOSQL - CAP

C - Consistency - all nodes of cluster return same data
A - Availability - reads/writes are never declined
P - Partition Tolerance - works across network partitions

At most , only two of CAP can be satisfied.

CP - MongoDB
AP - Cassandra
CA - MySQL

MongoDB
++++++++
Created by 10gen , written in C++.

Key Features
++++++++++++++++
High Performance
Rich Query Language
High Availability
Horizontally Scalable
Support of multiple storage engines

Installation
++++++++++++++

7.0 +

UI : VSCode
Azure Setup

Terminal
- Internal Certification

Common Commands
+++++++++++++++++

1. db - Shows the current DB. test db is the default DB.
2. show dbs - Shows the list of DBs under the server
3. use <> - use EmpDB - Create the DB or switch to that DB
4. db.createCollection("employees") - Create a table in the current DB
5. show collections/show tables -- show the tables in current db

Drop
+++++
1. db.employees.drop() - drop the table
2. db.dropDatabase() - drop the DB

CRUD Operations
++++++++++++++++

1.insert - can insert one or more records - not recommended
2.insertOne - one record only
3.insertMany - multiple records

db.employees.insertOne/insertMany/insert - it will create the table if not created using createCollection and then also insert the record

use EmpDB

db.employees.insertOne({
empId:1,
empName : "test",
dept : "ETA",
salary : 30000
})

{
"acknowledged" : true,
"insertedId" : ObjectId("66eadbc9abec11642808ce38")
}

insertedId - _id - primary key

db.employees.insertOne({
"empId":2,
"empName" : "Lakshmi",
dept : "ETA",
salary : 30000
})

db.employees.find() - See the data / read the data

db.employees.insertOne({
_id: 1001,
"empId":2,
"empName" : "Lakshmi",
dept : "ETA",
salary : 30000
})
{ "acknowledged" : true, "insertedId" : 1001 }

db.employees.insertMany([{
_id: 1002,
"empId":3,
"empName" : "John Doe",
dept : "ETA",
salary : 30000
},{
_id: 1003,
"empId":4,
"empName" : "Peter",
dept : "ETA",
salary : 30000
}])
{ "acknowledged" : true, "insertedIds" : [ 1002, 1003 ] }

Insert data from JSON file

db.employees.find().pretty()

Read Operations
++++++++++++++++
--where cond
db.employees.find({filter cond},{select cond})

db.employees.find({department:"ETA"})

--select cond
db.employees.find({department:"ETA"},{empName:1,department:1})
1 - show the col
0 - hide the col

--no _id
db.employees.find({department:"ETA"},{empName:1,department:1,_id:0})

You can have all 0's or all 1's except for _id field.You cannot have a combination of 1s and 0s.
db.employees.find({department:"ETA"},{empName:1,department:0,_id:0})
db.employees.find({department:"ETA"},{empName:0,department:0,_id:1})

findOne()
++++++++++
db.employees.findOne({department:"ETA"})

Read Operators
++++++++++++++++

$gt,$lt,$gte,$lte,$in,$nin,$eq,$ne etc

---all employees who have salary > 40000
db.employees.find({salary:{$gt:40000}},{empName:1,salary:1,_id:0})

--all emp whose dept is not ETA
db.employees.find({department:{$ne:"ETA"}},{empName:1,department:1,_id:0})

--all emp whose dept is ETA or IVS
db.employees.find({department:{$in:["ETA","IVS"]}},{empName:1,department:1,_id:0})

--all emp not in dept ETA or IVS but having a dept
db.employees.find({department:{$nin:["ETA","IVS"],$exists:true}},{empName:1,department:1,_id:0})

----------------------------------------
----------------------------------------
----------------------------------------

$size - Array Operator - No. of elem within the array
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
db.employees.find({experience:{$size:2}},{empName:1,experience:1,_id:0})

$regex - Regular Expression like in JS
+++++++

--Fetch all emp whose name starts with J
db.employees.find({empName:{$regex:"^J+"}},{empName:1,_id:0})

--Case insensitive search $options : i
+++++++++++++++++++++++++++++++++++++++
db.employees.find({empName:{$regex:"^J+",$options:"i"}},{empName:1,_id:0})

--Fetch all empName start with J or E
$in , $ regex ---cannot be used together
So, write regex using //

db.employees.find({empName:{$in:[/^J+/i,/^E+/i]}},{empName:1,_id:0})

--$and or $or
+++++++++++++++
--Fetch emp who work for ETA or IVS and salary > 30000

db.employees.find({$and:[{
department:{$in:["ETA","IVS"]}
},{
salary:{$gt:30000}
}]},{empName:1,department:1,salary:1,_id:0})

--Fetch emp who work for ETA or IVS or salary > 30000
db.employees.find({$or:[{
department:{$in:["ETA","IVS"]}
},{
salary:{$gt:30000}
}]},{empName:1,department:1,salary:1,_id:0})

$not $gt <> $lte
++++++++++++++++++

--Fetch those emp whose machineid is not greator than 1002
db.employees.find({machineId:{$not:{$gt:1002}}},{empName:1,machineId:1,_id:0})

db.employees.find({machineId:{$exists:true,$not:{$gt:1002}}},{empName:1,machineId:1,_id:0})

db.employees.find({machineId:{$lte:1002}},{empName:1,machineId:1,_id:0})

$elemMatch(Story)
++++++++++

---role : TL

db.employees.find({"experience.role":"Technology Lead"},{empName:1,experience:1,_id:0})

--role:TL exp > 1 year
$and , $gt

db.employees.find({$and:[{},{}]},{})

db.employees.find({$and:[{
"experience.role":"Technology Lead"
},{
"experience.roleExperience":{$gt:1.0}
}]},{empName:1,experience:1,_id:0})//not expected output

$elemMatch
//right output
db.employees.find({
experience:{$elemMatch:{role:"Technology Lead",roleExperience:{$gt:1.0}}}
},{empName:1,experience:1,_id:0})

db.employees.find({
experience:{$elemMatch:{role:"Technology Lead",roleExperience:{$gt:1.0}}}
},{empName:1,"experience.role":1,_id:0})

$all is like $and but for array elemMatch (Check this out in Mongo Notes)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

--Update Queries
++++++++++++++++++++++++++++++++

Read Query - colname :{operator : value}
Update Query - operator :{colname : value}

1.update - not recommended . By default it acts like updateOne and with
{multi:true} it acts like updateMany.
2.updateOne
3.updateMany

--Update sal of Chris to 40000
db.employees.updateOne({where cond},{update cond})

db.employees.updateOne({empId:14},{$set:{salary:40000}})
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }

db.employees.updateMany({department:"ETA"},{$set:{salary:42000}})

db.employees.update({department:"ETA"},{$set:{salary:44000}})

db.employees.update({department:"ETA"},{$set:{salary:46000}},{multi:true})

Update Operators
+++++++++++++++++

$inc , $mul , $max , $min etc

--Increase the sal of ETA emp by 2000 rs
db.employees.updateMany({department:"ETA"},{$inc:{salary:2000}})

--Decrease the sal of ETA emp by 2000 rs
db.employees.updateMany({department:"ETA"},{$inc:{salary:-2000}})

--Increase by 10 percent
$mul
db.employees.updateMany({department:"ETA"},{$mul:{salary:1.1}})

//Little confusing

--Each emp must have a min salary of 30000 or more
++++++++++++++++++++++++++++++++++++++++++++++++++++++
$max

db.employees.updateMany({},{$max :{salary:30000}})

--Each emp must have a max salary of 50000 or less
++++++++++++++++++++++++++++++++++++++++++++++++++++++
$min

db.employees.updateMany({},{$min :{salary:50000}})

$unset
+++++++

--Delete department and machine Id for emp id 14
db.employees.updateOne({empId:14},{$unset:{department:"",machineId:""}})

db.employees.updateOne({},{$unset:{department:"",machineId:""}})

--Array Update Operations
+++++++++++++++++++++++++++++++++++++++++++

--Adding a new value to existing array $push

db.employees.updateOne({empId:14},{$push:{experience:{
role:"Senior Systems Engineer",roleExperience:0.1}}})

--Add again , duplicate created
db.employees.updateOne({empId:14},{$push:{experience:{
role:"Senior Systems Engineer",roleExperience:0.1}}})

--To avoid duplicate use $addToSet
db.employees.updateOne({empId:14},{$addToSet:{experience:{
role:"Senior Systems Engineer",roleExperience:0.1}}})

--Add two array values
++++++++++++++++++++++++++++++++
$push - It will create a sub array
db.employees.updateOne({empId:14},{
$push:{experience:[{role:"Technology Analyst",roleExperience:0.1},
{role:"Technology Lead",roleExperience:0.1}]
}})

To avoid that $each
Use with the $push operator to append multiple values to an array .
db.employees.updateOne({empId:14},{
$push:{experience:{$each:[{role:"Technology Analyst",roleExperience:0.1},
{role:"Technology Lead",roleExperience:0.1}]
}}})

--Add at a specific position
$position
db.employees.updateOne({empId:14},{
$push:{experience:{$each:[{role :"Systems Engineer Trainee",roleExperience:0.6}],$position:0}}}
)

$- update only the role or roleExp
db.employees.updateOne({$and:[{empId:14},
{"experience.role":"Systems Engineer Trainee"}]},{
$set:{"experience.$.roleExperience":0.9}})

---------

Update Queries [contd]
++++++++++++++++++++++++++
$pop - Remove first or last ele from the array
-1 - First element
1 - Last Element

db.employees.updateOne({empId:14},{$pop:{experience:1}})

db.employees.updateOne({empId:14},{$pop:{experience:-1}})

$pull - Remove ele from array by matching condition

db.employees.updateOne({empId:14},{$pull:{experience:{role:"Senior Systems Engineer",roleExperience:0.1}}})

$pullAll - Remove multiple matching elements(multiple conditions)
---Refer to notes

Delete Commands
++++++++++++++++

1. deleteOne
2. deleteMany
3. remove - By default acts like deleteMany but with {justOne:true} it acts like deleteOne.

db.employees.deleteOne({empId:14})

db.employees.deleteMany({department:"ETA"})

db.employees.remove({department:"DX"},{justOne:true})

Aggregations
+++++++++++++

count , sum , avg , min , max etc

Aggregation Pipeline - the output of one stage is the input to the next stage
(Chaining)

1.countDocuments
+++++++++++++++++++++++++
--Count the no. of rows in the table
db.employees.countDocuments({})

db.employees.countDocuments({department :"ETA"})

db.employees.countDocuments({department :"DX"})

db.employees.countDocuments({},{skip:1,limit:2})

db.employees.countDocuments({},{skip:13,limit:2})

2. distinct
++++++++++++++++
db.employees.distinct("department")

db.employees.distinct("experience.role")

3.$match and $project
++++++++++++++++++++++++++++++++

$match - where cond of find()
$project - select cond of find()

--Fetch all emp who belong to any department
db.employees.aggregate([{stage1},{stage2}......])//basic syntax

db.employees.aggregate([{$match :{department:{$exists:true}}},
{$project:{empName:1,department:1,_id:0}},{$sort:{empName:1}}])

Q : sort for lowercase also

$group
++++++++

db.employees.aggregate([{$group:{_id:"$department"}}])

db.employees.aggregate([{$match:{department:{$exists:true}}},{$group:{_id:"$department"}}])

db.employees.aggregate([{$group:{_id:"$experience.role"}}])

db.employees.aggregate([{$group:{_id:{dept:"$department",sal:"$salary"}}}])

$count
++++++

db.employees.aggregate([{$count:"noofemp"}])

db.employees.aggregate([{$group:{_id:"$department"}},{$count:"noofdepts"}])

$sort,$limit,$out,$skip
++++++++++++++++++++++++++++++
db.employees.aggregate([{$sort:{salary:-1}},{$project:{empName:1,salary:1,_id:0}}])

db.employees.aggregate([{$sort:{salary:-1}},{$limit:3},{$project:{empName:1,salary:1,_id:0}}])

db.employees.aggregate([{$sort:{salary:-1}},{$limit:3},{$project:{empName:1,salary:1,_id:0}},{$out:"Top3Emp"}])

show collections/show tables

db.Top3Emp.find()
Q : $out can take anything apart from table
Only 3rd highest paid emp
db.employees.aggregate([{$sort:{salary:-1}},{$limit:3},{$skip:2},{$project:{empName:1,salary:1,_id:0}}])

Q : is there any other way to get top three employees and what if there are more than one employee with same salary?

$unwind
+++++++++++++++++++++

db.employees.aggregate([{$unwind:"$experience"}])

$sum, $avg , $min , $max
++++++++++++++++++++++++++++++
Pre-req : group

--Group by dept count of emp in each dept and avg salary

db.employees.aggregate([{
$group:{_id:"$department",
"noofemp":{$sum:1},
"avgsal":{$avg:"$salary"}}
}])

--Same way min and max (Refer notes)

$concat

db.employees.aggregate([{$project:{empDept:{$concat:["$empName","-","$department"]},_id:0}}])

$ifNull
db.employees.aggregate([{$project:{empDept:{$concat:[{$ifNull:["$empName","NA"]},"-",{$ifNull:["$department","NA"]}]},_id:0}}])

Date Queries
++++++++++++++++++++++

$year $month

$dateToString
date :
format :
timezone :
onNull :

%Y - Year
%m - Month
%d - date
%H - hours
%M - minutes
%S - Seconds
%L - Milli seconds

db.employees.aggregate([{$project:{birthYear:{$year:"$dateOfBirth"}}}])

db.employees.aggregate([{
$project:{birthdate:{$dateToString:{
date:"$dateOfBirth",
format:"%m-%d"}}}
}])

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++===

Indexing
+++++++++

db.employees.find({salary:{$gte:40000,$lte:50000}}) //Sample Query

db.employees.find({salary:{$gte:40000,$lte:50000}}).explain("executionStats")

db.employees.createIndex({salary:1})
-1:desc
1 :asc

db.employees.find({salary:{$gte:40000,$lte:50000}}).explain("executionStats")

db.employees.getIndexes()

> db.employees.dropIndex({salary:-1})
{ "nIndexesWas" : 2, "ok" : 1 }
> db.employees.getIndexes()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } ]

Types of Indexes
++++++++++++++++++++++++++++++

1. Single Field Index
2.Compound Index
db.employees.createIndex({salary:1,department:1})

3. Text Index
db.employees.createIndex({department:"text"})

db.employees.createIndex({"experience.role":1})

BulkWrite
++++++++++
--Refer to Notes

No comments: