BSON Format in MongoDB


BSON Format in MongoDB

BSON (Binary JSON) is a binary-encoded serialization format used to store documents and make remote procedure calls in MongoDB. It is designed to be a space-efficient and fast representation of data structures, particularly for use in MongoDB, which stores data as BSON documents.

While BSON is similar to JSON in that it represents objects (documents) with key-value pairs, it adds several additional features like support for more data types, efficient encoding for binary data, and metadata about the size of the document.

Key Characteristics of BSON

  1. Binary Encoding: BSON is a binary format, which means it is more compact than plain JSON, making it faster to transmit and process. It is designed for easy parsing in both databases and application layers.

  2. Support for Rich Data Types: In addition to the standard JSON types (string, number, Boolean, array, object), BSON supports several other data types, such as:

    • Dates: Date values are encoded as 64-bit integers.
    • Binary Data: BSON can store raw binary data.
    • Object IDs: A special 12-byte type used for document IDs (_id field).
    • 64-bit Integers: BSON can represent both 32-bit and 64-bit integers, which is more flexible than JSON.
    • Floating Point: BSON supports IEEE 754 floating point numbers.
    • Embedded Documents and Arrays: Complex nested structures can be represented within a document.
  3. Efficient Storage of Data: BSON includes metadata about the size of each document and its elements, allowing MongoDB to quickly skip over irrelevant sections when processing data, leading to more efficient queries and storage.


BSON Data Types

  1. String: Standard UTF-8 strings.

    { "name": "Alice" }
  2. Object: Embedded documents or objects, which can contain more key-value pairs.

    { "address": { "city": "New York", "state": "NY" } }
  3. Array: Arrays of values, which can contain mixed types.

    { "scores": [85, 90, 78] }
  4. Binary Data: Raw binary data, represented in BSON.

    { "file": <binary> }
  5. ObjectId: A 12-byte unique identifier used by MongoDB for the _id field. It contains a timestamp, machine identifier, process ID, and a counter.

    { "_id": ObjectId("507f191e810c19729de860ea") }
  6. Boolean: True or false values.

    { "active": true }
  7. Date: BSON includes a date type for storing timestamps as 64-bit integers representing milliseconds since the Unix epoch.

    { "created_at": ISODate("2023-09-01T00:00:00Z") }
  8. 32-bit Integer: Standard integer representation for 32-bit signed values.

    { "age": 25 }
  9. 64-bit Integer: Larger integers for 64-bit values.

    { "bigNumber": NumberLong("9223372036854775807") }
  10. Null: Represents null values.

    { "middle_name": null }

Example of a BSON Document

Let's consider a typical BSON document in a MongoDB collection:

{ "_id": ObjectId("507f191e810c19729de860ea"), "name": "John Doe", "age": 29, "email": "johndoe@example.com", "created_at": ISODate("2024-09-01T08:00:00Z"), "is_active": true, "scores": [88, 92, 79], "profile_pic": <binary data>, "address": { "street": "123 Main St", "city": "New York", "state": "NY", "zip": 10001 } }

This document contains a variety of data types, including an ObjectId for the _id field, a Date for created_at, and an embedded document (address).

BSON Object Size

Each BSON document includes information about its total size. This helps MongoDB efficiently process and skip over documents during query execution.

MongoDB has a limit of 16 MB per document, including all its embedded data, arrays, and metadata.

Use of BSON in MongoDB

  1. Storage: All data stored in MongoDB is in BSON format. This allows for efficient retrieval and query processing.
  2. Transport: BSON is used as the wire protocol format for data exchange between MongoDB clients and servers.
  3. Indexing: BSON allows MongoDB to create indexes on fields, improving query performance.
  4. Serialization/Deserialization: When interacting with MongoDB, drivers convert application data into BSON format (serialization) before sending it to the database and convert BSON back into native application data (deserialization) when retrieving data.