Databases

A database is a collection of data that is stored permanently and typically has an interface allowing user interaction. Databases allow us to store and retrieve information as needed rather than having to recreate the data anew each time we run an application.

The terms data and information are often used interchangeably, but it is important to to distinguish that information is data that have been processed in a way to be useful to the person using it. You can have all the data in the world, but it is useless unless you are able to aggregate it in a way that makes it useful.

There are two broad categories of databases in the development world: SQL (relational) and NoSQL (non-relational).

This section covers the following topics:


Relational Databases

  • Relational databases are tabular (table-like) and not flexible regarding design. Each record contains the same exact fields, whether those fields contain data or not and is connected (joined) to other tables in the database.

    USERS TABLE

    id name age city
    1 Tim 57 NYC
    2 Ira 24 Missoula
    3 Sue 40 Boulder

    COMMENTS TABLE

    id text
    1 "lol"
    2 "Come visit Montana!"
    3 "I love puppies!!!"
    4 "Seriously Montana is great!"

    USER/COMMENTS JOIN TABLE

    userId commentId
    1 3
    2 2
    2 4
    3 1

A relational database stores not only the data, but the relationship between the data in one table to the data in another table.

Before jumping right in, you should give some thought to how your database is going to be organized. A tool to help do this is an entity-relationship model.

An entity-relationship model (E-R model) maps out the relationships between entities or items. It states which data is to be stored and the relationship between them and is usually expressed in an entity-relationship diagram (E-R diagram or ERD). (Quassnoi, 2016)

e-r diagram

When designing databases, you often hear the term "normalization" and the typical definitions given are often hard to comprehend. Simply stated, normalization is the process of cleaning up the structure of the data in order to reduce redundancy and dependency of the data.

Consider two tables, one a list of car manufacturers and one a list of car models. Rather than have the manufacturer data in BOTH tables, you would have a field in the car model table that had a relationship to the manufacturer table. This allows us to store that data in only one table, but use it in another through the relationship.

When you normalize your database, you are reviewing the data for these kinds of relationships and adjusting your structure to be more efficient.

This is a simplified description of the process and there is a lot more to normalization than just this, but it gives you an idea of the overall concept.

MySQL vs PostgreSQL

Two database options that are used frequently are MySQL and PostgreSQL. Both are open source. MySQL is known for its speed and ease of use while PostgreSQL is known for its advanced features. For a list of differences between the two, see this article: PostgreSQL vs. MySQL

PostgreSQL

PostgreSQL is an open source, object-related database system that runs on all major operating systems. It is highly recommended by the authors of Two Scoops of Django.

PostgreSQL uses a client/server model. A PostgreSQL session consists of the following cooperating processes (programs):

  • A server process, which manages the database files, accepts connections to the database from client applications, and performs database actions on behalf of the clients. The database server program is called postgres.

  • The user's client (frontend) application that wants to perform database operations.

Installation

Use Homebrew to install PostgreSQL on the MacOS using the following command:

$ brew install postgresql

Start Postgres running and set it to start every time your computer starts up:

$ pg_ctl -D /usr/local/var/postgres start && brew services start postgresql

Stop Postgres running:

pg_ctl -D /usr/local/var/postgres stop -s -m fast

Check the version:

$ postgres -V

Configuration

Enter the postgres command line with the psql command.

MacBook-Pro: ~ $ psql postgres
psql (9.5.4)
Type "help" for help.

postgres=#

Once in the psql command line, we can use commands to configure our environment.

The first thing you need to do is set the password for the default account (by default, it has no password.) We can use the following commands to do so:

postgres=# \password postgres

You will be prompted to enter the password and then confirm it.

To see a list of users, use the command \du:

postgres=# \du

postgres users

To see a list of databases, use the command \list:

postgres=# \list

postgres dbs

To exit from the postgres command line, use the command \q.

(Sears, 2016)

Creating a Database

There are multiple ways to create a PostgreSQL database. Two of them are:

1) THE CREATEDB COMMAND

You can use the createdb command from the command line to create the database. With this command, you can specify the owner, which should be one of the existing user roles configured in PostgreSQL already.

createdb <<name of database>> --owner=<<user role>>
2) USE THE PGADMIN APPLICATION
  1. Open the pgAdmin app and drill down into the Server until the Databases icon shows.

  2. Follow the prompts to assign a name, an owner and add a comment to identify the purpose of the database and any other information necessary. Hit Save to create the database.

Other Commands

ps auxw | grep post – Show what postgreSQL processes are running.

brew services restart postgresql - (MacOS) Will restart the postgreSQL service.

Troubleshooting

Occasionally on my Mac, I have had trouble with PostgreSQL getting hung up. Below are some error messages I've seen and what I've done to fix them.

Error Message Postgres Description Solution Found (macOS)
The data directory contains an old postmaster.pid file / The data directory contains an unreadable postmaster.pid file PostgreSQL puts a file named postmaster.pid in the data directory to store the process id of the PostgreSQL server process. If PostgreSQL crashes, this file can contain an old pid that confuses PostgreSQL. You can fix this issue by deleting the postmaster.pid file. However, you must make sure that PostgreSQL is really not running. Open Activity Monitor and make sure that there are no processes named ‘postgres’ or ‘postmaster’. Find the postmaster.pid file in this directory (default): ~/Library/Application Support/Postgres/var[-ver]/ and delete it. Restart Postgres.

MySQL

MySQL is Dual Licensed. This means it is open source for general use or you can purchase a license for commercial use.

Things to know about MySQL:

  • Database names are case-sensitive.

Creating a MySql Database

  1. Connect to the server

    $ mysql -u [host] -p
    Enter password:
    Welcome to the mySql monitor. Commands end with ; or \g.
    ...
    
    mysql>
    
  2. Show the existing databases.

    mysql> SHOW DATABASES;
    +--------------------+
    | Database           |
    +--------------------+
    | information_schema |
    | mysql              |
    | performance_schema |
    | sys                |
    +--------------------+
    4 rows in set (0.00 sec)
    
    mysql>
    
  3. Create a database.

    mysql> CREATE DATABASE [database name] CHARACTER SET utf8;
    Query OK, 1 row affected (.0.. sec)
    
    mysql>
    
  4. Exit the shell.

    mysql> QUIT
    bye
    
    $
    

Deleting a MySQL database

mysql> DROP DATABASE [database name];

Other Useful Commands

  • Add time zones to the MySQL server (one time set up). Note: There is no space between the option -p and the password.
    $ mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql - [host] -p[password] mysql
    
    mysql: [Warning] Using a password on the command line interface can be insecure.
    
    $
    

Non-relational Databases

Non-relational databases are highly flexible and do not have to be structured consistently. They are more like a document than a table. NoSQL databases run more efficiently on clusters and allow for higher operational speed.

The advantages of using documents are:

  • Documents (i.e. objects) correspond to native data types in many programming languages.
  • Embedded documents and arrays reduce need for expensive joins.
  • Dynamic schema supports fluent polymorphism.
    // ==========================
    // A NON-RELATIONAL DATABASE:
    // ==========================
    {
      name: "Ira",
      age: 24,
      city: Missoula,
      comments: [
        {text: "Come visit Montana!"},
        {text: "Seriously Montana is great!"},
        {text: "Why does no one care about Montana???"}
      ],
      favColor: "purple"
    }
    
    {
      name: "Tammy",
      age: 24,
      city: Missoula,
      comments: [
        {text: "Why does no one care about Montana???"}
      ],
      favFood: "Ribeye"
    }
    

MongoDB

MongoDB is an open-source, NoSQL (non-relational) document database that provides high performance, high availability, and automatic scaling. It is one of the most popular databases used when working with NodeJS and has tools that integrate nicely within an Express app. (Steele, The Web Developer Bootcamp, 2017)

Key features include:

  • High performance data persistence
    • Support for embedded data models reduces I/O activity on database system.
    • Indexes support faster queries and can include keys from embedded documents and arrays.
  • Rich query language to support read and write (CRUD) operations
    • Data Aggregation
    • Text Search and Geospatial Queries
  • High Availability via MongoDB's replication facility
    • automatic failover
    • data redundancy
  • Horizontal Scalability
    • Sharding distributes data across a cluster of machines.
    • MongoDB 3.4 supports creating zones of data based on the shard key.
  • Support for multiple storage engines
    • WiredTiger Storage Engine
    • MMAPv1 Storage Engine
    • Pluggable storage engine API that allows third parties to develop storage engines for MongoDB

Data Structure

A record in MongoDB is a document, which is a data structure composed of field and value pairs. MongoDB documents are similar to JSON objects. The values of fields may include other documents, arrays, and arrays of documents.

Document – A record where data is structured as a field and its value. A field is similar to a column in relational databases. The value of fields may include other documents, arrays, and arrays of documents.

Structure:

{
field1: value1,
field2: value2,
field3: value3,
...
fieldN: valueN
}

Example:

var mydoc = {
    _id: ObjectId("5099803df3f4948bd2f98391"),
    name: {first: "Alan", last: "Turing"},
    birth: new Date('Jun 23, 1912'),
    death: new Date('Jun 07, 1954'),
    contribs: ["Turing machine", "Turing test", "Turingery" ],
    views : NumberLong(1250000)
    }

Notes about Document structure:

  • Field names are strings.

    • The field name _id is reserved for use as a primary key; its value must be unique in the collection, is immutable, and may be of any type other than an array.
    • The field names cannot start with the dollar sign ($) character.
    • The field names cannot contain the dot (.) character.
    • The field names cannot contain the null character.
  • For collections that are indexed, the values for the indexed fields have a Maximum Index Key Length limit.

  • MongoDB uses the dot notation to access the elements of an array and to access the fields of an embedded document.

Collection – A group of documents (like a table) that exists within a single database.

Database – A container that holds a group of collections.

Mongod – (Mongo daemon) This is the MongoDB database server. It starts the MongoDB server as a daemon and manages data requests, formats, and background operations.

Workflow

When using MongoDB, you must follow the process below:

  • Start the Mongo server (mongod) from an admin command prompt;
  • In a separate terminal window, start the Mongo shell using the command: mongo
  • Specify which database to use with the command: use <databaseName>
  • Specify the collection within the database in the commands when manipulating data with the command: db.collection.command()
  • Exit the shell when done using CTRL-C or quit()

Installation Locally

For installation on the Mac OS, follow these instructions: https://docs.mongodb.com/manual/tutorial/install-mongodb-on-os-x/

For installation on Windows, use these: https://docs.mongodb.com/manual/tutorial/install-mongodb-on-windows/

Commands

Below are some common commands used in the MongoDB shell. (The examples shown are from Cloud9.) You can find more commands and details online in the MondoDB Command Reference.

  • mongod - (stands for mongo daemon) starts the mongo daemon and is run in the core directory (one level up from the main /webdevboot folder). The Mongo daemon must be running for mongo to work.

    $ ./mongod
    
  • mongo - starts the mongo shell and is run in the regular terminal window where we run all of our other commands. This is actually a nice place to test things outside of the regular code.

    kruizerchick:~/workspace $ mongo
    MongoDB shell version 2.6.12
    connecting to: test
    Server has startup warnings:
    2017-04-16T15:17:35.688+0000 ** WARNING: --rest is specified without --httpinterface,
    2017-04-16T15:17:35.688+0000 **          enabling http interface
    
  • help - gives you a list of mongo shell commands

  • show dbs - shows the databases that have been set up (we use separate databases for each app that we create)

    > show dbs
    admin  (empty)
    local  0.078GB
    
  • use - tells mongo which database to use; if the specified database doesn't exit, it creates a new database. After creating a new database, it won't show in the list until it actually has data in it.

    > use demo
    switched to db demo
    

    Note: Before you can use any of the commands regarding this database, you must point the system to this database with the use command.

  • insert db.collection.insert(Object) - inserts data into the database

    > db.dogs.insert({name: "Rusty", breed: "Mutt"})
    WriteResult({ "nInserted" : 1 })
    
  • show collections - displays a list of the different collections with a database.

    > show collections
    dogs
    system.indexes
    
  • find db.collection.find(query) - lists all of the objects in the specified collection that match the specified parameter or, if no parameter specified, all objects in the collection

  • update db.collection.update(filter, update) - updates information on the specified object in the collection; Caution: The default update overwrites the entire object. To update specific fields, use an update operator in the update parameter.

    > db.dogs.update({name: "Lulu"}, {breed: "Labradoodle})
    
    • $set operator - replaces the value of the item field.
      > db.dogs.update({name: "Rusty"}, {$set: {name: "Tater", isCute: true}})
      
  • remove db.collection.remove(filter) - will remove every object that matches the specified filter

    > db.dogs.remove({breed: "Mutt"})
    
  • drop() - db.collection.drop() - Removes all objects in the database.

    > db.campgrounds.drop()
    

Configuration

You can modify the MongoDB configuration settings by adjusting the configuration files for either mongod or mongos (or both). This file utilizes command line options with the added ability to add comments.

A sample configuration file named ‘mongod’ might look like this:

Folder structure:

|-- project/
    |-- data/
    |-- todos_api/
    mongod

File: mongod

mongod --dbpath=./data --nojournal "s@"

See MongoDB documents under Configuration File Options for details.

. .
Back to top Forward to Best Practices ==>