Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Seeding Data Into MongoDB Running On Docker

Learn how to seed data into a running Docker container in a simple and flexible way.

Modern applications are, at least to some extent, data-rich. What this means is that often times applications will have features like the Twitter Feed, aggregated statistics, friends/followers, and many other features that rely on complex inter-related data.

It's this data that provides the vast majority of the application value. Twitter would be quite useless if the only thing you could do is post and could only see a handful of others’ tweets.

The biggest pitfall most developers can fall to is re-using the production database for development

Due to the complexity of this data, during the early development process of an application, it's tempting to use the same database for production as for development.

The thinking is, “if we need the most realistic data, what’s more, realistic than actual user-generated data”.

There are a few serious reasons why you should strongly consider not going that route:

The combination of programmatically-generated data with a locally running database will prevent any of those aforementioned problems from causing significant issues.

Since even if you do nuke the database or DDOS yourself, it’s a trivial task to refresh your development environment or re-generate the data you need.

By reducing the number of external dependencies during development we increase system consistency solving debugging and isolation issues.

But the additional value gained from data generation will depend on two major factors:

So in order to have a setup that maximizes your chances of catching bugs and testing the real quality of the software being developed the data powering the application and its usage must follow the following rules:

For this article, we will create a simple React application that will simply render a list of employees. Specifically, it will display:

Then, since the front-end needs to display the list of employees, the API will return an array of objects. The object will have each aforementioned property for all of the users stored on the DB.

Since we’re focusing on the database side, we’ll breeze through the rest of the applications but in a future article, I’ll be diving deeper into how to get complex orchestrations with Docker Compose.

For our front-end client, we’ll be utilizing React to display the data stored on the MongoDB database. The client will make requests using Axios to the Node API.

To get started, we will utilize Create React App to set up the baseline application that we’ll make a few changes to.

You can create a CRA application with the following command from the project root:

Then, we will have to download the dependencies that we’ll need for the React application. For our purposes, we’re only going to need Axios and Material-UI.

You can download them both with the following command (make sure you’re in the client direction, not the project root)

For our purposes, we will only be making changes to the App.js file, which in the starter project is a major component that displays content.

This is what that file should look like at the start:

The changes we will make to this file are in order:

After the three steps, your App.js file will look something like this:

The styling and components we used will result in the cards looking like this(Note the black background is the CRA default background, not the actual card):

You’ll be able to see it for yourself once we have wired up the API and implemented the data generation.

Here it is, we just have to install the necessary dependencies into the image and then run the development server as normal

On the API, we’ll have a single unauthenticated route named /employees which will return an array of objects containing the properties we defined above.

The folder structure for the api will ultimately end up looking like this:

The User.js model will contain a simple Mongoose model which we’ll use to interface with the database when querying for the list of employees.

Then, we will have to download the necessary dependencies to quickly make a web server and integrate it with a MongoDB server. Specifically, we’ll utilize Express, Mongoose, and Nodemon.

The first two we’ll download as regular dependencies with the following command (make sure you’re in the api directory and not in the project root):

Then nodemon we will install as a development dependency

Once you have your dependencies downloaded make sure to add the ‘nodemon’ prefix to your npm start script. Your “start” script in the package.json should look like this:

First let’s build out the User mongoose model, in the User.js file in the models folder the User model can be created like this:

Employee.js

Where we the ‘mongoose.model’ function registers it into mongoose as long as we require the file in our index.js.

Then our index.js file we require the User model, create a basic express server, and have our singular route the GET /employees route.

index.js

API Dockerfile

The API Dockerfile will look exactly the same to the Client Dockerfile since we’ve updated the package.json file to abstract away the functionality of the API needs.

Dockerfile.dev

From the project root, create a folder named nginx, which will contain the configuration of an NGINX server that will route the requests either to the React application or the Nodejs API.

The following is the Nginx configuration will that you should name nginx.conf. It defines the upstreams servers for the client and server.

nginx.conf

The blocks for sockjs-node are there to allow for the WebSocket connection that CRA utilizes during development.

We also need to create a Dockerfile for the NGINX server that uses our config file to override the default. Make sure to create the Dockefile in the same folder as the config file.

We won’t be going too deeply into how Compose works in this article, but suffice it to say that it ties together the individual containers we defined above.

docker-compose.yml

Towards the bottom of the docker-compose.yml file, you’ll see the services for the MongoDB Database and the container that will seed the aforementioned database.

Now that we’ve finished defining the foundations of the application, we will move on to creating the mongo directory where will we define the Dockerfile for the dbseed service and the scripts for generating data.

Now that we have all of the infrastructure setups, let's move on towards seeding our MongoDB database with dynamically generated data.

Before defining the database seeding container, first, we’ll focus on the actual data generation for development data.

The folder structure of the data generation script and DB seeding container will match the following:

These libraries make it incredibly simple to generate fake data and to handle CLI in JS files respectively.

We will have two main files we’ll need for the data generation, these are an index.js file which will serve as our point of contact for the data generation and the employee.js which will hold all of the data generation functions needed for employees

index.js

employees.js

This script is then called by the aforementioned init.sh, which is a simple bash file that posts the mongoimport CLI command.

init.sh

Now that we’ve defined the scripts to generate and import the data, we can define the Dockerfile that will be utilized by Docker Compose.

Dockerfile.dev

When generating development data for a MongoDB database there are three primary concerns that must be considered:

Within this article, we will only have to consider the first one but the other two we will cover and discuss

Specifically, mongorestore only works with BSON data, this allows it to run faster and preserve the metadata BSON provides.

This is possible because unlike mongoimport, mongorestore doesn’t have to convert the data from JSON into BSON.

Why not go with mongorestore

Mongorestore is:

But the reason why it’d advise to instead utilize mongoimport for development data is the simplicity it provides.

Due to the flexibility of data it can receive, mongoimport is significantly easier to use compared to mongorestore. Unlike its faster alternative, mongoimport can directly import both JSON and CSV.

This allows us to write a simple script to generate an array of JSON which can be easily imported as so

There may be times where the generated data used for development should be related to developer-dependent information.

For example, the developer has a specific logon (username and userId) and the generated data is user-specific.

Hence, in order for the developer to have the data generated for their specific account, there should be an optional JSON that is only locally defined.

We can achieve this by creating a JSON file in the same folder as the data generation scripts. For example:

localuser.json

Which can then be imported and used by the general data generation script as such:

Here you can see how we can import the localuser and then create an employee based on the provided data.

In this situation, we could also use destructuring to provide an easier way to override the generated data with an arbitrary number of properties. Like this:

But do note that the JSON key must match the properties defined in the ‘employee’ object. So to override the title and name property the localuser.json must look like this:

Let’s say that the company that all of our employees are a part of gives each employee a computer. In such a case we would want to keep track of each computer the company owns and the employee who currently has it.

Its schema would look a bit like this (Ignored the overly simplistic example):

Hence, if we wanted to generate data for the computers the company owns we would have to utilize the names of the employees we generated.

This inter-collection example uses a computer schema that isn’t how it would actually be done in real life. This would probably make more sense as an embedded document within an employee document. This example is just used for simplicity’s sake.

We can do this by simply passing down the array of employees generated to the function that generates the computers.

This would look roughly like this:

Where generateComputers is a function similar to generate employees but takes an extra parameter that holds the data that belongs to a separate collection.

Congrats!! Now everything you need has been hooked together and the data you need should be in the database.

With all of the names, titles, departments, etc (except for the one specified in localuser.json) being randomly generated.

The final big-picture folder structure of the application should look kinda like this:

Valencian Digital helps businesses and startups stay ahead of curve through cutting-edge technology that integrates seamlessly into the business workflows.

Add a comment

Related posts:

The Goals of .Net Core

.NET Core has the following Goals.. “The Goals of .Net Core” is published by Pritam Ajmire.

Gracie Whispers Again

He saw the flicker. “Gracie Whispers Again” is published by Richard Palmquist, DVM GDVCHM CCHVM.