How to Use Monstache to sync MongoDb and Elasticsearch in real-time?

Noc Folio3

4 years ago

Let it Sync! Using Monstache to sync MongoDb and Elasticsearch in real-time.

Many of us might have heard about elasticsearch. It is an open-source NoSQL search engine that is commonly used to search and analyze data. With that being said, we know that elasticsearch is not recommended to be used as a primary database, hence we always need a database to be used with Elasticsearch and keep them synced!

In order to sync elasticsearch with relational databases, there are tools like JDBC and logstash, and many tutorials and articles on how to integrate that but elasticsearch does not provide the required MongoDB JDBC support, which leaves us with very few tools which can be used to sync MongoDB and elasticsearch:

Mongo Connector:
According to its Github:
“ mongo-connector creates a pipeline from a MongoDB cluster to one or more target systems, such as Solr, Elasticsearch, or another MongoDB cluster. “
But the drawback of Mongo Connector is that it does not have very good support for Elasticsearch 6+. Not to forget that its repository has not been updated for more than a year.
Trasporter:
Transporter is a good tool to export data from MongoDB to elasticsearch, but it does not provide real-time syncing.
Mongoosastic:
According to its Github:
“Mongoosastic is a mongoose plugin that can automatically index
your models into elasticsearch.”
But, it is only useful when changes in MongoDB are done through the server, any changes done directly to MongoDB will not reflect in Elasticsearch

Sync in real-time with Monstache!

Monstache is a sync daemon written in Go that syncs MongoDB collections into Elasticsearch in real-time. It is possible using monstache to index entire MongoDB collections into elasticsearch, and after indexing, monstache will also keep everything synced.

How it syncs in real-time?

Monstache reads oplogs of the MongoDB that is connected to it, to sync every operation that is performed on MongoDB.

You will need to ensure that MongoDB is configured to produce an oplog by deploying a replica set. If you haven’t already done so, follow the 5 step procedure to initiate and validate your replica set. For testing it locally, your replica set may contain a single member.

Before we move forward, let me mention some features of monstache listed here:

Supports up to and including the latest versions of Elasticsearch and MongoDB
Single binary with a light footprint
Support for MongoDB change streams and aggregation pipelines
Pre built Docker containers
Optionally filter the set of collections to sync
Direct read mode to do a full sync of collections in addition to tailing the oplog
Transform and filter documents before indexing using Golang plugins or JavaScript ( What I like the most! )

Getting Started

Preresequits: Golang setup in your systems — Windows, Mac and Linux

Please note that you don’t need to download Monstache explicitly, all you need to do is checkout the Monstache tag (version) whichever suits you. Trust me, that’s the best way to run monstache. You would get why am i saying it later with this writeup.

If you want to see which monstache version is suitable for you then you can visit “Which Version Should I Use”.

Now let’s start with running monstache.

First off, clone the monstache repo to your system.
Checkout the appropriate tag (version) according to your mongodb and elasticsearch version as mentioned above in “Which Version Should I use”. For me that would be:
1. $ git checkout v6.4.3
Run: go install from inside the repo, which is going to install/make a binary if monstache.
After you have ran the above command, you have successfully set up monstache in your pc. Just run monstache -v to verify it.
Now that monstache is setup. Lets start with the monstache.toml file. Which is basically a configuration file for monstache. Few necessary configurations are these:
1. mongo-url = “your-mongo-db-connection-string”
2. elasticsearch-urls = [“your-elastic-search-url”]
3. replay: When replay is true, monstache replays all events from the beginning of the MongoDB oplog and syncs them to Elasticsearch.
4. resume: When resume is true, monstache writes the timestamp of MongoDB operations it has successfully synced to Elasticsearch to the collection monstache.monstache. It also reads that timestamp from that collection when it starts in order to replay events which it might have missed because monstache was stopped
5. resume-name: monstache uses the value of resume-name as an id when storing and retrieving timestamps to and from the MongoDB collection monstache.monstache. The default value for this option is the string default.
6. namespace-regex:When namespace-regex is given this regex is tested against the namespace, database.collection, of any insert, update, delete in MongoDB
7. direct-read-namespaces:This option allows you to directly copy collections from MongoDB to Elasticsearch. You need this option if you want index all of your data from mongodb to elasticsearch
8. mapping: used to overwrite default index and type.. See the section Index Mapping for more information.

There are many more configurations for monstache as per your need. You can find them here.

Our monstache file will look something like this:

mapper-plugin-path = "/app/plugin.so" // plugin path

mongo-url = "mongodb-connection-string"

elasticsearch-urls = ["http://es7:9200"]

elasticsearch-max-conns = 10 

replay = false

resume = true

enable-oplog = true

resume-name = "my-resume-name"

namespace-regex = '^db\.collection$' // my namespace which I want to sync

direct-read-namespaces = ["db.collection"] // directly copy entire data to es from this name space
index-as-update = true // upsert docs

verbose = true //logs enabled

exit-after-direct-reads = false // don’t exit after copying from db to es

[[mapping]]
namespace = "db.collection" // my db collection from where I want to sync
index = "my-es-index-name" // my index where I want to sync data.

Now what about plugin?

Monstache supports middleware between mongodb and elasticsearch from which it is possible to manipulate, filter documents going from mongodb to elasticsearch. Middleware may be written in either Javascript or in Golang as a plugin. We will discuss golang here because it is recommended.

Below is the simple plugin example:

package main
 import (
     "github.com/rwynn/monstache/monstachemap"
     "strings"
 )
 // a plugin to convert document values to uppercase
 func Map(input *monstachemap.MapperPluginInput) (output *monstachemap.MapperPluginOutput, err error) {
     doc := input.Document
     for k, v := range doc {
         switch v.(type) {
         case string:
             doc[k] = strings.ToUpper(v.(string))
         }
     }
     output = &monstachemap.MapperPluginOutput{Document: doc}
     return
 }

The main question that arises is to build the plugin, and it requires to be built with the same version of monstache with which you want to use the plugin.

And… this is the reason why we don’t want to download the monstache separately rather use it by checking out from git because we are going to use this cloned repository to build our plugin as well.

If you will install monstache separately, and try to use the plugin with it, you will get this error:

Plugin package error: built with a different version of package github.com/globalsign/mgo/internal/json

Even if your monstache version is the same as the plugin. You. will. get. this. Error.

So plugin build steps are:

cd to monstache branch
Type:

go build -buildmode=plugin -o patht/to/save/myplugin.so /path/to/myplugin.go

Congratulations! You have build your plugin.

Now you can finally run monstache using

monstache -f path/to/monstache.toml

.* note: we have already given the plugin’s path inside the toml file.

Conclusion:

Monstache has been a lifesaver for us in keeping everything synced like a piece of cake. This article covers a very small part of monstache because in all it is actually a very powerful tool.

All the information here about monstache is collected from Monstache official documentation.