Some background:

One of the features of our project is a script management system. We initially build it with nodes in Drupal, with some 3rd party modules. It achieved our goals, and users can use it to done there jobs. But we all know that node operations in Drupal is very expensive and a bit slow, and users have to redirect from this page to that page and back again. As the data growing, I think it is time to rebuild the feature and provide a more graceful way for users to use. Then I spend several days built the Node.js backend, exporting some http apis and made a modern async user interface with backbonejs, and save the data into MongoDB(This is an old choice). It is not difficult to develop the new system. The new system is well tested. Then I have to migrate the old date in MySQL/Drupal system into the new Node.js/MongoDB system.

Some choices:

When you face a problem, there always several solution or methods there. There are two choices of my migration task: write a script at the database level to fetch data from MySQL and insert into MongoDB; or generate data from the higher layer Drupal API and feed the new system via the new HTTP API. If I choose the first one, there will be lots of code to write and it will be faster to run the script. I think this is a one time task, so I chosed the second way: to write less code.

So I spent 2 days on coding and testing the migration script…

Here are some experiences about data migration:

How to develop a script in Drupal system

A Drupal script run like a normal web page, But It run from command line and take more time to be done. You can run a Drupal script like this:

cd /your/drupal/site/location
nohup php drupal_script.php > output.log &

In this way, the drupal script will run as a daemon and will not be interrupted. And the output from the script will be appended at the end the file output.log

A drupal_script.php should be write like this:

How to communicate to the HTTP API in Drupal

It is a function drupal_http_request in Drupal, it is easy to make HTTP reuqests. I had to change the jsonp result into Array result in PHP.

How to bypass the auth layer of the new system

I copied the code of the apis from the new system and made some minor changes: add a new param UID. Then the new system will recognize the user without a auth token or cookie but just the param UID in GET request .

req.uid = parseInt(req.query.cid);

Watch the migration progress

Sine we have output the Drupal script result into output.log, we can watch
the logs when the script is running, and we can also to watch the logs generated from Node.js.

Lesson learned

The way I used to migrate data is not very fast, the speed is only ~50 records per second. But It just took us ~1 hour to run the script and less code to write. I do not have to write tons of codes and spend lots of time on testing the codes.

  • You should better stop doing any other operation on the server when migrating data.
  • Bypass the errors, and limit the scope of the affection of the error. Log the errors in logs, and then deal with the error records.
  • MongoDB will eat lots of CPU for these operations.
  • The long running PHP script will eat more and more memory, take care of this.
  • Build the system FAST, then let it run fast.
  • You should test your code in a small scope, then in a larger scope.
  • You should predict the time spend on the script running.
  • Last, Drupal is great for front-end and prototyping, but not for APIs; Node.js is FAST and fast.


  • 看了之后有帮助吗?