Monday, February 26, 2007

MySQL - Real Database Or Not?

I hold these truths to be self-evident:
  • MySQL is not a real database
  • MySQL is one of the most successful database systems out there

Matt Asay has an interesting blog on his site regarding the claim that MySQL is not a real database. I've been arguing for a while that MySQL is the proof that most systems are not database-centric in nature. In my blog entry, Centricity, I define the difference between application-centric and database-centric systems.

As someone who spent their career in databases, I can very well understand why people argue that MySQL isn't a real database. The operational issues (re-issuing SQL statements as part of a recovery) and the non-deterministic behavior when certain, invalid operations are attempted make it easy to argue that MySQL doesn't fit with many of the large-scale operational requirements of enterprise IT organizations. MySQL, I'm sure, will "fix" many of these issues as time goes on but, for now, there are valid issues that prevent MySQL from tackling some of the large database-centric problems.

On the other hand, though, MySQL's popularity absolutely proves that most systems that are deployed on a database are not database-centric. MySQL's success, I believe, is related directly to the ease with which applications can be developed. This has led to many application templates being developed on top of MySQL which, in turn, has made it easier to develop even more applications. Generally speaking, application developers don't care about how recovery of a down database is performed.

IT managers do care, though, about how long it takes to recover a down or corrupted database. As I pointed out in my Centricity blog, the problem with Oracle is that they want you to pay the database-centric premium for all your database deployments and that premium is very large, indeed.

7 comments:

Eric Bergen said...
This comment has been removed by the author.
Eric Bergen said...

What are you referring to when you say, "re-issuing SQL statements as part of a recovery"? There are seveal backup/recovery methods for MySQL that don't involve replaying SQL statements.

Dave Dargo said...

I would be very happy to be educated/corrected or pulled up short on this one as I'm not a MySQL expert, internals or otherwise. My understanding is that when a roll forward operation is performed in MySQL it uses the binlog to know which SQL statements to "replay." Is this not correct?

The databases I've worked with keep track of physical changes to blocks and then re-apply those changes back at the block level when performing roll forward operations.

How does this work in MySQL?

Thanks,
Dave

Mark Callaghan said...

It is real. In recent MySQL releases the binlog is consulted on crash recovery to determine the outcome for pending transactions so that the binlog and storage engines are kept in sync, but the only data provided by the binlog is a list of transactions that have committed. Otherwise, crash recovery is up to the storage engine. My favorite, InnoDB, is an amazing piece of software. It does physical logging and has some features that Oracle does not: adaptive hash index, doublewrite buffer and insert buffer.

Ronald Bradford said...

Dave,

As you state you have a career in databases, I would like to see your full criteria for determining what you consider "a real database". To ask the question as you have in the title of this article, I feel you need to provide more information.

With this criteria, I would then ask to determine which "databases" as listed at http://en.wikipedia.org/wiki/Database
indeed satisfy your criteria to be called a "real database", as well as seek to know if additional criteria not within this definition of a database can be considered valid.

This is further complicated, as there may be circumstances where products that now satisfy your criteria previously didn't within their lifecycle.

Ronald

Dave Dargo said...

I'd hate to put myself in a position of judging whether or not something is a real database. Instead, allow me to describe my context and, what I believe, is many people's context for this discussion.

A "real" database in this context is one that can be used for general purpose storage of data in a reliable, secure, recoverable manner. Part of recoverability requires that the database can recover from any situation. A practical requirement would mean that recovery happens in as short a time period as possible. A database that simply replays SQL statements is unlikely to meet the requirements for recoverability in an enterprise-class operation.

Similarly, a database that is non-deterministic when a user attempts to do something that is "incorrect" is unlikely to meet the requirements of reliability for an enterprise.

Such a product may meet the general requirements or general definition of a data store but it would not meet the requirements of the context set by most enterprises for database operations.

As you highlight, there are many products that started out unable to meet the requirements of the enterprise but which matured over time and easily satisfy the requirements of even the most demanding enterprise today.

Similarly, the requirements that enterprises place on their databases has also been an evolving standard.

Dave

Anonymous said...

I have always said everything is a database. I don't assume a database needs to do more than preserve data for a moment. Probably because dBase was an early love and at its heyday quite the product; yet required so much programming. So your term real database probably should of been Enterprise grade... or 2007 Enterprise grade to peg the time domain ... or the old term not thought of fondly... 4gl (since it does so much without coding). I thought the Japanese were going to take over the world in programming cause of their use of 4gl... or was that 5.