Very simply put, database management is knowing how to access, configure, debug, and manage your database. For some website owners, this isn’t really a concern. Even developers rarely need to poke around with the database, and for many of us, it’s just a piece of software that someone configured once and, as many of us like to say, “We’ll never need to touch it again.” This is a typical understanding of database management, especially in the context of WordPress, with the real ins-and-outs of the entire foundation of your website going largely ignored or, at the very least, just not really being thought about.
Now is a good time to stop and explain exactly what a database is. Ready? Wait for it… it’s a base… for your data.
Okay, terrible jokes aside, a database is where all of the persistent data for your website live. Depending on how and where your website is located, the database may be on its own server, or it may be sharing the server with your web server. It could be a Relational Database Management System (RDBMS), such as MySQL, Oracle, or MS SQL, to name a few, or it could be a NoSQL database such as CouchDB or Cassandra. This particular article will deal with RDBMS and MySQL/MariaDB in particular since it’s the most widely used RDBMS for WordPress.
Speaking of SQL, what is it? SQL stands for Structured Query Language, and it’s how both humans and programs communicate with the database. Within SQL are all the commands you need to get, modify, create, and delete information. The other good thing about SQL is that it’s a standard adopted by most RDBMSes on the market.
For instance, I can write a simple query that will work in MySQL, PostgreSQL, MS SQL, and Oracle, with little or no tweaks. A “gotcha” here is that all of these RDBMSes implement their own extensions on top of the SQL standards they support, and while most are SQL or ACID compliant, the level of compliance varies between products and even between versions of the same product. RDBMS-specific features include things like Flashback queries in Oracle or tsvector queries in PostgreSQL. Generally, when looking up SQL documentation, it’s best to try and find solutions specific to your RDBMS.
So, why is it important to learn about your database? For starters, we’re in a world where things go wrong all the time. “It’s easy—just one button and you’re done,” or, “It’s a DNS switch, nothing major,” or, “Just push the patch live and see what happens. I’m sure it won’t do any harm.”
I estimate around $10,421 in nickels for every time I’ve heard those phrases and something went wrong.* When things go south, understanding even how to simply access the database can be crucial to getting your site back in working order. Other reasons you may want to really know how things work in your database include:
- You decide to move to a new hosting provider and you need backups
- There’s a change that would be tedious through your CMS but trivial in the database
- Troubleshooting odd behavior or data
- Monitoring your database when identifying sluggish behavior in your website
- Making new friends**
- When you’re developing a new feature and want to understand the relationship of data in your application
There are plenty of other reasons you may want to look around in the database, and once you are familiar with what it does and how it works, you can feel (mostly) confident that you can peruse the raw data without breaking anything. For some tasks myself, I find it much easier to write a quick SQL query that gets the data I need instead of digging through WordPress trying to find it. This is especially true if you work on several projects throughout your day or week and need to do a lot of context switching.
There are plenty of tools out there to access your database, and I’ll outline a few here:
- Sequel Pro is pretty much the de facto SQL client for MacOS. It’s free, open-source, and very reliable
- Navicat is a paid product for accessing various RDBMS servers and has products for Windows, MacOS, and Linux
- HeidiSQL is a free, open-source SQL client for Windows
- phpMyAdmin is another open-source SQL administration client with a twist – it runs on your web server instead of being a standalone application
- The command-line, which most, if not all, SQL servers provide a set of command-line tools to interface with your database. It takes more knowledge of SQL to get up and running, and to be able to do anything, but some people prefer the simplicity of the terminal over sometimes-frilly GUI clients, like the ones above.
Once you have a SQL client installed, you can set up a connection to your SQL server. Depending on where your site is hosted, you may need to contact your site administrator to request this information. If you have a development server (and you should), then pointing your client to the dev server will give you the freedom to poke around without worrying about destroying your site in the process. Which leads us to the next topic…
Any time you are dealing with a database, whether you want to alter a single row or try out some new tricks, take a backup. All of the above programs offer options to export your data to a SQL file. Backups will save you in a crisis, and give you peace of mind while digging into your database. As a rule of thumb for myself, I won’t do any operation on live production data unless I have a backup. Sometimes it’s easy to write a query with two parameters swapped, or missing a clause to limit the query’s scope, and a backup is about the only “undo” button you have in those situations.
Further, your hosting provider should provide a means of taking regular backups. If this isn’t enabled by default, be sure to set it up; if it’s not available, ask your provider for tips on how to connect to your database so that you can take your own backups. If your site is important enough to you, however, and your provider doesn’t provide automatic backups, it may be worth looking to host your site elsewhere.
For WordPress, you can read more about backups on the codex entry for Database Backups.
I’m only briefly going to touch on caching, since it’s a bit technical to set up and beyond the scope of this post, that said: caching is great, especially if you have a site with a large volume of content or even just lots of users.
In the WordPress world, plugins like W3 Total Cache or Rocket Cache can provide caching solutions to improve your site’s performance and take some of the strain off of your database. Another important note about the WordPress database and caching is knowing what functions are cached by WordPress and which cause database hits. Some methods in WordPress force a fresh lookup of data, which can slow down your application. These methods usually have counterparts that are cached and should usually be used instead. An example of one such function is wp_get_object_terms, which suggests using get_the_terms() instead unless you absolutely need fresh data (caching in this sense refers to data cached in the current session, so a fresh page load would still pull uncached data on the first call of wp_get_object_terms). Always be sure to carefully read the documentation to ensure the method you’re using is proper and performant for the task.
Most of what I’ve covered so far is pretty WordPress-agnostic, so now I’m going to touch on the two together. WordPress was written for MySQL, and although there is some support for alternatives, odds are you’re either running MySQL or MariaDB (a fork of MySQL) if you’re running WordPress. MySQL is well-documented, and whilst MariaDB is a fork, it has its own documentation which covers the same ins-and-outs as MySQL, while providing information specific to MariaDB.
You may have configured your database when you first installed WordPress, and you can find out about your database credentials by looking at your project’s “wp-config.php”, where you can find the DB_NAME, DB_USER, DB_HOST, and DB_PASSWORD constants. More reading on this can be found on the WordPress Codex.
The database is at the core of WordPress. It’s where your posts, tags, users, theme settings, everything is kept. When you start looking at how the database is structured, you can see why WordPress is built on a Relational Database Management System: Posts have IDs, which connect to the wp_postmeta table (on the post_id column), and taxonomy terms are related via the object_id column on the “wp_term_relationships” table. Meanwhile, the comments table is linked back to posts, and even attachments (which are posts as far as the database is concerned) reference their parent post through relational data.
Side note: if you’ve been doing Database Things™ for a while now, you’re probably wondering why WordPress doesn’t really make use of things like foreign key constraints or other key-based relational data. The reason mostly seems to be from the fact that WordPress defaulted to the MyISAM engine for its tables. While offering better performance for things like largely static record reads (what you expect on a blog), it lacked certain features such as Foreign Key Constraints.
For multisite, you’ll notice that most of the tables in standard WordPress are “duplicated” in a sense for the subsites, with prefixes containing the site’s ID. So if your blog “food.coolblogs.com” is site ID 3, its tables will be prefixed with wp_3_. Note that some tables, like wp_users, are shared across all sites, with WordPress using meta keys to differentiate user roles across sites.
- Backups – I may have touched on this previously, but it never hurts to reiterate: always take a backup before touching the database
- Transactions – Typical WordPress installations using MyISAM engines do not support transactions, so be careful when executing SQL queries in WordPress and test your queries on your local machine or a dev instance before running them in production
- Filters – Practically every part of the queries that get built by WordPress internally for WP_Query can be filtered and adjusted as needed:
- posts_fields – Filters the SELECT clause of the query
- posts_search – Filters the search SQL that is used in the WHERE clause of WP_Query
- posts_search_orderby – Filters the ORDER BY used when ordering search results
- posts_where – Filters the WHERE clause of the query
- posts_join – Filters the JOIN clause of the query
- Serialized data – WordPress makes heavy usage of serialized data to store objects in the database. If you are ever doing large data operations directly on the database, such as replacing one domain with another in a column, it’s important to ensure the integrity of the serialized data afterward. There are several plugins and code snippets on the web that can help you fix things up again. WP Engine also has a blog post explaining serialization and potential gotchas. (Disclaimer: While we work with WP Engine on several projects, I found that blog post organically via Google.)
- Monitor your performance – Plugins like Query Monitor can give you an under-the-hood look at what WordPress is doing when it works with the database, and can provide feedback that can be used to tweak for performance, reveal bottlenecks, or uncover bugs in custom query implementations. Similarly, you can use command-line tools like mytop to get an overview of the load on your database server.
For further reading on WordPress and the Database, check out the codex page for the Database Description.
There’s a lot to learn about WordPress and the database, and if you’re willing, you can get an in-depth understanding of how the pieces fit and better manage your data. There’s a certain feeling of empowerment in understanding the database, and being able to know that you don’t need to panic when something goes wrong makes dealing with fires a lot less stressful.
If like most of us you’re busy and don’t have time to dig into all of this, then the fine folks here at WebDevStudios can help you manage and optimize your database.
* These numbers are best-guess, and I’ve probably actually heard those phrases far more than 2000 times.
** A SQL query walks into a bar, finds two tables and asks, “May I JOIN you?”