In my previous post, we looked at an overview of the two technologies we are combining to create a SQL Server reporting server: Indexed Views and Replication. As we saw, replication allows us to move our data from one server to another and indexed views allow us to prepare our data to be used by reports ahead of time, dramatically increasing speeds. Now that we have an idea of what each of these two technologies does, let’s take a look at how they come together. In this explanation, we’re assuming that both transactional replication and our indexed view have already been set up. We’ll cover how that is done in later posts.
Generally, if we are trying to set up a reporting server, it is because our primary OLTP (Online Transaction Processing) database isn’t delivering the data fast enough or we want to scale out by moving part of our workload. Our OLTP server is where we will begin tracking our data, where replication will pick it up and eventually move it to our reporting server.
Continuing with our widgets example, imagine that a shipment has come into our warehouse and someone updates the database with a new inventory count for one of our widgets, a sonic screwdriver. In a normal OLTP database, this data gets updated in memory, written to the logs and eventually written to the data file. With our reporting server setup, this all still happens, but some additional processes are also at play.
A Log Reader Agent will watch the log of our OLTP server (which is called the Publisher in replication, because it is the source of our data) for any committed changes to data that we are replicating. When it sees that our sonic screwdriver has had its inventory count changed, it will copy that transaction, and any others that are relevant to replication articles, in a batch to the distribution database.
The distribution database is a completely separate database which can run on the same instance as the publisher, the subscriber’s instance (our reporting server) or on its own instance. It houses the guts of the transaction process and acts as a staging point for any data moving between publishers and subscribers. The update for our sonic screwdriver inventory will wait there until the Distribution Agent moves it to the subscriber.
The distribution agent can run in one of two places, depending how you’ve set up your system. It will run either at the distributor, called a push subscription (data is pushed from the distributor to the subscriber), or at the subscriber, called a pull subscription (data is pulled from the distributor to the subscriber). In either case, it will move data from the distribution database to the subscriber database. In our example, the distribution agent will move the update of our inventory count for our sonic screwdriver from the distribution database to the inventory table in the reporting database. This is the last step along the way that transactional replication handles. From here, our indexed view takes over.
At this point in the process, our inventory count has been updated by a user on our OLTP server, that transaction has been written to the logs where it was picked up by the log reader agent, which brought it to our distribution database. From the distribution database, it was pushed or pulled to our subscriber database by the distribution agent, where it is written into the replicated tables. As I stated in my last post, once we create an index on a view, that data will actually physically exist within our database. However, the underlying data will still exist in its respective tables.
When a table is updated, all of its indexes are updated as well (and in the case of a table with a clustered index, the index actually physically IS the table). This includes the indexes for indexed views. As you’ll see when we create our example indexed view, the data is processed by our view definition and saved in that state, and that is done as the data is being inserted. Think of it as running the new data through our report’s query definition and saving the results into an index. This means that when the distribution agent moves the data to the subscriber (our report database) and inserts it, our indexed view is also updated. This means that our reports are instantly up to date the moment that data reaches our reporting server, and ready to be read for that report your boss needs immediately! This is really the beauty of indexed views: having the data already processed and saved can increase report speeds dramatically. Reports can go from running for several minutes down to less than a few seconds.
Keep in mind that updating these indexes as the data is being inserted is extra overhead for our server, as extra pages need to be written. This is one place where having a separate reporting server really shines. We don’t have to worry so much about this write overhead because it is abstracted away from our primary OLTP server. Data is written quickly to the OLTP server and then carried over to our reporting server via replication, protecting our write heavy OLTP server from that overhead. The additional write overhead of the indexed views may slow the replication inserts slightly, but it is generally negligible and reports will be up to date within seconds of the data being updated on the OLTP server, in my experience.
In my next post, we will begin looking at how we actually set up our reporting server.