DiamondStream builds and runs applications for the travel and entertainment industry (with special focus on the casino business). Recently, we decided to make a shift from the traditional Microsoft SQL Server to Redshift, a multi-node, columnar database, hosted by AWS for both our product development and services business. As part of that decision, we decided to move from an on-premise solution to the cloud, and not just any cloud, but to Amazon Web Services (AWS). The cloud offered us more flexibility, better scalability and lower cost. Most medium-sized businesses with BI and analytics needs should find the story interesting.

Our Situation

At DiamondStream we do two things: build and operate analytics software-as-a-service applications for the casino industry, and provide services for Travel, Entertainment and eCommerce companies that wish to build their own internal applications. Philosophically, we have always focused our recruitment and people investment on insight and client value-added vs. infrastructure provision and maintenance.

For this reason, we chose Birst for our data storage and analytics needs when we started the company. It provided us with an integrated business intelligence stack that sat on top of Microsoft SQL Server in their proprietary cloud. This helped us get up and running fast, limited our need to manage infrastructure and kept our capital and operating costs low.

The applications we built on this platform – Casino Share Intelligence and ACE (our campaign management tool for acquisition), worked effectively to manage terabytes of data for hundreds of users. As time went on, however, we faced increasingly complex data integration challenges as we added casino loyalty club data, social media data and social gaming data to our core payments and proprietary data sources. All of this meant our data sizes were set to grow rapidly and we had to work hard to keep up with performance requirements. In addition, we needed to keep track of an expanding set of partnerships and 3rd party relationships. All of this created a challenging master data issue.

Moving to the AWS Cloud Environment

To meet these challenges, we made three key major infrastructure decisions to

  1. Move from the Birst cloud to a Birst implementation on AWS
  2. Migrate from Microsoft SQL Server to Redshift, AWS’s new multi-node, columnar database, and
  3. Add a master data management tool from Semarchy to ensure we had one version of the truth with respect to patron data that we could keep track of seamlessly

Our move to AWS remained consistent with our philosophy of keeping our people focused on our value-added rather than infrastructure build. At the same time, however, it gave us significantly improved flexibility to incorporate pre-processing steps in our application for geo-spatial analysis and master data management. It also enabled us to push reports to end-users with fewer manual steps using the AWS Simple Workflow toolkit. In fact, the MDM system we chose ran on a managed service version of Oracle in the AWS environment.

The move to Redshift also significantly improved dashboard query performance. We ran a test on our current hardware vs. a two node Redshift cluster that cost a quarter as much on a monthly basis before building any aggregates. Redshift performed ~200% faster than the traditional SQL Server we had been using in the past.

Redshift_Blog_Chart1

Given the ease by which you can spin up additional nodes, we could easily test the cost/benefit tradeoffs by simply adding additional processing capacity when needed vs. building additional aggregates, which took time and lots of labor to build and complete. We also set a strategy only to buy the capacity we needed for the processing demand we had at that time.

We learned a few things as well. Due to its parallel nature, Redshift likes to consume files for load in smaller bites than SQL server. We needed to parallelize the process of breaking up the source files to ensure the loading process did not become a bottleneck.

We also gained a big advantage in scalability. For many of the pre-processing steps, we needed processing capacity only intermittently. So, we used that capacity only when needed. For the Redshift DB itself, we developed an architecture that would scale up additional nodes only when the demand was there.

All of these changes enabled us to cut our baseline software and hosting costs in half, while improving our flexibility, performance and scalability. Most importantly, we have set up the capabilities we needed to dramatically grow our business while staying focused on our core business value.