Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Scale-Independent Relational Query Processing

Abstract

An increasingly common pattern is for newly-released web applications to succumb to a "Success Disaster". In this scenario, overloaded database machines and resultant high response times destroy a previously good user experience, just as a site is becoming popular. Unfortunately, the data independence provided by a traditional relational database system, while useful for agile development, only exacerbates the problem by hiding potentially expensive queries under simple declarative expressions. The disconnect between expression complexity and runtime cost often leads developers to mistrust the suitability of relational database systems for their web applications in the long term. As a result, developers of these applications are increasingly abandoning relational systems in favor of imperative code written against distributed "NoSQL" key/value stores, losing the many benefits of data independence in the process.

While some claim that scalability issues are inherent in the use of the relational model, this thesis challenges that notion by extending standard data independence with the notion of scale independence. In contrast to traditional relational databases, a scale-independent system is capable of providing predictable response time for all of the queries in an application, even as the amount of data grows by orders of magnitude. This predictability is achieved by compile-time enforcement of strict upper bounds on the number of operations that will be performed for all queries. Coupled with a service level objective (SLO) compliance prediction model and a scalable storage architecture, these upper bounds make it easy for developers to write {em success-tolerant} applications that support an arbitrarily large number of users while still providing acceptable performance.

Statically bounding the amount of work required to execute a query can be easy for some queries, such as those that perform a lookup of a single record by primary key. However, such simple queries are generally insufficient for the construction of complex, real-world applications. Therefore, to enable successful development of such applications, this thesis defines four levels of scale-independent execution that greatly expand the space of queries that can be guaranteed to be scalable a priori. Each scale-independent execution level leverages increasingly sophisticated techniques, ranging from extra cardinality information in the schema to incremental precomputation, to ensure that the performance of the application will not change as the amount of stored data grows. Furthermore, developers can use the levels to reason about the resource requirements of each query that is run by their application.

In addition to presenting the theory of scale independence, this thesis describes PIQL, a actual implementation of a scalable relational engine. Using the PIQL system, I present an empirical analysis of scale independence that includes all the queries from the TPC-W benchmark and validates PIQL's ability to maintain nearly constant high-quantile query and update latency, even as an application scales to hundreds of machines.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View