Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

Scalable Fault-Tolerant Elastic Data Ingestion in AsterixDB

Creative Commons 'BY-ND' version 4.0 license
Abstract

In this thesis, we develop the support for continuous data ingestion in AsterixDB, an open-source Big Data Management System (BDMS) that provides a platform for storage and analysis of large volumes of semi-structured data. Data feeds are a mechanism for having continuous data arrive into a BDMS from external sources and incrementally populate a persisted dataset and associated indexes. The need to persist and index "fast-flowing'' high-velocity data (and support ad hoc analytical queries) is ubiquitous. However, the state of the art today involves 'gluing' together different systems. AsterixDB is different in being a unified system with "native support'' for data ingestion.

We discuss the challenges and present the design and implementation of the concepts involved in modeling and managing data feeds in AsterixDB. AsterixDB allows the runtime behavior, allocation of resources, and the offered degree of robustness to be customized (by associating an ingestion policy) to suit the application(s) that wish to consume the ingested data. Results from experiments that evaluate the scalability and fault-tolerance of the AsterixDB data feeds facility are reported. We include an evaluation of the built-in ingestion policies and study their effect as well on throughput and latency. An evaluation and comparison with a 'glued' together system formed from popular engines - Storm (for streaming) and MongoDB (for persistence) - is also included.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View