Columnar Databases

  • Columnar Storage – an explanation of what columnar storage is and when you might want it.
  • Actian Vector – column-oriented analytic database.
  • C-Store – column oriented DBMS.
  • MonetDB – column store database.
  • Parquet – columnar storage format for Hadoop.
  • Pivotal Greenplum – purpose-built, dedicated analytic data warehouse that offers a columnar engine as well as a traditional row-based one.
  • Vertica – is designed to manage large, fast-growing volumes of data and provide very fast query performance when used for data warehouses.
  • Google BigQuery Google’s cloud offering backed by their pioneering work on Dremel.
  • Amazon Redshift Amazon’s cloud offering, also based on a columnar datastore backend.

Graph Data Model

  • Apache Giraph – implementation of Pregel, based on Hadoop.
  • Apache Spark Bagel – implementation of Pregel, part of Spark.
  • ArangoDB – multi model distributed database.
  • Facebook TAO – TAO is the distributed data store that is widely used at facebook to store and serve the social graph.
  • GCHQ Gaffer – Gaffer by GCHQ is a framework that makes it easy to store large-scale graphs in which the nodes and edges have statistics.
  • Google Cayley – open-source graph database.
  • Google Pregel – graph processing framework.
  • GraphLab PowerGraph – a core C++ GraphLab API and a collection of high-performance machine learning and data mining toolkits built on top of the GraphLab API.
  • GraphX – resilient Distributed Graph System on Spark.
  • Gremlin – graph traversal Language.
  • Infovore – RDF-centric Map/Reduce framework.
  • Intel GraphBuilder – tools to construct large-scale graphs on top of Hadoop.
  • MapGraph – Massively Parallel Graph processing on GPUs.
  • Neo4j – graph database writting entirely in Java.
  • OrientDB – document and graph database.
  • Phoebus – framework for large scale graph processing.
  • Titan – distributed graph database, built over Cassandra.
  • Twitter FlockDB – distributed graph database.

Key-value Data Model

  • Aerospike – NoSQL flash-optimized, in-memory. Open source and “Server code in ‘C’ (not Java or Erlang) precisely tuned to avoid context switching and memory copies.”
  • Amazon DynamoDB – distributed key/value store, implementation of Dynamo paper.
  • Edis – is a protocol-compatible Server replacement for Redis.
  • ElephantDB – Distributed database specialized in exporting data from Hadoop.
  • EventStore – distributed time series database.
  • LinkedIn Krati – is a simple persistent data store with very low latency and high throughput.
  • Linkedin Voldemort – distributed key/value storage system.
  • Oracle NoSQL Database – distributed key-value database by Oracle Corporation.
  • Redis – in memory key value datastore.
  • Riak – a decentralized datastore.
  • Storehaus – library to work with asynchronous key value stores, by Twitter.
  • Tarantool – an efficient NoSQL database and a Lua application server.
  • TreodeDB – key-value store that’s replicated and sharded and provides atomic multirow writes.

Key Map Data Model

  • Apache Accumulo – distributed key/value store, built on Hadoop.
  • Apache Cassandra – column-oriented distributed datastore, inspired by BigTable.
  • Apache HBase – column-oriented distributed datastore, inspired by BigTable.
  • Facebook HydraBase – evolution of HBase made by Facebook.
  • Google BigTable – column-oriented distributed datastore.
  • Google Cloud Datastore – is a fully managed, schemaless database for storing non-relational data over BigTable.
  • Hypertable – column-oriented distributed datastore, inspired by BigTable.
  • InfiniDB – is accessed through a MySQL interface and use massive parallel processing to parallelize queries.
  • Tephra – Transactions for HBase.
  • Twitter Manhattan – real-time, multi-tenant distributed database for Twitter scale.

Document Data Model

  • Actian Versant – commercial object-oriented database management systems .
  • Crate Data – is an open source massively scalable data store. It requires zero administration.
  • Facebook Apollo – Facebook’s Paxos-like NoSQL database.
  • jumboDB – document oriented datastore over Hadoop.
  • LinkedIn Espresso – horizontally scalable document-oriented NoSQL data store.
  • MarkLogic – Schema-agnostic Enterprise NoSQL database technology.
  • MongoDB – Document-oriented database system.
  • RavenDB – A transactional, open-source Document Database.
  • RethinkDB – document database that supports queries like table joins and group by.

Distributed Filesystem

Distributed Programming

  • AddThis Hydra – distributed data processing and storage system originally developed at AddThis.
  • AMPLab SIMR – run Spark on Hadoop MapReduce v1.
  • Apache Crunch – a simple Java API for tasks like joining and data aggregation that are tedious to implement on plain MapReduce.
  • Apache DataFu – collection of user-defined functions for Hadoop and Pig developed by LinkedIn.
  • Apache Flink – high-performance runtime, and automatic program optimization.
  • Apache Gora – framework for in-memory data model and persistence.
  • Apache Hama – BSP (Bulk Synchronous Parallel) computing framework.
  • Apache MapReduce – programming model for processing large data sets with a parallel, distributed algorithm on a cluster.
  • Apache Pig – high level language to express data analysis programs for Hadoop.
  • Apache REEF – retainable evaluator execution framework to simplify and unify the lower layers of big data systems.
  • Apache S4 – framework for stream processing, implementation of S4.
  • Apache Spark – framework for in-memory cluster computing.
  • Apache Spark Streaming – framework for stream processing, part of Spark.
  • Apache Storm – framework for stream processing by Twitter also on YARN.
  • Apache Samza – stream processing framework, based on Kafka and YARN.
  • Apache Tez – application framework for executing a complex DAG (directed acyclic graph) of tasks, built on YARN.
  • Apache Twill – abstraction over YARN that reduces the complexity of developing distributed applications.
  • Cascalog – data processing and querying library.
  • Cheetah – High Performance, Custom Data Warehouse on Top of MapReduce.
  • Concurrent Cascading – framework for data management/analytics on Hadoop.
  • Damballa Parkour – MapReduce library for Clojure.
  • Datasalt Pangool – alternative MapReduce paradigm.
  • DataTorrent StrAM – real-time engine is designed to enable distributed, asynchronous, real time in-memory big-data computations in as unblocked a way as possible, with minimal overhead and impact on performance.
  • Facebook Corona – Hadoop enhancement which removes single point of failure.
  • Facebook Peregrine – Map Reduce framework.
  • Facebook Scuba – distributed in-memory datastore.
  • Google Dataflow – create data pipelines to help themæingest, transform and analyze data.
  • Google MapReduce – map reduce framework.
  • Google MillWheel – fault tolerant stream processing framework.
  • JAQL – declarative programming language for working with structured, semi-structured and unstructured data.
  • Kite – is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem.
  • Metamarkets Druid – framework for real-time analysis of large datasets.
  • Netflix PigPen – map-reduce for Clojure whiche compiles to Apache Pig.
  • Nokia Disco – MapReduce framework developed by Nokia.
  • Pinterest Pinlater – asynchronous job execution system.
  • Pydoop – Python MapReduce and HDFS API for Hadoop.
  • Rackerlabs Blueflood – multi-tenant distributed metric processing system
  • Stratosphere – general purpose cluster computing framework.
  • Streamdrill – usefull for counting activities of event streams over different time windows and finding the most active one.
  • Tuktu – Easy-to-use platform for batch and streaming computation, built using Scala, Akka and Play!
  • Twitter Scalding – Scala library for Map Reduce jobs, built on Cascading.
  • Twitter Summingbird – Streaming MapReduce with Scalding and Storm, by Twitter.
  • Twitter TSAR – TimeSeries AggregatoR by Twitter.