Data Visualization

  • Airpal – Web UI for PrestoDB.
  • Arbor – graph visualization library using web workers and jQuery.
  • Banana – visualize logs and time-stamped data stored in Solr. Port of Kibana.
  • Bokeh – A powerful Python interactive visualization library that targets modern web browsers for presentation, with the goal of providing elegant, concise construction of novel graphics in the style of D3.js, but also delivering this capability with high-performance interactivity over very large or streaming datasets.
  • C3 – D3-based reusable chart library
  • CartoDB – open-source or freemium hosting for geospatial databases with powerful front-end editing capabilities and a robust API.
  • Chart.js – open source HTML5 Charts visualizations.
  • Chartist.js – another open source HTML5 Charts visualization.
  • Crossfilter – JavaScript library for exploring large multivariate datasets in the browser. Works well with dc.js and d3.js.
  • Cubism – JavaScript library for time series visualization.
  • Cytoscape – JavaScript library for visualizing complex networks.
  • DC.js – Dimensional charting built to work natively with crossfilter rendered using d3.js. Excellent for connecting charts/additional metadata to hover events in D3.
  • D3 – javaScript library for manipulating documents.
  • D3.compose – Compose complex, data-driven visualizations from reusable charts and components.
  • D3Plus – A fairly robust set of reusable charts and styles for d3.js.
  • Echarts – Baidus enterprise charts.
  • Envisionjs – dynamic HTML5 visualization.
  • FnordMetric – write SQL queries that return SVG charts rather than tables
  • Freeboard – pen source real-time dashboard builder for IOT and other web mashups.
  • Gephi – An award-winning open-source platform for visualizing and manipulating large graphs and network connections. It’s like Photoshop, but for graphs. Available for Windows and Mac OS X.
  • Google Charts – simple charting API.
  • Grafana – graphite dashboard frontend, editor and graph composer.
  • Graphite – scalable Realtime Graphing.
  • Highcharts – simple and flexible charting API.
  • IPython – provides a rich architecture for interactive computing.
  • Kibana – visualize logs and time-stamped data
  • Matplotlib – plotting with Python.
  • Metricsgraphic.js – a library built on top of D3 that is optimized for time-series data
  • NVD3 – chart components for d3.js.
  • Peity – Progressive SVG bar, line and pie charts.
  • Plot.ly – Easy-to-use web service that allows for rapid creation of complex charts, from heatmaps to histograms. Upload data to create and style charts with Plotly’s online spreadsheet. Fork others’ plots.
  • Plotly.js The open source javascript graphing library that powers plotly.
  • Recline – simple but powerful library for building data applications in pure Javascript and HTML.
  • Redash – open-source platform to query and visualize data.
  • Sigma.js – JavaScript library dedicated to graph drawing.
  • Vega – a visualization grammar.
  • Zeppelin – a notebook-style collaborative data analysis.
  • Zing Charts – JavaScript charting library for big data.

Business Intelligence

  • BIME Analytics – business intelligence platform in the cloud.
  • Chartio – lean business intelligence platform to visualize and explore your data.
  • datapine – self-service business intelligence tool in the cloud.
  • Jaspersoft – powerful business intelligence suite.
  • Jedox Palo – customisable Business Intelligence platform.
  • Microsoft – business intelligence software and platform.
  • Microstrategy – software platforms for business intelligence, mobile intelligence, and network applications.
  • Pentaho – business intelligence platform.
  • Qlik – business intelligence and analytics platform.
  • Saiku – open source analytics platform.
  • SpagoBI – open source business intelligence platform.
  • Tableau – business intelligence platform.
  • Zoomdata – Big Data Analytics.
  • Jethrodata – Interactive Big Data Analytics.

Embedded Databases

  • Actian PSQL – ACID-compliant DBMS developed by Pervasive Software, optimized for embedding in applications.
  • BerkeleyDB – a software library that provides a high-performance embedded database for key/value data.
  • HanoiDB – Erlang LSM BTree Storage.
  • LevelDB – a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
  • LMDB – ultra-fast, ultra-compact key-value embedded data store developed by Symas.
  • RocksDB – embeddable persistent key-value store for fast storage based on LevelDB.

PostgreSQL forks and evolutions

  • HadoopDB – hybrid of MapReduce and DBMS.
  • IBM Netezza – high-performance data warehouse appliances.
  • Postgres-XL – Scalable Open Source PostgreSQL-based Database Cluster.
  • RecDB – Open Source Recommendation Engine Built Entirely Inside PostgreSQL.
  • Stado – open source MPP database system solely targeted at data warehousing and data mart applications.
  • Yahoo Everest – multi-peta-byte database / MPP derived by PostgreSQL.

MySQL forks and evolutions

  • Amazon RDS – MySQL databases in Amazon’s cloud.
  • Drizzle – evolution of MySQL 6.0.
  • Google Cloud SQL – MySQL databases in Google’s cloud.
  • MariaDB – enhanced, drop-in replacement for MySQL.
  • MySQL Cluster – MySQL implementation using NDB Cluster storage engine.
  • Percona Server – enhanced, drop-in replacement for MySQL.
  • ProxySQL – High Performance Proxy for MySQL.
  • TokuDB – TokuDB is a storage engine for MySQL and MariaDB.
  • WebScaleSQL – is a collaboration among engineers from several companies that face similar challenges in running MySQL at scale.

Search engine and framework

  • Apache Lucene – Search engine library.
  • Apache Solr – Search platform for Apache Lucene.
  • ElasticSearch – Search and analytics engine based on Apache Lucene.
  • Enigma.io – Freemium robust web application for exploring, filtering, analyzing, searching and exporting massive datasets scraped from across the Web.
  • Facebook Unicorn – social graph search platform.
  • Google Caffeine – continuous indexing system.
  • Google Percolator – continuous indexing system.
  • TeraGoogle – large search index.
  • HBase Coprocessor – implementation of Percolator, part of HBase.
  • Lily HBase Indexer – quickly and easily search for any content stored in HBase.
  • LinkedIn Bobo – is a Faceted Search implementation written purely in Java, an extension to Apache Lucene.
  • LinkedIn Cleo – is a flexible software library for enabling rapid development of partial, out-of-order and real-time typeahead search.
  • LinkedIn Galene – search architecture at LinkedIn.
  • LinkedIn Zoie – is a realtime search/indexing system written in Java.
  • Sphinx Search Server – fulltext search engine.

Applications

  • Adobe spindle – Next-generation web analytics processing with Scala, Spark, and Parquet.
  • Apache Kiji – framework to collect and analyze data in real-time, based on HBase.
  • Apache Nutch – open source web crawler.
  • Apache OODT – capturing, processing and sharing of data for NASA’s scientific archives.
  • Apache Tika – content analysis toolkit.
  • Countly – open source mobile and web analytics platform, based on Node.js & MongoDB.
  • Domino – Run, scale, share, and deploy models — without any infrastructure.
  • Eclipse BIRT – Eclipse-based reporting system.
  • Eventhub – open source event analytics platform.
  • Hermes – asynchronous message broker built on top of Kafka.
  • HIPI Library – API for performing image processing tasks on Hadoop’s MapReduce.
  • Hunk – Splunk analytics for Hadoop.
  • Imhotep – Large scale analytics platform by indeed.
  • MADlib – data-processing library of an RDBMS to analyze data.
  • Kylin – open source Distributed Analytics Engine from eBay.
  • PivotalR – R on Pivotal HD / HAWQ and PostgreSQL.
  • Qubole – auto-scaling Hadoop cluster, built-in data connectors.
  • Sense – Cloud Platform for Data Science and Big Data Analytics.
  • Snowplow – enterprise-strength web and event analytics, powered by Hadoop, Kinesis, Redshift and Postgres.
  • SparkR – R frontend for Spark.
  • Splunk – analyzer for machine-generated data.
  • Sumo Logic – cloud based analyzer for machine-generated data.
  • Talend – unified open source environment for YARN, Hadoop, HBASE, Hive, HCatalog & Pig.
  • Warp – query by example tool for big data (OS X app)

System Deployment

  • Apache Ambari – operational framework for Hadoop mangement.
  • Apache Bigtop – system deployment framework for the Hadoop ecosystem.
  • Apache Helix – cluster management framework.
  • Apache Mesos – cluster manager.
  • Apache Slider – is a YARN application to deploy existing distributed applications on YARN.
  • Apache Whirr – set of libraries for running cloud services.
  • Apache YARN – Cluster manager.
  • Brooklyn – library that simplifies application deployment and management.
  • Buildoop – Similar to Apache BigTop based on Groovy language.
  • Cloudera HUE – web application for interacting with Hadoop.
  • Facebook Prism – multi datacenters replication system.
  • Google Borg – job scheduling and monitoring system.
  • Google Omega – job scheduling and monitoring system.
  • Hortonworks HOYA – application that can deploy HBase cluster on YARN.
  • Marathon – Mesos framework for long-running services.