Big Data

In market research, information is gathered, studied and interpreted by information specialists. As time advances, the information collected only increases. According to some studies, information doubles every one or so years. As technology seeps into being an unavoidable part of our lives, scientists have developed text analytics. It is a technology designed to search for responses to very specific questions in multiple databases all over the internet. Naturally, as information only carries on to increase in sum, a bigger, better and more dependable tool is required to manage trillions of gigabytes of information. Big data has tools which enable the information agent to search for responses to questions in an accumulation of databases which are huge in figures, and are overly complicated to analyze using traditional manners.

This new technology can mine through billions of gigabytes of information and search for priceless answers. It can examine information written in various programming languages, from various sites, or formats. Questions that big data responses are for how many percentage something is, just how many, and just how much, perhaps even how frequently. It goes deeper as to finding the responses to the WHYs, WHENs, and WHOs of market research workers. It picks up the thoughts of the clients presented in all kinds – social networks, key words entered in search engines, applications used or not and much more – all over the web, whether or not it was from the personal computer or a smartphone, it may sift through all of it, so letting sentiment analysis to be finished.

It can make unstructured data simpler to look at to extract useful information. It prepares them for the future and due to this new tool, many opportunities have blossomed. Big data companies will need individuals with the know how in maths, data sciences and programming to preserve and manage this mammoth technology and ensure it keeps up to its job. These new trends are possibilities which are found simply by the evaluation of the designs found through big data. This implies many entrepreneurs may be capable to find which business to venture into.

Data Visualization

  • Airpal – Web UI for PrestoDB.
  • Arbor – graph visualization library using web workers and jQuery.
  • Banana – visualize logs and time-stamped data stored in Solr. Port of Kibana.
  • Bokeh – A powerful Python interactive visualization library that targets modern web browsers for presentation, with the goal of providing elegant, concise construction of novel graphics in the style of D3.js, but also delivering this capability with high-performance interactivity over very large or streaming datasets.
  • C3 – D3-based reusable chart library
  • CartoDB – open-source or freemium hosting for geospatial databases with powerful front-end editing capabilities and a robust API.
  • Chart.js – open source HTML5 Charts visualizations.
  • Chartist.js – another open source HTML5 Charts visualization.
  • Crossfilter – JavaScript library for exploring large multivariate datasets in the browser. Works well with dc.js and d3.js.
  • Cubism – JavaScript library for time series visualization.
  • Cytoscape – JavaScript library for visualizing complex networks.
  • DC.js – Dimensional charting built to work natively with crossfilter rendered using d3.js. Excellent for connecting charts/additional metadata to hover events in D3.
  • D3 – javaScript library for manipulating documents.
  • D3.compose – Compose complex, data-driven visualizations from reusable charts and components.
  • D3Plus – A fairly robust set of reusable charts and styles for d3.js.
  • Echarts – Baidus enterprise charts.
  • Envisionjs – dynamic HTML5 visualization.
  • FnordMetric – write SQL queries that return SVG charts rather than tables
  • Freeboard – pen source real-time dashboard builder for IOT and other web mashups.
  • Gephi – An award-winning open-source platform for visualizing and manipulating large graphs and network connections. It’s like Photoshop, but for graphs. Available for Windows and Mac OS X.
  • Google Charts – simple charting API.
  • Grafana – graphite dashboard frontend, editor and graph composer.
  • Graphite – scalable Realtime Graphing.
  • Highcharts – simple and flexible charting API.
  • IPython – provides a rich architecture for interactive computing.
  • Kibana – visualize logs and time-stamped data
  • Matplotlib – plotting with Python.
  • Metricsgraphic.js – a library built on top of D3 that is optimized for time-series data
  • NVD3 – chart components for d3.js.
  • Peity – Progressive SVG bar, line and pie charts.
  • Plot.ly – Easy-to-use web service that allows for rapid creation of complex charts, from heatmaps to histograms. Upload data to create and style charts with Plotly’s online spreadsheet. Fork others’ plots.
  • Plotly.js The open source javascript graphing library that powers plotly.
  • Recline – simple but powerful library for building data applications in pure Javascript and HTML.
  • Redash – open-source platform to query and visualize data.
  • Sigma.js – JavaScript library dedicated to graph drawing.
  • Vega – a visualization grammar.
  • Zeppelin – a notebook-style collaborative data analysis.
  • Zing Charts – JavaScript charting library for big data.

Business Intelligence

  • BIME Analytics – business intelligence platform in the cloud.
  • Chartio – lean business intelligence platform to visualize and explore your data.
  • datapine – self-service business intelligence tool in the cloud.
  • Jaspersoft – powerful business intelligence suite.
  • Jedox Palo – customisable Business Intelligence platform.
  • Microsoft – business intelligence software and platform.
  • Microstrategy – software platforms for business intelligence, mobile intelligence, and network applications.
  • Pentaho – business intelligence platform.
  • Qlik – business intelligence and analytics platform.
  • Saiku – open source analytics platform.
  • SpagoBI – open source business intelligence platform.
  • Tableau – business intelligence platform.
  • Zoomdata – Big Data Analytics.
  • Jethrodata – Interactive Big Data Analytics.

Embedded Databases

  • Actian PSQL – ACID-compliant DBMS developed by Pervasive Software, optimized for embedding in applications.
  • BerkeleyDB – a software library that provides a high-performance embedded database for key/value data.
  • HanoiDB – Erlang LSM BTree Storage.
  • LevelDB – a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
  • LMDB – ultra-fast, ultra-compact key-value embedded data store developed by Symas.
  • RocksDB – embeddable persistent key-value store for fast storage based on LevelDB.

PostgreSQL forks and evolutions

  • HadoopDB – hybrid of MapReduce and DBMS.
  • IBM Netezza – high-performance data warehouse appliances.
  • Postgres-XL – Scalable Open Source PostgreSQL-based Database Cluster.
  • RecDB – Open Source Recommendation Engine Built Entirely Inside PostgreSQL.
  • Stado – open source MPP database system solely targeted at data warehousing and data mart applications.
  • Yahoo Everest – multi-peta-byte database / MPP derived by PostgreSQL.

MySQL forks and evolutions

  • Amazon RDS – MySQL databases in Amazon’s cloud.
  • Drizzle – evolution of MySQL 6.0.
  • Google Cloud SQL – MySQL databases in Google’s cloud.
  • MariaDB – enhanced, drop-in replacement for MySQL.
  • MySQL Cluster – MySQL implementation using NDB Cluster storage engine.
  • Percona Server – enhanced, drop-in replacement for MySQL.
  • ProxySQL – High Performance Proxy for MySQL.
  • TokuDB – TokuDB is a storage engine for MySQL and MariaDB.
  • WebScaleSQL – is a collaboration among engineers from several companies that face similar challenges in running MySQL at scale.

Search engine and framework

  • Apache Lucene – Search engine library.
  • Apache Solr – Search platform for Apache Lucene.
  • ElasticSearch – Search and analytics engine based on Apache Lucene.
  • Enigma.io – Freemium robust web application for exploring, filtering, analyzing, searching and exporting massive datasets scraped from across the Web.
  • Facebook Unicorn – social graph search platform.
  • Google Caffeine – continuous indexing system.
  • Google Percolator – continuous indexing system.
  • TeraGoogle – large search index.
  • HBase Coprocessor – implementation of Percolator, part of HBase.
  • Lily HBase Indexer – quickly and easily search for any content stored in HBase.
  • LinkedIn Bobo – is a Faceted Search implementation written purely in Java, an extension to Apache Lucene.
  • LinkedIn Cleo – is a flexible software library for enabling rapid development of partial, out-of-order and real-time typeahead search.
  • LinkedIn Galene – search architecture at LinkedIn.
  • LinkedIn Zoie – is a realtime search/indexing system written in Java.
  • Sphinx Search Server – fulltext search engine.

Applications

  • Adobe spindle – Next-generation web analytics processing with Scala, Spark, and Parquet.
  • Apache Kiji – framework to collect and analyze data in real-time, based on HBase.
  • Apache Nutch – open source web crawler.
  • Apache OODT – capturing, processing and sharing of data for NASA’s scientific archives.
  • Apache Tika – content analysis toolkit.
  • Countly – open source mobile and web analytics platform, based on Node.js & MongoDB.
  • Domino – Run, scale, share, and deploy models — without any infrastructure.
  • Eclipse BIRT – Eclipse-based reporting system.
  • Eventhub – open source event analytics platform.
  • Hermes – asynchronous message broker built on top of Kafka.
  • HIPI Library – API for performing image processing tasks on Hadoop’s MapReduce.
  • Hunk – Splunk analytics for Hadoop.
  • Imhotep – Large scale analytics platform by indeed.
  • MADlib – data-processing library of an RDBMS to analyze data.
  • Kylin – open source Distributed Analytics Engine from eBay.
  • PivotalR – R on Pivotal HD / HAWQ and PostgreSQL.
  • Qubole – auto-scaling Hadoop cluster, built-in data connectors.
  • Sense – Cloud Platform for Data Science and Big Data Analytics.
  • Snowplow – enterprise-strength web and event analytics, powered by Hadoop, Kinesis, Redshift and Postgres.
  • SparkR – R frontend for Spark.
  • Splunk – analyzer for machine-generated data.
  • Sumo Logic – cloud based analyzer for machine-generated data.
  • Talend – unified open source environment for YARN, Hadoop, HBASE, Hive, HCatalog & Pig.
  • Warp – query by example tool for big data (OS X app)