Choose Index below for a list of all words and phrases defined in this glossary.
Apache Hive - Apache Hive is an open-source data warehouse system for querying and analyzing large datasets stored in Hadoop files. Hadoop is a framework for handling large datasets in a distributed computing environment.
Hive has three main functions: data summarization, query and analysis. It supports queries expressed in a language called HiveQL, which automatically translates SQL-like queries into MapReduce jobs executed on Hadoop. In addition, HiveQL supports custom MapReduce scripts to be plugged into queries. Hive also enables data serialization/deserialization and increases flexibility in schema design by including a system catalog called Hive-Metastore.
According to the Apache Hive wiki, "Hive is not designed for OLTP workloads and does not offer real-time queries or row-level updates. It is best used for batch jobs over large sets of append-only data (like web logs)."
Hive supports text files (also called flat files), SequenceFiles (flat files consisting of binary key/value pairs) and RCFiles (Record Columnar Files which store columns of a table in a columnar database way.)
Related glossary terms: In-Memory Data Grid (IMDG), Apache HBase
[Category=Data Management ]
Source: WhatIs.com, 05 July 2013 09:13:37, http://whatis.techtarget.com/glossary/Data-and-Data-Management
These advertisers support this free service
Hive - A SQL-like query and data warehouse engine.
[Category=Big Data ]
Source: DataInformed, 31 October 2013 09:06:30, http://data-informed.com/glossary-of-big-data-terms/
Data Quality Glossary. A free resource from GRC Data Intelligence. For comments, questions or feedback: dqglossary@grcdi.nl