Programming Hive

By Edward Capriolo, Dean Wampler

Need to maneuver a relational database program to Hadoop? This entire advisor introduces you to Apache Hive, Hadoop’s info warehouse infrastructure. You’ll quick how to use Hive’s SQL dialect—HiveQL—to summarize, question, and research huge datasets kept in Hadoop’s disbursed filesystem.

This example-driven consultant indicates you ways to establish and configure Hive on your surroundings, presents an in depth evaluation of Hadoop and MapReduce, and demonstrates how Hive works in the Hadoop environment. You’ll additionally locate real-world case experiences that describe how businesses have used Hive to resolve detailed difficulties regarding petabytes of data.

  • Use Hive to create, regulate, and drop databases, tables, perspectives, capabilities, and indexes
  • Customize info codecs and garage ideas, from documents to exterior databases
  • Load and extract facts from tables—and use queries, grouping, filtering, becoming a member of, and different traditional question methods
  • Gain most sensible practices for developing person outlined capabilities (UDFs)
  • Learn Hive styles you can use and anti-patterns you might want to avoid
  • Integrate Hive with different information processing programs
  • Use garage handlers for NoSQL databases and different datastores
  • Learn the professionals and cons of working Hive on Amazon’s Elastic MapReduce

Show description

Preview of Programming Hive PDF

Best Computers books

UML: A Beginner's Guide

Crucial abilities for first-time programmers! This easy-to-use e-book explains the basics of UML. you will discover ways to learn, draw, and use this visible modeling language to create transparent and potent blueprints for software program improvement initiatives. The modular strategy of this series--including drills, pattern initiatives, and mastery checks--makes it effortless to profit to take advantage of this strong programming language at your individual velocity.

The Linux Programmer's Toolbox

Grasp the Linux instruments that may Make You a extra efficient, potent Programmer The Linux Programmer's Toolbox is helping you faucet into the huge choice of open resource instruments on hand for GNU/Linux. writer John Fusco systematically describes the main precious instruments on hand on so much GNU/Linux distributions utilizing concise examples that you should simply adjust to fulfill your wishes.

Advanced Visual Basic 2010 (5th Edition)

Within the 5th version, complex visible easy 2010 is helping those people who are conversant in the basics of visible easy 2010 programming harness its energy for extra complicated makes use of. assurance of subtle instruments and strategies utilized in the at the present time contain a number of database, ASP. web, LINQ, WPF and internet prone issues.

Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference (Addison-Wesley Data & Analytics)

Grasp Bayesian Inference via sensible Examples and Computation–Without complex Mathematical research   Bayesian tools of inference are deeply average and intensely strong. although, so much discussions of Bayesian inference depend upon intensely complicated mathematical analyses and synthetic examples, making it inaccessible to a person with out a robust mathematical heritage.

Extra resources for Programming Hive

Show sample text content

Additionally, the characteristic is comparatively new, so it doesn’t have loads of strategies but. in spite of the fact that, the indexing strategy is designed to be customizable with plug-in Java code, so groups can expand the function to satisfy their wishes. Indexing is additionally a great replacement to partitioning whilst the logical walls would really be too a number of and small to be worthy. Indexing can reduction in pruning a few blocks from a desk as enter for a MapReduce activity. now not all queries can reap the benefits of an index — the clarify syntax and Hive can be utilized to figure out if a given question is aided by way of an index. Indexes in Hive, like these in relational databases, have to be evaluated rigorously. retaining an index calls for additional disk area and development an index has a processing fee. The person needs to weigh those expenditures opposed to the advantages they provide whilst querying a desk. developing an Index Let’s create an index for our controlled, partitioned staff desk we defined in Partitioned, controlled Tables. here's the desk definition we used formerly, for reference: CREATE desk staff ( identify STRING, wage flow, subordinates ARRAY, deductions MAP, handle STRUCT ) PARTITIONED through (country STRING, nation STRING); Let’s index at the kingdom partition merely: CREATE INDEX employees_index ON desk staff (country) AS 'org. apache. hadoop. hive. ql. index. compact. CompactIndexHandler' WITH DEFERRED REBUILD IDXPROPERTIES ('creator = 'me', 'created_at' = 'some_time') IN desk employees_index_table PARTITIONED by way of (country, identify) remark 'Employees listed by means of nation and identify. '; subsequently, we didn't partition the index desk to an identical point of granularity because the unique desk. shall we decide to accomplish that. If we passed over the PARTITIONED by way of clause thoroughly, the index might span all walls of the unique desk. The AS ... clause specifies the index handler, a Java category that implements indexing. Hive ships with a number of consultant implementations; the CompactIndexHandler proven used to be within the first free up of this selection. Third-party implementations can optimize definite situations, aid particular dossier codecs, and extra. We’ll offer additional information on enforcing your personal index handler in imposing a customized Index Handler. We’ll speak about the which means of WITH DEFERRED REBUILD within the subsequent part. It’s no longer a demand for the index handler to avoid wasting its facts in a brand new desk, but when it does, the IN desk ... clause is used. It helps the various techniques on hand whilst developing different tables. particularly, the instance doesn’t use the not obligatory ROW structure, saved AS, kept by means of, situation, and TBLPROPERTIES clauses that we mentioned in bankruptcy four. All would seem ahead of the ultimate remark clause proven. presently, indexing exterior tables and perspectives is supported aside from facts dwelling in S3. Bitmap Indexes Hive v0. eight. zero provides a integrated bitmap index handler. Bitmap indexes are familiar for columns with few particular values. here's our prior instance rewritten to exploit the bitmap index handler: CREATE INDEX employees_index ON desk staff (country) AS 'BITMAP' WITH DEFERRED REBUILD IDXPROPERTIES ('creator = 'me', 'created_at' = 'some_time') IN desk employees_index_table PARTITIONED via (country, identify) remark 'Employees listed by way of state and identify.

Download PDF sample

Rated 4.97 of 5 – based on 11 votes