`
文章列表
flatten players = load 'baseball' as (name:chararray, team:chararray,position:bag{t:(p:chararray)}, bat:map[]);pos= foreach players generate name, flatten(position) as position;bypos= group pos by position;   Jorge Posada,New York Yankees,{(Catcher),(Designated_hitter)},... ==> Jorge Pos ...

ZooKeeper: Install

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.   Download from offical websit  http://zookeeper.apache.org/   #tar -zxf zookeeper-3.4.6.tar.gz #cd zookeeper-3.4.6     ----conf/zoo.cfg ...

HBase: Install

EVN: hadoop2.3.0  ubuntu12.04 64     hue3.5.0   pig0.12.0    hive0.12.0   oozie4.0.0    Install Download tarball form http://mirrors.cnnic.cn/apache/hbase/hbase-0.98.0/ #tar -xzvf hbase-0.98.0-hadoop2-bin.tar.gz #cd hbase-0.98.0-hadoop2 Configure Note:the following conf is suited to fully- ...
Relational Operations foreach foreach takes a set of expressions and applies them to every record in the data pipeline.   A = load 'input' as (user:chararray, id:long, address:chararray, phone:chararray,preferences:map[]);B = foreach A generate user, id;   prices = load 'NYSE_daily' as (exc ...
Relation and Field Pig Latin is a dataflow language. Each processing step results in a new data set, or relation. A = load 'NYSE_dividends' (exchange, symbol, date, dividends); //A is relation      exchange,symbol,date and dividends are all fields   Case Sensitivity Keywords in Pig Latin a ...

Pig: Data Model

    博客分类:
  • Pig
Data types   Nulls In Pig a null data element means the value is unknown.which is completely different from the concept of null in C, Java, Python, etc. Schemas dividends = load 'NYSE_dividends' as (exchange:chararray, symbol:chararray, date:chararray, dividend:float); dividends ...

Pig: Grunt Usage

    博客分类:
  • Pig
Grunt* is Pig’s interactive shell.   Start #pig -x local    //interacte with local file system #pig                //interacte with hadoop cluster   Exit grunt>quit; or CTRL+D   Note:Grunt provides command-line history and editing,as well as Tab completion. It does not provide file ...

Pig: Basic Usage

    博客分类:
  • Pig
Running Locally #pig -x local average_dividend.pig   Runnig  on Hadoop Cluster #pig -e fs -mkdir /user/username      //username is the name who run pig #pig -e fs -copyFromLocal NYSE_dividends  NYSE_dividends  //put test data to /user/username dir #pig average_dividend.pig #pig -e cat averag ...
DataBase hive>show databases; hive>show databases like 'h.*'; hive>describe database mydb; hive>describe database extended mydb;   hive>create database  mydb; hive>create database if not exists mydb; hive>create database mydb location '/my/prefered/direcotry'; hive&g ...
 HiveConf java class for current Hive configuration options   Metastore Conf All the metadata for Hive tables and partitions are stored in Hive Metastore. there are 3 different ways to setup metastore server using different Hive configurations: Embedded Metastore An embedded metastore is ma ...
Install hive 1. download hive-0.12.0.bin.tar.gz 2.#tar -xzvf hive-0.12.0.bin.tar.gz 3.add the bin dir  to PATH in  ~/.bashrc 4.#source ~/.bashrc   The dir structure of hive-0.12.0.bin likes the following:   lib/ : contains JARs., which implement a particular subset of Hive's functionali ...
EVN:  ubuntu 12.04/13.01   hadoop2.x.0   hue3.5.0    pig0.12.0  hive0.12.0  sqoop1.99.3           oozie4.0.0  hbase0.98.0   - Prepare evn ----------------- #sudo apt-get update #sudo apt-get install libxml2-dev #sudo apt-get install libxslt-dev #sudo apt-get install libsasl2-dev #sudo a ...
Primative types: TINYINT    SMALLINT  INT  BIGINT  BOOLEAN  FLOAT  DOUBLE STRING  TIMESTAMP  BINARY   Collection Data Types: Example: CREATE TABLE employees (name                      STRING,salary                      FLOAT,subordinates           ARRAY<STRING>,deductions              ...

Pig: Install and Rebuild

    博客分类:
  • Pig
ENV:  Hadoop2.3.0  pig0.12 Hadoop is runnig and pig grunt works well. but when load data and dump it to screen #actor = load '/test/actor' using PigStorage(',') as (id, name, addr, time); #dump actor; the error is : ackend error message during job submission----------------------------------- ...
Sqoop2 Install 1. install server  download the tarball form the official website  #tar -xzvf sqoop-1.99.3-bin-hadoop200.tar.gz  Assume that the server and client will install in the same host:192.168.122.1  configure  server  related configuration files in dir /path/to/sqoop-1.99.3-bin-hadoop ...
Global site tag (gtag.js) - Google Analytics