http://innovating-technology.blogspot.com/2013/04/mysql-hadoop-applier-part-2.html
This is a follow up post, describing the implementation details of Hadoop Applier, and steps to configure and install it. Hadoop Applier integrates MySQL with Hadoop providing the real-time replication of INSERTs to HDFS, and hence can be consumed by the data stores working on top of Hadoop. You can know more about the design rationale and per-requisites in theprevious post.
Design and Implementation:
Hadoop Applier replicates rows inserted into a table in MySQL to the Hadoop Distributed File System(HDFS). It uses an API provided by libhdfs, a C library to manipulate files in HDFS.
-
fetches the row insert events occurring on the master
-
decodes these events, extracts data inserted into each field of the row
-
uses content handlers to get it in the format required and appends it to a text file in HDFS.
Schema equivalence is a simple mapping:
Databases are mapped as separate directories, with tables in them as sub-directories. Data inserted into each table is written into text files (named as datafile1.txt) in HDFS. Data can be in comma separated format; or any other delimiter can be used, that is configurable by command line arguments.
The file in which the data is stored is named datafile1.txt here; you can name is anything you want. The working directory where this datafile goes is base_dir/db_name.db/tb_name.
- Connect to the MySQL master using the interfaces to the binary log
#include “binlog_api.h”
- Register content handlers
/*
Table_index is a sub class of Content_handler class in the Binlog API
*/
- Start an event loop and wait for the events to occur on the master
while (true)
class Applier : public mysql::Content_handler
- Each row event contains multiple rows and fields.
Iterate one row at a time using Row_iterator.
mysql::Row_event_set rows(rev, ti_it->second);
- Get the field data separated by field delimiters and row delimiters.
mysql::Row_of_fields::const_iterator field_it= fields.begin();
HdfsFS m_fs= hdfsConnect(host.c_str(), port);
- Create the directory structure in HDFS.
hdfsSetWorkingDirectory(m_fs, (stream_dir_path.str()).c_str());
writeFile= hdfsOpenFile(m_fs, write_path, O_WRONLY|O_APPEND, 0, 0, 0);
Follow these steps to install and run the Applier:
--- a/cmake_modules/FindHDFS.cmake
+++ b/cmake_modules/FindHDFS.cmake
@@ -11,6 +11,7 @@ exec_program(hadoop ARGS version OUTPUT_VARIABLE
Hadoop_VERSION
# currently only looking in HADOOP_HOME
find_path(HDFS_INCLUDE_DIR hdfs.h PATHS
$ENV{HADOOP_HOME}/include/
+ $ENV{HADOOP_HOME}/src/c++/libhdfs/
# make sure we don't accidentally pick up a different version
NO_DEFAULT_PATH
)
@@ -26,9 +27,9 @@ endif()
message(STATUS "Architecture: ${arch_hint}")
if ("${arch_hint}" STREQUAL "x64")
- set(HDFS_LIB_PATHS $ENV{HADOOP_HOME}/lib/native)
+ set(HDFS_LIB_PATHS $ENV{HADOOP_HOME}/c++/Linux-amd64-64/lib)
else ()
- set(HDFS_LIB_PATHS $ENV{HADOOP_HOME}/lib/native)
+ set(HDFS_LIB_PATHS $ENV{HADOOP_HOME}/c++/Linux-i386-32/lib)
endif ()
message(STATUS "HDFS_LIB_PATHS: ${HDFS_LIB_PATHS}")
-
Download a copy of Hadoop Applier from http://labs.mysql.com.
-
'mysqlclient' library is required to be installed in the default library paths. You can either download and install it (you can get a copy here), or set the environment variable $MYSQL_DIR to point to the parent directory of MySQL source code. Make sure to run cmake on MySQL source directory.$export MYSQL_DIR=/usr/local/mysql-5.6
-
Run the 'cmake' command on the parent directory of the Hadoop Applier source. This will generate the necessary Makefiles. Make sure to set cmake option ENABLE_DOWNLOADS=1; which will install Google Test required to run the unit tests.$cmake . -DENABLE_DOWNLOADS=1
-
Run 'make' and 'make install' to build and install. This will install the library 'libreplication' which is to be used by Hadoop Applier.
$export PATH=$HADOOP_HOME/bin:$PATH
Options | Use |
-r, --field-delimiter=DELIM
Use DELIM instead of ctrl-A for field delimiter. DELIM can be a string or an ASCII value in the format '\nnn' .
Escape sequences are not allowed.
|
Provide the string by which the fields in a row will be separated. By default, it is set to ctrl-A |
-w, --row-delimiter=DELIM
Use DELIM instead of LINE FEED for row delimiter . DELIM can be a string or an ASCII value in the format '\nnn'
Escape sequences are not allowed.
|
Provide the string by which the rows of a table will be separated. By default, it is set to LINE FEED (\n) |
-d, --databases=DB_LIST
DB-LIST is made up of one database name, or many names separated by commas.
Each database name can be optionally followed by table names.
The table names must follow the database name, separated by HYPHENS
|
Import entries for some databases, optionally include only specified tables. |
-f, --fields=LIST
Similar to cut command, LIST is made up of one range, or many ranges separated by commas.Each range is one of:
N N'th byte, character or field, counted from 1
N- from N'th byte, character or field, to end of line
N-M from N'th to M'th (included) byte,character or field
-M from first to M'th (included) byte, character or field
|
Import entries for some fields only in a table
|
-h, --help | Display help |
SQL Query | Hive Query |
CREATE TABLE t (i INT); |
CREATE TABLE t ( time_stamp INT, i INT)
[ROW FORMAT DELIMITED]
STORED AS TEXTFILE;
|
Now, when any row is inserted into table on MySQL databases, a corresponding entry is made in the Hive tables. Watch the demo to get a better understanding.
The demo is non audio, and is meant to be followed in conjunction with the blog.You can also create an external table in hive and load data into the tables; its your choice!
相关推荐
It will cover real-time use case scenarios to explain integration and achieving Big Data solutions using different technologies such as Apache Hadoop, Apache Sqoop, and MySQL Applier. The book will ...
MySQL Kafka应用程序 用于kafka的mysql realtime-binlog 要求 MySQL Binlog事件1.0.0 librdkafka MySQL 5.7.X(二进制和源代码) 安装 跑步
库伯应用程序 kube-applier是一项服务,可通过将声明性配置文件从Git存储库应用到Kubernetes集群,从而实现Kubernetes对象的连续部署。 kube-applier在您的集群中作为Pod运行,并监视以确保集群对象及其存储库中的...
Feature Change Applier是将系统声音更改规则应用于输入词典的工具。 特征: 基于功能的电话定义基于功能的声音更改规则支持多字符电话支持多个规则集的比较运行什么是LATL?LATL是一种针对JavaScript的编译语言,...
使用Java反射合并“源”和“目标”对象的小型库。 可以在需要部分更新对象的情况下使用。 实际用例:REST API PATCH方法 好处: 易于使用 没有第三方依赖性 很小 如何使用: 看 ... ObjectMerger.mergerOf(source,...
MGR(Mysql Group Replication)是5.7版本新加的特性,是一个MySQL插件。 MGR 是一个新的高可用与高扩展的方案,集群中的任何节点数据都是一样的,可以实现任何节点都可以写入,实现了真正意义上的多主。 主要包含...
核心补丁 通用补丁程序生成器和应用程序,例如使用BsDiff / BsPatch和Google存档补丁程序。... compile "io.github.lizhangqu:corepatch-core-applier:1.0.4" } 专家 //for generator <group
yugong 是阿里巴巴推出的去Oracle数据迁移同步工具(全量 增量,目标支持MySQL/DRDS)。2008年左右,阿里巴巴开始尝试MySQL的... applier (将数据更新到目标库,可分为全量/增量/对比的实现) 标签:阿里巴巴
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,亲测可用
官方离线安装包,亲测可用
官方离线安装包,亲测可用