http://innovating-technology.blogspot.com/2013/04/mysql-hadoop-applier-part-1.html
MySQL replication enables data to be replicated from one MySQL database server (the master) to one or more MySQL database servers (the slaves). However, imagine the number of use cases being served if the slave (to which data is replicated) isn't restricted to be a MySQL server; but it can be any other database server or platform with replication events applied in real-time!
This is what the new Hadoop Applier empowers you to do.
An example of such a slave could be a data warehouse system such asApache Hive, which uses HDFS as a data store. If you have a Hive metastore associated with HDFS(Hadoop Distributed File System), theHadoop Applier can populate Hive tables in real time. Data is exported from MySQL to text files in HDFS, and therefore, into Hive tables. It is as simple as running a 'CREATE TABLE' HiveQL on Hive, to define the table structure similar to that on MySQL (and yes, you can use any row and column delimiters you want); and then run Hadoop Applier to start real time data replication.
The motivation to develop the Hadoop Applier is that currently, there is no tool available to perform this real time transfer. Existing solutions to import data into HDFS include Apache Sqoop which is well proven and enables batch transfers , but as a result requires re-import from time to time, to keep the data updated. It reads the source MySQL database via a JDBC connector or a fastpath connector, and performs a bulk data transfer, which can create an overhead on your operational systems, making other queries slow. Consider a case where there are only a few changes of the database compared to the size of the data, Sqoop might take too long to load the data.
On the other hand, Hadoop Applier reads from a binary log and inserts data in real time, applying the events as they happen on the MySQL server; therefore other queries can continue to execute without effect on their speed. No bulk transfers required! Hadoop Applier takes only the changes and insert them, which is a lot faster.
Hadoop Applier can thus be a solution when you need to rapidly acquire new data from MySQL for real-time processing within Hadoop.
Introducing The Applier:
It is a method which replicates events from the MySQL binary log to provide real time integration of MySQL with Hadoop and related frameworks which work on top of HDFS. There are many use cases for the integration of unstructured data stored in Apache Hadoop and structured data from relational databases such as MySQL.
Hadoop Applier provides real time connectivity between MySQL andHadoop/HDFS(Hadoop Distributed File System); which can be used for big data analytics: for purposes like sentiment analysis, marketing campaign analysis, customer churn modeling, fraud detection, risk modelling and many more. You can read more about the role of Hadoop Applier in Big data in the blog by Mat Keep. Many widely used systems, such as Apache Hive, use HDFS as a data store.
The diagram below represents the integration:
Replication via Hadoop Applier happens by reading binary log events , and writing them into a file in HDFS(Hadoop Distributed File System) as soon as they happen on MySQL master. “Events” describe database changes such as table creation operations or changes to table data.
As soon as an Insert query is fired on MySQL master, it is passed to the Hadoop Applier. This data is then written into a text file in HDFS. Once data is in HDFS files; other Hadoop ecosystem platforms and databases can consume this data for their own application.
Hadoop Applier can be downloaded from http://labs.mysql.com/
Prerequisites:
These are the packages you require in order to run Hadoop Applier on your machine:
These are the packages you require in order to run Hadoop Applier on your machine:
- Hadoop Applier package from http://labs.mysql.com
- Hadoop 1.0.4 ( that is what I used for the demo in the next post)
- Java version 6 or later (since hadoop is written in Java)
- libhdfs (it comes precompiled with Hadoop distros,
${HADOOP_HOME}/libhdfs/libhdfs.so)
${HADOOP_HOME}/libhdfs/libhdfs.so)
- cmake 2.6 or greater
- libmysqlclient 5.6
- gcc 4.6.3
- MySQL Server 5.6
-FindHDFS.cmake (cmake file to find libhdfs library while compiling. You can get a copy online)
-FindJNI.cmake (optional, check if you already have one:
$locate FindJNI.cmake)
-FindJNI.cmake (optional, check if you already have one:
$locate FindJNI.cmake)
To use the Hadoop Applier with Hive, you will also need to install Hive , which you can download here.
Please use the comments section of this blog to share your opinion on Hadoop Applier, and let us know more about your requirements.
相关推荐
It will cover real-time use case scenarios to explain integration and achieving Big Data solutions using different technologies such as Apache Hadoop, Apache Sqoop, and MySQL Applier. The book will ...
MySQL Kafka应用程序 用于kafka的mysql realtime-binlog 要求 MySQL Binlog事件1.0.0 librdkafka MySQL 5.7.X(二进制和源代码) 安装 跑步
库伯应用程序 kube-applier是一项服务,可通过将声明性配置文件从Git存储库应用到Kubernetes集群,从而实现Kubernetes对象的连续部署。 kube-applier在您的集群中作为Pod运行,并监视以确保集群对象及其存储库中的...
Feature Change Applier是将系统声音更改规则应用于输入词典的工具。 特征: 基于功能的电话定义基于功能的声音更改规则支持多字符电话支持多个规则集的比较运行什么是LATL?LATL是一种针对JavaScript的编译语言,...
MGR(Mysql Group Replication)是5.7版本新加的特性,是一个MySQL插件。 MGR 是一个新的高可用与高扩展的方案,集群中的任何节点数据都是一样的,可以实现任何节点都可以写入,实现了真正意义上的多主。 主要包含...
使用Java反射合并“源”和“目标”对象的小型库。...com.tarde.merger.DataProvider-目标对象数据结构 com.tarde.merger.ObjectMergerTest-用例 代码示例: ObjectMerger.mergerOf(source,target).merge();
2008年左右,阿里巴巴开始尝试MySQL的相关研究,并开发了基于MySQL分库分表技术的相关产品,Cobar/TDDL(目前为阿里云DRDS产品),解决了单机Oracle无法满足的扩展性问题,当时也掀起一股去IOE项目的浪潮,愚公这项目...
核心补丁 通用补丁程序生成器和应用程序,例如使用BsDiff / BsPatch和Google存档补丁程序。... compile "io.github.lizhangqu:corepatch-core-applier:1.0.4" } 专家 //for generator <group
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,亲测可用
官方离线安装包,亲测可用
官方离线安装包,亲测可用