flume.conf
a1.sinks.hbase-sink1.channel = ch1 a1.sinks.hbase-sink1.type = hbase a1.sinks.hbase-sink1.table = users a1.sinks.hbase-sink1.columnFamily= info a1.sinks.hbase-sink1.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer a1.sinks.hbase-sink1.serializer.regex=^(.+)\t(.+)\t(.+)$ a1.sinks.hbase-sink1.serializer.colNames=ROW_KEY,name,email a1.sinks.hbase-sink1.serializer.rowKeyIndex=0 a1.sinks.hbase-sink1.serializer.depositHeaders=true
Note
A:In order to using rowKey, you should configure rowKeyIndex=0 and colNames=ROW_KEY..... where in you post Josn data, rowkey must be the first filed.
B: If you want to put the headers info of your json post, you must set depositHeaders=true
a1.sources.http-source1.channels = ch1 a1.sources.http-source1.type = http a1.sources.http-source1.bind = 0.0.0.0 a1.sources.http-source1.port = 5140 a1.sources.http-source1.handler = org.apache.flume.source.http.JSONHandler
a1.channels = ch1 a1.sources = http-source1 a1.sinks = hbase-sink1
Hbase
#hbase shell
>create 'users' 'info'
Curl post json
curl -i -H 'content-type: application/json' -X POST -d '[{"headers":{"userId":"9","name":"ZhangZiYi","phoneNumber":"1522323222"}, "body":"9\tZhangZiYi\tzy@163.com"}]' http://192.168.10.204:5140
Hbase result
>scan 'users'
Note: the name column, which content comes from headers and boy of JSON, will just overwrite the same content. Acutally, you can specify different column names to save the same content in different cells.
References
http://flume.apache.org/FlumeUserGuide.html#hbasesinks
http://thunderheadxpler.blogspot.jp/2013/09/bigdata-apache-flume-hdfs-and-hbase.html
相关推荐
Flume和Hbase集成的sink包,修改这个包里的源码可以成功客制化Flume往Hbase中写数据的格式。
Log4j直接发送数据到Flume + Kafka (方式一) 通过flume收集系统日记, 收集的方式通常采用以下. 系统logs直接发送给flume系统, 本文主要记录种方式进行说明. 文章链接,请看:...
flume-customize支持flume落地高版本es支持flume落地hbase + elasticsearch自定义过滤器自定义sink
you’ll learn Flume’s rich set of features for collecting, aggregating, and writing large amounts of streaming data to the Hadoop Distributed File System (HDFS), Apache HBase, SolrCloud, Elastic ...
第1章 认识Apache Hadoop 和Apache HBase 第2章 用Apache Flume 处理流数据 第3章 源(Source) 第4章 Channel 第5章 Sink 第6章 拦截器、Channel 选择器、Sink 组和 第7章 发送数据到Flume* . 第8章 规划、部署和...
you’ll learn Flume’s rich set of features for collecting, aggregating, and writing large amounts of streaming data to the Hadoop Distributed File System (HDFS), Apache HBase, SolrCloud, Elastic ...
flume的包。flume是一个分布式、可靠、和高可用的海量日志采集、... Sink: 从Channel中读取并移除Event, 将Event传递到FlowPipeline中的下一个Agent(如果有的话)(Sink从Channel收集数据,运行在一个独立线程。)
sink:sink组件是⽤于从channel中取数据并送到⽬的地的组件,⽬的地包括hdfs、logger、avro、thrift、file、hbase等。 其实flume的使⽤就是编写配置⽂件,下⾯是使⽤flume将Nginx的⽇志对接kafka的配置⽂件,我们将...
147_使用hbasesink收集日志到hbase数据库 148_内存通道配置6 U/ X5 L3 ]7 b6 `5 x 149_source的通道选择器-复制策略-multiplexing 150_source的数据流程 151_sinkgroup的处理器-loadbalance- ^6 B0 j4 Z5 f9 d 152_...
Data Sink 窗口模型 状态管理与检查点机制 Standalone 集群部署 六、HBase 简介 系统架构及数据结构 基本环境搭建 集群环境搭建 常用 Shell 命令 Java API 过滤器详解 可显示字数有限,详细内容请看资源。
Finally, these applications can use out-of-the- box integrations with other systems such as Kafka, Flume, HBase, and Cassandra. All of these features have turned Spark Streaming into the Swiss Army ...
Flume 由三个局部组成:Source,Channel 和 Sink,对应于采集,缓存和保存三个环节。 其中,Source 组件用来采集各种类型的数据源,如 directory、 、kafka 等。Channel 组件用来缓存数据,有 memory channel,JDBC...