https://storm.apache.org/documentation/Trident-tutorial.html
The Trident data model is the TridentTuple which is a named list of values. During a topology, tuples are incrementally built up through a sequence of operations. Operations generally take in a set of input fields and emit a set of "function fields". The input fields are used to select a subset of the tuple as input to the operation, while the "function fields" name the fields the operation emits.
Consider this example. Suppose you have a stream called "stream" that contains the fields "x", "y", and "z". To run a filter MyFilter that takes in "y" as input, you would say:
stream.each(new Fields("y"), new MyFilter())
Suppose the implementation of MyFilter is this:
public class MyFilter extends BaseFilter {
public boolean isKeep(TridentTuple tuple) {
return tuple.getInteger(0) < 10;
}
}
This will keep all tuples whose "y" field is less than 10. The TridentTuple given as input to MyFilter will only contain the "y" field. Note that Trident is able to project a subset of a tuple extremely efficiently when selecting the input fields: the projection is essentially free.
Let's now look at how "function fields" work. Suppose you had this function:
public class AddAndMultiply extends BaseFunction {
public void execute(TridentTuple tuple, TridentCollector collector) {
int i1 = tuple.getInteger(0);
int i2 = tuple.getInteger(1);
collector.emit(new Values(i1 + i2, i1 * i2));
}
}
This function takes two numbers as input and emits two new values: the addition of the numbers and the multiplication of the numbers. Suppose you had a stream with the fields "x", "y", and "z". You would use this function like this:
stream.each(new Fields("x", "y"), new AddAndMultiply(), new Fields("added", "multiplied"));
The output of functions is additive: the fields are added to the input tuple. So the output of this each call would contain tuples with the five fields "x", "y", "z", "added", and "multiplied". "added" corresponds to the first value emitted by AddAndMultiply, while "multiplied" corresponds to the second value.
With aggregators, on the other hand, the function fields replace the input tuples. So if you had a stream containing the fields "val1" and "val2", and you did this:
stream.aggregate(new Fields("val2"), new Sum(), new Fields("sum"))
The output stream would only contain a single tuple with a single field called "sum", representing the sum of all "val2" fields in that batch.
With grouped streams, the output will contain the grouping fields followed by the fields emitted by the aggregator. For example:
stream.groupBy(new Fields("val1"))
.aggregate(new Fields("val2"), new Sum(), new Fields("sum"))
In this example, the output will contain the fields "val1" and "sum".
相关推荐
strom飞哥研究Strom大数据处理系统
斯特罗姆简单的流状态管理器。 受到启发。 浏览器必须支持new Map , new Set和Symbol... modify ( ( value , state ) => { // Perform modifications here and return modified state return { ... state , ... value
tinkerforge-strom-ui
发展PG-Strom开发存储库移至
strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的...
Big Data:Principles and best practices of scala 1. A new paradigm for Big Data - FREE 2. Data model for Big Data - AVAILABLE 3. Data storage on the batch layer 4. MapReduce and batch processing 5. ...
strom zookeeper kafka 部署文档 原理解析
Strom项目依赖所需jar
Strom webService测试工具,类似于soapUI,个人更喜欢这个
Strom的基础概念,包括核心概念释义,如拓扑等;一些常用API方法和参数详解;大方面的工作流程;
⽀持交互查询, STROM:纯实时处理,来⼀条处理⼀条。低延时,⾼容错性。 对于这两个技术,也是针对于应⽤场景的,假如不能忍受⾼延时且不需要复杂的交互查询则⽤STROM,否则⼀般⽤SPARK STREAMING。 2. 离线处理 ...
strom介绍,包括出现背景,应用场景,环境搭建,基本架构。
GPU数据库PG_strom的安装及使用,包括postgresql的安装, PG_strom的安装。
##介绍 XMemcached 是一个高性能、易用的 Java 阻塞多线程 memcached 客户端。 它基于 nio 并经过精心设计以获得最佳性能。 ##新闻和下载 。 Maven 依赖: ... <artifactId>xmemcached <version>{version} ...
NULL 博文链接:https://contentprovider.iteye.com/blog/1041946
里面是对storm运行的一个项目,放到eclipse里面就可以跑了,对于初学者非常有用,是根据这个https://www.cnblogs.com/freeweb/p/5242631.html来的
kafka-and-strom-event-processing-in-realtime-131023085422-phpapp01.pdf
非常好用的webservice接口测试小工具,开发时中必备,解压即可。
strom学习笔记
pg-strom, PG Strom开发知识库 pgpg strom是PostgreSQL数据库的定制扫描提供程序模块。 它是用于使用GPU设备进行accelarate顺序扫描,hash-基于表的Join 和聚合函数。 它的基本概念是CPU和GPU应该集中在它们具有优势...