HighQulity PPT on line

博客分类：

ML

http://www.slideshare.net/yuhuang/large-scale-machine-learning-for-big-data

2015-04-24 15:33
浏览 350
评论(0)
分类:开源软件

Spark Streaming uses a “micro-batch” architecture, where the streaming computation is treated as a continuous series of batch computations on small batches of data. Spark Streaming receives data from various input sources and groups it into small batches. New batches are created at regular time int ...

2015-04-22 16:02
浏览 376
评论(0)
分类:开源软件

Spark: cluters architecture

博客分类：

Spark

In distributed mode, Spark uses a master/slave architecture with one central coordinator and many distributed workers. The central coordinator is called the driver.The driver communicates with a potentially large number of distributed workers called executors.The driver runs in its own Java p ...

2015-04-22 10:51
浏览 499
评论(0)
分类:开源软件

Spark: deploy cluster in standlone mode

博客分类：

Spark

Host: 192.168.0.135 192.168.0.136 192.168.0.137 master: 137 workers:135 136 1.Install spark on all hosts in /opt dir 2.Install SSH Remote Access 137#ssh-keygen 137#ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.0.135 137#ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.0.136 3. Conf ...

2015-04-20 12:32
浏览 555
评论(0)
分类:开源软件

Spark: Cluster Mode Overview

博客分类：

Spark

https://spark.apache.org/docs/latest/cluster-overview.html This document gives a short overview of how Spark runs on clusters, to make it easier to understand the components involved. Read through theapplication submission guide to submit applications to a cluster. Components Spark application ...

2015-04-20 10:15
浏览 537
评论(0)
分类:开源软件

Flume: avro source and sink

博客分类：

Flume

In order to flow the data across multiple agents or hops, the sink of the previous agent and source of the current hop need to be avro type with the sink pointing to the hostname (or IP address) and port of the source. Hop 1: a1.channels.ch1.type = memory a1.sources.avro-source1.channel ...

2015-04-17 11:12
浏览 739
评论(0)
分类:开源软件

Flume: hbase sink

博客分类：

Flume

flume.conf a1.sinks.hbase-sink1.channel = ch1 a1.sinks.hbase-sink1.type = hbase a1.sinks.hbase-sink1.table = users a1.sinks.hbase-sink1.columnFamily= info a1.sinks.hbase-sink1.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer a1.sinks.hbase-sink1.serializer.regex=^(.+)\t(.+)\t( ...

2015-04-16 17:04
浏览 2869
评论(0)
分类:开源软件

Kite:Morphlines Introduction

博客分类：

Kite

http://kitesdk.org/docs/1.0.0/morphlines/ http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-and-integrate-etl-apps-for-apache-hadoop/

2015-04-13 11:09
浏览 487
评论(0)
分类:开源软件

Kite: A Data API for Hadoop

博客分类：

Kite

http://kitesdk.org/docs/current/

2015-04-13 11:04
浏览 476
评论(0)
分类:开源软件

Neo4j: fulltext search

博客分类：

Neo4j

Model @Indexed(indexType = IndexType.FULLTEXT, indexName = "TaskTile") private String title; Repository @Query("START n=node:TaskTile({0}) return n") Iterable<Task> findTasksByTitle(String query); query string parameter title:*software* In ...

2015-04-08 15:03
浏览 1006
评论(0)
分类:开源软件

Neo4j: custom parse string to POJO with jackson json

博客分类：

Neo4j

In my project, I provide neo4j extentions to clients which send json string to extention while jackson auto parse the json string to my POJO Model. Howerver, I want to simply the json string sent by client . The POJO Model class likes @NodeEntity @XmlRootElement @JsonAutoDetect @JsonIgnor ...

2015-04-02 16:23
浏览 936
评论(0)
分类:开源软件

Neo4j: Index date type field

博客分类：

Neo4j

I want to search some nodes by date and time. In model, @Indexed private int startDate; @Indexed private int startTime; @Indexed private int endDate; @Indexed private int endTime; A issue should be note that you should do your custom desirializer Josn t ...

2015-04-01 14:45
浏览 560
评论(0)
分类:开源软件

Neo4j: Geo Spatial Query

博客分类：

Neo4j

I follow the spring data neo4j reference guide to import GeoSpatial function in my project likes <dependency> <groupId>org.neo4j</groupId> <artifactId>neo4j-spatial</artifactId> <version>0.13-neo4j-2.0.1</version> </dependency> ...

2015-04-01 11:30
浏览 634
评论(0)
分类:开源软件

Java: Enum

博客分类：

Java

http://www.tuicool.com/articles/YvQZFf

2015-03-31 19:03
浏览 395
评论(0)
分类:开源软件

Neo4j: Streaming JSON responses

博客分类：

Neo4j

To lift server performance, when we query some data and result is hug, in which case, large memory will be used to produce the result json string. One solution is to use streaming JSON responses. @Path("/fof/{userName}") @GET @Produces(MediaType.APPLICATION_JSON) publ ...

2015-03-31 16:33
浏览 699
评论(0)
分类:开源软件

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

HighQulity PPT on line

Spark: Spark Streaming

Spark: cluters architecture

Spark: deploy cluster in standlone mode

Spark: Cluster Mode Overview

Flume: avro source and sink

Flume: hbase sink

Kite:Morphlines Introduction

Kite: A Data API for Hadoop

Neo4j: fulltext search

Neo4j: custom parse string to POJO with jackson json

Neo4j: Index date type field

Neo4j: Geo Spatial Query

Java: Enum

Neo4j: Streaming JSON responses

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

最近访客更多访客>>