`
文章列表

HighQulity PPT on line

    博客分类:
  • ML
http://www.slideshare.net/yuhuang/large-scale-machine-learning-for-big-data

Spark: Spark Streaming

Spark Streaming uses a “micro-batch” architecture, where the streaming computation is treated as a continuous series of batch computations on small batches of data. Spark Streaming receives data from various input sources and groups it into small batches. New batches are created at regular time int ...
In distributed mode, Spark uses a master/slave  architecture with one central  coordinator and many distributed  workers. The central coordinator is called the driver.The driver communicates with a potentially large number of    distributed workers called executors.The driver  runs in its own Java p ...
Host: 192.168.0.135 192.168.0.136   192.168.0.137 master: 137  workers:135 136   1.Install spark on all hosts  in /opt dir   2.Install SSH Remote Access 137#ssh-keygen 137#ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.0.135 137#ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.0.136   3. Conf ...
https://spark.apache.org/docs/latest/cluster-overview.html   This document gives a short overview of how Spark runs on clusters, to make it easier to understand the components involved. Read through theapplication submission guide to submit applications to a cluster. Components Spark application ...
In order to flow the data across multiple agents or hops, the sink of the previous agent and source of the current hop need to be avro type with the sink pointing to the hostname (or IP address) and port of the source.     Hop 1: a1.channels.ch1.type = memory a1.sources.avro-source1.channel ...

Flume: hbase sink

flume.conf a1.sinks.hbase-sink1.channel = ch1 a1.sinks.hbase-sink1.type = hbase a1.sinks.hbase-sink1.table = users a1.sinks.hbase-sink1.columnFamily= info a1.sinks.hbase-sink1.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer a1.sinks.hbase-sink1.serializer.regex=^(.+)\t(.+)\t( ...
    http://kitesdk.org/docs/1.0.0/morphlines/ http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-and-integrate-etl-apps-for-apache-hadoop/
http://kitesdk.org/docs/current/  

Neo4j: fulltext search

Model @Indexed(indexType = IndexType.FULLTEXT, indexName = "TaskTile") private String title;   Repository @Query("START n=node:TaskTile({0}) return n") Iterable<Task> findTasksByTitle(String query);   query string parameter title:*software*   In ...
In my project, I provide neo4j extentions to clients which send json string to extention while  jackson auto parse the json string to my POJO Model.    Howerver, I want to simply the json string sent by client .  The POJO Model class likes @NodeEntity @XmlRootElement @JsonAutoDetect @JsonIgnor ...
 I want to search some nodes by date and time. In model, @Indexed private int startDate; @Indexed private int startTime; @Indexed private int endDate; @Indexed private int endTime;   A issue should be note that you should do your custom desirializer Josn t ...
I follow the spring data neo4j  reference guide to import  GeoSpatial function in my project likes   <dependency> <groupId>org.neo4j</groupId> <artifactId>neo4j-spatial</artifactId> <version>0.13-neo4j-2.0.1</version> </dependency>   ...

Java: Enum

    博客分类:
  • Java
http://www.tuicool.com/articles/YvQZFf
To lift server performance, when we query some data and result is hug,  in which case,  large memory will be used to produce the result json string.  One solution is to use streaming JSON responses. @Path("/fof/{userName}") @GET @Produces(MediaType.APPLICATION_JSON) publ ...
Global site tag (gtag.js) - Google Analytics