I already integrated carrot2 with solr-4.x with my customerized chinese tokenizer successfully.
But I run some errors following my series of blogs http://ylzhj02.iteye.com/blog/2152348 to adopt carrot2 to solr-5.1.0
The error is
org.carrot2.util.factory.FallbackFactory; Tokenizer for Chinese Simplified (zh_cn) is not available. This may degrade clustering quality of Chinese Simplified content. Cause: java.lang.NoSuchMethodError: org.apache.lucene.analysis.Tokenizer.<init>(Ljava/io/Reader;)V
The reason is that solr-5.2.1 adopted lucene 5.1.0, however carrot2-3.10.0 used lucene 4.6.0. So the cause is jars uncompatible.
So, the solution is to download the latest version of carrot2
#git clone git://github.com/carrot2/carrot2.git
(3.11.0)
the lucene version is now 5.1.0
#cd carrot2
step 1:
#vi core/carrot2-util-text/src/org/carrot2/text/linguistic/DefaultTokenizerFactory.java
add
import org.carrot2.text.linguistic.lucene.InokChineseTokenizerAdapter;
change
100 map.put(LanguageCode.CHINESE_SIMPLIFIED,
101 new NewClassInstanceFactory<ITokenizer>(ChineseTokenizerAdapter.class));
to
map.put(LanguageCode.CHINESE_SIMPLIFIED,
new NewClassInstanceFactory<ITokenizer>(InokChineseTokenizerAdapter.class));
step 2:
#vi InokChineseTokenizerAdapter.java
#cp chineseTokenizer/InokChineseTokenizerAdapter.java ./core/carrot2-util-text/src/org/carrot2/text/linguistic/lucene/
step 3:
#mkdir lib/org.lionsoul.jcseg
├── build.properties
├── jcseg-core-1.9.6.jar
├── jcseg.LICENSE
└── META-INF
└── MANIFEST.MF
the file and jars is
build.properties
bin.includes = META-INF/,\
jcseg-core-1.9.6.jar,\
jcseg.LICENSE
META-INF/MANIFEST.MF
Manifest-Version: 1.0
Bundle-ManifestVersion: 2
Bundle-Name: Jcseg Tokenizer
Bundle-SymbolicName: org.lionsoul.jcseg
Bundle-Version: 1.9.6
Bundle-ClassPath: jcseg-core-1.9.6.jar
Bundle-Vendor: INokNok Inc.
Bundle-RequiredExecutionEnvironment: JavaSE-1.6
step 4:
modify build.xml
141 <patternset id="lib.test">
142 <include name="core/**/*.jar" />
143 <include name="lib/**/*.jar" />
144 <include name="lib/org.lionsoul.jcseg/*.jar" />
145 <exclude name="lib/org.slf4j/slf4j-nop*" />
146 <include name="applications/carrot2-dcs/**/*.jar" />
147 <include name="applications/carrot2-webapp/lib/*.jar" />
148 <include name="applications/carrot2-benchmarks/lib/*.jar" />
149 </patternset>
173 <patternset id="lib.core">
174 <include name="lib/**/*.jar" />
175 <include name="lib/org.lionsoul.jcseg/*.jar" />
176 <include name="core/carrot2-util-matrix/lib/*.jar" />
177 <patternset refid="lib.core.excludes" />
178 </patternset>
180 <patternset id="lib.core.mini">
181 <include name="lib/**/mahout-*.jar" />
182 <include name="lib/**/jcseg*.jar" />
183 <include name="lib/**/mahout.LICENSE" />
184 <include name="lib/**/colt.LICENSE" />
185 <include name="lib/**/commons-lang*" />
186 <include name="lib/**/guava*" />
187 <include name="lib/**/jackson*" />
188 <include name="lib/**/lucene-snowball*" />
189 <include name="lib/**/lucene.LICENSE" />
190 <include name="lib/**/hppc-*.jar" />
191 <include name="lib/**/hppc*.LICENSE" />
192
193 <include name="lib/**/slf4j-api*.jar" />
194 <include name="lib/**/slf4j-nop*.jar" />
195 <include name="lib/**/slf4j.LICENSE" />
196
197 <include name="lib/**/attributes-binder-*.jar" />
198 </patternset>
199
906 <target name="core" depends="jar, jar.src, lib-no-jar.flattened" description="Builds Carrot2 Java API JAR with dependencies">
907 <delete dir="${api.dir}" failonerror="false" />
908 <mkdir dir="${api.dir}" />
909 <mkdir dir="${api.dir}/lib" />
910 <mkdir dir="${api.dir}/examples" />
911 <mkdir dir="${api.dir}/resources" />
912
913 <patternset id="carrot2.required">
914 <include name="**/jcseg*" />
915 <include name="**/commons-lang*" />
step 6:
#ant jar
#scp tmp/jar/carrot2-core-3.11.0-SNAPSHOT.jar root@192.168.0.135:/opt/solr/contrib/clustering/lib
carrot2-core-3.11.0-SNAPSHOT.jar
restart solr server to test clustering
-----------------------------
An error happans
org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: com
/carrotsearch/hppc/ObjectHashSet
Solution :
#scp lib/com.carrotsearch.hppc/hppc-0.7.1.jar root@192.168.0.135:/opt/solr/contrib/clustering/lib/
hppc-0.7.1.jar
#rm -f opt/solr/contrib/clustering/lib/hppc-0.5.2.jar
------
another error is
java.lang.RuntimeException: java.lang.IllegalAccessError: class
com.carrotsearch.hppc.ObjectHashSet cannot access its superclass com.carrotsearch.hppc.AbstractObjectCollection
The reason is that there is an old hppc-0.5.2.jar in /opt/solr/server/webapps/solr.war
so, Solution is to
#cd /opt/solr/server/solr-webapp/webapp
#rm -f WEB-INF/lib/hppc-0.5.2.jar
#cp hppc-0.7.1.jar WEB-INF/lib
#jar cf solr.war ./
#mv solr.war /opt/solr/server/webapps
restart solr
the error disappers
分享到:
相关推荐
| Spring | MyBatis | Solr | Dubbo | Netty | Kafka | Zookeeper | Nginx | Tomcat | Redis | ## Java - JAVA基础 - JAVA虚拟机 - JAVA并发编程 - JAVA容器类 - Java锁汇总 ## 数据库 - MySQL - MySQL...
ik-analyzer-solr 用于solr 7.x-8.x的ik-analyzer 简介 适应最新版本的solr 7&8; 扩展IK首词库: 分词工具 词库中词的数量 最后更新时间 我知道 27.5万 2012年 毫米段 15.7万 2017年 字 64.2万 2014年 界坝 58.4...
ik-analyzer分词器,支持solr5-5.x
最新可用已配置好solr的carrot2插件,tomcat里面需配置好solr具体到http://carrot2.github.io/solr-integration-strategies/carrot2-3.8.0/index.html查看
solr-mongo-importer-1.1.0.jar solr-mongo-importer-1.1.0.jar solr-mongo-importer-1.1.0.jar
solr6 solr-dataimporthandler-scheduler-1.1源码,可以自己编译适应不同版本solr
solr-7.7.2+ik-analyzer-solr7x solr-7.7.2+ik-analyzer-solr7x
以solr8.11.1为基础镜像,使用docker-compose构建含中文分词器的新的镜像 文件夹内含有docker-compose.yml脚本、Dockerfile脚本以及构建镜像所需中文分词器ik-analyzer-8.5.0.jar、所有扩展词和停用词相关的配置文件...
solr-import-export-json最新代码solr-import-export-json最新代码solr-import-export-json最新代码solr-import-export-json最新代码solr-import-export-json最新代码solr-import-export-json最新代码solr-import-...
solr的carrot2需要用到的文件solr-integration-strategies-gh-pages carrot3.9webapp,还有tomcat还有solr4.81请自己下载
solr -8.11.1.zip 文件
Title: Scaling Big Data with Hadoop and Solr, 2nd Edition Author: Hrishikesh Vijay Karambelkar Length: 156 pages Edition: 1 Language: English Publisher: Packt Publishing Publication Date: 2015-03-31 ...
solr-data-import-scheduler-1.1.2,用于solr定时更新索引的jar包,下载后引入到solr本身的dist下面,或者你tomcat项目下面的lib下面
solr5.0.0 所需jar包,包含 IKAnalyzer-5.0 ,solr-analyzer-extra-5.1.0 , solr-dataimportscheduler-1.1.1,solr-dataimporthandler-5.0.0 ,solr-dataimporthandler-extras-5.0.0
solr增量更新-
Laravel 4 Apache Solr Laravel 4软件包提供了一个接口,用于通过其静态接口使用(查询) 。安装首先通过Composer安装此软件包。 编辑项目的composer.json文件,以要求davispeixoto/laravel-4-solr 。 "require": {...
Solr搜索NodeBB : Solr(发音为“ solar”)是来自Apache Lucene项目的开源企业搜索平台。 它的主要功能包括全文搜索,命中突出显示,多面搜索,动态聚类,数据库集成以及丰富的文档(例如Word,PDF)处理。 此...
Solr in Action by Trey Grainger , Timothy Potter Book Description Publication Date: April 5, 2014 | ISBN-10: 1617291021 | ISBN-13: 978-1617291029 | Edition: 1 Summary Solr in Action is a ...
Node.js 的 Solr 模块参考Node.js: : Solr: : 使用npm test运行测试。 如果您没有在 127.0.0.1:8983 上运行 Solr,请编辑“test/common.js”。使用示例请参阅使用测试。 这是一个快速示例: var solr = require ( ...
This book is also for people who work with analytics to generate graphs and reports using Solr. Moreover, if you are a search architect who is looking forward to scale your search using Solr, this is...