1,PredictionIO如果用直接下载的0.11.0-incubating版本,存在一个HDFS配置相关的BUG
执行pio status命令时会发生如下的错误:
2017-05-31 13:08:46,819 ERROR org.apache.predictionio.data.storage.Storage$ [main] - Error initializing storage client for source HDFS java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at org.apache.predictionio.data.storage.hdfs.StorageClient.<init>(StorageClient.scala:32) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:223) at org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:254) at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:215) at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:215) at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189) at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91) at org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:215) at org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:284) at org.apache.predictionio.data.storage.Storage$.getDataObjectFromRepo(Storage.scala:269) at org.apache.predictionio.data.storage.Storage$.getModelDataModels(Storage.scala:411) at org.apache.predictionio.data.storage.Storage$.verifyAllDataObjects(Storage.scala:350) at org.apache.predictionio.tools.commands.Management$.status(Management.scala:156) at org.apache.predictionio.tools.console.Pio$.status(Pio.scala:144) at org.apache.predictionio.tools.console.Console$$anonfun$main$1.apply(Console.scala:663) at org.apache.predictionio.tools.console.Console$$anonfun$main$1.apply(Console.scala:611) at scala.Option.map(Option.scala:145) at org.apache.predictionio.tools.console.Console$.main(Console.scala:611) at org.apache.predictionio.tools.console.Console.main(Console.scala) 2017-05-31 13:08:46,826 ERROR org.apache.predictionio.tools.commands.Management$ [main] - Unable to connect to all storage backends successfully.
这是一个已知的代码错误 https://issues.apache.org/jira/browse/PIO-91
可以通过从github上下载最新源码自己编译来解决此问题。
2,编译PredictionIO源码的方式
之所以把ElasticSearch版本拿出来说,是因为编译成功后,我们需要修改conf目录中的pio-env.sh配置文件,此时需要特别注意:
3,0.11.0-incubating版本对于ElasticSearch的版本支持问题
这个版本的PIO的代码中,有ElasticSearch1和ElasticSearch5两套支持代码。
而ElasticSearch2可能会使用ES5的代码,可能产生问题,例如使用了一些只有ES5中才有的type “keyword”。
所以0.11.0-incubating版本不建议配合ElasticSearch2使用。
同时要注意,不同版本的ES下,pio-env.sh的配置上可能会有少许差异,否则也会出问题
ES1的默认端口配置为9300
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=<some-elasticsearch-node>,<some-other-elasticsearch-node>,...
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,9300,9300
ES5的默认端口配置为9200
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=<some-elasticsearch-node>,<some-other-elasticsearch-node>,...
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,9200,9200
因为Universal Recommender中主要支持ElasticSearch1,所以我们最终决定使用ElasticSearch1。