hadoop streaming - How to call python UDF with streaming_python in CDH 4.4 pig? -


I am working on some python UDF for my pig project. On the first path, I prefer CPython instead of Jython because I want to use some mature libraries such as python to smoothly numpy.

There is a code snippet in my pyudf.py file:

  np @outputSchema as pig_util import outputSchema import from numpy ("t: Double ") Def STD (Input): Input 2 = [n [1.] In the pig script, for n in the input (t [0]) input] np.std return (input 2)  

First register the dragon module with the command:

  register myfuncs as 'streaming_python' using 'pyudf.py';  

Then I can call the UDF part in the following:

  myfuncs.std (..)  

The entire workflow can proceed without errors in Apache Pig (1.2.1) on my desktop. But complaining about the pig streaming_python when I run the same code on our CDH platform:

  error org.apache.pig.tools.grunt.Grunt - Error 2997 : anticipating Aioaksepshn ScriptEngine could not load: streaming_python to streaming_python (languages ​​supported: [JavaScript, Jython, Groovy, JRuby]): java.lang.ClassNotFoundException: streaming_python  

does anyone know How does CDH work with streaming_python or any other solution?

Thanks Forward!

Jemin - Checkmax 35@gmail.com


Comments