Pentaho

 View Only

 PMI - Engines unavailable (Python Scikit-Learn)

  • Pentaho
  • Pentaho
Ingo Levin's profile image
Ingo Levin posted 04-12-2018 14:32

Hi, I am on Windows8.1 and have Miniconda3 installed with Python 3.6.

I can verify this on the command line.

C:\Program Files\pdi-ce-8.0.0.0-28\data-integration>python -c "import sys; print (sys.executable);"

>> C:\Users\ingo\Miniconda3\python.exe

C:\Program Files\pdi-ce-8.0.0.0-28\data-integration>python -V

>>Python 3.6.0 :: Continuum Analytics, Inc.

In it I have numpy, sklearn, pandas and matplotlib installed. They are on the list of returned modules when I run:

python -c "help(\"modules\")"

Yet, when I start PDI and try to use a PMI step, it throws an Error that Python Scikit-learn and R are not installed.

What am I doing wrong??

pastedimage_6


#Pentaho
David Huh's profile image
David Huh

Can you try installing R packages as well?

Link to install doc for your reference: https://community.hitachivantara.com/servlet/JiveServlet/downloadBody/1010952-102-1-293961/PMI_Installation__Windows.pdf

Link to other references: "PMI Installation, Developer Guide and Sample/Demos"

Kenneth Wood's profile image
Kenneth Wood

Can you provide your environmental variables that you've defined for PMI? Also, how are you launching "spoon"?

Mark Hall's profile image
Mark Hall

Do you have scipy installed? PMI also requires this. You can run

python ~/wekafiles/packages/wekaPython/resources/py/pyCheck.py

 

(Adapt the above for Windows with respect to pointing to your home directory and backslashes etc.).

 

This is what it checks for:

def check_libraries():

    check_min_python()

    isPython3 = sys.version_info >= (3, 0)

    if isPython3:

        check_library('io')

    else:

        check_library('StringIO')

    check_library('math')

    check_library('traceback')

    check_library('socket')

    check_library('struct')

    check_library('os')

    check_library('json')

    check_library('base64')

    check_library('pickle')

    check_library('scipy')

    check_library('sklearn')

    check_library('matplotlib')

 

    check_library('numpy')

If there is no output, then you should have all the required python libraries.

 

Cheers,

Mark.

Ingo Levin's profile image
Ingo Levin

This content was either too long or contained formatting that did not work with our migration. A PDF document is attached that contains the original representation

 

Hi Mark,the script returns the following:C:\Users\ingo>python wekafiles\packages\wekaPython\resources\py\pyCheck.py>>A problem occurred when trying to import pandasSo, this clearly seems to be the cause of my problem.But the thing is,  I do have scipy and pandas correctly installed and I can import pandas without problems in an interactive python shell session for example...How can I troubleshoot this?I haven't set any specific env variables for a specific python home. It's just the one referenced in the system-wide %PATH%.  I have a few more other conda (python) environments, but that shouldn't matter as they are not in the PATHMy Full list of installed modules:>>help ("modules")Crypto              brain_stdlib        mmsystem            sspiIPython             builtins            modulefinder        sspiconOleFileIO_PL        bz2                 mpl_toolkits        statOpenSSL             cProfile            msilib              statisticsPIL                 calendar            msvcrt              storemagicPyQt5               certifi             multiprocessing     stringTBB                 cffi                nbconvert           stringprep__future__          cgi                 nbformat            struct_ast                cgitb               netbios             subprocess_asyncio            chardet             netrc               sunau_bisect             chunk               nntplib             symbol_blake2             clyent              notebook            sympyprinting_bootlocale         cmath               nt                  symtable_bz2                cmd                 ntpath              sys_cffi_backend       code                ntsecuritycon       sysconfig_codecs             codecs              nturl2path          tabnanny_codecs_cn          codeop              numbers             tarfile_codecs_hk          collections         numexpr             tbb_codecs_iso2022     colorama            numpy               telnetlib_codecs_jp          colorsys            numpydoc            tempfile_codecs_kr          commctrl            odbc                test_codecs_tw          compileall          olefile             test_path_collections        concurrent          opcode              test_pycosat_collections_abc    conda               operator            testpath_compat_pickle      conda_env           optparse            tests_compression        configparser        os                  textwrap_csv                contextlib          pandas              this_ctypes             copy                pandocfilters       threading_ctypes_test        copyreg             parser              time_datetime           crypt               path                timeit_decimal            cryptography        pathlib             timer_dummy_thread       csv                 pdb                 tkinter_elementtree        ctypes              pep8                token_functools          curses              perfmon             tokenize_hashlib            cwp                 pickle              tornado_heapq              cycler              pickleshare         trace_imp                cythonmagic         pickletools         traceback_io                 daal                pip                 tracemalloc_json               datetime            pipes               traitlets_license            dateutil            pkg_resources       tty_locale             dbi                 pkgutil             turtle_lsprof             dbm                 platform            turtledemo_lzma               dde                 plistlib            types_markupbase         decimal             poplib              typing_md5                decorator           posixpath           unicodedata_msi                difflib             pprint              unittest_multibytecodec     dis                 profile             untitled0_multiprocessing    distutils           prompt_toolkit      urllib_nsis               doctest             pstats              urllib3_opcode             docutils            psutil              uu_operator           dummy_threading     pty                 uuid_osx_support        easy_install        py_compile          venv_overlapped         email               pyasn1              warnings_pickle             encodings           pyclbr              wave_pydecimal          ensurepip           pycosat             wcwidth_pyio               entrypoints         pycparser           weakref_random             enum                pydoc               webbrowser_sha1               errno               pydoc_data          wheel_sha256             faulthandler        pyexpat             widgetsnbextension_sha3               filecmp             pyflakes            win2kras_sha512             fileinput           pygments            win32api_signal             fnmatch             pylab               win32clipboard_sitebuiltins       formatter           pylint              win32com_socket             fractions           pyparsing           win32con_sqlite3            ftplib              pythoncom           win32console_sre                functools           pytz                win32cred_ssl                gc                  pywin               win32crypt_stat               genericpath         pywin32_testutil    win32cryptcon_string             getopt              pywintypes          win32event_strptime           getpass             qtawesome           win32evtlog_struct             gettext             qtconsole           win32evtlogutil_symtable           glob                qtpy                win32file_system_path        gzip                queue               win32gui_testbuffer         hashlib             quopri              win32gui_struct_testcapi           heapq               random              win32help_testconsole        hmac                rasutil             win32inet_testimportmultiple html                re                  win32inetcon_testmultiphase     html5lib            regcheck            win32job_thread             http                regutil             win32lz_threading_local    idlelib             reprlib             win32net_tkinter            idna                requests            win32netcon_tracemalloc        imagesize           rlcompleter         win32pdh_warnings           imaplib             rmagic              win32pdhquery_weakref            imghdr              rope                win32pdhutil_weakrefset         imp                 ruamel_yaml         win32pipe_win32sysloader     importlib           run                 win32print_winapi             inspect             runpy               win32process_winxptheme         io                  sched               win32profileabc                 ipaddress           scipy               win32rasadodbapi            ipykernel           secrets             win32rcparserafxres              ipython_genutils    select              win32securityaifc                ipywidgets          selectors           win32servicealabaster           isapi               servicemanager      win32serviceutilanaconda_navigator  isort               setuptools          win32timezoneantigravity         itertools           shelve              win32traceargparse            jedi                shlex               win32traceutilarray               jinja2              shutil              win32transactionasn1crypto          json                signal              win32tsast                 jsonschema          simplegeneric       win32uiastroid             jupyter             sip                 win32uioleasynchat            jupyter_client      sipconfig           win32verstampasyncio             jupyter_console     sipdistutils        win32wnetasyncore            jupyter_core        site                win_inet_ptonatexit              keyword             six                 win_unicode_consoleaudioop             lazy_object_proxy   sklearn             winerrorautoreload          lib2to3             smtpd               winioctlconbabel               linecache           smtplib             winntbackports           locale              sndhdr              winperfbase64              logging             snowballstemmer     winregbdb                 lzma                socket              winsoundbinascii            macpath             socketserver        winxpguibinhex              macurl2path         socks               winxpthemebinstar_client      mailbox             sockshandler        wraptbisect              mailcap             sphinx              wsgirefbleach              markupsafe          spyder              xdrlibbrain_builtin_inference marshal             spyder_breakpoints  xmlbrain_dateutil      math                spyder_io_dcm       xmlrpcbrain_gi            matplotlib          spyder_io_hdf5      xxsubtypebrain_mechanize     menuinst            spyder_profiler     yamlbrain_nose          mimetypes           spyder_pylint       zipappbrain_numpy         mistune             sqlite3             zipfilebrain_pytest        mkl_fft             sre_compile         zipimportbrain_qt            mkl_random          sre_constants       zlibbrain_six           mmap                sre_parse           zmqbrain_ssl           mmapfile            sslEnter any module name to get more help.  Or, type "modules spam" to searchfor modules whose name or summary contain the string "spam".
Ingo Levin's profile image
Ingo Levin

Hi Ken Wood, I haven't defined any specific env variables for PMI. I am not (yet) using R, so per the docs all I should need is my python executable available in the system path. which it is.

C:\Users\ingo>echo %PATH%

C:\Program Files\Microsoft MPI\Bin\;C:\Program Files\PHP\v7.0;C:\Program Files (

x86)\Intel\iCLS Client\;C:\Program Files\Intel\iCLS Client\;C:\ProgramData\Oracl

e\Java\javapath;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windo

ws\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Intel\Intel(R) Manage

ment Engine Components\DAL;C:\Program Files\Intel\Intel(R) Management Engine Com

ponents\DAL;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\I

PT;C:\Program Files\Intel\Intel(R) Management Engine Components\IPT;C:\Program F

iles (x86)\PuTTY\;C:\Program Files (x86)\Gartle\SaveToDB\;;C:\Android;C:\Users\i

ngo\Miniconda3;C:\Users\ingo\Miniconda3\Scripts;C:\Users\ingo\Miniconda3\Library

\bin;C:\adb;C:\Program Files (x86)\Skype\Phone\;C:\Program Files\Microsoft SQL S

erver\Client SDK\ODBC\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Serve

r\140\Tools\Binn\;C:\Program Files\Microsoft SQL Server\140\Tools\Binn\;C:\Progr

am Files\Microsoft SQL Server\140\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL

Server\Client SDK\ODBC\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Ser

ver\140\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\Man

agementStudio\;C:\Users\ingo\Miniconda3;C:\Users\ingo\Miniconda3\Scripts;C:\User

s\ingo\Miniconda3\Library\bin

I am starting spoon via the shipped Spoon.bat. It's my local laptop, I'm just using the defaults and not setting explicit _PENTAHO_JAVA_HOME, KETTLE_HOME, etc.

Should I?

Ingo Levin's profile image
Ingo Levin

No yet. Was about to try that when I ran out of time.

I think the issue with python is that pandas does not get imported correctly - see my comment to Mark Hall. Trying to get that sorted out first, then will get back onto R.

Ingo Levin's profile image
Ingo Levin

Mark Hall

I found out why the pyCheck script is throwing the error 'A problem occurred when trying to import pandas'.

My installed pandas version is  0.22.0+0.ga00154d.dirty which is later than the required min version 0.7.0

FYI - I am using the Intel Python distro for the faster performance, not the standard Anaconda channel.

The problem is that the pyCheck script is essentially comparing the len(__version__.split('.')) of both version strings and if they dont have the same lenght, it will immediately throw said error.

len('0.22.0+0.ga00154d.dirty'.split('.'))

Out[13]: 5

while

len('0.7.0'.split('.'))

Out[14]: 3

I think this is a bug in the pyCheck script.

I have 0.22.x installed which satisfies the min requirement 0.7.0, so there should not be an error.

SOLUTION:

I commented out these two rows in the pyCheck.py script. Now the python scikit-learn engine is avaible when I start PDI.

def check_min_pandas():

    min_pandas = pandas_version_min.split('.')

    try:

        import pandas

        actual_pandas = pandas.__version__.split('.')

#        if len(actual_pandas) is not len(min_pandas):

#            raise Exception()

        result = check_min_version(min_pandas, actual_pandas)

        if result:

            append_to_results(

                'Installed pandas does not meet the minimum requirement: version ' + pandas_version_min)

    except:

        append_to_results('A problem occurred when trying to import pandas')

Mark Hall's profile image
Mark Hall

Cool! Thanks Ingo. I'll incorporate this fix into the next release of the wekaPython package and the PDI CPython script executor step.

Cheers,

Mark.

Data Conversion's profile image
Data Conversion
Attachment  View in library
61404.pdf 467 KB