Ingo Levin posted 04-12-2018 14:32

Hi, I am on Windows8.1 and have Miniconda3 installed with Python 3.6.

I can verify this on the command line.

C:\Program Files\pdi-ce-\data-integration>python -c "import sys; print (sys.executable);"

>> C:\Users\ingo\Miniconda3\python.exe

C:\Program Files\pdi-ce-\data-integration>python -V

>>Python 3.6.0 :: Continuum Analytics, Inc.

In it I have numpy, sklearn, pandas and matplotlib installed. They are on the list of returned modules when I run:

python -c "help(\"modules\")"

Yet, when I start PDI and try to use a PMI step, it throws an Error that Python Scikit-learn and R are not installed.

What am I doing wrong??


Can you try installing R packages as well?

Link to install doc for your reference:

Link to other references: "PMI Installation, Developer Guide and Sample/Demos"

Can you provide your environmental variables that you've defined for PMI? Also, how are you launching "spoon"?

Do you have scipy installed? PMI also requires this. You can run

python ~/wekafiles/packages/wekaPython/resources/py/


(Adapt the above for Windows with respect to pointing to your home directory and backslashes etc.).


This is what it checks for:

def check_libraries():


    isPython3 = sys.version_info >= (3, 0)

    if isPython3:

















If there is no output, then you should have all the required python libraries.




Hi Mark,the script returns the following:C:\Users\ingo>python wekafiles\packages\wekaPython\resources\py\>>A problem occurred when trying to import pandasSo, this clearly seems to be the cause of my problem.But the thing is,  I do have scipy and pandas correctly installed and I can import pandas without problems in an interactive python shell session for example...How can I troubleshoot this?I haven't set any specific env variables for a specific python home. It's just the one referenced in the system-wide %PATH%.  I have a few more other conda (python) environments, but that shouldn't matter as they are not in the PATH Hi Ken Wood, I haven't defined any specific env variables for PMI. I am not (yet) using R, so per the docs all I should need is my python executable available in the system path. which it is.

C:\Users\ingo>echo %PATH%

C:\Program Files\Microsoft MPI\Bin\;C:\Program Files\PHP\v7.0;C:\Program Files (

x86)\Intel\iCLS Client\;C:\Program Files\Intel\iCLS Client\;C:\ProgramData\Oracl


ws\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Intel\Intel(R) Manage

ment Engine Components\DAL;C:\Program Files\Intel\Intel(R) Management Engine Com

ponents\DAL;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\I

PT;C:\Program Files\Intel\Intel(R) Management Engine Components\IPT;C:\Program F

iles (x86)\PuTTY\;C:\Program Files (x86)\Gartle\SaveToDB\;;C:\Android;C:\Users\i


\bin;C:\adb;C:\Program Files (x86)\Skype\Phone\;C:\Program Files\Microsoft SQL S

erver\Client SDK\ODBC\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Serve

r\140\Tools\Binn\;C:\Program Files\Microsoft SQL Server\140\Tools\Binn\;C:\Progr

am Files\Microsoft SQL Server\140\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL

Server\Client SDK\ODBC\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Ser

ver\140\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\Man



I am starting spoon via the shipped Spoon.bat. It's my local laptop, I'm just using the defaults and not setting explicit _PENTAHO_JAVA_HOME, KETTLE_HOME, etc.

Should I?

No yet. Was about to try that when I ran out of time.

I think the issue with python is that pandas does not get imported correctly - see my comment to Mark Hall. Trying to get that sorted out first, then will get back onto R.

Mark Hall

I found out why the pyCheck script is throwing the error 'A problem occurred when trying to import pandas'.

My installed pandas version is  0.22.0+0.ga00154d.dirty which is later than the required min version 0.7.0

FYI - I am using the Intel Python distro for the faster performance, not the standard Anaconda channel.

The problem is that the pyCheck script is essentially comparing the len(__version__.split('.')) of both version strings and if they dont have the same lenght, it will immediately throw said error.


Out[13]: 5



Out[14]: 3

I think this is a bug in the pyCheck script.

I have 0.22.x installed which satisfies the min requirement 0.7.0, so there should not be an error.


I commented out these two rows in the script. Now the python scikit-learn engine is avaible when I start PDI.

def check_min_pandas():

    min_pandas = pandas_version_min.split('.')


        import pandas

        actual_pandas = pandas.__version__.split('.')

#        if len(actual_pandas) is not len(min_pandas):

#            raise Exception()

        result = check_min_version(min_pandas, actual_pandas)

        if result:


                'Installed pandas does not meet the minimum requirement: version ' + pandas_version_min)


        append_to_results('A problem occurred when trying to import pandas')

Cool! Thanks Ingo. I'll incorporate this fix into the next release of the wekaPython package and the PDI CPython script executor step.



