I have been working on STAF & STAX. My objective is to read a JSON file using STAF & STAX Return Testcase PASS or FAIL. I tried updating my staf to latest version with latest python version.
Python Version Detail
20130408-15:38:19
Python Version : 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06)
[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)]
Here is my Code:
try:
import simplejson as json
except ImportError:
import json
title = []
album = []
slist = []
json_data=open('d:\Json_File.txt')
data = json.load(json_data)
for i in range(data["result"].__len__()):
title = data["result"][i]["Title"]
album = data["result"][i]["Album"]
slist = data["result"][i]["Title"] + ' [' + data["result"][i]["Album"] + '] \n'
It is giving error given below
20130408-11:32:26 STAXPythonEvaluationError signal raised. Terminating job.
===== XML Information =====
File: new13.xml, Machine: local://local
Line 15: Error in element type "script".
===== Python Error Information =====
com.ibm.staf.service.stax.STAXPythonEvaluationException:
Traceback (most recent call last):
File "<pyExec string>", line 1, in <module>
ImportError: No module named simplejson
===== Call Stack for STAX Thread 1 =====[
function: main (Line: 7, File: C:\STAF\services\stax\samples\new13.xml, Machine: local://local)
sequence: 1/2 (Line: 14, File: C:\STAF\services\stax\samples\new13.xml, Machine: local://local)
]
What's the process to include JSON in STAF Module.
STAX uses Jython (a version of Python written in Java), not Python, to execute code within a element in a STAX job. As i said i was using the latest version of STAX, v3.5.4, then it provides an embedded Jython 2.5.2 (which implements the same set of language features as Python 2.5) to execute code within a element.
Note: Jython 2.5.2 does not include simplejson since simplejson is included in Python 2.6 or later.
Appendix F: "Jython and CPython Differences" in the STAX User's Guide at talks about some differences between Jython and Python (aka CPython). Installing Python 2.7 or later in system will have no effect on the fact that STAX uses Jython 2.5.2 to execute code within a element in a STAX job. However, "simplejson can be run via Jython." I added the directory containing the simplejson module to the sys.path within my STAX job and then import simplejson. For example:
<script>
myPythonDir = 'C:/simplejson'
import sys
pythonpath = sys.path
# Append myPythonDir to sys.path if not already present
if myPythonDir not in pythonpath:
sys.path.append(myPythonDir)
import simplejson as json
</script>
Or, if you want to use Python 2.7 or later that you installed on your system (that includes simplejson), you can run a Python script (that uses json) via your STAX job using a** element.
For example, to use Python 2.7 (if installed in C:\Python2.7) to run a Python script named YourPythonScript.py in C:\tests.
<process>
<location>'local'</location>
<command mode="'shell'">'C:/Python2.7/bin/python.exe YourPythonScript.py'</command>
<workdir>'C:/tests'</workdir>
</process>
I have little idea about STAF/STAX. But going by what the error says, it seems simplejson module is not available. Rewrite the import line as following:
try:
import simplejson as json
except ImportError:
import json
You can fallback to json module in case import fails (Python 2.6+).
Related
I am still not the most sophisticated python user; but I cannot overcome this probably simple problem. I have a code that works perfectly with the spyder interface. I would like to make it a recurring task via creating a bat file. The bat file which in turn triggers a cmd interface does not import pandas_data reader and the code gets stuck and aborts.
import pandas_datareader.data as web
this line above creates the error below. It's a lengthy text.
File "C:\Users\myself\anaconda3\lib\site-packages\pandas_datareader\__init__.py", line 2, in <module>
from .data import ( File "C:\Users\myself\anaconda3\lib\site-packages\pandas_datareader\data.py", line 9, in <module>
from pandas.util._decorators import deprecate_kwarg File "C:\Users\myself\anaconda3\lib\site-packages\pandas\__init__.py", line 17, in <module>
"Unable to import required dependencies:\n" + "\n".join(missing_dependencies) ImportError: Unable to import required dependencies: numpy:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy c-extensions failed.
- Try uninstalling and reinstalling numpy.
- If you have already done that, then:
1. Check that you expected to use Python3.7 from "C:\Users\myself\anaconda3\python.exe",
and that you have no directories in your PATH or PYTHONPATH that can
interfere with the Python and numpy version "1.17.0" you're trying to use.
2. If (1) looks fine, you can open a new issue at
https://github.com/numpy/numpy/issues. Please include details on:
- how you installed Python
- how you installed numpy
- your operating system
- whether or not you have multiple versions of Python installed
- if you built from source, your compiler versions and ideally a build log
- If you're working with a numpy git repository, try `git clean -xdf`
(removes all files not under version control) and rebuild numpy.
Note: this error has many possible causes, so please don't comment on
an existing issue about this - open a new one instead.
Original error was: DLL load failed: The specified module could not be found.
I am trying to get Pandas working, based on the documentation?
Under the list of Supported Libraries for Python Shell Jobs they mention:
pandas (required to be installed via the python setuptools configuration, setup.py)
I have tried this with a setup file
from setuptools import setup
setup(
name="dependecy_package",
version="0.1",
packages=['pandas','shapely','psycopg2','s3fs'],
package_dir = {'': '/home/user/.local/lib/python3.6/site-packages'}
)
I uploaded this generated egg file and uploaded to S3. I then reference the new ex file as part of the run job settings. I however get this error upon startup
ImportError: C extension: No module named 'pandas._libs' not built.
If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.
How do I fix this?
This has something to do with the IntelliJ IDEA 2017.1.1 IDE. I do not get the following issue when executing my code via the command line.
===========================================================================
Python version: 3.6.1
xarray version: 0.9.6
pandas version: 0.20.3
numpy version: 1.12.1
I, for the first time, would like to use xarray.
I imported the module (no problem here) and then, without even using the module, ran my code. For example:
import xarray as xr
def something():
print("doing something...")
something()
This immediately throws an exception when I run it:
Exception ignored in: at 0x05A287B0>
Traceback (most recent call last): File "C:\Program Files
(x86)\Python36-32\lib\site-packages\pyparsing.py", line 160, in
_generatorType = type((y for y in range(1))) SystemError: error return without exception set
If I delete the import xarray as xr and rerun the code, I get no exception.
From the exception message, it looks like something called pyparsing.py
Any ideas?
pyparsing is probably installed as a dependency from some other package. I have run the pyparsing unit tests on both Python 3.6.1 and 3.6.2 (as well as most other popular Python versions back to 2.6) without any error.
I suspect that something in your environment is defining range to be something other than the normal builtin range method, and this is then causing the pyparsing code to fail.
I will fix this in pyparsing, to replace range(1) with just an empty list, which should give the same results for pyparsing, but without the susceptibility to being overwritten by a monkeypatch to range.
In the meantime, try explicitly importing pyparsing before importing xarray, or anything else for that matter. A simple import pyparsing should do.
I’m spark-submitting a python file that imports numpy but I’m getting a no module named numpy error.
$ spark-submit --py-files projects/other_requirements.egg projects/jobs/my_numpy_als.py
Traceback (most recent call last):
File "/usr/local/www/my_numpy_als.py", line 13, in <module>
from pyspark.mllib.recommendation import ALS
File "/usr/lib/spark/python/pyspark/mllib/__init__.py", line 24, in <module>
import numpy
ImportError: No module named numpy
I was thinking I would pull in an egg for numpy —python-files, but I'm having trouble figuring out how to build that egg. But then it occurred to me that pyspark itself uses numpy. It would be silly to pull in my own version of numpy.
Any idea on the appropriate thing to do here?
It looks like Spark is using a version of Python that does not have numpy installed. It could be because you are working inside a virtual environment.
Try this:
# The following is for specifying a Python version for PySpark. Here we
# use the currently calling Python version.
# This is handy for when we are using a virtualenv, for example, because
# otherwise Spark would choose the default system Python version.
os.environ['PYSPARK_PYTHON'] = sys.executable
I got this to work by installing numpy on all the emr-nodes by configuring a small bootstrapping script that contains the following (among other things).
#!/bin/bash -xe
sudo yum install python-numpy python-scipy -y
Then configure the bootstrap script to be executed when you start your cluster by adding the following option to the aws emr command (the following example gives an argument to the bootstrap script)
--bootstrap-actions Path=s3://some-bucket/keylocation/bootstrap.sh,Name=setup_dependencies,Args=[s3://some-bucket]
This can be used when setting up a cluster automatically from DataPipeline as well.
Sometimes, when you import certain libraries, your namespace is polluted with numpy functions. Functions such as min, max and sum are especially prone to this pollution. Whenever in doubt, locate calls to these functions and replace these calls with __builtin__.sum etc. Doing so will sometimes be faster than locating the pollution source.
Make sure your spark-env.sh has PYSPARK_PATH pointing to the correct Python release. Add export PYSPARK_PATH=/your_python_exe_path to /conf/spark-env.sh file.
I'm trying to use Jython (embedded) in a Jetty server (all through Maven) to invoke a simple Python script.
My script works fine as long as I don't try to use any of the standard library's such as 'logging.' Whenever I try to import one of the standard library's it fails with the exception "ImportError."
The exception I get is:
File "<string>", line 1, in <module>
File "c:\home\work\sample\content\helloworld\helloworld.py", line 10, in <module>
import logging
File "c:\home\work\sample\content\Lib\logging\__init__.py", line 29, in <module>
import sys, os, types, time, string, cStringIO, traceback
File "c:\home\work\sample\content\Lib\os.py", line 119, in <module>
raise ImportError, 'no os specific module found'
ImportError: no os specific module found
at org.python.core.PyException.fillInStackTrace(PyException.java:70)
at java.lang.Throwable.<init>(Throwable.java:181)
at java.lang.Exception.<init>(Exception.java:29)
at java.lang.RuntimeException.<init>(RuntimeException.java:32)
at org.python.core.PyException.<init>(PyException.java:46)
at org.python.core.PyException.doRaise(PyException.java:200)
at org.python.core.Py.makeException(Py.java:1159)
at org.python.core.Py.makeException(Py.java:1163)
at os$py.f$0(c:\home\work\sample\content\Lib\os.py:692)
at os$py.call_function(c:\home\work\sample\content\Lib\os.py)
at org.python.core.PyTableCode.call(PyTableCode.java:165)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.imp.createFromCode(imp.java:325)
at org.python.core.imp.createFromPyClass(imp.java:144)
at org.python.core.imp.loadFromSource(imp.java:504)
at org.python.core.imp.find_module(imp.java:410)
at org.python.core.imp.import_next(imp.java:620)
at org.python.core.imp.import_first(imp.java:650)
at org.python.core.imp.import_name(imp.java:741)
at org.python.core.imp.importName(imp.java:791)
at org.python.core.ImportFunction.__call__(__builtin__.java:1236)
at org.python.core.PyObject.__call__(PyObject.java:367)
at org.python.core.__builtin__.__import__(__builtin__.java:1207)
at org.python.core.__builtin__.__import__(__builtin__.java:1190)
at org.python.core.imp.importOne(imp.java:802)
at logging$py.f$0(c:\home\work\sample\content\Lib\logging\__init__.py:1372)
at logging$py.call_function(c:\home\work\sample\content\Lib\logging\__init__.py)
at org.python.core.PyTableCode.call(PyTableCode.java:165)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.imp.createFromCode(imp.java:325)
at org.python.core.imp.createFromPyClass(imp.java:144)
at org.python.core.imp.loadFromSource(imp.java:504)
at org.python.core.imp.find_module(imp.java:410)
at org.python.core.imp.import_next(imp.java:620)
at org.python.core.imp.import_first(imp.java:650)
at org.python.core.imp.import_name(imp.java:741)
at org.python.core.imp.importName(imp.java:791)
at org.python.core.ImportFunction.__call__(__builtin__.java:1236)
at org.python.core.PyObject.__call__(PyObject.java:367)
at org.python.core.__builtin__.__import__(__builtin__.java:1207)
at org.python.core.__builtin__.__import__(__builtin__.java:1190)
at org.python.core.imp.importOne(imp.java:802)
at helloworld.helloworld$py.f$0(c:\home\work\sample\content\helloworld\helloworld.py:19)
at helloworld.helloworld$py.call_function(c:\home\work\sample\content\helloworld\helloworld.py)
at org.python.core.PyTableCode.call(PyTableCode.java:165)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.imp.createFromCode(imp.java:325)
at org.python.core.imp.createFromPyClass(imp.java:144)
at org.python.core.imp.loadFromSource(imp.java:504)
at org.python.core.imp.find_module(imp.java:410)
at org.python.core.PyModule.impAttr(PyModule.java:109)
at org.python.core.imp.import_next(imp.java:622)
at org.python.core.imp.import_name(imp.java:761)
at org.python.core.imp.importName(imp.java:791)
at org.python.core.ImportFunction.__call__(__builtin__.java:1236)
at org.python.core.PyObject.__call__(PyObject.java:367)
at org.python.core.__builtin__.__import__(__builtin__.java:1207)
at org.python.core.imp.importFromAs(imp.java:869)
at org.python.core.imp.importFrom(imp.java:845)
at org.python.pycode._pyx1.f$0(<string>:1)
at org.python.pycode._pyx1.call_function(<string>)
at org.python.core.PyTableCode.call(PyTableCode.java:165)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.Py.runCode(Py.java:1197)
at org.python.core.Py.exec(Py.java:1241)
at org.python.util.PythonInterpreter.exec(PythonInterpreter.java:138)
My scripts looks like:
from java.util import Random
from java.util import Date
import sys
print(sys.path)
print(sys.builtin_module_names)
import logging
logging.basicConfig(level=logging.WARNING)
logger1 = logging.getLogger('aaa')
logger1.warning('************* This message comes from one module')
def say_hello():
return 'hello world1'
I've tried the following so far but nothing has worked:
Include the zip of the 'Lib' directory in my classpath
Hard-coding the 'Lib' path when i setup the interpreter.
If I do it directly from the interactive Jython shell the script works fine (and a logging message appears).
Thanks.
KJQ
I think for now i've found an answer to my own question...
Basically, i knew it had something to do with my paths but could not figure out how to do them.
I ended up creating a "standalone" version of the Jython jar through the installer (and it includes the /Libs directory) and using that.