How to load python modules in Rstudio instance

How to load python modules in Rstudio instance - module

Reproducible example.
From bash try:
$ R
> library(rPython)
> python.exec("help('modules')")
I get loads, e.g.
blaze itertools scipy
blz itsdangerous see
Now try the same in RStudio:
> library(rPython)
> python.exec("help('modules')")
Crash!
Update
I've tried this on RStudio 0.99.451 (old dev version) and 0.99.467 (current latest - https://www.rstudio.com/products/rstudio/download/ ).

Related

Pipeline keeps failing at test and lint stage

I am pretty new to coding so I am sorry if I'm not giving enough context. I am trying to package and deploy a model in Python to a repo but keep getting the follow error when I push to GitLab
ERROR: No matching distribution found for numpy==1.23.1
$ if [[ "${COVERAGE_ARGS}" == "" ]]; then
$ dir_loc=`pwd`
$ COVERAGE_ARGS="--cov=${dir_loc}"
$ fi
$ pytest --junitxml=reports/report.xml --cov-config=${COV_CONFIG} ${COVERAGE_ARGS} ${EXTRA_ARGS}
============================= test session starts ==============================
platform linux -- Python 3.7.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /builds/Ahs1yUQY/1/dse/dscp/dse-dscp-python-hoffinator
plugins: cov-3.0.0, requests-mock-1.9.3
collected 0 items / 1 error
==================================== ERRORS ====================================
______________________ ERROR collecting test/test_main.py ______________________
ImportError while importing test module '/builds/Ahs1yUQY/1/dse/dscp/dse-dscp-python-hoffinator/test/test_main.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib64/python3.7/importlib/__init__.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
test/test_main.py:5: in <module>
from src.hoffinator import hoffinator_functions
src/hoffinator/hoffinator_functions.py:3: in <module>
import numpy as np
E ModuleNotFoundError: No module named 'numpy'
- generated xml file: /builds/Ahs1yUQY/1/dse/dscp/dse-dscp-python-hoffinator/reports/report.xml -
---------- coverage: platform linux, python 3.7.10-final-0 -----------
My requirements.txt for dependencies is as follows
ccplatlogging
boto3~=1.20.33
email-validator~=1.1.3
appnope==0.1.3
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
asttokens==2.0.5
attrs==21.4.0
backcall==0.2.0
beautifulsoup4==4.11.1
bleach==5.0.1
cffi==1.15.1
cycler==0.11.0
debugpy==1.6.2
decorator==5.1.1
defusedxml==0.7.1
entrypoints==0.4
executing==0.8.3
fastjsonschema==2.16.1
fonttools==4.34.4
iniconfig==1.1.1
ipykernel==6.15.1
ipython==7.34.0
ipython-genutils==0.2.0
ipywidgets==7.7.1
jedi==0.18.1
Jinja2==3.1.2
joblib==1.1.0
jsonschema==4.7.2
jupyter==1.0.0
jupyter-client==7.3.4
jupyter-console==6.4.4
jupyter-core==4.11.1
jupyterlab-pygments==0.2.2
jupyterlab-widgets==1.1.1
kiwisolver==1.4.4
MarkupSafe==2.1.1
matplotlib==3.5.2
matplotlib-inline==0.1.3
mistune==0.8.4
nbclient==0.6.6
nbconvert==6.5.0
nbformat==5.4.0
nest-asyncio==1.5.5
notebook==6.4.12
numpy==1.23.1
packaging==21.3
pandas==1.4.3
pandocfilters==1.5.0
parso==0.8.3
patsy==0.5.2
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.2.0
pluggy==1.0.0
ppscore==1.2.0
prometheus-client==0.14.1
prompt-toolkit==3.0.30
psutil==5.9.1
ptyprocess==0.7.0
pure-eval==0.2.2
py==1.11.0
pycparser==2.21
pycryptodome==3.15.0
Pygments==2.12.0
pyodbc==4.0.34
pyparsing==3.0.9
pyrsistent==0.18.1
pytest==7.1.2
python-dateutil==2.8.2
pytz==2022.1
pyzmq==23.2.0
qtconsole==5.3.1
QtPy==2.1.0
scikit-learn==0.24.2
scipy==1.8.1
seaborn==0.11.2
Send2Trash==1.8.0
six==1.16.0
soupsieve==2.3.2.post1
stack-data==0.3.0
statsmodels==0.13.2
teradatasql==17.20.0.0
terminado==0.15.0
threadpoolctl==3.1.0
tinycss2==1.1.1
tomli==2.0.1
tornado==6.2
traitlets==5.3.0
wcwidth==0.2.5
webencodings==0.5.1
widgetsnbextension==3.6.1
I do not understand why numpy is not found when it is cleary in the .txt. Is it a version issue?

Your console output shows that you are using Python 3.7.10. If you take a look at numpy release notes for 1.23.1 found here you’ll notice that it states:
The Python version supported for this release are 3.8-3.10.
You will need to either use a newer version of Python or use an older version of numpy.

Getting Error with Pytube: signature = cipher.get_signature(js, stream['s']) KeyError: 's'

I have been this error when running my Pytube script:
signature = cipher.get_signature(js, stream['s'])
KeyError: 's'
My code is like this:
url = 'https://www.youtube.com/watch?v='
train_List = []
i = 0
while i < len(my_list):
if len(my_list[i]) > 6:
urls = url + my_list[i]
train_List.append(urls)
yt=YouTube(train_List[i])
t=yt.streams.filter(only_audio=True).all()
t[0].download('/pathtofolder')
i+=1
I also tried:
t=yt.streams.filter(file_extension='mp4').all()
I altered the cipher.py and helper.py files per the recommendation here: https://github.com/nficano/pytube/issues/353#issuecomment-455116197
but it hasn't fixed the problem. After making the alterations, I got the error noted above.
Next, I ran "pip install pytube --upgrade" per some other recommendations. Still getting the KeyError after it downloads a few audio files.
I've also implemented this in mixins.py, per the github issues:
if ('signature=' in url) or ('&sig=' in url) or ('&lsig=' in url):
but now it's hanging after 3 uploads.
Does anyone have a fix for this?

I fixed this issue (for me, at least) by changing line 49 in mixins.py to this:
signature = cipher.get_signature(js, stream['url'])
instead of
signature = cipher.get_signature(js, stream['s'])
and then changing lines 55-63 to
logger.debug(
'finished descrambling signature for itag=%s\n%s',
stream['itag'], pprint.pformat(
{
'url': stream['url'],
'signature': signature,
}, indent=2,
),
)

Have the same problem. If I try a second time it works most of the time...
url = 'https://www.youtube.com/watch?v=gQrkvZeE3Uc'
yt = YouTube(url)
yt.streams.filter(progressive=True, subtype='mp4').order_by('resolution').desc().last().download()
Thanks for any help

You have to fix the cipher.py file in lib/python3.6/site-packages/pytube:
In cipher.py, change pattern starting at line 38 to this:
pattern = [
r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\.sig\|\|(?P<sig>[a-zA-Z0-9$]+)\(',
r'yt\.akamaized\.net/\)\s*\|\|\s*.*?\s*c\s*&&\s*d\.set\([^,]+\s*,\s*(?:encodeURIComponent\s*\()?(?P<sig>[a-zA-Z0-9$]+)\(',
r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*(?:encodeURIComponent\s*\()?\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\bc\s*&&\s*d\.set\([^,]+\s*,\s*\([^)]*\)\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
]

One of the ways to solve it: pip install git+https://github.com/nficano/pytube.git
Other than that, check this thread

I did pip uninstall pytube.
Checked my python version with python --version, turned out to be the Anaconda 3.6.8 version.
So I did pip3 install pytube, pip3 being for Python 3 I believe.
It works no problems now.

had same issue, based on other answers that installed it from sources, just checked my current pytube version, which was 9.5.0 and see that the last released is 9.5.2, so I just run pip install --upgrade pytube and worked perfectly

ipython - pandasql.sqldf doesn't return an error

After importing
import pandas
import pandasql
and running:
q = """
select min(cast (maxtempi as integer))
from weather_data
where min(cast (maxtempi as integer)) >55
"""
print pandasql.sqldf(q.lower(), locals())
None
is returned and no result sets or error. Obviously the error is in the where clause.
How do I print an error from pandasql.sqldf?

Normal, it should be fine and no error occurs.
please also notice and check followings;
(1)is weather_data an instance of DataFrame?
(2)is pandasql installed? If you use PyCharm, please also restart your PyCharm to let new installed pandasql package works

How can I inspect a Hadoop SequenceFile for which I lack full schema information?

I have a compressed Hadoop SequenceFile from a customer which I'd like to inspect. I do not have full schema information at this time (which I'm working on separately).
But in the interim (and in the hopes of a generic solution), what are my options for inspecting the file?
I found a tool forqlift: http://www.exmachinatech.net/01/forqlift/
And have tried 'forqlift list' on the file. It complains that it can't load classes for the custom subclass Writables included. So I will need to track down those implementations.
But is there any other option available in the meantime? I understand that most likely I can't extract the data, but is there some tool for scanning how many key values and of what type?

From shell:
$ hdfs dfs -text /user/hive/warehouse/table_seq/000000_0
or directly from hive (which is much faster for small files, because it is running in an already started JVM)
hive> dfs -text /user/hive/warehouse/table_seq/000000_0
works for sequence files.

Check the SequenceFileReadDemo class in the 'Hadoop : The Definitive Guide'- Sample Code. The sequence files have the key/value types embedded in them. Use the SequenceFile.Reader.getKeyClass() and SequenceFile.Reader.getValueClass() to get the type information.

My first thought would be to use the Java API for sequence files to try to read them. Even if you don't know which Writable is used by the file, you can guess and check the error messages (there may be a better way that I don't know).
For example:
private void readSeqFile(Path pathToFile) throws IOException {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
SequenceFile.Reader reader = new SequenceFile.Reader(fs, pathToFile, conf);
Text key = new Text(); // this could be the wrong type
Text val = new Text(); // also could be wrong
while (reader.next(key, val)) {
System.out.println(key + ":" + val);
}
}
This program would crash if those are the wrong types, but the Exception should say which Writable type the key and value actually are.
Edit:
Actually if you do less file.seq usually you can read some of the header and see what the Writable types are (at least for the first key/value). On one file, for example, I see:
SEQ^F^Yorg.apache.hadoop.io.Text"org.apache.hadoop.io.BytesWritable

I'm not a Java or Hadoop programmer, so my way of solving problem could be not the best one, but anyway.
I spent two days solving the problem of reading FileSeq locally (Linux debian amd64) without installation of hadoop.
The provided sample
while (reader.next(key, val)) {
System.out.println(key + ":" + val);
}
works well for Text, but didn't work for BytesWritable compressed input data.
What I did?
I downloaded this utility for creating (writing SequenceFiles Hadoop data)
github_com/shsdev/sequencefile-utility/archive/master.zip
, and got it working, then modified for reading input Hadoop SeqFiles.
The instruction for Debian running this utility from scratch:
sudo apt-get install maven2
sudo mvn install
sudo apt-get install openjdk-7-jdk
edit "sudo vi /usr/bin/mvn",
change `which java` to `which /usr/lib/jvm/java-7-openjdk-amd64/bin/java`
Also I've added (probably not required)
'
PATH="/home/mine/perl5/bin${PATH+:}${PATH};/usr/lib/jvm/java-7-openjdk-amd64/"; export PATH;
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
export JAVA_VERSION=1.7
'
to ~/.bashrc
Then usage:
sudo mvn install
~/hadoop_tools/sequencefile-utility/sequencefile-utility-master$ /usr/lib/jvm/java-7-openjdk-amd64/bin/java -jar ./target/sequencefile-utility-1.0-jar-with-dependencies.jar
-- and this doesn't break the default java 1.6 installation that is required for FireFox/etc.
For resolving error with FileSeq compatability (e.g. "Unable to load native-hadoop library for your platform... using builtin-java classes where applicable"), I used the libs from the Hadoop master server as is (a kind of hack):
scp root#10.15.150.223:/usr/lib/libhadoop.so.1.0.0 ~/
sudo cp ~/libhadoop.so.1.0.0 /usr/lib/
scp root#10.15.150.223:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/amd64/server/libjvm.so ~/
sudo cp ~/libjvm.so /usr/lib/
sudo ln -s /usr/lib/libhadoop.so.1.0.0 /usr/lib/libhadoop.so.1
sudo ln -s /usr/lib/libhadoop.so.1.0.0 /usr/lib/libhadoop.so
One night drinking coffee, and I've written this code for reading FileSeq hadoop input files (using this cmd for running this code "/usr/lib/jvm/java-7-openjdk-amd64/bin/java -jar ./target/sequencefile-utility-1.3-jar-with-dependencies.jar -d test/ -c NONE"):
import org.apache.hadoop.io.*;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.ValueBytes;
import java.io.DataOutputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
Path file = new Path("/home/mine/mycompany/task13/data/2015-08-30");
reader = new SequenceFile.Reader(fs, file, conf);
long pos = reader.getPosition();
logger.info("GO from pos "+pos);
DataOutputBuffer rawKey = new DataOutputBuffer();
ValueBytes rawValue = reader.createValueBytes();
int DEFAULT_BUFFER_SIZE = 1024 * 1024;
DataOutputBuffer kobuf = new DataOutputBuffer(DEFAULT_BUFFER_SIZE);
kobuf.reset();
int rl;
do {
rl = reader.nextRaw(kobuf, rawValue);
logger.info("read len for current record: "+rl+" and in more details ");
if(rl >= 0)
{
logger.info("read key "+new String(kobuf.getData())+" (keylen "+kobuf.getLength()+") and data "+rawValue.getSize());
FileOutputStream fos = new FileOutputStream("/home/mine/outb");
DataOutputStream dos = new DataOutputStream(fos);
rawValue.writeUncompressedBytes(dos);
kobuf.reset();
}
} while(rl>0);
I've just added this chunk of code to file src/main/java/eu/scape_project/tb/lsdr/seqfileutility/SequenceFileWriter.java just after the line
writer = SequenceFile.createWriter(fs, conf, path, keyClass,
valueClass, CompressionType.get(pc.getCompressionType()));
Thanks to these sources of info:
Links:
If using hadoop-core instead of mahour, then will have to download asm-3.1.jar manually:
search_maven_org/remotecontent?filepath=org/ow2/util/asm/asm/3.1/asm-3.1.jar
search_maven_org/#search|ga|1|asm-3.1
The list of avaliable mahout repos:
repo1_maven_org/maven2/org/apache/mahout/
Intro to Mahout:
mahout_apache_org/
Good resource for learning interfaces and sources of Hadoop Java classes (I used it for writing my own code for reading FileSeq):
http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.action/0.2.7/org/apache/hadoop/io/BytesWritable.java
Sources of project tb-lsdr-seqfilecreator that I used for creating my own project FileSeq reader:
www_javased_com/?source_dir=scape/tb-lsdr-seqfilecreator/src/main/java/eu/scape_project/tb/lsdr/seqfileutility/ProcessParameters.java
stackoverflow_com/questions/5096128/sequence-files-in-hadoop - the same example (read key,value that doesn't work)
https://github.com/twitter/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/RawSequenceFileRecordReader.java - this one helped me (I used reader.nextRaw the same as in nextKeyValue() and other subs)
Also I've changed ./pom.xml for native apache.hadoop instead of mahout.hadoop, but probably this is not required, because the bugs for read->next(key, value) are the same for both so I had to use read->nextRaw(keyRaw, valueRaw) instead:
diff ../../sequencefile-utility/sequencefile-utility-master/pom.xml ./pom.xml
9c9
< <version>1.0</version>
---
> <version>1.3</version>
63c63
< <version>2.0.1</version>
---
> <version>2.4</version>
85c85
< <groupId>org.apache.mahout.hadoop</groupId>
---
> <groupId>org.apache.hadoop</groupId>
87c87
< <version>0.20.1</version>
---
> <version>1.1.2</version>
93c93
< <version>1.1</version>
---
> <version>1.1.3</version>

I was just playing with Dumbo. When you run a Dumbo job on a Hadoop cluster, the output is a sequence file. I used the following to dump out an entire Dumbo-generated sequence file as plain text:
$ bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar \
-input totals/part-00000 \
-output unseq \
-inputformat SequenceFileAsTextInputFormat
$ bin/hadoop fs -cat unseq/part-00000
I got the idea from here.
Incidentally, Dumbo can also output plain text.

Following the anwer of Praveen Sripati, here a small example of SequenceFileReadDemo.java from Hadoop the Definitive Guide by Tom White.
Data are in HDFS, in this position : user/hduser/output-hashsort/ and the file is
part-r-00001
In eclipse, in the Arguments folder I've written this string :
and this is part of the output, with the debugger

Python file location conventions and Import errors

I'm new to Python and I simply don't know how to handle this specific problem:
I'm trying to run an executable (named ros_M5e.py) that's located in the directory /opt/ros/diamondback/stacks/hrl/hrl_rfid/src/hrl_rfid/ros_M5e.py (annoyingly long filepath, I know, but necessary). However, within the ros_M5e.py file there is a call to another file that is further up the file path: from hrl.hrl_rfid.msg import RFIDread. The directory msg indeed is located at /opt/ros/diamondback/stacks/hrl/hrl_rfid/ and it does indeed contain the file RFIDread. However, whenever I try to execute ros_M5e.py I get this error:
Traceback (most recent call last):
File "/opt/ros/diamondback/stacks/hrl/hrl_rfid/src/hrl_rfid/ros_M5e.py", line 37, in <module>
from hrl.hrl_rfid.msg import RFIDread
ImportError: No module named hrl.hrl_rfid.msg
Would someone with some expertise please shine some light on this problem for me? It seems like just a rudimentary file location problem, but I just don't know the appropriate Python conventions to fix it. I've tried putting the ros_M5e.py file in the same directory as the files it calls and changing the filepaths but to no avail.
Thanks a lot,
Khiya

Sure, I can help you get it up and running.
From the StackOverflow posting, it would seem that you're checking out the stack to /opt/ros/diamondback. This is no good, as it is a system path. You need to install into your local path. The reason for "readonly" on the repository is that you do not have permissions to make changes to the code -- it will still work just fine for you on your local machine. I spent a fair amount of time showing how to use this package (at least the python version) here:
http://www.ros.org/wiki/hrl_rfid
I'll try to do a quick run-through for installing it.... Run the following commands:
cd
mkdir sandbox
cd sandbox/
svn checkout http://gt-ros-pkg.googlecode.com/svn/trunk/hrl/hrl_rfid hrl_rfid (double-check that this checkout works OK!)
Add the following line to the bottom of your bashrc to tell ROS where to find the new package. (You may use "gedit ~/.bashrc")
export ROS_PACKAGE_PATH=$ROS_PACKAGE_PATH:$HOME/sandbox/hrl_rfid
Now execute the following:
roscd hrl_rfid (did you end up in the correct directory?)
rosmake hrl_rfid (did it make without errors?)
roscd hrl_rfid/src/hrl_rfid
At this point everything is actually installed correctly. By default, ros_M5e.py assumes that the reader is located at "/dev/robot/RFIDreader". Unless you've already altered udev rules, this will not be the case on your machine. I suggest running through the code:
http://www.ros.org/wiki/hrl_rfid
using iPython (a command-line python prompt that will let you execute python commands one at a time) to make sure everything is working (replace /dev/ttyUSB0 with whatever device your RFID reader is connected as):
import lib_M5e as M5e
r = M5e.M5e( '/dev/ttyUSB0', readPwr = 3000 )
r.ChangeAntennaPorts( 1, 1 )
r.QueryEnvironment()
r.TrackSingleTag( 'test_tag_id_' )
r.ChangeTagID( 'test_tag_id_' )
r.QueryEnvironment()
r.TrackSingleTag( 'test_tag_id_' )
r.ChangeAntennaPorts( 2, 2 )
r.QueryEnvironment()
This means that the underlying library is working just fine. Next, test ROS (make sure "roscore" is running!), by putting this in a python file and executing:
import lib_M5e as M5e
def P1(r):
r.ChangeAntennaPorts(1,1)
return 'AntPort1'
def P2(r):
r.ChangeAntennaPorts(2,2)
return 'AntPort2'
def PrintDatum(data):
ant, ids, rssi = data
print data
r = M5e.M5e( '/dev/ttyUSB0', readPwr = 3000 )
q = M5e.M5e_Poller(r, antfuncs=[P1, P2], callbacks=[PrintDatum])
q.query_mode()
t0 = time.time()
while time.time() - t0 < 3.0:
time.sleep( 0.1 )
q.track_mode( 'test_tag_id_' )
t0 = time.time()
while time.time() - t0 < 3.0:
time.sleep( 0.1 )
q.stop()
OK, everything works now. You can make your own node that is tuned to your setup:
#!/usr/bin/python
import ros_M5e as rm
def P1(r):
r.ChangeAntennaPorts(1,1)
return 'AntPort1'
def P2(r):
r.ChangeAntennaPorts(2,2)
return 'AntPort2'
ros_rfid = rm.ROS_M5e( name = 'my_rfid_server',
readPwr = 3000,
portStr = '/dev/ttyUSB0',
antFuncs = [P1, P2],
callbacks = [] )
rospy.spin()
ros_rfid.stop()
Or, ping me back and I can tweak ros_M5e.py to take an optional "portStr" -- though I recommend making your own so that you can name your antennas sensibly. Also, I highly recommend setting udev rules to ensure that the RFID reader always gets assigned to the same device: http://www.hsi.gatech.edu/hrl-wiki/index.php/Linux_Tools#udev_for_persistent_device_naming
BUS=="usb", KERNEL=="ttyUSB*", SYSFS{idVendor}=="0403", SYSFS{idProduct}=="6001", SYSFS{serial}=="ftDXR6FS", SYMLINK+="robot/RFIDreader"
If you do not do this... there is no guarantee that the reader will always be enumerated at /dev/ttyUSBx.
Let me know if you have any further problems.
~Travis Deyle (Hizook.com)
PS -- Did you modify ros_M5e.py to "from hrl.hrl_rfid.msg import RFIDread"? In the repo, it is "from hrl_rfid.msg import RFIDread". The latter is correct. As long as you have your ROS_PACKAGE_PATH correctly defined, and you've run rosmake on the package, then the import statement should work just fine. Also, I would not recommend posting ROS-related questions to StackOverflow. Very few people on here are going to be familiar with the ROS ecosystem (which is VERY complex). Please post questions here instead:
http://answers.ros.org/
http://code.google.com/p/gt-ros-pkg/issues/list

You need to make sure that following are true:
Directory /opt/ros/diamondback/stacks/ is in your python path.
/opt/ros/diamondback/stacks/hr1 contains __init__.py
/opt/ros/diamondback/stacks/hr1/hr1_rfid contians __init__.py
/opt/ros/diamondback/stacks/hr1/hr1_rfid/msg contians __init__.py
As the asker explained in comments that the RFIDRead does not have .py extension, so here is how that can be imported.
import imp
imp.load_source('RFIDRead', '/opt/ros/diamondback/stacks/hr1/hr1_rfid/msg/RFIDRead.msg')
Check out imp documentation for more information.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to load python modules in Rstudio instance - module

Related

Pipeline keeps failing at test and lint stage

Getting Error with Pytube: signature = cipher.get_signature(js, stream['s']) KeyError: 's'

ipython - pandasql.sqldf doesn't return an error

How can I inspect a Hadoop SequenceFile for which I lack full schema information?

Python file location conventions and Import errors

Categories

Resources