In Google collab I get IOPub data rate exceeded - google-colaboratory

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
--NotebookApp.iopub_data_rate_limit.
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

An IOPub error usually occurs when you try to print a large amount of data to the console. Check your print statements - if you're trying to print a file that exceeds 10MB, its likely that this caused the error. Try to read smaller portions of the file/data.
I faced this issue while reading a file from Google Drive to Colab.
I used this link https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb
and the problem was in this block of code
# Download the file we just uploaded.
#
# Replace the assignment below with your file ID
# to download a different file.
#
# A file ID looks like: 1uBtlaggVyWshwcyP6kEI-y_W3P8D26sz
file_id = 'target_file_id'
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(0)
#Remove this print statement
#print('Downloaded file contents are: {}'.format(downloaded.read()))
I had to remove the last print statement since it exceeded the 10MB limit in the notebook - print('Downloaded file contents are: {}'.format(downloaded.read()))
Your file will still be downloaded and you can read it in smaller chunks or read a portion of the file.

The above answer is correct, I just commented the print statement and the error went away. just keeping it here so someone might find it useful. Suppose u are reading a csv file from google drive just import pandas and add pd.read_csv(downloaded) it will work just fine.
file_id = 'FILEID'
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(0)
pd.read_csv(downloaded);

Maybe this will help..
from via sv1997
IOPub Error on Google Colaboratory in Jupyter Notebook
IoPub Error is occurring in Colab because you are trying to display the output on the console itself(Eg. print() statements) which is very large.
The IoPub Error maybe related in print function.
So delete or annotate the print function. It may resolve the error.

%cd darknet
!sed -i 's/OPENCV=0/OPENCV=1/' Makefile
!sed -i 's/GPU=0/GPU=1/' Makefile
!sed -i 's/CUDNN=0/CUDNN=1/' Makefile
!sed -i 's/CUDNN_HALF=0/CUDNN_HALF=1/' Makefile
!apt update
!apt-get install libopencv-dev
its important to update your make file. and also, keep your input file name correct

Related

How to save an updated fits file with headers in correct places?

I want to edit the data in my fits file using astropy and then save it to its original file. Below is my code and the error message, please ignore if there's a redundant line because obviously I opened the file twice but I still get the error after deleting it.
file_list = sorted(glob.glob('*.fits')) #read in my three fits files
hdudata = np.full((3,720,1440), 0) # a test list to store the data
for im in range(len(file_list)):
hdu_list = fits.open(file_list[im])
hdudata[im] = hdu_list[0].data # read in the data from fits file
if im == 2: # I only want to change the last image
with fits.open(file_list[im], mode='update') as hdus:
hdu = hdus[0]
hdu.data = (hdudata[im-1] + hdudata[im])/2. # basically add two images
# and take the average
hdu.close() # this is required otherwise an error message pops up saying
# the next line cannot proceed as the file is being run
hdu.flush() # the error line
VerifyError:
Verification reported errors:
HDU 0:
'NAXIS1' card at the wrong place (card 4).
'NAXIS2' card at the wrong place (card 5).
'EXTEND' card at the wrong place (card 6).
Note: astropy.io.fits uses zero-based indexing.
I have only accessed and changed the data but why is the error taking place in my header, I met no problem reading the headers (though I didn't include in this code above) then why is it faulty when saving it?

Using pandas to open Excel files stored in GCS from command line

The following code snippet is from a Google tutorial, it simply prints the names of files on GCP in a given bucket:
from google.cloud import storage
def list_blobs(bucket_name):
"""Lists all the blobs in the bucket."""
# bucket_name = "your-bucket-name"
storage_client = storage.Client()
# Note: Client.list_blobs requires at least package version 1.17.0.
blobs = storage_client.list_blobs(bucket_name)
for blob in blobs:
print(blob.name)
list_blobs('sn_project_data')
No from the command line I can run:
$ python path/file.py
And in my terminal the files in said bucket are printed out. Great, it works!
However, this isn't quite my goal. I'm looking to open a file and act upon it. For example:
df = pd.read_excel(filename)
print(df.iloc[0])
However, when I pass the path to the above, the error returned reads "invalid file path." So I'm sure there is some sort of GCP specific function call to actually access these files...
What command(s) should I run?
Edit: This video https://www.youtube.com/watch?v=ED5vHa3fE1Q shows a trick to open files and needs to use StringIO in the process. But it doesn't support excel files, so it's not an effective solution.
read_excel() does not support google cloud storage file path as of now but it can read data in bytes.
pandas.read_excel(io, sheet_name=0, header=0, names=None,
index_col=None, usecols=None, squeeze=False, dtype=None, engine=None,
converters=None, true_values=None, false_values=None, skiprows=None,
nrows=None, na_values=None, keep_default_na=True, na_filter=True,
verbose=False, parse_dates=False, date_parser=None, thousands=None,
comment=None, skipfooter=0, convert_float=True, mangle_dupe_cols=True,
storage_options=None)
Parameters: io : str, bytes, ExcelFile, xlrd.Book, path object, or
file-like object
What you can do is use the blob object and use download_as_bytes() to convert the object into bytes.
Download the contents of this blob as a bytes object.
For this example I just used a random sample xlsx file and read the 1st sheet:
from google.cloud import storage
import pandas as pd
bucket_name = "your-bucket-name"
blob_name = "SampleData.xlsx"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(blob_name)
data_bytes = blob.download_as_bytes()
df = pd.read_excel(data_bytes)
print(df)
Test done:

Display log into console with tf_logging

I'm playing with slim.learning.train and want to display the log in the console. According to the source code this is done using tf_logging module :
if 'should_log' in train_step_kwargs:
if sess.run(train_step_kwargs['should_log']):
logging.info('global step %d: loss = %.4f (%.2f sec)',
np_global_step, total_loss, time_elapsed)
I can run my training loop but there's no logs in the console. How can I enable it ?
tf.logging.info() goes to stderr, not stdout, as mentioned here:
https://groups.google.com/a/tensorflow.org/forum/#!msg/discuss/SO_JRts-VIs/JG1x8vOLDAAJ
and i can verify this from my personal experience, having spent several hours twiddling versions of subprocess.Popen()'s parameters, and not capturing the logging. a simple change to stderr=PIPE worked, as in my specific example, which you should be able to generalize to your problem.
training_results = Popen(cmds,shell=False,stderr=PIPE,bufsize=1,executable="python")
for line in iter(training_results.stderr.readline, b''):
print line
if line.startswith("INFO:tensorflow:Final test accuracy"):
tf_final_acc = line
training_results.wait() # wait for the subprocess to exit

Using the BigQuery Connector with Spark

I'm not getting the Google example work
https://cloud.google.com/hadoop/examples/bigquery-connector-spark-example
PySpark
There are a few mistakes in the code i think, like:
'# Output Parameters
'mapred.bq.project.id': '',
Should be: 'mapred.bq.output.project.id': '',
and
'# Write data back into new BigQuery table.
'# BigQueryOutputFormat discards keys, so set key to None.
(word_counts
.map(lambda pair: None, json.dumps(pair))
.saveAsNewAPIHadoopDataset(conf))
will give an error message. If I change it to:
(word_counts
.map(lambda pair: (None, json.dumps(pair)))
.saveAsNewAPIHadoopDataset(conf))
I get the error message:
org.apache.hadoop.io.Text cannot be cast to com.google.gson.JsonObject
And whatever I try I can not make this work.
There is a dataset created in BigQuery with the name I gave it in the 'conf' with a trailing '_hadoop_temporary_job_201512081419_0008'
And a table is created with '_attempt_201512081419_0008_r_000000_0' on the end. But are always empty
Can anybody help me with this?
Thanks
We are working to update the documentation because, as you noted, the docs are incorrect in this case. Sorry about that! While we're working to update the docs, I wanted to get you a reply ASAP.
Casting problem
The most important problem you mention is the casting issue. Unfortunately,PySpark cannot use the BigQueryOutputFormat to create Java GSON objects. The solution (workaround) is to save the output data into Google Cloud Storage (GCS) and then load it manually with the bq command.
Code example
Here is a code sample which exports to GCS and loads the data into BigQuery. You could also use subprocess and Python to execute the bq command programatically.
#!/usr/bin/python
"""BigQuery I/O PySpark example."""
import json
import pprint
import pyspark
sc = pyspark.SparkContext()
# Use the Google Cloud Storage bucket for temporary BigQuery export data used
# by the InputFormat. This assumes the Google Cloud Storage connector for
# Hadoop is configured.
bucket = sc._jsc.hadoopConfiguration().get('fs.gs.system.bucket')
project = sc._jsc.hadoopConfiguration().get('fs.gs.project.id')
input_directory ='gs://{}/hadoop/tmp/bigquery/pyspark_input'.format(bucket)
conf = {
# Input Parameters
'mapred.bq.project.id': project,
'mapred.bq.gcs.bucket': bucket,
'mapred.bq.temp.gcs.path': input_directory,
'mapred.bq.input.project.id': 'publicdata',
'mapred.bq.input.dataset.id': 'samples',
'mapred.bq.input.table.id': 'shakespeare',
}
# Load data in from BigQuery.
table_data = sc.newAPIHadoopRDD(
'com.google.cloud.hadoop.io.bigquery.JsonTextBigQueryInputFormat',
'org.apache.hadoop.io.LongWritable',
'com.google.gson.JsonObject',
conf=conf)
# Perform word count.
word_counts = (
table_data
.map(lambda (_, record): json.loads(record))
.map(lambda x: (x['word'].lower(), int(x['word_count'])))
.reduceByKey(lambda x, y: x + y))
# Display 10 results.
pprint.pprint(word_counts.take(10))
# Stage data formatted as newline delimited json in Google Cloud Storage.
output_directory = 'gs://{}/hadoop/tmp/bigquery/pyspark_output'.format(bucket)
partitions = range(word_counts.getNumPartitions())
output_files = [output_directory + '/part-{:05}'.format(i) for i in partitions]
(word_counts
.map(lambda (w, c): json.dumps({'word': w, 'word_count': c}))
.saveAsTextFile(output_directory))
# Manually clean up the input_directory, otherwise there will be BigQuery export
# files left over indefinitely.
input_path = sc._jvm.org.apache.hadoop.fs.Path(input_directory)
input_path.getFileSystem(sc._jsc.hadoopConfiguration()).delete(input_path, True)
print """
###########################################################################
# Finish uploading data to BigQuery using a client e.g.
bq load --source_format NEWLINE_DELIMITED_JSON \
--schema 'word:STRING,word_count:INTEGER' \
wordcount_dataset.wordcount_table {files}
# Clean up the output
gsutil -m rm -r {output_directory}
###########################################################################
""".format(
files=','.join(output_files),
output_directory=output_directory)

Python file location conventions and Import errors

I'm new to Python and I simply don't know how to handle this specific problem:
I'm trying to run an executable (named ros_M5e.py) that's located in the directory /opt/ros/diamondback/stacks/hrl/hrl_rfid/src/hrl_rfid/ros_M5e.py (annoyingly long filepath, I know, but necessary). However, within the ros_M5e.py file there is a call to another file that is further up the file path: from hrl.hrl_rfid.msg import RFIDread. The directory msg indeed is located at /opt/ros/diamondback/stacks/hrl/hrl_rfid/ and it does indeed contain the file RFIDread. However, whenever I try to execute ros_M5e.py I get this error:
Traceback (most recent call last):
File "/opt/ros/diamondback/stacks/hrl/hrl_rfid/src/hrl_rfid/ros_M5e.py", line 37, in <module>
from hrl.hrl_rfid.msg import RFIDread
ImportError: No module named hrl.hrl_rfid.msg
Would someone with some expertise please shine some light on this problem for me? It seems like just a rudimentary file location problem, but I just don't know the appropriate Python conventions to fix it. I've tried putting the ros_M5e.py file in the same directory as the files it calls and changing the filepaths but to no avail.
Thanks a lot,
Khiya
Sure, I can help you get it up and running.
From the StackOverflow posting, it would seem that you're checking out the stack to /opt/ros/diamondback. This is no good, as it is a system path. You need to install into your local path. The reason for "readonly" on the repository is that you do not have permissions to make changes to the code -- it will still work just fine for you on your local machine. I spent a fair amount of time showing how to use this package (at least the python version) here:
http://www.ros.org/wiki/hrl_rfid
I'll try to do a quick run-through for installing it.... Run the following commands:
cd
mkdir sandbox
cd sandbox/
svn checkout http://gt-ros-pkg.googlecode.com/svn/trunk/hrl/hrl_rfid hrl_rfid (double-check that this checkout works OK!)
Add the following line to the bottom of your bashrc to tell ROS where to find the new package. (You may use "gedit ~/.bashrc")
export ROS_PACKAGE_PATH=$ROS_PACKAGE_PATH:$HOME/sandbox/hrl_rfid
Now execute the following:
roscd hrl_rfid (did you end up in the correct directory?)
rosmake hrl_rfid (did it make without errors?)
roscd hrl_rfid/src/hrl_rfid
At this point everything is actually installed correctly. By default, ros_M5e.py assumes that the reader is located at "/dev/robot/RFIDreader". Unless you've already altered udev rules, this will not be the case on your machine. I suggest running through the code:
http://www.ros.org/wiki/hrl_rfid
using iPython (a command-line python prompt that will let you execute python commands one at a time) to make sure everything is working (replace /dev/ttyUSB0 with whatever device your RFID reader is connected as):
import lib_M5e as M5e
r = M5e.M5e( '/dev/ttyUSB0', readPwr = 3000 )
r.ChangeAntennaPorts( 1, 1 )
r.QueryEnvironment()
r.TrackSingleTag( 'test_tag_id_' )
r.ChangeTagID( 'test_tag_id_' )
r.QueryEnvironment()
r.TrackSingleTag( 'test_tag_id_' )
r.ChangeAntennaPorts( 2, 2 )
r.QueryEnvironment()
This means that the underlying library is working just fine. Next, test ROS (make sure "roscore" is running!), by putting this in a python file and executing:
import lib_M5e as M5e
def P1(r):
r.ChangeAntennaPorts(1,1)
return 'AntPort1'
def P2(r):
r.ChangeAntennaPorts(2,2)
return 'AntPort2'
def PrintDatum(data):
ant, ids, rssi = data
print data
r = M5e.M5e( '/dev/ttyUSB0', readPwr = 3000 )
q = M5e.M5e_Poller(r, antfuncs=[P1, P2], callbacks=[PrintDatum])
q.query_mode()
t0 = time.time()
while time.time() - t0 < 3.0:
time.sleep( 0.1 )
q.track_mode( 'test_tag_id_' )
t0 = time.time()
while time.time() - t0 < 3.0:
time.sleep( 0.1 )
q.stop()
OK, everything works now. You can make your own node that is tuned to your setup:
#!/usr/bin/python
import ros_M5e as rm
def P1(r):
r.ChangeAntennaPorts(1,1)
return 'AntPort1'
def P2(r):
r.ChangeAntennaPorts(2,2)
return 'AntPort2'
ros_rfid = rm.ROS_M5e( name = 'my_rfid_server',
readPwr = 3000,
portStr = '/dev/ttyUSB0',
antFuncs = [P1, P2],
callbacks = [] )
rospy.spin()
ros_rfid.stop()
Or, ping me back and I can tweak ros_M5e.py to take an optional "portStr" -- though I recommend making your own so that you can name your antennas sensibly. Also, I highly recommend setting udev rules to ensure that the RFID reader always gets assigned to the same device: http://www.hsi.gatech.edu/hrl-wiki/index.php/Linux_Tools#udev_for_persistent_device_naming
BUS=="usb", KERNEL=="ttyUSB*", SYSFS{idVendor}=="0403", SYSFS{idProduct}=="6001", SYSFS{serial}=="ftDXR6FS", SYMLINK+="robot/RFIDreader"
If you do not do this... there is no guarantee that the reader will always be enumerated at /dev/ttyUSBx.
Let me know if you have any further problems.
~Travis Deyle (Hizook.com)
PS -- Did you modify ros_M5e.py to "from hrl.hrl_rfid.msg import RFIDread"? In the repo, it is "from hrl_rfid.msg import RFIDread". The latter is correct. As long as you have your ROS_PACKAGE_PATH correctly defined, and you've run rosmake on the package, then the import statement should work just fine. Also, I would not recommend posting ROS-related questions to StackOverflow. Very few people on here are going to be familiar with the ROS ecosystem (which is VERY complex). Please post questions here instead:
http://answers.ros.org/
http://code.google.com/p/gt-ros-pkg/issues/list
You need to make sure that following are true:
Directory /opt/ros/diamondback/stacks/ is in your python path.
/opt/ros/diamondback/stacks/hr1 contains __init__.py
/opt/ros/diamondback/stacks/hr1/hr1_rfid contians __init__.py
/opt/ros/diamondback/stacks/hr1/hr1_rfid/msg contians __init__.py
As the asker explained in comments that the RFIDRead does not have .py extension, so here is how that can be imported.
import imp
imp.load_source('RFIDRead', '/opt/ros/diamondback/stacks/hr1/hr1_rfid/msg/RFIDRead.msg')
Check out imp documentation for more information.