I can type in the following code in the terminal, and it works:
for i in range(5):
print(i)
And it will print:
0
1
2
3
4
as expected. However, I tried to write a script that does a similar thing:
print(current_chunk.data)
read_chunk(file, current_chunk)
numVerts, numFaces, numEdges = current_chunk.data
print(current_chunk.data)
print(numVerts)
for vertex in range(numVerts):
print("Hello World")
current_chunk.data is gained from the following method:
def read_chunk(file, chunk):
line = file.readline()
while line.startswith('#'):
line = file.readline()
chunk.data = line.split()
The output for this is:
['OFF']
['490', '518', '0']
490
Traceback (most recent call last):
File "/home/leif/src/install/linux2/.blender/scripts/io/import_scene_off.py", line 88, in execute
load_off(self.properties.path, context)
File "/home/leif/src/install/linux2/.blender/scripts/io/import_scene_off.py", line 68, in load_off
for vertex in range(numVerts):
TypeError: 'str' object cannot be interpreted as an integer
So, why isn't it spitting out Hello World 490 times? Or is the 490 being thought of as a string?
I opened the file like this:
def load_off(filename, context):
file = open(filename, 'r')
'490' is a string. Try int('490').
Sigh, never mind, it did turn out to by evaluated as a string. Changing the for loop to
for vertex in range(int(numVerts)):
print("Hello World")
fixed the problem.
Related
Data which, for me, generates an exception instead of invoking the 'on_bad_lines' handler is at:
https://opencalaccess.org/misc/NAMES_CD.TSV
I have this:
bad_lines = list()
def bad_line_finder(x):
bad_lines.append(str(x))
return None
for file in os.listdir(dir):
bad_lines = list()
try:
for df in pd.read_csv(f"{dir}/{file}",
sep='\t',
on_bad_lines=bad_line_finder,
engine='python',
chunksize=1000):
print(f"\n{target}")
df.info()
print(f"Bad Lines: {bad_lines}")
bad_lines = list()
except:
print("EXCEPTION:")
traceback.print_exc()
and this works great. There are errors in the files and the method handles them so that I can keep track of them. Except, why do i still see this:
EXCEPTION:
Traceback (most recent call last):
File "/home/ray/Projects/opencalaccess-data/import.py", line 41, in <module>
for df in pd.read_csv(f"{dir}/{file}",
File "/home/ray/Projects/opencalaccess-data/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1698, in __next__
return self.get_chunk()
File "/home/ray/Projects/opencalaccess-data/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1810, in get_chunk
return self.read(nrows=size)
File "/home/ray/Projects/opencalaccess-data/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/home/ray/Projects/opencalaccess-data/.venv/lib/python3.10/site-packages/pandas/io/parsers/python_parser.py", line 250, in read
content = self._get_lines(rows)
File "/home/ray/Projects/opencalaccess-data/.venv/lib/python3.10/site-packages/pandas/io/parsers/python_parser.py", line 1114, in _get_lines
new_rows.append(next(self.data))
_csv.Error: ' ' expected after '"'
What is the "on_bad_lines" option doing if it does not handle all of the bad lines? Which of them will it handle and which will it not?
This is a government data source. There are format errors in the data that cannot be corrected by the agency, because they constitute the 0fficial record. So, I must fix them myself. But which of them throw exceptions and which do not?
I've been fighting an unexpected behavior when attempting to construct a subclass of numpy ndarray within a map call to a pyspark RDD. Specifically, the attribute that I added within the ndarray subclass appears to be stripped from the resulting RDD.
The following snippets contain the essence of the issue.
import numpy as np
class MyArray(np.ndarray):
def __new__(cls,shape,extra=None,*args):
obj = super().__new__(cls,shape,*args)
obj.extra = extra
return obj
def __array_finalize__(self,obj):
if obj is None:
return
self.extra = getattr(obj,"extra",None)
def shape_to_array(shape):
rval = MyArray(shape,extra=shape)
rval[:] = np.arange(np.product(shape)).reshape(shape)
return rval
If I invoke shape_to_array directly (not under pyspark), it behaves as expected:
x = shape_to_array((2,3,5))
print(x.extra)
outputs:
(2, 3, 5)
But, if I invoke shape_to_array via a map to an RDD of inputs, it goes wonky:
from pyspark.sql import SparkSession
sc = SparkSession.builder.appName("Steps").getOrCreate().sparkContext
rdd = sc.parallelize([(2,3,5),(2,4),(2,5)])
result = rdd.map(shape_to_array).cache()
print(result.map(lambda t:type(t)).collect())
print(result.map(lambda t:t.shape).collect())
print(result.map(lambda t:t.extra).collect())
Outputs:
[<class '__main__.MyArray'>, <class '__main__.MyArray'>, <class '__main__.MyArray'>]
[(2, 3, 5), (2, 4), (2, 5)]
22/10/15 15:48:02 ERROR Executor: Exception in task 7.0 in stage 2.0 (TID 23)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/local/Cellar/apache-spark/3.3.0/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 686, in main
process()
File "/usr/local/Cellar/apache-spark/3.3.0/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 678, in process
serializer.dump_stream(out_iter, outfile)
File "/usr/local/Cellar/apache-spark/3.3.0/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 273, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/usr/local/Cellar/apache-spark/3.3.0/libexec/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper
return f(*args, **kwargs)
File "/var/folders/w7/42_p7mcd1y91_tjd0jzr8zbh0000gp/T/ipykernel_94831/2519313465.py", line 1, in <lambda>
AttributeError: 'MyArray' object has no attribute 'extra'
What happened to the extra attribute of the MyArray instances?
Thanks much for any/all suggestions
EDIT: A bit of additional info. If I add logging inside the shape_to_array function just before the return, I can verify that the extra attribute does exist on the DataArray object that is being returned. But when I attempt to access the DataArray elements in the RDD from the main driver, they're gone.
After a night of sleeping on this, I remembered that I have often had issues with pyspark RDDs where the error message had to do the return type not working with pickle.
I wasn't getting that error message this time because numpy.ndarray does work with pickle. BUT... the __reduce__ and __setstate__ methods of numpy.ndarray known nothing of the added extra attribute on the MyArray subclass. This is where extra was being stripped.
Adding the following two methods to MyArray solved everything.
def __reduce__(self):
mthd,cls,args = super().__reduce__(self)
return mthd, cls, args + (self.extra,)
def __setstate__(self,args):
super().__setstate__(args[:-1])
self.extra = args[-1]
Thank you to anyone who took some time to think about my question.
I have the following Python function:
def bet():
my_bet = -1
while my_bet < 0 or my_bet > 49:
print("before input")
my_bet = input("Enter a bet: ")
print("after input")
try:
my_bet = int(my_bet)
if my_bet < 0 or my_bet > 49:
print("Bet must be between 0 and 49")
except:
print("Need a number between 0 and 49")
continue
return my_bet
When I run this under Windows 10, I am seeing the following behavior:
>>> bet()
before input
enter a bet: ddd
after input
need a number between 0 and 49
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in bet
TypeError: '<' not supported between instances of 'str' and 'int'
I have also tried moving the if statement after the conversion to int into the else clause of the try statement, with the exact same result. It looks like the conversion is run, raises the exception, then the except block is run, but instead of continuing the loop, it executes the rest of the 'try' code block. According to the Python description, I would not expect the rest of the try block after the exception is raised. What am I doing wrong?
Thanks Vorspring durch Technik for pointing out what should have been obvious: after the continue, what is executed is the test of the while loop which is the same as that of the if statement.
I am new to phraseMatcher and want to extract some keyword from my emails.
Everything is working well except that I can't get a name of added matcher.
This is my code below:
def main():
patterns_months = 'phraseMatcher/months.txt'
text_loc = 'phraseMatcher/text.txt'
nlp = spacy.blank('en')
nlp.vocab.lex_attr_getters ={}
phrases_months = read_gazetter(patterns_months)
txts = read_text(text_loc, n=n)
months = [nlp(text) for text in phrases_months]
matcher = PhraseMatcher(nlp.vocab)
matcher.add('MONTHS', None, *months)
print(nlp.vocab.strings['MONTHS'])
for txt in txts:
doc = nlp(txt)
matches = matcher(doc)
for match_id ,start, end in matches:
span = doc[start: end]
label = nlp.vocab.strings[match_id]
print(label, span.text, start, end)
The result:
12298211501233906429 <--- this is from print(nlp.vocab.strings['MONTHS'])
Traceback (most recent call last):
File "D:/workspace/phraseMatcher/venv/phraseMatcher.py", line 71, in <module>
plac.call(main)
File "D:\workspace\phraseMatcher\venv\lib\site-packages\plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "D:\workspace\phraseMatcher\venv\lib\site-packages\plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "D:/workspace/phraseMatcher/venv/phraseMatcher.py", line 47, in main
label = nlp.vocab.strings[match_id]
File "strings.pyx", line 117, in spacy.strings.StringStore.__getitem__
KeyError: "[E018] Can't retrieve string for hash '18446744072093410045'."
spaCy version:** 2.0.12
Platform:** Windows-7-6.1.7601-SP1
Python version:** 3.7.0
I can't find what I did wrong. It is simple and I read these already:
Using PhraseMatcher in SpaCy to find multiple match types
Help me, thanks in advance.
When checking across different solutions available on the net, most people (including datitran) pointed out that it might be a missing class or a misspell of a class in the train csv file. Am not able to figure that out since the labelling is done using labelImg, it saves these classes as xml, the xml_to_csv.py converts this to a csv. Am not sure under what circumstance I could have had the opportunity to miss out or misspel any class incorrectly.
Here's the error am dealing with:
(OT)
nisxxxxx#xxxxxxxx:~/Desktop/OD/models/research/object_detection$
python generate_tfrecord.py --csv_input=data/train_labels.csv --
output_path=data/train.record
Traceback (most recent call last):
File "generate_tfrecord.py", line 192, in <module>
tf.app.run()
File "/home/nisxxxxx/Desktop/test_OD/OT/lib/python2.7/site-
packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "generate_tfrecord.py", line 184, in main
tf_example = create_tf_example(group, path)
File "generate_tfrecord.py", line 173, in create_tf_example
'image/object/class/label':
dataset_util.int64_list_feature(classes),
File"/home/nishanth/Desktop/test_OD/models/research/object_detection/utils/dat
aset_util.py", line 26, in int64_list_feature
return
tf.train.Feature(int64_list=tf.train.Int64List(value=value))
TypeError: None has type NoneType, but expected one of: int, long
Has anyone been able to solve this problem?
I am not sure how many classes you have used... after the final else try with "return 0 instead of none"... example
if row_label == 'red':
return 1
elif row_label == 'orange':
return 2
elif row_label == 'blue':
return 3
else:
return 0
Just change the label name whatever you labeling them during crop the image by labelImg tool.
def class_text_to_int(row_label):
if row_label == 'raccon':
return 1
else:
None
Instead of 'raccon' put label name for ex:- 'car'.