Pig script getting hang at group by statement - apache-pig

pig script that I have written is running on my local on 4gb of data but when I try ti run it on EMR cluster it is getting hang at a particular group by statement.Below is the error that I am getting...
at org.apache.tez.dag.app.dag.impl.VertexImpl$NoOpVertexManager.onVertexStateUpdated(VertexImpl.java:4528)
at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStateUpdate.invoke(VertexManager.java:564)
at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:647)
at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:642)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:642)
at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:631)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.tez.dag.app.dag.impl.AMUserCodeException: Fail to initialize Edge,EdgeInfo: sourceVertexName=scope-325, destinationVertexName=scope-329
at org.apache.tez.dag.app.dag.impl.Edge.initialize(Edge.java:174)
at org.apache.tez.dag.app.dag.impl.Edge.setEdgeProperty(Edge.java:196)
at org.apache.tez.dag.app.dag.impl.VertexImpl.setParallelismWrapper(VertexImpl.java:1724)
... 16 more
Caused by: java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager$CustomShuffleEdgeManager.initialize(ShuffleVertexManager.java:251)
at org.apache.tez.dag.app.dag.impl.Edge.initialize(Edge.java:171)
... 18 more
]
Vertex killed, vertexName=scope-346, vertexId=vertex_1509345097826_0006_1_11, diagnostics=[Vertex received Kill in NEW state., Vertex vertex_1509345097826_0006_1_11 [scope-346] killed/failed due to:OTHER_VERTEX_FAILURE]
Vertex killed, vertexName=scope-344, vertexId=vertex_1509345097826_0006_1_10, diagnostics=[Vertex received Kill in NEW state., Vertex vertex_1509345097826_0006_1_10 [scope-344] killed/failed due to:OTHER_VERTEX_FAILURE]
Vertex killed, vertexName=scope-334, vertexId=vertex_1509345097826_0006_1_09, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1509345097826_0006_1_09 [scope-334] killed/failed due to:OTHER_VERTEX_FAILURE]
Vertex killed, vertexName=scope-317, vertexId=vertex_1509345097826_0006_1_02, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1509345097826_0006_1_02 [scope-317] killed/failed due to:OTHER_VERTEX_FAILURE]
Vertex killed, vertexName=scope-316, vertexId=vertex_1509345097826_0006_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1509345097826_0006_1_01 [scope-316] killed/failed due to:OTHER_VERTEX_FAILURE]
Vertex killed, vertexName=scope-315, vertexId=vertex_1509345097826_0006_1_00, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:68, Vertex vertex_1509345097826_0006_1_00 [scope-315] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:6 killedVertices:6

What pig code you are running ? provide complete information along with data and complete pig code ?

Related

how to grep the multiple strings from a file and print them in group wise

I'm trying to grep a list of errors form host log file, being a huge file it prints a lot of data and hard to see what errors repeated and logged
0x45bae19d6bc0 IO type 16648 (READ) isOrdered:NO isSplit:NO isEncr:NO since 7990 msec status I/O error
Throttled: 82 IO failed on disk e3d17cdb-3190-9e21-ea45-4cff39420501, Wake up 0x45ba3a34f9c0 with status I/O error
Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10432 microseconds to 5392073 microseconds.
Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10444 microseconds to 10822733 microseconds.
naa.5000c500bb7a661f performance has improved. I/O latency reduced from 10822733 microseconds to 2163435 microseconds.
naa.5000c500bb7a661f performance has improved. I/O latency reduced from 2163435 microseconds to 426054 microseconds.
Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10465 microseconds to 925119 microseconds.
Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10469 microseconds to 1904014 microseconds.
Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10472 microseconds to 3936215 microseconds.
Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10479 microseconds to 8517984 microseconds.
cpu3:2099278)Migrate: 448: Error reading from pending connection: Failure
cpu3:2099278)Migrate: 448: Error reading from pending connection: Failure
cpu3:2099278)Migrate: 448: Error reading from pending connection: Failure
cpu3:2099278)Migrate: 448: Error reading from pending connection: Failure
Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10490 microseconds to 17358740 microseconds.
0x45bae0fefe40 IO type 16648 (READ) isOrdered:NO isSplit:NO isEncr:NO since 48543 msec status I/O error
Throttled: 82 IO failed on disk e3d17cdb-3190-ea45-4cff39420501, Wake up 0x45da36318840 with status I/O error
naa.5000c500ba661f performance has improved. I/O latency reduced from 17358740 microseconds to 3372968 microseconds.
naa.5000c500bb7a661f performance has improved. I/O latency reduced from 3372968 microseconds to 674458 microseconds.
Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10677 microseconds to 1353205 microseconds.
naa.5000c500bb7a661f performance has improved. I/O latency reduced from 1353205 microseconds to 268942 microseconds.
Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10682 microseconds to 419051 microseconds.
Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10682 microseconds to 872847 microseconds.
Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10684 microseconds to 1770518 microseconds.
Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10687 microseconds to 3640051 microseconds.
0x45dae4fe25c0 IO type 16648 (READ) isOrdered:NO isSplit:NO isEncr:NO since 15991 msec status I/O error
Throttled: 82 IO failed on disk e3d17cdb-3190--ea45-4cff39420501, Wake up 0x45da362677c0 with status I/O error
0x45dae4fe2340 IO type 16648 (READ) isOrdered:NO isSplit:NO isEncr:NO since 24806 msec status I/O error
cpu3:2099278)Migrate: 448: Error reading from pending connection: Failure
cpu3:2099278)Migrate: 448: Error reading from pending connection: Failure
cpu10:36926358)MemSchedAdmit: 471: Admission failure in path: vm.36926352/vmmanon.36926352
cpu23:36926381)MemSchedAdmit: 471: Admission failure in path: vm.36926375/vmmanon.36926375
Throttled: 82 IO failed on disk e3d17cdb-3190-9e21-ea45-4cff39420501, Wake up 0x45ba3abe8880 with status I/O error
Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10696 microseconds to 7557465 microseconds.
Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10711 microseconds to 15202991 microseconds.
naa.5000c500bb7a661f performance has improved. I/O latency reduced from 15202991 microseconds to 2944264 microseconds.
naa.5000c500bb7a661f performance has improved. I/O latency reduced from 2944264 microseconds to 577176 microseconds.
naa.5000c500bb7a661f performance has improved. I/O latency reduced from 577176 microseconds to 112712 microseconds.
I'm expecting the following output, I've searched alot of places and didn't find a suitable solution, hoping it may possible with awk and sed
egrep -i "latency|I/O error|Failure" error.log
Failure
cpu3:2099278)Migrate: 448: Error reading from pending connection: Failure
cpu3:2099278)Migrate: 448: Error reading from pending connection: Failure
cpu3:2099278)Migrate: 448: Error reading from pending connection: Failure
cpu3:2099278)Migrate: 448: Error reading from pending connection: Failure
IO Errors
cpu5:2098752)WARNING: LSOM: RCIOCompletionLoop:93: Throttled: 82 IO failed on disk e3d17cdb-3190-9e21-ea45-4cff39420501, Wake up 0x45da362677c0 with status I/O error
cpu6:2097866)LSOMCommon: IORETRYCompleteIO:470: Throttled: 0x45dae4fe2340 IO type 16648 (READ) isOrdered:NO isSplit:NO isEncr:NO since 24806 msec status I/O error
cpu2:2098752)WARNING: LSOM: RCIOCompletionLoop:93: Throttled: 82 IO failed on disk e3d17cdb-3190-9e21-ea45-4cff39420501, Wake up 0x45ba3abe8880 with status I/O error
cpu9:2099365 opID=add9908b)WARNING: ScsiDeviceIO: 12028: READ CAPACITY on device “naa.5000c500bb7a661f” from Plugin “HPP” failed. I/O error
LAtency
cpu5:2097866)WARNING: ScsiDeviceIO: 1596: Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10682 microseconds to 419051 microseconds.
cpu19:2097867)WARNING: ScsiDeviceIO: 1596: Device naa.5000c500bb7a661f performance has deteriorated. I/O latency increased from average value of 10682 microseconds to 872847 microseconds
Assumptions:
if multiple patterns match a single line we'll display the line in each of the output groups
group headings are exact reprints of the search patterns (ie, won't be reformatting the group headers as is done in the question where search pattern I/O error becomes group heading IO Errors)
there is no requirement to match on only whole words (eg, failure will match on failure, failures, nonfailures, stufffailuresXYZ)
within an output group we wish to maintain the input ordering of the rows
The question's current input/output doesn't match so until fixed we'll use a small(er) set of input data for demonstration purposes:
$ cat test.log
you can ignore this line
you should match this line on abcLaTeNcYxyz
yeah, match this line on Failures and throttled
you can ignore this line
more matches for i/o error and latency
single match on I/O error
couple more matches on failures
couple more matches on failure
ignore this line, too
Adding a non-matching string (no-match) to the mix:
$ patterns='latency|I/O error|Failure|throttled|no-match'
One GNU awk idea (for array of arrays and PROCINFO["sorted_in"]):
awk -v plist="${patterns}" '
BEGIN { IGNORECASE=1
delete groups
n=split(plist,arr,"|") # break plist up into components
for (i=1;i<=n;i++) {
ptns[arr[i]] # assign as indices of ptns[] array for easier processing
groups[arr[i]][0] # place holder to allow us to print an empty group
}
}
{ for (ptn in ptns) # loop through list of patterns and ...
if ($0 ~ ptn) # if found then ...
groups[ptn][c++]=$0 # save in groups[] array
}
END { PROCINFO["sorted_in"]="#ind_str_asc"
for (ptn in ptns) {
printf "\n######### %s\n\n", ptn
PROCINFO["sorted_in"]="#ind_num_asc" # sort the c++ values in ascending order => maintain input ordering
for (i in groups[ptn])
if (groups[ptn][i] != "")
print groups[ptn][i]
}
}
' test.log
This generates:
######### Failure
yeah, match this line on Failures and throttled
couple more matches on failures
couple more matches on failure
######### I/O error
more matches for i/o error and latency
single match on I/O error
######### latency
you should match this line on abcLaTeNcYxyz
more matches for i/o error and latency
######### no-match
######### throttled
yeah, match this line on Failures and throttled

failed to alloc X bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory

I am trying to run a tensorflow project and I am encountering memory problems on the university HPC cluster. I have to run a prediction job for hundreds of inputs, with differing lengths. We have GPU nodes with different amounts of vmem, so I am trying to set up the scripts in a way that will not crash in any combination of GPU node - input length.
After searching the net for solutions, I played around with TF_FORCE_UNIFIED_MEMORY, XLA_PYTHON_CLIENT_MEM_FRACTION, XLA_PYTHON_CLIENT_PREALLOCATE, and TF_FORCE_GPU_ALLOW_GROWTH, and also with tensorflow's set_memory_growth. As I understood, with unified memory, I should be able to use more memory than a GPU has in itself.
This was my final solution (only relevant parts)
os.environ['TF_FORCE_UNIFIED_MEMORY']='1'
os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION']='2.0'
#os.environ['XLA_PYTHON_CLIENT_PREALLOCATE']='false'
os.environ['TF_FORCE_GPU_ALLOW_GROWTH ']='true' # as I understood, this is redundant with the set_memory_growth part :)
import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
print(gpu)
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
and I submit it on the cluster with --mem=30G (slurm job scheduler) and --gres=gpu:1.
And this is the error my code crashes with. As I understand, it does try to use the unified memory but is failing for some reason.
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5582 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN Black, pci bus id: 0000:02:00.0, compute capability: 3.5)
2021-08-24 09:22:02.053935: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 12758286336 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:03.738635: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 11482457088 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:05.418059: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 10334211072 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:07.102411: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 9300789248 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:08.784349: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 8370710016 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:10.468644: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 7533638656 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:12.150588: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 6780274688 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:23:10.326528: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.33GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Traceback (most recent call last):
File "scripts/script.py", line 654, in <module>
prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed), "cpu")
File "env/lib/python3.7/site-packages/alphafold/model/model.py", line 134, in predict
result, recycles = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
File "env/lib/python3.7/site-packages/jax/_src/traceback_util.py", line 183, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "env/lib/python3.7/site-packages/jax/_src/api.py", line 402, in cache_miss
donated_invars=donated_invars, inline=inline)
File "env/lib/python3.7/site-packages/jax/core.py", line 1561, in bind
return call_bind(self, fun, *args, **params)
File "env/lib/python3.7/site-packages/jax/core.py", line 1552, in call_bind
outs = primitive.process(top_trace, fun, tracers, params)
File "env/lib/python3.7/site-packages/jax/core.py", line 1564, in process
return trace.process_call(self, fun, tracers, params)
File "env/lib/python3.7/site-packages/jax/core.py", line 607, in process_call
return primitive.impl(f, *tracers, **params)
File "env/lib/python3.7/site-packages/jax/interpreters/xla.py", line 608, in _xla_call_impl
*unsafe_map(arg_spec, args))
File "env/lib/python3.7/site-packages/jax/linear_util.py", line 262, in memoized_fun
ans = call(fun, *args)
File "env/lib/python3.7/site-packages/jax/interpreters/xla.py", line 758, in _xla_callable
compiled = compile_or_get_cached(backend, built, options)
File "env/lib/python3.7/site-packages/jax/interpreters/xla.py", line 76, in compile_or_get_cached
return backend_compile(backend, computation, compile_options)
File "env/lib/python3.7/site-packages/jax/interpreters/xla.py", line 373, in backend_compile
return backend.compile(built_c, compile_options=options)
jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: Resource exhausted: Out of memory while trying to allocate 4649385984 bytes.
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "scripts/script.py", line 654, in <module>
prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed), "cpu")
File "env/lib/python3.7/site-packages/alphafold/model/model.py", line 134, in predict
result, recycles = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
File "env/lib/python3.7/site-packages/jax/interpreters/xla.py", line 373, in backend_compile
return backend.compile(built_c, compile_options=options)
RuntimeError: Resource exhausted: Out of memory while trying to allocate 4649385984 bytes.
I would be glad for any ideas on how to get it to work and use all the available memory.
Thank you!
Looks like your GPU doesn't fully support unified memory. The support is limited and in fact the GPU holds all data in its memory.
See this article for the description: https://developer.nvidia.com/blog/unified-memory-cuda-beginners/
In particular:
On systems with pre-Pascal GPUs like the Tesla K80, calling cudaMallocManaged() allocates size bytes of managed memory on the GPU device that is active when the call is made. Internally, the driver also sets up page table entries for all pages covered by the allocation, so that the system knows that the pages are resident on that GPU.
And:
Since these older GPUs can’t page fault, all data must be resident on the GPU just in case the kernel accesses it (even if it won’t).
And your GPU is Kepler-based, according to TechPowerUp: https://www.techpowerup.com/gpu-specs/geforce-gtx-titan-black.c2549
As far as I know, TensorFlow should also issue a warning about that. Something like:
Unified memory on GPUs with compute capability lower than 6.0 (pre-Pascal class GPUs) does not support oversubscription.
Probably this answer will be useful for you. This nvidia_smi python module have some useful tools like checking the gpu total memory. Here I reproduce the code of the answer I mentioned earlier.
import nvidia_smi
nvidia_smi.nvmlInit()
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
print("Total memory:", info.total)
nvidia_smi.nvmlShutdown()
I think this should be your starting point. A simple solution would be to set the batch size according to the gpu memory. If you only want to get predictions, except of the batch_size, usually there is no anything else that much memory intensive. Also, I would recommend, if there is any preprocessing done on gpu, pass it to cpu.

Google Colab Pro crashed while allocating large memory

I'm trying to use Colab pro GPU (max 25Gb memory) for training a sequential model.
Based on the instructions found here, I'm setting the memory limit to 22Gb. Below is my code and logs.
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
mem_limit=22000
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=mem_limit)])
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized
print(e)
Per this log, it seems to be setting the cap
Dec 22, 2020, 7:57:15 PM WARNING 2020-12-23 01:57:15.673093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22000 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
Dec 22, 2020, 7:57:15 PM WARNING 2020-12-23 01:57:15.673030: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
However, when executing a statement, invariably it's attempting to allocate 37Gb memory and the runtime crashes. Here is the log
Dec 22, 2020, 8:01:01 PM INFO KernelRestarter: restarting kernel (1/5), keep random ports
Dec 22, 2020, 8:00:47 PM WARNING tcmalloc: large alloc 37200994304 bytes == 0x7f48b828a000 # 0x7f5249f5a001 0x7f52414564ff 0x7f52414a6ab8 0x7f52414aabb7 0x7f5241549003 0x50a4a5 0x50cc96 0x507be4 0x509900 0x50a2fd 0x50cc96 0x507be4 0x5161c5 0x50a12f 0x50beb4 0x507be4 0x509900 0x50a2fd 0x50beb4 0x507be4 0x509900 0x50a2fd 0x50cc96 0x507be4 0x508ec2 0x594a01 0x59fd0e 0x50d256 0x507be4 0x509900 0x50a2fd
My dataset is large and will possibly require more than 128Gb memory. Is there way to limit the amount of memory use by TF and I'm fine with longer execution time, if it comes to that.
Thanks in advance.
I have had the same issue and had to change my tf code. Setting the maximum GPU memory does not mean that tf will figure out a way to run your code without trying to allocate more than what you have specified. That works for what I would call "units" of allocation but if one single operation is gigantic, it will blow up.
So, let's supposed that you have a massive matrix multiplication that can't fit on the GPU. Colab will crash.
Based on my limited experience, you have 2 options:
Change your settings not to use GPU ( and bare the performance hit)
Change your code

scylladb : scylla_io_setup script not showing Recommended --max-io-requests param in output

I am running scylla_io_setup script inside docker container, i have mounted /var/lib/scylla/ data directory to 1TB xfs sdd. the scylla_io_setup not showing Recommended --max-io-requests parameter in output. following is the output of the script.
[root#ip /]# ./usr/lib/scylla/scylla_io_setup
tuning /sys/devices/virtual/block/dm-4
tuning: /sys/devices/virtual/block/dm-4/queue/nomerges 2
warning: unable to tune /sys/devices/virtual/block/dm-4/queue/nomerges to 2
tuning /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:2:4/0:2:4:0/block/sde
tuning: /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:2:4/0:2:4:0/block/sde/queue/nomerges 2
warning: unable to tune /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:2:4/0:2:4:0/block/sde/queue/nomerges to 2
tuning /sys/devices/virtual/block/dm-4
tuning /sys/devices/virtual/block/dm-4
tuning /sys/devices/virtual/block/dm-4
tuning /sys/devices/virtual/block/dm-4
WARNING: unable to mbind shard memory; performance may suffer:
WARN 2020-01-29 10:26:47,892 [shard 0] seastar - Unable to set SCHED_FIFO scheduling policy for timer thread; latency impact possible. Try adding CAP_SYS_NICE
WARNING: unable to mbind shard memory; performance may suffer:
WARNING: unable to mbind shard memory; performance may suffer:
WARNING: unable to mbind shard memory; performance may suffer:
WARNING: unable to mbind shard memory; performance may suffer:
WARNING: unable to mbind shard memory; performance may suffer:
WARNING: unable to mbind shard memory; performance may suffer:
WARNING: unable to mbind shard memory; performance may suffer:
WARNING: unable to mbind shard memory; performance may suffer:
WARNING: unable to mbind shard memory; performance may suffer:
WARNING: unable to mbind shard memory; performance may suffer:
WARNING: unable to mbind shard memory; performance may suffer:
WARNING: unable to mbind shard memory; performance may suffer:
WARNING: unable to mbind shard memory; performance may suffer:
WARNING: unable to mbind shard memory; performance may suffer:
WARNING: unable to mbind shard memory; performance may suffer:
INFO 2020-01-29 10:26:48,161 [shard 0] iotune - /var/lib/scylla/saved_caches passed sanity checks
WARN 2020-01-29 10:26:48,161 [shard 0] iotune - Scheduler for /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:2:4/0:2:4:0/block/sde/queue/scheduler set to deadline. It is recommend to set it to noop before evaluation so as not to skew the results.
WARN 2020-01-29 10:26:48,161 [shard 0] iotune - nomerges for /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:2:4/0:2:4:0/block/sde/queue/nomerges set to 0. It is recommend to set it to 2 before evaluation so that merges are disabled. Results can be skewed otherwise.
Starting Evaluation. This may take a while...
Measuring sequential write bandwidth: 188 MB/s
Measuring sequential read bandwidth: 424 MB/s
Measuring random write IOPS: 23843 IOPS
Measuring random read IOPS: 66322 IOPS
Writing result to /etc/scylla.d/io_properties.yaml
Writing result to /etc/scylla.d/io.conf
Was the result written to io.conf?

Error on tensoflow training after upgrading RAM

I've tried training a custom object detector before with a 4gb ram and GTX 1050ti, it runs without any error even though it is extremely slow. Now I've upgraded to 8gb and I am now having this error.
2017-12-27 18:21:27.928811: E
tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for
event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS
2017-12-27 18:21:27.928841: F
tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:203] Unexpected
Event status: 1 Aborted (core dumped)
Rerunning the training script after the first attempt:
2017-12-27 18:35:26.519028: E
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:638] failed to
record completion event; therefore, failed to create inter-stream
dependency 2017-12-27 18:35:26.519053: I
tensorflow/stream_executor/stream.cc:4624] stream 0xc757670 did not
memcpy device-to-host; source: 0x10208489d00 2017-12-27
18:35:26.519064: E tensorflow/stream_executor/stream.cc:306] Error
recording event in stream: error recording CUDA event on stream
0x1201fa60: CUDA_ERROR_LAUNCH_FAILED; not marking stream as bad, as
the Event object may be at fault. Monitor for further errors.
2017-12-27 18:35:26.519076: E
tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for
event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED
2017-12-27 18:35:26.519084: F
tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:203] Unexpected
Event status: 1 Aborted (core dumped)