Problem
I re-installed my server system.Before then, I can use remote-ssh normally.However, I can't use remote-ssh to connect to my server anymore.But I can still use ssh directly to connect to the server.
I suppose it managed to get into the system but somehow it broke down.
The error log is below:
Welcome to Ubuntu 20.04 LTS (GNU/Linux 5.4.0-77-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
System information as of Tue 14 Sep 2021 09:56:58 PM CST
System load: 0.07 Processes: 117
Usage of /: 6.5% of 59.00GB Users logged in: 1
Memory usage: 10% IPv4 address for eth0: 10.0.12.2
Swap usage: 0%
* Super-optimized for small spaces - read how we shrank the memory
footprint of MicroK8s to make it the smallest full K8s around.
https://ubuntu.com/blog/microk8s-memory-optimisation
ready: 6425958cce28
Linux 5.4.0-77-generic #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021
6425958cce28: running
bash: line 1: _exitcode: command not found
bash: line 2: syntax error near unexpected token `elif'
bash: line 2: ` elif [[ $ALLOW_CLIENT_DOWNLOAD == "1" ]]; then'
-sh: 4: function: not found
-sh: 69: [[: not found
-sh: 90: [[: not found
-sh: 155: Syntax error: "(" unexpected (expecting "then")
Transferred: sent 17180, received 4016 bytes, in 0.5 seconds
Bytes per second: sent 35433.6, received 8283.0
local-server-1> ssh child died, shutting down
[21:56:58.587] Failed to parse remote port from server output
[21:56:58.588] Resolver error: Error:
at Function.Create (/Users/luther/.vscode/extensions/ms-vscode-remote.remote-ssh-0.65.7/out/extension.js:1:64659)
at Object.t.handleInstallOutput (/Users/luther/.vscode/extensions/ms-vscode-remote.remote-ssh-0.65.7/out/extension.js:1:63302)
at Object.e [as tryInstallWithLocalServer] (/Users/luther/.vscode/extensions/ms-vscode-remote.remote-ssh-0.65.7/out/extension.js:1:387573)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
at async /Users/luther/.vscode/extensions/ms-vscode-remote.remote-ssh-0.65.7/out/extension.js:1:294473
at async Object.t.withShowDetailsEvent (/Users/luther/.vscode/extensions/ms-vscode-remote.remote-ssh-0.65.7/out/extension.js:1:406463)
at async /Users/luther/.vscode/extensions/ms-vscode-remote.remote-ssh-0.65.7/out/extension.js:1:386112
at async E (/Users/luther/.vscode/extensions/ms-vscode-remote.remote-ssh-0.65.7/out/extension.js:1:382710)
at async Object.t.resolveWithLocalServer (/Users/luther/.vscode/extensions/ms-vscode-remote.remote-ssh-0.65.7/out/extension.js:1:385728)
at async Object.t.resolve (/Users/luther/.vscode/extensions/ms-vscode-remote.remote-ssh-0.65.7/out/extension.js:1:295870)
at async /Users/luther/.vscode/extensions/ms-vscode-remote.remote-ssh-0.65.7/out/extension.js:127:110656
[21:56:58.592] ------
Tried
I tried delete the know_hosts file from host, re-install the remote-ssh plugin, but can't work
I am pretty new to remote-ssh, hope can give me more detailed solution.
Thanks :)
I downgraded remote-ssh.Then I changed my default shell into zsh and upgrade remote-ssh.It began to install '.vscode-server' file again and magically it worked.
I implemented model prediction in oozie workflow and i got error "Container is running beyond memory limits" on step 3 i.e. model1.predict_proba. Table1 has 27 Million records. It run fine on jyupiter notebook but i got this error on oozie. Can someone please help.
d1 = sqlContext.sql("SELECT * FROM table1").toPandas()
xyz= d1.drop(['abc'], axis = 1)
modelprob = model1.predict_proba(xyz)[:,1]
Error : Yarn Logs
Application application_1547693435775_8741566 failed 2 times due to AM Container for appattempt_1547693435775_8741566_000002 exited with exitCode: -104
For more detailed output, check application tracking page:https://xyz
Diagnostics: Container [pid=224941,containerID=container_e167_1547693435775_8741566_02_000002] is running beyond physical memory limits. Current usage: 121.2 GB of 121 GB physical memory used; 226.9 GB of 254.1 GB virtual memory used. Killing container.
2019-04-15 22:43:36,231 [dispatcher-event-loop-10] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_5_piece0 on xyz.corp.intranet:34252 in memory (size: 5.6 KB, free: 6.2 GB)
2019-04-15 22:43:36,231 [dispatcher-event-loop-35] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_5_piece0 on xyz1.corp.intranet:38363 in memory (size: 5.6 KB, free: 6.2 GB)
2019-04-15 22:43:36,242 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned accumulator 4
2019-04-15 22:43:36,245 [dispatcher-event-loop-51] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_2_piece0 on xyz3 in memory (size: 53.5 KB, free: 52.8 GB)
2019-04-15 22:43:36,245 [dispatcher-event-loop-51] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_2_piece0 on xyz4.corp.intranet:46309 in memory (size: 53.5 KB, free: 6.2 GB)
2019-04-15 22:43:36,248 [dispatcher-event-loop-9] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_2_piece0 on xyz5.corp.intranet:44850 in memory (size: 53.5 KB, free: 6.2 GB)
2019-04-15 22:45:48,103 [SIGTERM handler] INFO org.apache.spark.deploy.yarn.ApplicationMaster - Final app status: FAILED, exitCode: 16
2019-04-15 22:45:48,106 [SIGTERM handler] ERROR org.apache.spark.deploy.yarn.ApplicationMaster - RECEIVED SIGNAL 15: SIGTERM
2019-04-15 22:45:48,124 [Thread-5] INFO org.apache.spark.SparkContext - Invoking stop() from shutdown hook
below are sparkconf parameters :
sconf = SparkConf().setAppName("xyz model").set("spark.driver.memory", "8g").set('spark.executor.memory', '12g').set("spark.yarn.am.memory", "8g").set('spark.dynamicAllocation.enabled', 'true').set('spark.dynamicAllocation.minExecutors', 20').set('spark.dynamicAllocation.maxExecutors', '60').set("spark.shuffle.service.enabled", "true").set('spark.kryoserializer.buffer.max.mb', '2047').set("spark.shuffle.blockTransferService", "nio").set("spark.driver.maxResultSize", "4g").set('spark.rpc.message.maxSize', '330').setMaster("yarn-cluster")
sc = SparkContext(conf=sconf)
below are sprkopts parameters :
sparkopts=--executor-memory 115g --num-executors 60 --driver-memory 110g --executor-cores 16 --driver-cores 2 --conf "spark.dynamicAllocation.enabled=true" --conf "spark.kryoserializer.buffer.max=2047m" --conf "spark.driver.maxResultSize=4096m" --conf spark.yarn.executor.memoryOverhead=8000 --conf "spark.network.timeout=10000000" --conf "spark.executor.extraJavaOptions=-XX:+UseCompressedOops -XX:PermSize=2048M -XX:MaxPermSize=2048M -XX:+UseG1GC" --conf "spark.broadcast.compress=true" --conf "spark.broadcast.blockSize=128m" --conf "spark.serializer.objectStreamReset=2" --conf spark.executorEnv.PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python --files ${xyz}/hive-site.xml --files ${xyz}/yarn-site.xml
I am getting 416 errors while creating buckets using S3 or Swift. How to solve this?
swift -A http://ceph-4:7480/auth/1.0 -U testuser:swift -K 'BKtVrq1...' upload testas testas
Warning: failed to create container 'testas': 416 Requested Range Not Satisfiable: InvalidRange
Object PUT failed: http://ceph-4:7480/swift/v1/testas/testas 404 Not Found b'NoSuchBucket'
Also S3 python test:
File "/usr/lib/python2.7/dist-packages/boto/s3/connection.py", line 621, in create_bucket
response.status, response.reason, body)
boto.exception.S3ResponseError: S3ResponseError: 416 Requested Range Not Satisfiable
<?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidRange</Code><BucketName>mybucket</BucketName><RequestId>tx00000000000000000002a-005a69b12d-1195-default</RequestId><HostId>1195-default-default</HostId></Error>
Here is my ceph status:
cluster:
id: 1e4bd42a-7032-4f70-8d0c-d6417da85aa6
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-2,ceph-3,ceph-4
mgr: ceph-1(active), standbys: ceph-2, ceph-3, ceph-4
osd: 3 osds: 3 up, 3 in
rgw: 2 daemons active
data:
pools: 7 pools, 296 pgs
objects: 333 objects, 373 MB
usage: 4398 MB used, 26309 MB / 30708 MB avail
pgs: 296 active+clean
I am using CEPH Luminous build with bluestore
ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)
User created:
sudo radosgw-admin user create --uid="testuser" --display-name="First User"
sudo radosgw-admin subuser create --uid=testuser --subuser=testuser:swift --access=full
sudo radosgw-admin key create --subuser=testuser:swift --key-type=swift --gen-secret
Logs on osd:
2018-01-25 12:19:45.383298 7f03c77c4700 1 ====== starting new request req=0x7f03c77be1f0 =====
2018-01-25 12:19:47.711677 7f03c77c4700 1 ====== req done req=0x7f03c77be1f0 op status=-34 http_status=416 ======
2018-01-25 12:19:47.711937 7f03c77c4700 1 civetweb: 0x55bd9631d000: 192.168.109.47 - - [25/Jan/2018:12:19:45 +0200] "PUT /mybucket/ HTTP/1.1" 1 0 - Boto/2.38.0 Python/2.7.12 Linux/4.4.0-51-generic
Linux ubuntu, 4.4.0-51-generic
set default pg_num and pgp_num to lower value(8 for example), or set mon_max_pg_per_osd to a high value in ceph.conf
I've installed Tensorflow into a fresh virtual environment on OSX 10.12: https://www.tensorflow.org/install/install_mac#installing_with_virtualenv
In one attempt I installed both of these in a fresh virtualenv:
$ pip3 install --upgrade tensorflow
$ pip3 install --upgrade tensorflow-gpu
In another attempt I only installed this in a fresh virtualenv:
$ pip3 install --upgrade tensorflow-gpu
I think this is the error:
Library not loaded: #rpath/libcudnn.5.dylib
The installation provided libcudnn.6.dylib but not libcudnn.5.dylib
I believe my question is distinct from these two
Failed to load the native TensorFlow runtime. Reason : Image not found. What am I doing wrong?
Tensorflow error [image not found]
as their errors involve libcudart:
Library not loaded: #rpath/libcudart.8.0.dylib
Trace:
(tensorflow) Dione:tensorflow peterbecich$ python3
Python 3.4.6 (default, Apr 23 2017, 17:16:17)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>> Traceback (most recent call last):
File "/Users/peterbecich/tensorflow/lib/python3.4/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/Users/peterbecich/tensorflow/lib/python3.4/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/Users/peterbecich/tensorflow/lib/python3.4/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/Users/peterbecich/tensorflow/lib/python3.4/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
ImportError: dlopen(/Users/peterbecich/tensorflow/lib/python3.4/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so, 10): Library not loaded: #rpath/libcudnn.5.dylib
Referenced from: /Users/peterbecich/tensorflow/lib/python3.4/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
Reason: image not found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/peterbecich/tensorflow/lib/python3.4/site-packages/tensorflow/__init__.py", line 24, in <module>
from tensorflow.python import *
File "/Users/peterbecich/tensorflow/lib/python3.4/site-packages/tensorflow/python/__init__.py", line 51, in <module>
from tensorflow.python import pywrap_tensorflow
File "/Users/peterbecich/tensorflow/lib/python3.4/site-packages/tensorflow/python/pywrap_tensorflow.py", line 52, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/Users/peterbecich/tensorflow/lib/python3.4/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/Users/peterbecich/tensorflow/lib/python3.4/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/Users/peterbecich/tensorflow/lib/python3.4/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/Users/peterbecich/tensorflow/lib/python3.4/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
ImportError: dlopen(/Users/peterbecich/tensorflow/lib/python3.4/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so, 10): Library not loaded: #rpath/libcudnn.5.dylib
Referenced from: /Users/peterbecich/tensorflow/lib/python3.4/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
Reason: image not found
Failed to load the native TensorFlow runtime.
Paths set in CUDA setup:
(tensorflow) Dione:~ peterbecich$ echo $LD_LIBRARY_PATH
/usr/local/cuda/lib/:
(tensorflow) Dione:~ peterbecich$ echo $DYLD_LIBRARY_PATH
/usr/local/cuda/lib:/usr/local/cuda/:/usr/local/cuda/extras/CUPTI/lib
Clang version from outdated Command Line Tools, due to this: https://github.com/caffe2/caffe2/issues/318#issuecomment-295555763
Dione:~ peterbecich$ clang --version
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin16.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
nvcc version:
(tensorflow) Dione:~ peterbecich$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:46_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
pip in virtual environment:
(tensorflow) Dione:~ peterbecich$ pip3 --version
pip 9.0.1 from /Users/peterbecich/tensorflow/lib/python3.4/site-packages (python 3.4)
libcudnn is a symbolic link in /usr/local/cuda/lib. Interestingly, the missing file is libcudnn.5.dylib and libcudnn.6.dynlib is available:
/ssh:Dione.local:/usr/local/cuda/lib:
total 672
drwxr-xr-x 83 peterbecich wheel 2.8K May 24 12:32 .
drwxr-xr-x 17 peterbecich wheel 578B May 24 11:28 ..
lrwxr-xr-x 1 root wheel 50B Jan 11 17:33 libcublas.8.0.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcublas.8.0.dylib
lrwxr-xr-x 1 root wheel 46B Jan 11 17:33 libcublas.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcublas.dylib
lrwxr-xr-x 1 root wheel 49B Jan 11 17:33 libcublas_device.a -> /Developer/NVIDIA/CUDA-8.0/lib/libcublas_device.a
lrwxr-xr-x 1 root wheel 49B Jan 11 17:33 libcublas_static.a -> /Developer/NVIDIA/CUDA-8.0/lib/libcublas_static.a
-rwxr-xr-x 1 root wheel 13K Jan 11 17:31 libcuda.dylib
lrwxr-xr-x 1 root wheel 45B Jan 11 17:33 libcudadevrt.a -> /Developer/NVIDIA/CUDA-8.0/lib/libcudadevrt.a
lrwxr-xr-x 1 root wheel 50B Jan 11 17:33 libcudart.8.0.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudart.8.0.dylib
lrwxr-xr-x 1 root wheel 46B Jan 11 17:33 libcudart.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudart.dylib
lrwxr-xr-x 1 root wheel 49B Jan 11 17:33 libcudart_static.a -> /Developer/NVIDIA/CUDA-8.0/lib/libcudart_static.a
lrwxr-xr-x 1 root wheel 47B May 24 12:32 libcudnn.6.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudnn.6.dylib
lrwxr-xr-x 1 root wheel 45B May 24 12:32 libcudnn.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudnn.dylib
lrwxr-xr-x 1 root wheel 48B May 24 12:32 libcudnn_static.a -> /Developer/NVIDIA/CUDA-8.0/lib/libcudnn_static.a
lrwxr-xr-x 1 root wheel 49B Jan 11 17:33
.
.
.
Real location of libcudnn:
/ssh:Dione.local:/Developer/NVIDIA/CUDA-8.0/lib:
total 2825680
.
.
.
lrwxr-xr-x 1 root wheel 19B Jan 11 17:32 libcudart.dylib -> libcudart.8.0.dylib
-rw-r--r-- 1 root wheel 598K Jan 11 17:32 libcudart_static.a
-rwxr-xr-x# 1 peterbecich staff 144M Apr 12 14:12 libcudnn.6.dylib
lrwxr-xr-x# 1 peterbecich staff 16B Apr 12 14:16 libcudnn.dylib -> libcudnn.6.dylib
.
.
deviceQuery from the Nvidia CUDA samples:
(tensorflow) Dione:release peterbecich$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GT 750M"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147024896 bytes)
( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 926 MHz (0.93 GHz)
Memory Clock rate: 2508 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GT 750M
Result = PASS
Thanks very much for your time in reviewing this.
Symbolic links from libcudnn.5.dylib to the available libcudnn.6.dylib solved it. I put links in both /Developer/NVIDIA/CUDA-8.0/lib and /usr/local/cuda/lib:
16 May 24 14:12 libcudnn.5.dylib -> libcudnn.6.dylib
Successful import:
Dione:tensorflow peterbecich$ source ~/tensorflow/bin/activate
(tensorflow) Dione:tensorflow peterbecich$ python3
Python 3.4.6 (default, Apr 23 2017, 17:16:17)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>>
Doing the symlink fixed the import for me, but I still ran into problems later when training a model:
2017-09-11 21:57:26.922561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)
2017-09-11 21:57:28.920558: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Loaded runtime CuDNN library: 6021 (compatibility version 6000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
I removed the symlink and reinstalled cuDNN V5.1 from https://developer.nvidia.com/cudnn
Why does this script not work on FreeBSD? I ran the script on Centos and Debian, all was fine. On FreeBSD (10.2) I encounter the following error:
awk: syntax error at source line 1
context is
match($0, "^listen >>> queue:[[:space:]]+(.*)", <<<
awk: bailing out at source line 1
-0.9902
As an example, here is some output of php-form status:
pool: www
process manager: ondemand
start time: 29/Feb/2016:15:18:54 +0200
start since: 2083770
accepted conn: 1467128
listen queue: 0
max listen queue: 129
listen queue len: 128
idle processes: 1
active processes: 2
total processes: 3
max active processes: 64
max children reached: 1
slow requests: 0
On Centos and Debian, when I run:
/path/to/script/php-fpm-check.sh "idle processes" http://127.0.0.1/status
I get 1, but on FreeBSD the error mentioned above.
The 3-argument form of match is a GNU awk extension (docs). You'll have to find another way to capture the match (perhaps using the RSTART and RLENGTH variables set as a side-effect of match()), or install gawk on your freebsd system.