CentOS-7 + Python 3.9 - Minimal Docker Image - docker-image

There are enough of articles on image compression, but the CentOS-7 + Python-3.9 combination docker images are huge. Reason is, we need to build the python.
Versions:
CentOS : 7 (AltArch)
Python : 3.9.9
DockerHub Stats:
Centos 7 Image Size : 74 MG
Python 3.9.9 Image Size: 338 MB
Total : 412 MB
CentOS-7 + Python-3.9 (Installation from Source Code) Image size = ~1GB
Is there any better way to do the same ?

Related

Conda package bug? binary incompatability

I'm working in a remote Jupyter notebook on a system where I don't have root access, or even a shell in which to make many adjustments. I can retrieve packages from Conda's archive and run functions in notebook cells that install packages like this
!conda install /path/to/package-vvv.tar.bz2
I've run into situations where I guess wrong on the version number, install something that is incompatible. The error messages are like the one I produce below, binary incompatability in numpy or mkl.
Now I'm re-tracing problem on an Ubuntu 20.10 notebook where I have admin access. I have a reproducible problem to show and share.
Create an environment with same version of python, numpy and pandas, as we have on remote machine:
$ conda create -n cenv-py368 python=3.6.8 pandas=1.1.2 numpy=1.15.4
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.5.12
latest version: 4.9.2
Please update conda by running
$ conda update -n base -c defaults conda
## Package Plan ##
environment location: /home/pauljohn/LinuxDownloads/miniconda3/envs/cenv-py368
added / updated specs:
- numpy=1.15.4
- pandas=1.1.2
- python=3.6.8
The following packages will be downloaded:
package | build
---------------------------|-----------------
libffi-3.2.1 | hf484d3e_1007 52 KB
python-3.6.8 | h0371630_0 34.4 MB
libgcc-ng-9.1.0 | hdf63c60_0 8.1 MB
libstdcxx-ng-9.1.0 | hdf63c60_0 4.0 MB
blas-1.0 | mkl 6 KB
_libgcc_mutex-0.1 | main 3 KB
------------------------------------------------------------
Total: 46.6 MB
The following NEW packages will be INSTALLED:
_libgcc_mutex: 0.1-main
blas: 1.0-mkl
ca-certificates: 2021.1.19-h06a4308_0
certifi: 2020.12.5-py36h06a4308_0
intel-openmp: 2020.2-254
libedit: 3.1.20191231-h14c3975_1
libffi: 3.2.1-hf484d3e_1007
libgcc-ng: 9.1.0-hdf63c60_0
libgfortran-ng: 7.3.0-hdf63c60_0
libstdcxx-ng: 9.1.0-hdf63c60_0
mkl: 2020.2-256
mkl-service: 2.3.0-py36he8ac12f_0
mkl_fft: 1.2.0-py36h23d657b_0
mkl_random: 1.1.1-py36h0573a6f_0
ncurses: 6.2-he6710b0_1
numpy: 1.15.4-py36h7e9f1db_0
numpy-base: 1.15.4-py36hde5b4d6_0
openssl: 1.1.1i-h27cfd23_0
pandas: 1.1.2-py36he6710b0_0
pip: 20.3.3-py36h06a4308_0
python: 3.6.8-h0371630_0
python-dateutil: 2.8.1-pyhd3eb1b0_0
pytz: 2021.1-pyhd3eb1b0_0
readline: 7.0-h7b6447c_5
setuptools: 52.0.0-py36h06a4308_0
six: 1.15.0-pyhd3eb1b0_0
sqlite: 3.33.0-h62c20be_0
tk: 8.6.10-hbc83047_0
wheel: 0.36.2-pyhd3eb1b0_0
xz: 5.2.5-h7b6447c_0
zlib: 1.2.11-h7b6447c_3
Proceed ([y]/n)? y
Downloading and Extracting Packages
libffi-3.2.1 | 52 KB | ##################################### | 100%
python-3.6.8 | 34.4 MB | ##################################### | 100%
libgcc-ng-9.1.0 | 8.1 MB | ##################################### | 100%
libstdcxx-ng-9.1.0 | 4.0 MB | ##################################### | 100%
blas-1.0 | 6 KB | ##################################### | 100%
_libgcc_mutex-0.1 | 3 KB | ##################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate cenv-py368
#
# To deactivate an active environment, use
#
# $ conda deactivate
activate that environment.
Install, for example, the package called "fastparquet":
(cenv-py368) $ conda install fastparquet
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.5.12
latest version: 4.9.2
Please update conda by running
$ conda update -n base -c defaults conda
## Package Plan ##
environment location: /home/pauljohn/LinuxDownloads/miniconda3/envs/cenv-py368
added / updated specs:
- fastparquet
The following packages will be downloaded:
package | build
---------------------------|-----------------
pyparsing-2.4.7 | pyhd3eb1b0_0 59 KB
packaging-20.9 | pyhd3eb1b0_0 35 KB
------------------------------------------------------------
Total: 95 KB
The following NEW packages will be INSTALLED:
fastparquet: 0.5.0-py36h6323ea4_1
libllvm10: 10.0.1-hbcb73fb_5
llvmlite: 0.34.0-py36h269e1b5_4
numba: 0.51.2-py36h0573a6f_1
packaging: 20.9-pyhd3eb1b0_0
pyparsing: 2.4.7-pyhd3eb1b0_0
thrift: 0.11.0-py36hf484d3e_0
Proceed ([y]/n)? y
Downloading and Extracting Packages
pyparsing-2.4.7 | 59 KB | ##################################### | 100%
packaging-20.9 | 35 KB | ##################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Observe failure of import
(cenv-py368) $ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import fastparquet
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pauljohn/LinuxDownloads/miniconda3/envs/cenv-py368/lib/python3.6/site-packages/fastparquet/__init__.py", line 5, in <module>
from .core import read_thrift
File "/home/pauljohn/LinuxDownloads/miniconda3/envs/cenv-py368/lib/python3.6/site-packages/fastparquet/core.py", line 9, in <module>
from . import encoding
File "/home/pauljohn/LinuxDownloads/miniconda3/envs/cenv-py368/lib/python3.6/site-packages/fastparquet/encoding.py", line 13, in <module>
from .speedups import unpack_byte_array
File "fastparquet/speedups.pyx", line 1, in init fastparquet.speedups
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject
>>> AA
Do you agree I found a bug?
Seems like either Conda should work, or it should say there is no compatible version of fastparquet.
That error usually indicates that the NumPy is older than is compatible with the library that is using it, in this case fastparquet. Try updating the Python version to 3.7 or 3.8; Python 3.6 and NumPy 1.15 are not within the recommended versions today. (Updating Python to 3.7+ should also update NumPy; this is not usually done when you do conda update ...). Some recipes pin to >= some minimum version, this one did not seem to.
https://numpy.org/neps/nep-0029-deprecation_policy.html#support-table
It is a flaw in the preparation of some Python libraries you are importing. When the authors of a package like fastparquet do not correctly set the minimum compatible version of numpy or python for their package, the Conda environment reconciliation has no way to know that the package is incorrect. Conda offers up the package as a solution, although in fact it is not.
In a larger sense, this is a flaw in the way Conda finds compatible packages. Perhaps it is working as intended, so it is not a bug. But it is a flaw, in the sense that when the user pegs numpy=1.15, then the correct answer from Conda should be "there is no compatible package". However, because Conda relies on the version dependencies of contributed packages, it is not able to do so.
I've not encountered the same problem with packaging for RedHat or Debian Linux systems, they tend to report "nothing" rather than providing an inaccurate match.

Nvidia GeForce 210 compute issue on Ubuntu 18.04

I am using ubuntu 18.04 (I have dual booted windows with ubuntu 18.04).
nvidia-smi
This is the output I got when I ran the above command on my ubuntu(18.04) terminal:
Fri Oct 9 09:33:56 2020
+------------------------------------------------------+
| NVIDIA-SMI 340.108 Driver Version: 340.108 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 210 Off | 0000:01:00.0 N/A | N/A |
| 35% 52C P8 N/A / N/A | 368MiB / 1023MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
Before that, I followed these steps to install required driver on my system:
sudo add-apt-repository --remove ppa:graphics-drivers/ppa
sudo apt-get purge nvidia*
sudo apt autoremove
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo ubuntu-drivers devices
sudo ubuntu-drivers autoinstall
sudo shutdown -r now
When I tried to run Geekbench5 compute benchmark test, the output stopped when it was running Histogram Equalization. This is the output when I ran this ./geekbench5 --compute OpenCL in the folder where I extracted geekbench5:
[1009/092949:FATAL:src/halogen/cuda/cuda_library.cpp(1481)] Failed to load
cuDevicePrimaryCtxRetain: /usr/lib/x86_64-linux-gnu/libcuda.so.1: undefined symbol: cuDevicePrimaryCtxRetain
[1009/092949:FATAL:src/halogen/cuda/cuda_library.cpp(1481)] Failed to load cuDevicePrimaryCtxRetain: /usr/lib/x86_64-linux-gnu/libcuda.so.1: undefined symbol: cuDevicePrimaryCtxRetain
Geekbench 5.2.4 Tryout : https://www.geekbench.com/
Geekbench 5 is in tryout mode.
Geekbench 5 requires an active Internet connection when in tryout mode, and
automatically uploads test results to the Geekbench Browser. Other features
are unavailable in tryout mode.
Buy a Geekbench 5 license to enable offline use and remove the limitations of
tryout mode.
If you would like to purchase Geekbench you can do so online:
https://store.primatelabs.com/v5
If you have already purchased Geekbench, enter your email address and license
key from your email receipt with the following command line:
./geekbench5 -r <email address> <license key>
Running Gathering system information
System Information
Operating System Ubuntu 18.04.5 LTS 4.15.0-118-generic x86_64
Model To be filled by O.E.M. To be filled by O.E.M.
Motherboard O.E.M Intel H81
BIOS American Megatrends Inc. 4.6.5
Processor Information
Name Intel Core i5-4460
Topology 1 Processor, 4 Cores
Identifier GenuineIntel Family 6 Model 60 Stepping 3
Base Frequency 3.20 GHz
L1 Instruction Cache 32.0 KB x 2
L1 Data Cache 32.0 KB x 2
L2 Cache 256 KB x 2
L3 Cache 6.00 MB
Memory Information
Size 7.75 GB
OpenCL Information
Platform Vendor NVIDIA Corporation
Platform Name NVIDIA CUDA
Device Vendor NVIDIA Corporation
Device Name GeForce 210
Device Driver Version 340.108
Maximum Frequency 1.23 GHz
Compute Units 2
Device Memory 1024 MB
OpenCL
Running Sobel
Running Canny
Running Stereo Matching
Running Histogram Equalization
[1009/093329:ERROR:src/interface/console/consolemain.cpp(808)] Geekbench encountered an internal error and cannot continue. Please contact support#primatelabs.com for assistance.
Internal error message: clCreateImage returned -40.
Also, when I tried running the geekbench5 compute benchmark test on windows 10(same machine, on GUI), it paused running at Histogram equalization.
I am not getting any idea why this is happening.Is anything really wrong with my GPU or driver or anything else? I tried to search online, installed the driver again,rebooted the system, but the results are same. Can someone please help?
Your driver installation is fine, but your GPU is 11 years old and does not support some of the more recent features of the OpenCL standard. The geekbench error message -40 means that the image size geekbench uses for one of its benchmarks is not supported by your GPU. This causes the benchmark to crash. Maybe an older version of geekbench still works.

kaggle api not showing everything

!kaggle competitions files -c planet-understanding-the-amazon-from-space
I ran the code in above, but it does not show me all the files. It only shows:
!kaggle competitions files -c planet-understanding-the-amazon-from-space
Warning: Looks like you're using an outdated API Version, please consider updating (server 1.5.6 /
client 1.5.4)
name size creationDate
------------------------------------------------- ----- -------------------
train_v2.csv/train_v2.csv 1MB 2019-12-15 22:14:13
sample_submission_v2.csv/sample_submission_v2.csv 3MB 2019-12-15 22:14:13
test_v2_file_mapping.csv/test_v2_file_mapping.csv 600KB 2019-12-15 22:14:13
Kaggle-planet-test-tif.torrent 2MB 2019-12-15 22:14:13
Kaggle-planet-train-tif.torrent 1MB 2019-12-15 22:14:13
It should show this, I can not download the train jpg tar file: all file image
[https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/data][2]
My colab: https://colab.research.google.com/drive/19hLo3NN_NY6tbXjEpWOI5y_bEVujikTF

Container is running beyond memory limits - RECEIVED SIGNAL 15: SIGTERM

I implemented model prediction in oozie workflow and i got error "Container is running beyond memory limits" on step 3 i.e. model1.predict_proba. Table1 has 27 Million records. It run fine on jyupiter notebook but i got this error on oozie. Can someone please help.
d1 = sqlContext.sql("SELECT * FROM table1").toPandas()
xyz= d1.drop(['abc'], axis = 1)
modelprob = model1.predict_proba(xyz)[:,1]
Error : Yarn Logs
Application application_1547693435775_8741566 failed 2 times due to AM Container for appattempt_1547693435775_8741566_000002 exited with exitCode: -104
For more detailed output, check application tracking page:https://xyz
Diagnostics: Container [pid=224941,containerID=container_e167_1547693435775_8741566_02_000002] is running beyond physical memory limits. Current usage: 121.2 GB of 121 GB physical memory used; 226.9 GB of 254.1 GB virtual memory used. Killing container.
2019-04-15 22:43:36,231 [dispatcher-event-loop-10] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_5_piece0 on xyz.corp.intranet:34252 in memory (size: 5.6 KB, free: 6.2 GB)
2019-04-15 22:43:36,231 [dispatcher-event-loop-35] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_5_piece0 on xyz1.corp.intranet:38363 in memory (size: 5.6 KB, free: 6.2 GB)
2019-04-15 22:43:36,242 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned accumulator 4
2019-04-15 22:43:36,245 [dispatcher-event-loop-51] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_2_piece0 on xyz3 in memory (size: 53.5 KB, free: 52.8 GB)
2019-04-15 22:43:36,245 [dispatcher-event-loop-51] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_2_piece0 on xyz4.corp.intranet:46309 in memory (size: 53.5 KB, free: 6.2 GB)
2019-04-15 22:43:36,248 [dispatcher-event-loop-9] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_2_piece0 on xyz5.corp.intranet:44850 in memory (size: 53.5 KB, free: 6.2 GB)
2019-04-15 22:45:48,103 [SIGTERM handler] INFO org.apache.spark.deploy.yarn.ApplicationMaster - Final app status: FAILED, exitCode: 16
2019-04-15 22:45:48,106 [SIGTERM handler] ERROR org.apache.spark.deploy.yarn.ApplicationMaster - RECEIVED SIGNAL 15: SIGTERM
2019-04-15 22:45:48,124 [Thread-5] INFO org.apache.spark.SparkContext - Invoking stop() from shutdown hook
below are sparkconf parameters :
sconf = SparkConf().setAppName("xyz model").set("spark.driver.memory", "8g").set('spark.executor.memory', '12g').set("spark.yarn.am.memory", "8g").set('spark.dynamicAllocation.enabled', 'true').set('spark.dynamicAllocation.minExecutors', 20').set('spark.dynamicAllocation.maxExecutors', '60').set("spark.shuffle.service.enabled", "true").set('spark.kryoserializer.buffer.max.mb', '2047').set("spark.shuffle.blockTransferService", "nio").set("spark.driver.maxResultSize", "4g").set('spark.rpc.message.maxSize', '330').setMaster("yarn-cluster")
sc = SparkContext(conf=sconf)
below are sprkopts parameters :
sparkopts=--executor-memory 115g --num-executors 60 --driver-memory 110g --executor-cores 16 --driver-cores 2 --conf "spark.dynamicAllocation.enabled=true" --conf "spark.kryoserializer.buffer.max=2047m" --conf "spark.driver.maxResultSize=4096m" --conf spark.yarn.executor.memoryOverhead=8000 --conf "spark.network.timeout=10000000" --conf "spark.executor.extraJavaOptions=-XX:+UseCompressedOops -XX:PermSize=2048M -XX:MaxPermSize=2048M -XX:+UseG1GC" --conf "spark.broadcast.compress=true" --conf "spark.broadcast.blockSize=128m" --conf "spark.serializer.objectStreamReset=2" --conf spark.executorEnv.PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python --files ${xyz}/hive-site.xml --files ${xyz}/yarn-site.xml

hashcat benchmark not starting at all

I'm trying to start hashcat on Win10 using my GPU AMD RX580.
What I did so far is:
Uninstalled all AMD drivers and rebooted
Started Driver Fusion cleaned all AMD Display drivers and rebooted
Manually removed OpenCL.dll from C:\windows\system32\ and c:\windows\syswow64 and rebooted
Installed AMD Crimsson driver edition - tried 17.8.2, 17.11.1, 17.11.4
My problem:
When I try to start a benchmark for WPA2 hash with PS D:\crack\hashcat-5.1.0> .\hashcat64.exe -m 2500 -b
It starts and just quits without any output:
PS D:\crack\hashcat-5.1.0> .\hashcat64.exe -m 2500 -b
hashcat (v5.1.0) starting in benchmark mode...
Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.
* Device #2: Not a native Intel OpenCL runtime. Expect massive speed loss.
You can use --force to override, but do not report related errors.
OpenCL Platform #1: Advanced Micro Devices, Inc.
================================================
* Device #1: Ellesmere, 3264/4096 MB allocatable, 36MCU
* Device #2: Intel(R) Core(TM) i7-4790 CPU # 3.60GHz, skipped.
Benchmark relevant options:
===========================
* --optimized-kernel-enable
Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4096)
PS D:\crack\hashcat-5.1.0>
Output from hashcat -I:
PS D:\crack\hashcat-5.1.0> .\hashcat64.exe -I
hashcat (v5.1.0) starting...
OpenCL Info:
Platform ID #1
Vendor : Advanced Micro Devices, Inc.
Name : AMD Accelerated Parallel Processing
Version : OpenCL 2.0 AMD-APP (2442.8)
Device ID #1
Type : GPU
Vendor ID : 1
Vendor : Advanced Micro Devices, Inc.
Name : Ellesmere
Version : OpenCL 2.0 AMD-APP (2442.8)
Processor(s) : 36
Clock : 1366
Memory : 3264/4096 MB allocatable
OpenCL Version : OpenCL C 2.0
Driver Version : 2442.8
Device ID #2
Type : CPU
Vendor ID : 128
Vendor : GenuineIntel
Name : Intel(R) Core(TM) i7-4790 CPU # 3.60GHz
Version : OpenCL 1.2 AMD-APP (2442.8)
Processor(s) : 8
Clock : 3599
Memory : 6131/24526 MB allocatable
OpenCL Version : OpenCL C 1.2
Driver Version : 2442.8 (sse2,avx)
PS D:\crack\hashcat-5.1.0>
The same happens if I try to start mask attack.
Other benchmarks seems to work but when it reaches the WPA2 it just returns me to the command prompt:
PS D:\crack\hashcat-5.1.0> .\hashcat64.exe -b
hashcat (v5.1.0) starting in benchmark mode...
Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.
* Device #2: Not a native Intel OpenCL runtime. Expect massive speed loss.
You can use --force to override, but do not report related errors.
OpenCL Platform #1: Advanced Micro Devices, Inc.
================================================
* Device #1: Ellesmere, 3264/4096 MB allocatable, 36MCU
* Device #2: Intel(R) Core(TM) i7-4790 CPU # 3.60GHz, skipped.
Benchmark relevant options:
===========================
* --optimized-kernel-enable
Hashmode: 0 - MD5
Speed.#1.........: 12381.1 MH/s (96.87ms) # Accel:256 Loops:512 Thr:256 Vec:1
Hashmode: 100 - SHA1
Speed.#1.........: 4268.9 MH/s (70.02ms) # Accel:256 Loops:128 Thr:256 Vec:1
Hashmode: 1400 - SHA2-256
Speed.#1.........: 1870.1 MH/s (80.00ms) # Accel:256 Loops:64 Thr:256 Vec:1
Hashmode: 1700 - SHA2-512
Speed.#1.........: 461.1 MH/s (81.22ms) # Accel:128 Loops:32 Thr:256 Vec:1
Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4096)
PS D:\crack\hashcat-5.1.0>
Any idea what could cause this behavior?
It looks like it was a crash in the OpenCL runtime. I have installed the newest Adrenalin Drivers - 19.3.2 , but went into another issue :
PS D:\crack\hashcat-5.1.0> .\hashcat64.exe -b -m 2500
hashcat (v5.1.0) starting in benchmark mode...
Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.
OpenCL Platform #1: Advanced Micro Devices, Inc.
================================================
* Device #1: Ellesmere, 3264/4096 MB allocatable, 36MCU
Benchmark relevant options:
===========================
* --optimized-kernel-enable
Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4096)
* Device #1: ATTENTION! OpenCL kernel self-test failed.
Your device driver installation is probably broken.
See also: https://hashcat.net/faq/wrongdriver
Speed.#1.........: 179.3 kH/s (87.14ms) # Accel:128 Loops:64 Thr:256 Vec:1
Started: Mon Mar 18 19:05:55 2019
Stopped: Mon Mar 18 19:06:26 2019
PS D:\crack\hashcat-5.1.0>
Then I used the --self-test-disable option and it started . I have also created a post in the hashcat forum : https://hashcat.net/forum/thread-8226-post-44141.html#pid44141 in case anyone is interested .