Newbie here. I have a big finite analysis code that needs to be run with high performance computing. People keep telling me Intel compiler usually gives better speed (I used to use gcc before). And I found that it is true on our Intel clusters. But recently we have a new AMD cluster. I am confused about how to use the compiling options of icpc to optimize the program.
Basically, I have two questions:
Questions 1
Here is the cluster with AMD chips:
processor : 63
vendor_id : AuthenticAMD
cpu family : 21
model : 2
model name : AMD Opteron(tm) Processor 6378
stepping : 0
cpu MHz : 2399.837
cache size : 2048 KB
physical id : 2
siblings : 16
core id : 7
cpu cores : 8
apicid : 79
initial apicid : 79
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr tbm topoext perfctr_core cpb npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bogomips : 4799.73
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate [9] [10]
When I compile a small code using icpc hello.cpp -O3 -xP, I do not know exactly what options should I use? I found the errors are:
$ /usr/bin/time -p ./a.out
Fatal Error: This program was not built to run on the processor in your system.
The allowed processors are: Intel(R) Pentium(R) 4 and compatible Intel processors with Intel(R) Streaming SIMD Extensions 3 (Intel(R) SSE3) instruction support.
real 0.00
user 0.00
sys 0.00
Question-2
If I want the binaries be used for both Intel chips cluster and AMD chips cluster, should I use different options to compile the code?
Intel compilers don't always work with AMD chips especially with certain flags like -xP (now -xSSE3, see here). Specifically -xSSE3/-xP tells the compiler: May generate Intel® SSE3, SSE2, and SSE instructions for Intel® processors. Optimizes for Intel processors that support Intel® SSE3 instructions. For OS X* systems, this value is only supported on IA-32 architecture. This replaces value P, which is deprecated.
That document also has this quote: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
You can try to optimize with icc and icpc, but I'm not sure it will work on AMD chips. For compilers other than gcc you could look at clang, PGI, or Cray compilers (if you are on a Cray system).
If you're trying to create binaries for both architectures, I'm not sure you'll be able to do heavy optimizations due to differences in cache line size and other architecture specific settings.
Related
I compile a macOS driverkit system extension as a Universal library so that it contains both x86_64 and arm64. One Apple Silicon computer A the driver starts when I attach the USB device. On Apple Silicon computer B I can see kernel: exec_mach_imgact: disallowing arm64 platform driverkit binary "com.example.driver", should be arm64e being printed in the Console.app when the USB device is attached. I've looked at the source code
of where this is happening but I cannot figure out what the problem is.
If I compile it for arm64e then it get exec_mach_imgact: not running binary "com.example.driver" built against preview arm64e on computer A, bit then it starts on computer B.
None of the computers have -arm64e_preview_abi set in the boot-args.
If I create a new Xcode (12.4) project on each machine and build Release then computer A and otool -fvv com.example.driver gives
Fat headers
fat_magic FAT_MAGIC
nfat_arch 2
architecture x86_64
cputype CPU_TYPE_X86_64
cpusubtype CPU_SUBTYPE_X86_64_ALL
capabilities 0x0
offset 16384
size 73856
align 2^14 (16384)
architecture arm64
cputype CPU_TYPE_ARM64
cpusubtype CPU_SUBTYPE_ARM64_ALL
capabilities 0x0
offset 98304
size 73856
align 2^14 (16384)
On computer B the same command gives
Fat headers
fat_magic FAT_MAGIC
nfat_arch 2
architecture x86_64
cputype CPU_TYPE_X86_64
cpusubtype CPU_SUBTYPE_X86_64_ALL
capabilities 0x0
offset 16384
size 73280
align 2^14 (16384)
architecture arm64
cputype CPU_TYPE_ARM64
cpusubtype CPU_SUBTYPE_ARM64_ALL
capabilities 0x0
offset 98304
size 73296
align 2^14 (16384)
How can I make the driver start on both machines?
Dexts should indeed be arm64 and x86_64 (but as pmdj explains, system binaries are still arm64e.)
As hinted by the name of (and need for) the -arm64e_preview_abi, arm64e is currently only exposed as a developer preview, to allow for testing.
However, you shouldn't get the disallowing arm64 error: did you set other interesting boot-args on computer B? (in particular, amfi= may be relevant)
My experience so far indicates that arm64e is the correct and only correct Apple Silicon architecture to use for dexts.
For one, there's the "disallowing arm64 platform" error, and also Apple's own DriverKit based drivers are built for arm64e:
% otool -fvv /System/Library/DriverExtensions/com.apple.AppleUserHIDDrivers.dext/com.apple.AppleUserHIDDrivers
Fat headers
fat_magic FAT_MAGIC
nfat_arch 2
architecture x86_64
cputype CPU_TYPE_X86_64
cpusubtype CPU_SUBTYPE_X86_64_ALL
capabilities 0x0
offset 16384
size 96208
align 2^14 (16384)
architecture arm64e
cputype CPU_TYPE_ARM64
cpusubtype CPU_SUBTYPE_ARM64E
capabilities CPU_SUBTYPE_ARM64E_PTRAUTH_VERSION 0
offset 114688
size 95312
align 2^14 (16384)
That leaves the question of why your arm64e build isn't working. The "built against preview arm64e" error suggests the problem isn't with the computer but the binary. Are you using identical binaries on the 2 systems? Perhaps one has SIP disabled, so it's more permissive of badly built binaries?
Have you tried a "hello world" style dext, in a freshly created project on the latest version of Xcode? Check that runs natively on both machines. Once that's working, compare Xcode's compiler and linker command lines with those from your build script - or if you're also using Xcode, compare your target's build settings with the "clean" one.
I was trying to Retrain an object detection model for Google coral accelerator as per the below link
https://coral.ai/docs/edgetpu/retrain-detection/#prerequisites
The host system is based on Linux Mint with docker environment
CPU : Intel(R) Core(TM) i3-5005U CPU # 2.00GHz
Graphics: Card: Intel HD Graphics 5500
OS : Linux Mint 19 Tara
Memory Size: 8G
But after Starting the training job
root#beaa5d65a1d5:/tensorflow/models/research# ./retrain_detection_model.sh --num_training_steps ${NUM_TRAINING_STEPS} --num_eval_steps ${NUM_EVAL_STEPS}
The process is killed by OOM killer
./retrain_detection_model.sh: line 45: 86 Killed
python object_detection/model_main.py
--pipeline_config_path="${CKPT_DIR}/pipeline.config" --model_dir="${TRAIN_DIR}" --num_train_steps="${num_training_steps}" --num_eval_steps="${num_eval_steps}"
Any help is appreciated!
This is an out of memory issue due to HW limitation. 2 things you can do is to either add more RAM or Swapspace (using storage as RAM). Although going with the later will be very slow.
I have an arm board and would like to run mono on it...
~ # cat /proc/cpuinfo
Processor : ARM926EJ-S rev 5 (v5l)
BogoMIPS : 131.48
Features : swp half fastmult edsp java
CPU implementer : 0x41
CPU architecture: 5TEJ
CPU variant : 0x0
CPU part : 0x926
CPU revision : 5
And cross compiler gcc 3.3.4 with uclibc.
Does anyone has any idea if this is even enough to crosscompile mono... It is a Armv5 so ?
Did anyone succeed to build mono for similar configuration? Can you please give me configuration instructions for cross compilation?
Thank you!
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
I usually don't use OS X because I run a linux partition. Been needing to access xcode so I've been using it again. It just shuts off all of a sudden and reboots. It seems like it does it more often with my printer plugged into USB. I'm running Mint and Blag on it, but I have never had any issues when running on those OSs even with the printer plugged in. Anyone make heads or tales of this stack?
Interval Since Last Panic Report: 17556 sec
Panics Since Last Report: 2
Anonymous UUID: 6DA1700F-1927-B6CB-7922-15F95FA2CC21
Wed Sep 11 18:42:32 2013
panic(cpu 0 caller 0xffffff80292ac90b): Releasing non-exclusive RW lock without a reader refcount!
Backtrace (CPU 0), Frame : Return Address
0xffffff8119993820 : 0xffffff802921d626
0xffffff8119993890 : 0xffffff80292ac90b
0xffffff81199938b0 : 0xffffff802931c09c
0xffffff8119993ae0 : 0xffffff8029308fab
0xffffff8119993b90 : 0xffffff80292fbb49
0xffffff8119993c40 : 0xffffff80292fc314
0xffffff8119993f50 : 0xffffff80295e182a
0xffffff8119993fb0 : 0xffffff80292ced33
BSD process name corresponding to current thread: installd
Mac OS version:
12C60
Kernel version:
Darwin Kernel Version 12.2.0: Sat Aug 25 00:48:52 PDT 2012; root:xnu- 2050.18.24~1/RELEASE_X86_64
Kernel UUID: 69A5853F-375A-3EF4-9247-478FD0247333
Kernel slide: 0x0000000029000000
Kernel text base: 0xffffff8029200000
System model name: Macmini5,1 (Mac-8ED6AF5B48C039E1)
System uptime in nanoseconds: 508482133713
last loaded kext at 31448390032: com.apple.driver.AppleHWSensor 1.9.5d0 (addr 0xffffff7faaa92000, size 36864)
last unloaded kext at 102515839551: com.apple.driver.AppleUSBUHCI 5.2.5 (addr 0xffffff7fa9b17000, size 65536)
loaded kexts:
com.apple.driver.AppleHWSensor 1.9.5d0
com.apple.driver.AudioAUUC 1.60
com.apple.iokit.IOBluetoothSerialManager 4.0.9f33
com.apple.driver.ApplePlatformEnabler 2.0.5d4
com.apple.driver.AGPM 100.12.69
com.apple.driver.AppleMikeyHIDDriver 122
com.apple.filesystems.autofs 3.0
com.apple.driver.AppleHDA 2.3.1f2
com.apple.driver.AppleMikeyDriver 2.3.1f2
com.apple.iokit.BroadcomBluetoothHCIControllerUSBTransport 4.0.9f33
com.apple.driver.AppleUpstreamUserClient 3.5.10
com.apple.driver.AppleMCCSControl 1.0.33
com.apple.driver.AppleSMCPDRC 1.0.0
com.apple.iokit.IOUserEthernet 1.0.0d1
com.apple.Dont_Steal_Mac_OS_X 7.0.0
com.apple.driver.ApplePolicyControl 3.2.11
com.apple.driver.ACPI_SMC_PlatformPlugin 1.0.0
com.apple.driver.AppleLPC 1.6.0
com.apple.driver.AppleIntelHD3000Graphics 8.0.0
com.apple.driver.AppleIntelSNBGraphicsFB 8.0.0
com.apple.driver.AppleIRController 320.15
com.apple.AppleFSCompression.AppleFSCompressionTypeDataless 1.0.0d1
com.apple.AppleFSCompression.AppleFSCompressionTypeZlib 1.0.0d1
com.apple.BootCache 34
com.apple.driver.XsanFilter 404
com.apple.iokit.IOAHCIBlockStorage 2.2.2
com.apple.driver.AppleUSBHub 5.2.5
com.apple.driver.AppleFWOHCI 4.9.6
com.apple.driver.AirPort.Brcm4331 602.15.22
com.apple.iokit.AppleBCM5701Ethernet 3.2.5b3
com.apple.driver.AppleUSBEHCI 5.4.0
com.apple.driver.AppleAHCIPort 2.4.1
com.apple.driver.AppleSDXC 1.2.2
com.apple.driver.AppleEFINVRAM 1.6.1
com.apple.driver.AppleACPIButtons 1.6
com.apple.driver.AppleRTC 1.5
com.apple.driver.AppleHPET 1.7
com.apple.driver.AppleSMBIOS 1.9
com.apple.driver.AppleACPIEC 1.6
com.apple.driver.AppleAPIC 1.6
com.apple.driver.AppleIntelCPUPowerManagementClient 196.0.0
com.apple.nke.applicationfirewall 4.0.39
com.apple.security.quarantine 2
com.apple.driver.AppleIntelCPUPowerManagement 196.0.0
com.apple.iokit.IOSerialFamily 10.0.6
com.apple.kext.triggers 1.0
com.apple.driver.DspFuncLib 2.3.1f2
com.apple.iokit.IOAudioFamily 1.8.9fc10
com.apple.kext.OSvKernDSPLib 1.6
com.apple.iokit.IOFireWireIP 2.2.5
com.apple.iokit.AppleBluetoothHCIControllerUSBTransport 4.0.9f33
com.apple.driver.AppleUSBMergeNub 5.2.5
com.apple.driver.AppleSMBusController 1.0.10d0
com.apple.iokit.IOSurface 86.0.3
com.apple.iokit.IOBluetoothFamily 4.0.9f33
com.apple.driver.AppleGraphicsControl 3.2.11
com.apple.driver.IOPlatformPluginLegacy 1.0.0
com.apple.driver.IOPlatformPluginFamily 5.2.0d16
com.apple.driver.AppleSMC 3.1.4d2
com.apple.driver.AppleSMBusPCI 1.0.10d0
com.apple.iokit.IONDRVSupport 2.3.5
com.apple.driver.AppleHDAController 2.3.1f2
com.apple.iokit.IOHDAFamily 2.3.1f2
com.apple.iokit.IOGraphicsFamily 2.3.5
com.apple.iokit.IOSCSIArchitectureModelFamily 3.5.1
com.apple.driver.AppleThunderboltDPOutAdapter 1.8.5
com.apple.driver.AppleThunderboltDPInAdapter 1.8.5
com.apple.driver.AppleThunderboltDPAdapterFamily 1.8.5
com.apple.driver.AppleThunderboltPCIDownAdapter 1.2.5
com.apple.iokit.IOUSBHIDDriver 5.2.5
com.apple.driver.AppleUSBComposite 5.2.5
com.apple.iokit.IOUSBUserClient 5.2.5
com.apple.driver.AppleThunderboltNHI 1.6.0
com.apple.iokit.IOThunderboltFamily 2.1.1
com.apple.iokit.IOFireWireFamily 4.5.5
com.apple.iokit.IO80211Family 500.15
com.apple.iokit.IOEthernetAVBController 1.0.2b1
com.apple.iokit.IONetworkingFamily 3.0
com.apple.iokit.IOAHCIFamily 2.2.1
com.apple.iokit.IOUSBFamily 5.4.0
com.apple.driver.AppleEFIRuntime 1.6.1
com.apple.iokit.IOHIDFamily 1.8.0
com.apple.iokit.IOSMBusFamily 1.1
com.apple.security.sandbox 220
com.apple.kext.AppleMatch 1.0.0d1
com.apple.security.TMSafetyNet 7
com.apple.driver.DiskImages 344
com.apple.iokit.IOStorageFamily 1.8
com.apple.driver.AppleKeyStore 28.21
com.apple.driver.AppleACPIPlatform 1.6
com.apple.iokit.IOPCIFamily 2.7.2
com.apple.iokit.IOACPIFamily 1.4
com.apple.kec.corecrypto 1.0
Model: Macmini5,1, BootROM MM51.0077.B10, 2 processors, Intel Core i5, 2.3 GHz, 8 GB, SMC 1.76f0
Graphics: Intel HD Graphics 3000, Intel HD Graphics 3000, Built-In, 512 MB
Memory Module: BANK 0/DIMM0, 4 GB, DDR3, 1333 MHz, 0x802C, 0x31364A54463531323634485A2D3147344D31
Memory Module: BANK 1/DIMM0, 4 GB, DDR3, 1333 MHz, 0x802C, 0x31364A54463531323634485A2D3147344D31
AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0xE4), Broadcom BCM43xx 1.0 (5.106.98.81.22)
Bluetooth: Version 4.0.9f33 10885, 2 service, 18 devices, 1 incoming serial ports
Network Service: Wi-Fi, AirPort, en1
Serial ATA Device: APPLE HDD HTS547550A9E384, 500.11 GB
USB Device: hub_device, 0x0424 (SMSC), 0x2513, 0xfd100000 / 2
USB Device: USB Receiver, 0x046d (Logitech Inc.), 0xc52b, 0xfd130000 / 4
USB Device: IR Receiver, apple_vendor_id, 0x8242, 0xfd110000 / 3
USB Device: hub_device, 0x0424 (SMSC), 0x2513, 0xfa100000 / 2
USB Device: HP LaserJet 3050, 0x03f0 (Hewlett Packard), 0x3217, 0xfa130000 / 4
USB Device: BRCM20702 Hub, 0x0a5c (Broadcom Corp.), 0x4500, 0xfa110000 / 3
USB Device: Bluetooth USB Host Controller, apple_vendor_id, 0x8281, 0xfa113000 / 7
Updated to 10.8.4 and problem solved. This solution came from Apple Support, apparently 10.8.2 commonly has this issue. Although I dislike forced updates, this solution worked.
I try to start wp8 emulator, but it always display generic error and failure.
I used coreinfo to show system info
Intel(R) Core(TM) i3 CPU 530 # 2.93GHz
Intel64 Family 6 Model 37 Stepping 2, GenuineIntel
HTT - Hyperthreading enabled
HYPERVISOR * Hypervisor is present
VMX - Supports Intel hardware-assisted virtualization
SVM - Supports AMD hardware-assisted virtualization
EM64T * Supports 64-bit mode
SMX - Supports Intel trusted execution
SKINIT - Supports AMD SKINIT
NX * Supports no-execute page protection
SMEP - Supports Supervisor Mode Execution Prevention
SMAP - Supports Supervisor Mode Access Prevention
PAGE1GB - Supports 1 GB large pages
PAE * Supports > 32-bit physical addresses
PAT * Supports Page Attribute Table
PSE * Supports 4 MB pages
PSE36 * Supports > 32-bit address 4 MB pages
PGE * Supports global bit in page tables
SS * Supports bus snooping for cache operations
VME * Supports Virtual-8086 mode
RDWRFSGSBASE - Supports direct GS/FS base access
FPU * Implements i387 floating point instructions
MMX * Supports MMX instruction set
MMXEXT - Implements AMD MMX extensions
3DNOW - Supports 3DNow! instructions
3DNOWEXT - Supports 3DNow! extension instructions
SSE * Supports Streaming SIMD Extensions
SSE2 * Supports Streaming SIMD Extensions 2
SSE3 * Supports Streaming SIMD Extensions 3
SSSE3 * Supports Supplemental SIMD Extensions 3
SSE4.1 * Supports Streaming SIMD Extensions 4.1
SSE4.2 * Supports Streaming SIMD Extensions 4.2
AES - Supports AES extensions
AVX - Supports AVX intruction extensions
FMA - Supports FMA extensions using YMM state
MSR * Implements RDMSR/WRMSR instructions
MTRR * Supports Memory Type Range Registers
XSAVE - Supports XSAVE/XRSTOR instructions
OSXSAVE - Supports XSETBV/XGETBV instructions
RDRAND - Supports RDRAND instruction
RDSEED - Supports RDSEED instruction
CMOV * Supports CMOVcc instruction
CLFSH * Supports CLFLUSH instruction
CX8 * Supports compare and exchange 8-byte instructions
CX16 * Supports CMPXCHG16B instruction
BMI1 - Supports bit manipulation extensions 1
BMI2 - Supports bit maniuplation extensions 2
ADX - Supports ADCX/ADOX instructions
DCA - Supports prefetch from memory-mapped device
F16C - Supports half-precision instruction
FXSR * Supports FXSAVE/FXSTOR instructions
FFXSR - Supports optimized FXSAVE/FSRSTOR instruction
MONITOR - Supports MONITOR and MWAIT instructions
MOVBE - Supports MOVBE instruction
ERMSB - Supports Enhanced REP MOVSB/STOSB
PCLULDQ - Supports PCLMULDQ instruction
POPCNT * Supports POPCNT instruction
SEP * Supports fast system call instructions
LAHF-SAHF * Supports LAHF/SAHF instructions in 64-bit mode
HLE - Supports Hardware Lock Elision instructions
RTM - Supports Restricted Transactional Memory instructions
DE * Supports I/O breakpoints including CR4.DE
DTES64 - Can write history of 64-bit branch addresses
DS * Implements memory-resident debug buffer
DS-CPL - Supports Debug Store feature with CPL
PCID - Supports PCIDs and settable CR4.PCIDE
INVPCID - Supports INVPCID instruction
PDCM - Supports Performance Capabilities MSR
RDTSCP * Supports RDTSCP instruction
TSC * Supports RDTSC instruction
TSC-DEADLINE - Local APIC supports one-shot deadline timer
TSC-INVARIANT * TSC runs at constant rate
xTPR - Supports disabling task priority messages
EIST - Supports Enhanced Intel Speedstep
ACPI - Implements MSR for power management
TM - Implements thermal monitor circuitry
TM2 - Implements Thermal Monitor 2 control
APIC * Implements software-accessible local APIC
x2APIC * Supports x2APIC
CNXT-ID - L1 data cache mode adaptive or BIOS
MCE * Supports Machine Check, INT18 and CR4.MCE
MCA * Implements Machine Check Architecture
PBE - Supports use of FERR#/PBE# pin
PSN - Implements 96-bit processor serial number
PREFETCHW * Supports PREFETCHW instruction
Logical to Physical Processor Map:
* Physical Processor 0
Logical Processor to Socket Map:
* Socket 0
Logical Processor to NUMA Node Map:
* NUMA Node 0
Logical Processor to Cache Map:
* Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64
* Instruction Cache 0, Level 1, 32 KB, Assoc 4, LineSize 64
* Unified Cache 0, Level 2, 256 KB, Assoc 8, LineSize 64
* Unified Cache 1, Level 3, 4 MB, Assoc 16, LineSize 64
Logical Processor to Group Map:
* Group 0
which hardware do I need to change?
Your comment welcome
You should execute the following to verify that the EPT - Supports Extended page tables (SLAT) is available.
c:> coreinfo -v
This flag will show only the virtualization features of your CPU.
For example, my output:
Coreinfo v3.2 - Dump information on system CPU and memory topology
Copyright (C) 2008-2012 Mark Russinovich
Sysinternals - www.sysinternals.com
Intel(R) Core(TM)2 Quad CPU Q9650 # 3.00GHz
Intel64 Family 6 Model 23 Stepping 10, GenuineIntel
HYPERVISOR - Hypervisor is present
VMX * Supports Intel hardware-assisted virtualization
EPT - Supports Intel extended page tables (SLAT)
( asterisk means your CPU has it )
In this case, my machine does NOT support the ability to run the WP8 emulator because I do not have EPT (SLAT) capability. You generally need an Intel i series processor.
You may have another virtualization software like VMware installed, the output shows that a hypervisor is present.
Your CPU (Intel64 Family 6 Model 37 Stepping 2) is a 5300 aka Clovertown aka Core-Based-Xeon CPU.
Win8 Hyper-V needs SLAT (Second Level Address Translation) or (in Intel terms) EPT (Extended Page Table support,
which is only available on Nehalem-based Xeon (or newer) CPUs.