I try to start wp8 emulator, but it always display generic error and failure.
I used coreinfo to show system info
Intel(R) Core(TM) i3 CPU 530 # 2.93GHz
Intel64 Family 6 Model 37 Stepping 2, GenuineIntel
HTT - Hyperthreading enabled
HYPERVISOR * Hypervisor is present
VMX - Supports Intel hardware-assisted virtualization
SVM - Supports AMD hardware-assisted virtualization
EM64T * Supports 64-bit mode
SMX - Supports Intel trusted execution
SKINIT - Supports AMD SKINIT
NX * Supports no-execute page protection
SMEP - Supports Supervisor Mode Execution Prevention
SMAP - Supports Supervisor Mode Access Prevention
PAGE1GB - Supports 1 GB large pages
PAE * Supports > 32-bit physical addresses
PAT * Supports Page Attribute Table
PSE * Supports 4 MB pages
PSE36 * Supports > 32-bit address 4 MB pages
PGE * Supports global bit in page tables
SS * Supports bus snooping for cache operations
VME * Supports Virtual-8086 mode
RDWRFSGSBASE - Supports direct GS/FS base access
FPU * Implements i387 floating point instructions
MMX * Supports MMX instruction set
MMXEXT - Implements AMD MMX extensions
3DNOW - Supports 3DNow! instructions
3DNOWEXT - Supports 3DNow! extension instructions
SSE * Supports Streaming SIMD Extensions
SSE2 * Supports Streaming SIMD Extensions 2
SSE3 * Supports Streaming SIMD Extensions 3
SSSE3 * Supports Supplemental SIMD Extensions 3
SSE4.1 * Supports Streaming SIMD Extensions 4.1
SSE4.2 * Supports Streaming SIMD Extensions 4.2
AES - Supports AES extensions
AVX - Supports AVX intruction extensions
FMA - Supports FMA extensions using YMM state
MSR * Implements RDMSR/WRMSR instructions
MTRR * Supports Memory Type Range Registers
XSAVE - Supports XSAVE/XRSTOR instructions
OSXSAVE - Supports XSETBV/XGETBV instructions
RDRAND - Supports RDRAND instruction
RDSEED - Supports RDSEED instruction
CMOV * Supports CMOVcc instruction
CLFSH * Supports CLFLUSH instruction
CX8 * Supports compare and exchange 8-byte instructions
CX16 * Supports CMPXCHG16B instruction
BMI1 - Supports bit manipulation extensions 1
BMI2 - Supports bit maniuplation extensions 2
ADX - Supports ADCX/ADOX instructions
DCA - Supports prefetch from memory-mapped device
F16C - Supports half-precision instruction
FXSR * Supports FXSAVE/FXSTOR instructions
FFXSR - Supports optimized FXSAVE/FSRSTOR instruction
MONITOR - Supports MONITOR and MWAIT instructions
MOVBE - Supports MOVBE instruction
ERMSB - Supports Enhanced REP MOVSB/STOSB
PCLULDQ - Supports PCLMULDQ instruction
POPCNT * Supports POPCNT instruction
SEP * Supports fast system call instructions
LAHF-SAHF * Supports LAHF/SAHF instructions in 64-bit mode
HLE - Supports Hardware Lock Elision instructions
RTM - Supports Restricted Transactional Memory instructions
DE * Supports I/O breakpoints including CR4.DE
DTES64 - Can write history of 64-bit branch addresses
DS * Implements memory-resident debug buffer
DS-CPL - Supports Debug Store feature with CPL
PCID - Supports PCIDs and settable CR4.PCIDE
INVPCID - Supports INVPCID instruction
PDCM - Supports Performance Capabilities MSR
RDTSCP * Supports RDTSCP instruction
TSC * Supports RDTSC instruction
TSC-DEADLINE - Local APIC supports one-shot deadline timer
TSC-INVARIANT * TSC runs at constant rate
xTPR - Supports disabling task priority messages
EIST - Supports Enhanced Intel Speedstep
ACPI - Implements MSR for power management
TM - Implements thermal monitor circuitry
TM2 - Implements Thermal Monitor 2 control
APIC * Implements software-accessible local APIC
x2APIC * Supports x2APIC
CNXT-ID - L1 data cache mode adaptive or BIOS
MCE * Supports Machine Check, INT18 and CR4.MCE
MCA * Implements Machine Check Architecture
PBE - Supports use of FERR#/PBE# pin
PSN - Implements 96-bit processor serial number
PREFETCHW * Supports PREFETCHW instruction
Logical to Physical Processor Map:
* Physical Processor 0
Logical Processor to Socket Map:
* Socket 0
Logical Processor to NUMA Node Map:
* NUMA Node 0
Logical Processor to Cache Map:
* Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64
* Instruction Cache 0, Level 1, 32 KB, Assoc 4, LineSize 64
* Unified Cache 0, Level 2, 256 KB, Assoc 8, LineSize 64
* Unified Cache 1, Level 3, 4 MB, Assoc 16, LineSize 64
Logical Processor to Group Map:
* Group 0
which hardware do I need to change?
Your comment welcome
You should execute the following to verify that the EPT - Supports Extended page tables (SLAT) is available.
c:> coreinfo -v
This flag will show only the virtualization features of your CPU.
For example, my output:
Coreinfo v3.2 - Dump information on system CPU and memory topology
Copyright (C) 2008-2012 Mark Russinovich
Sysinternals - www.sysinternals.com
Intel(R) Core(TM)2 Quad CPU Q9650 # 3.00GHz
Intel64 Family 6 Model 23 Stepping 10, GenuineIntel
HYPERVISOR - Hypervisor is present
VMX * Supports Intel hardware-assisted virtualization
EPT - Supports Intel extended page tables (SLAT)
( asterisk means your CPU has it )
In this case, my machine does NOT support the ability to run the WP8 emulator because I do not have EPT (SLAT) capability. You generally need an Intel i series processor.
You may have another virtualization software like VMware installed, the output shows that a hypervisor is present.
Your CPU (Intel64 Family 6 Model 37 Stepping 2) is a 5300 aka Clovertown aka Core-Based-Xeon CPU.
Win8 Hyper-V needs SLAT (Second Level Address Translation) or (in Intel terms) EPT (Extended Page Table support,
which is only available on Nehalem-based Xeon (or newer) CPUs.
Related
I am running the latest Proxmox (6.3-3 at this time, fully updated) and attempting to passthrough the onboard GPU on my Core i7 4770 CPU to a Windows 10 VM. I have already enabled iommu on the system and also told grub to not let the system claim the device by adding intel_iommu=on video=efifb:off to the grub kernel options. I've verified IOMMU is actually available by checking dmesg
# dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
[ 0.007556] ACPI: DMAR 0x00000000D88C33C8 0000B8 (v01 INTEL HSW 00000001 INTL 00000001)
[ 0.083595] DMAR: IOMMU enabled
[ 0.180445] DMAR: Host address width 39
[ 0.180446] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.180449] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
[ 0.180449] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[ 0.180451] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008020660462 ecap f010da
[ 0.180452] DMAR: RMRR base: 0x000000d8842000 end: 0x000000d884efff
[ 0.180452] DMAR: RMRR base: 0x000000db000000 end: 0x000000df1fffff
[ 0.180454] DMAR-IR: IOAPIC id 8 under DRHD base 0xfed91000 IOMMU 1
[ 0.180454] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[ 0.180455] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.180831] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 0.874497] DMAR: No ATSR found
[ 0.874527] DMAR: dmar0: Using Queued invalidation
[ 0.874531] DMAR: dmar1: Using Queued invalidation
[ 1.026818] DMAR: Intel(R) Virtualization Technology for Directed I/O
I've also added the iGPU (and associated audio device) to blacklist to prevent the host OS from claiming it:
# cat /etc/modprobe.d/blacklist.conf
blacklist snd_hda_intel
blacklist snd_hda_codec_hdmi
blacklist i915
# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=8086:0412 disable_vga=1
Finally, I setup a new Windows 10 VM on my host along with the q35 chipset and uEFI (OVMF) BIOS as this is apparently the most "compatible" way to pass through hardware. I've also got an external screen plugged into the HDMI port of my Proxmox host. I understand when the VM boots up, I should see this screen come to life. The qemu config file of the VM is below:
agent: 1
balloon: 0
bios: ovmf
boot: order=virtio0;ide2;net0
cores: 4
efidisk0: local-1tb-nvme-thinpool:vm-118-disk-1,size=4M
hostpci0: 00:02,pcie=1,x-vga=1
ide2: none,media=cdrom
machine: q35
memory: 4096
name: VFIOtest
net0: virtio=52:D7:02:CA:B6:2E,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=cd9d41e9-d8c2-465e-94dc-798aa8e517e2
sockets: 1
virtio0: local-1tb-nvme-thinpool:vm-118-disk-0,backup=0,discard=on,size=60G
vmgenid: 2cb8ce5e-5dda-4870-9cf3-774bb025057f
Once I've done that I can boot the VM. As soon as I boot the VM, the screen goes to standby indicating no signal. I can however then RDP into the system and I see that the Intel HD Graphics 4600 is visible in device manager. So I installed the latest drivers from the Intel website. Unfortunately, the device will not start and shows an exclamation mark next to it. The Device Status shows
Windows has stopped this device because it has reported problems. (Code 43)
Unfortunately, the code 43 error just means something is wrong, it isn't very specific on what is causing this.
Not too sure what to try from this point on - any assistance on where to continue fixing this would be useful.
Code 43 is a NVIDIA specific error; you will need a way to mask the true CPU by using the FancyId parameter. Here is a link to a video that covers some of the process revolving around the error you are seeing.
Can you edit the original post to contain your grub config file? There are some more recent changes to Proxmox 6.3 that might need to be reconfigured; there are almost no articles about setting up passthrough on 6.3.
I found it came down to setting the CPU model during VM creation. Changing it after VM creation does nothing so something must be set during creation. None of the other guides worked for me so I solved the problem and made my own guide https://elijahliedtke.medium.com/home-lab-guides-proxmox-6-pci-e-passthrough-with-nvidia-43ccfb9424de
I was trying to Retrain an object detection model for Google coral accelerator as per the below link
https://coral.ai/docs/edgetpu/retrain-detection/#prerequisites
The host system is based on Linux Mint with docker environment
CPU : Intel(R) Core(TM) i3-5005U CPU # 2.00GHz
Graphics: Card: Intel HD Graphics 5500
OS : Linux Mint 19 Tara
Memory Size: 8G
But after Starting the training job
root#beaa5d65a1d5:/tensorflow/models/research# ./retrain_detection_model.sh --num_training_steps ${NUM_TRAINING_STEPS} --num_eval_steps ${NUM_EVAL_STEPS}
The process is killed by OOM killer
./retrain_detection_model.sh: line 45: 86 Killed
python object_detection/model_main.py
--pipeline_config_path="${CKPT_DIR}/pipeline.config" --model_dir="${TRAIN_DIR}" --num_train_steps="${num_training_steps}" --num_eval_steps="${num_eval_steps}"
Any help is appreciated!
This is an out of memory issue due to HW limitation. 2 things you can do is to either add more RAM or Swapspace (using storage as RAM). Although going with the later will be very slow.
I'm using Tensorflow on a cluster and I want to tell Tensorflow to run only on one single core (even though there are more available).
Does someone know if this is possible?
To run Tensorflow on one single CPU thread, I use:
session_conf = tf.ConfigProto(
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
sess = tf.Session(config=session_conf)
device_count limits the number of CPUs being used, not the number of cores or threads.
tensorflow/tensorflow/core/protobuf/config.proto says:
message ConfigProto {
// Map from device type name (e.g., "CPU" or "GPU" ) to maximum
// number of devices of that type to use. If a particular device
// type is not found in the map, the system picks an appropriate
// number.
map<string, int32> device_count = 1;
On Linux you can run sudo dmidecode -t 4 | egrep -i "Designation|Intel|core|thread" to see how many CPUs/cores/threads you have, e.g. the following has 2 CPUs, each of them has 8 cores, each of them has 2 threads, which gives a total of 2*8*2=32 threads:
fra#s:~$ sudo dmidecode -t 4 | egrep -i "Designation|Intel|core|thread"
Socket Designation: CPU1
Manufacturer: Intel
HTT (Multi-threading)
Version: Intel(R) Xeon(R) CPU E5-2667 v4 # 3.20GHz
Core Count: 8
Core Enabled: 8
Thread Count: 16
Multi-Core
Hardware Thread
Socket Designation: CPU2
Manufacturer: Intel
HTT (Multi-threading)
Version: Intel(R) Xeon(R) CPU E5-2667 v4 # 3.20GHz
Core Count: 8
Core Enabled: 8
Thread Count: 16
Multi-Core
Hardware Thread
Tested with Tensorflow 0.12.1 and 1.0.0 with Ubuntu 14.04.5 LTS x64 and Ubuntu 16.04 LTS x64.
Yes it is possible by thread affinity. Thread affinity allows you to decide which specific thread to be executed by which specific core of the cpu. For thread affinity you can use "taskset" or "numatcl" on linux. You can also use https://man7.org/linux/man-pages/man2/sched_setaffinity.2.html and https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html
The following code will not instruct/direct Tensorflow to run only on one single core.
TensorFlow 1
session_conf = tf.ConfigProto(
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
sess = tf.Session(config=session_conf)
TensorFlow 2
import os
# reduce number of threads
os.environ['TF_NUM_INTEROP_THREADS'] = '1'
os.environ['TF_NUM_INTRAOP_THREADS'] = '1'
import tensorflow
This will generate in total at least N threads, where N is the number of cpu cores. Most of the time only one thread will be running while others are in sleeping mode.
Sources:
https://github.com/tensorflow/tensorflow/issues/42510
https://github.com/tensorflow/tensorflow/issues/33627
You can restrict the number of devices of a certain type that TensorFlow uses by passing the appropriate device_count in a ConfigProto as the config argument when creating your session. For instance, you can restrict the number of CPU devices as follows :
config = tf.ConfigProto(device_count={'CPU': 1})
sess = tf.Session(config=config)
with sess.as_default():
print(tf.constant(42).eval())
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
I usually don't use OS X because I run a linux partition. Been needing to access xcode so I've been using it again. It just shuts off all of a sudden and reboots. It seems like it does it more often with my printer plugged into USB. I'm running Mint and Blag on it, but I have never had any issues when running on those OSs even with the printer plugged in. Anyone make heads or tales of this stack?
Interval Since Last Panic Report: 17556 sec
Panics Since Last Report: 2
Anonymous UUID: 6DA1700F-1927-B6CB-7922-15F95FA2CC21
Wed Sep 11 18:42:32 2013
panic(cpu 0 caller 0xffffff80292ac90b): Releasing non-exclusive RW lock without a reader refcount!
Backtrace (CPU 0), Frame : Return Address
0xffffff8119993820 : 0xffffff802921d626
0xffffff8119993890 : 0xffffff80292ac90b
0xffffff81199938b0 : 0xffffff802931c09c
0xffffff8119993ae0 : 0xffffff8029308fab
0xffffff8119993b90 : 0xffffff80292fbb49
0xffffff8119993c40 : 0xffffff80292fc314
0xffffff8119993f50 : 0xffffff80295e182a
0xffffff8119993fb0 : 0xffffff80292ced33
BSD process name corresponding to current thread: installd
Mac OS version:
12C60
Kernel version:
Darwin Kernel Version 12.2.0: Sat Aug 25 00:48:52 PDT 2012; root:xnu- 2050.18.24~1/RELEASE_X86_64
Kernel UUID: 69A5853F-375A-3EF4-9247-478FD0247333
Kernel slide: 0x0000000029000000
Kernel text base: 0xffffff8029200000
System model name: Macmini5,1 (Mac-8ED6AF5B48C039E1)
System uptime in nanoseconds: 508482133713
last loaded kext at 31448390032: com.apple.driver.AppleHWSensor 1.9.5d0 (addr 0xffffff7faaa92000, size 36864)
last unloaded kext at 102515839551: com.apple.driver.AppleUSBUHCI 5.2.5 (addr 0xffffff7fa9b17000, size 65536)
loaded kexts:
com.apple.driver.AppleHWSensor 1.9.5d0
com.apple.driver.AudioAUUC 1.60
com.apple.iokit.IOBluetoothSerialManager 4.0.9f33
com.apple.driver.ApplePlatformEnabler 2.0.5d4
com.apple.driver.AGPM 100.12.69
com.apple.driver.AppleMikeyHIDDriver 122
com.apple.filesystems.autofs 3.0
com.apple.driver.AppleHDA 2.3.1f2
com.apple.driver.AppleMikeyDriver 2.3.1f2
com.apple.iokit.BroadcomBluetoothHCIControllerUSBTransport 4.0.9f33
com.apple.driver.AppleUpstreamUserClient 3.5.10
com.apple.driver.AppleMCCSControl 1.0.33
com.apple.driver.AppleSMCPDRC 1.0.0
com.apple.iokit.IOUserEthernet 1.0.0d1
com.apple.Dont_Steal_Mac_OS_X 7.0.0
com.apple.driver.ApplePolicyControl 3.2.11
com.apple.driver.ACPI_SMC_PlatformPlugin 1.0.0
com.apple.driver.AppleLPC 1.6.0
com.apple.driver.AppleIntelHD3000Graphics 8.0.0
com.apple.driver.AppleIntelSNBGraphicsFB 8.0.0
com.apple.driver.AppleIRController 320.15
com.apple.AppleFSCompression.AppleFSCompressionTypeDataless 1.0.0d1
com.apple.AppleFSCompression.AppleFSCompressionTypeZlib 1.0.0d1
com.apple.BootCache 34
com.apple.driver.XsanFilter 404
com.apple.iokit.IOAHCIBlockStorage 2.2.2
com.apple.driver.AppleUSBHub 5.2.5
com.apple.driver.AppleFWOHCI 4.9.6
com.apple.driver.AirPort.Brcm4331 602.15.22
com.apple.iokit.AppleBCM5701Ethernet 3.2.5b3
com.apple.driver.AppleUSBEHCI 5.4.0
com.apple.driver.AppleAHCIPort 2.4.1
com.apple.driver.AppleSDXC 1.2.2
com.apple.driver.AppleEFINVRAM 1.6.1
com.apple.driver.AppleACPIButtons 1.6
com.apple.driver.AppleRTC 1.5
com.apple.driver.AppleHPET 1.7
com.apple.driver.AppleSMBIOS 1.9
com.apple.driver.AppleACPIEC 1.6
com.apple.driver.AppleAPIC 1.6
com.apple.driver.AppleIntelCPUPowerManagementClient 196.0.0
com.apple.nke.applicationfirewall 4.0.39
com.apple.security.quarantine 2
com.apple.driver.AppleIntelCPUPowerManagement 196.0.0
com.apple.iokit.IOSerialFamily 10.0.6
com.apple.kext.triggers 1.0
com.apple.driver.DspFuncLib 2.3.1f2
com.apple.iokit.IOAudioFamily 1.8.9fc10
com.apple.kext.OSvKernDSPLib 1.6
com.apple.iokit.IOFireWireIP 2.2.5
com.apple.iokit.AppleBluetoothHCIControllerUSBTransport 4.0.9f33
com.apple.driver.AppleUSBMergeNub 5.2.5
com.apple.driver.AppleSMBusController 1.0.10d0
com.apple.iokit.IOSurface 86.0.3
com.apple.iokit.IOBluetoothFamily 4.0.9f33
com.apple.driver.AppleGraphicsControl 3.2.11
com.apple.driver.IOPlatformPluginLegacy 1.0.0
com.apple.driver.IOPlatformPluginFamily 5.2.0d16
com.apple.driver.AppleSMC 3.1.4d2
com.apple.driver.AppleSMBusPCI 1.0.10d0
com.apple.iokit.IONDRVSupport 2.3.5
com.apple.driver.AppleHDAController 2.3.1f2
com.apple.iokit.IOHDAFamily 2.3.1f2
com.apple.iokit.IOGraphicsFamily 2.3.5
com.apple.iokit.IOSCSIArchitectureModelFamily 3.5.1
com.apple.driver.AppleThunderboltDPOutAdapter 1.8.5
com.apple.driver.AppleThunderboltDPInAdapter 1.8.5
com.apple.driver.AppleThunderboltDPAdapterFamily 1.8.5
com.apple.driver.AppleThunderboltPCIDownAdapter 1.2.5
com.apple.iokit.IOUSBHIDDriver 5.2.5
com.apple.driver.AppleUSBComposite 5.2.5
com.apple.iokit.IOUSBUserClient 5.2.5
com.apple.driver.AppleThunderboltNHI 1.6.0
com.apple.iokit.IOThunderboltFamily 2.1.1
com.apple.iokit.IOFireWireFamily 4.5.5
com.apple.iokit.IO80211Family 500.15
com.apple.iokit.IOEthernetAVBController 1.0.2b1
com.apple.iokit.IONetworkingFamily 3.0
com.apple.iokit.IOAHCIFamily 2.2.1
com.apple.iokit.IOUSBFamily 5.4.0
com.apple.driver.AppleEFIRuntime 1.6.1
com.apple.iokit.IOHIDFamily 1.8.0
com.apple.iokit.IOSMBusFamily 1.1
com.apple.security.sandbox 220
com.apple.kext.AppleMatch 1.0.0d1
com.apple.security.TMSafetyNet 7
com.apple.driver.DiskImages 344
com.apple.iokit.IOStorageFamily 1.8
com.apple.driver.AppleKeyStore 28.21
com.apple.driver.AppleACPIPlatform 1.6
com.apple.iokit.IOPCIFamily 2.7.2
com.apple.iokit.IOACPIFamily 1.4
com.apple.kec.corecrypto 1.0
Model: Macmini5,1, BootROM MM51.0077.B10, 2 processors, Intel Core i5, 2.3 GHz, 8 GB, SMC 1.76f0
Graphics: Intel HD Graphics 3000, Intel HD Graphics 3000, Built-In, 512 MB
Memory Module: BANK 0/DIMM0, 4 GB, DDR3, 1333 MHz, 0x802C, 0x31364A54463531323634485A2D3147344D31
Memory Module: BANK 1/DIMM0, 4 GB, DDR3, 1333 MHz, 0x802C, 0x31364A54463531323634485A2D3147344D31
AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0xE4), Broadcom BCM43xx 1.0 (5.106.98.81.22)
Bluetooth: Version 4.0.9f33 10885, 2 service, 18 devices, 1 incoming serial ports
Network Service: Wi-Fi, AirPort, en1
Serial ATA Device: APPLE HDD HTS547550A9E384, 500.11 GB
USB Device: hub_device, 0x0424 (SMSC), 0x2513, 0xfd100000 / 2
USB Device: USB Receiver, 0x046d (Logitech Inc.), 0xc52b, 0xfd130000 / 4
USB Device: IR Receiver, apple_vendor_id, 0x8242, 0xfd110000 / 3
USB Device: hub_device, 0x0424 (SMSC), 0x2513, 0xfa100000 / 2
USB Device: HP LaserJet 3050, 0x03f0 (Hewlett Packard), 0x3217, 0xfa130000 / 4
USB Device: BRCM20702 Hub, 0x0a5c (Broadcom Corp.), 0x4500, 0xfa110000 / 3
USB Device: Bluetooth USB Host Controller, apple_vendor_id, 0x8281, 0xfa113000 / 7
Updated to 10.8.4 and problem solved. This solution came from Apple Support, apparently 10.8.2 commonly has this issue. Although I dislike forced updates, this solution worked.
Newbie here. I have a big finite analysis code that needs to be run with high performance computing. People keep telling me Intel compiler usually gives better speed (I used to use gcc before). And I found that it is true on our Intel clusters. But recently we have a new AMD cluster. I am confused about how to use the compiling options of icpc to optimize the program.
Basically, I have two questions:
Questions 1
Here is the cluster with AMD chips:
processor : 63
vendor_id : AuthenticAMD
cpu family : 21
model : 2
model name : AMD Opteron(tm) Processor 6378
stepping : 0
cpu MHz : 2399.837
cache size : 2048 KB
physical id : 2
siblings : 16
core id : 7
cpu cores : 8
apicid : 79
initial apicid : 79
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr tbm topoext perfctr_core cpb npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bogomips : 4799.73
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate [9] [10]
When I compile a small code using icpc hello.cpp -O3 -xP, I do not know exactly what options should I use? I found the errors are:
$ /usr/bin/time -p ./a.out
Fatal Error: This program was not built to run on the processor in your system.
The allowed processors are: Intel(R) Pentium(R) 4 and compatible Intel processors with Intel(R) Streaming SIMD Extensions 3 (Intel(R) SSE3) instruction support.
real 0.00
user 0.00
sys 0.00
Question-2
If I want the binaries be used for both Intel chips cluster and AMD chips cluster, should I use different options to compile the code?
Intel compilers don't always work with AMD chips especially with certain flags like -xP (now -xSSE3, see here). Specifically -xSSE3/-xP tells the compiler: May generate Intel® SSE3, SSE2, and SSE instructions for Intel® processors. Optimizes for Intel processors that support Intel® SSE3 instructions. For OS X* systems, this value is only supported on IA-32 architecture. This replaces value P, which is deprecated.
That document also has this quote: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
You can try to optimize with icc and icpc, but I'm not sure it will work on AMD chips. For compilers other than gcc you could look at clang, PGI, or Cray compilers (if you are on a Cray system).
If you're trying to create binaries for both architectures, I'm not sure you'll be able to do heavy optimizations due to differences in cache line size and other architecture specific settings.