Any help figuring out how to do this would be great: How much time each CPU core spent in the C0 power state over the past second.
This is for a mac app so Objective-C, cocoa and c are needed.
OS X doesn't have any APIs that expose the c-state of the CPU. However, it seems like you can do this using the MWAIT/MONITOR instructions on intel CPUs. Intel mentions that you can track C-state residency using this technique in section 14.4 of the reference manual:
Software should use CPUID to discover if a target processor supports the enumeration of MWAIT extensions. If CPUID.05H.ECX[Bit 0] = 1, the target processor supports MWAIT extensions and their enumeration (see Chapter 3, “Instruction Set Reference, A-M,” of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A).
If CPUID.05H.ECX[Bit 1] = 1, the target processor supports using interrupts as break-events for MWAIT, even when interrupts are disabled. Use this feature to measure C-state residency as follows:
Software can write to bit 0 in the MWAIT Extensions register (ECX) when issuing an MWAIT to enter into a processor-specific C-state or sub C-state.
When a processor comes out of an inactive C-state or sub C-state, software can read a timestamp before an interrupt service routine (ISR) is potentially executed.
You can find more info about the MWAIT instruction in the same manual. Good luck!
Intel's Reference Manual
For getting C0 percentage you should do the following:
Read following MSRs at start and end points of the period you measure:
0x3FC (core c3), 0x3FD (core c6), 0x3FE (core c7), 0x10 (tsc)
Then do the following calculation:
Cx_ticks = (c3_after - c3_before) + (c6_after - c6_before) + (c7_after - c7_before)
total_ticks = (tsc_after - tsc_before)
Cx_percentage = Cx_ticks/total_ticks
C0_percentage = 100% - Cx_percentage
You can find more information in this document (go to Vol. 3C 35-95)
Related
I am creating a Primitive Virtual Machine which is kind of inspired by LC-3 VMs but a 32-bit version. I am feeding the machine set of instructions. After executing the first instruction, how will the PC know the location of the second instruction.
Is there a particular method to store the instructions in memory in a systematic way so that PC knows the address of the next instruction
Example - All instructions are stored in a linear way as in memory[0] = instruction1, memory[1] = instruction2 etc.
Thank you for the help.
It depends on whether your Processor architecture is RISC or CISC. In the context you asked, A CISC processor has instructions whose size vary, say from 1 to 14 bytes, like for Intel processors. If it is RISC, each instruction size is fixed, say 4 byte, like for ARM processors. All the instruction of a program are stored, in sequence, in main memory. It is the processor control unit that decides how much to increment the PC. Instructions from the main memory would be read in sequence.
So say in CISC architecture, a single 8 byte read from main memory, can contain up to 8 '1 byte' instructions, e.g., repetitive 'inc ax' instruction in Intel processors. After sending the first instruction for decode, the control unit will increment PC by 1. But, at other extreme, there could be a instruction like 'add REG , [BASE+INDEX+OFFSET]' , which can take 13 bytes to store all the information (opcode + REG id + base address + index + some offset) that is there in the instruction. For such instruction, two memory read operation would be required to fetch the full instruction. After sending it for decode, the control unit will increment the PC by 13.
For RISC it is simple. Increment PC by size (2,4,...) of instructions.
Only exception is when there is branch. In that case, PC value is reset at usually the execute stage.
Instructions and data are generally grouped (segmented in some processor architecture) and stored separately. A code segment will end with some kind of return or exit instruction. If PC is set to some memory address where data is stored, the control unit of the processor will process it as instruction. After all both data and instructions are nothing but a sequence of bits! The control unit will not be able to differentiate. It is usually the role of OS or programmer (if there is no OS, like on micro-controllers) to prevent such anomaly.
The STM32F3DISCOVERY board's data brief indicates it features:
STM32F303VCT6 microcontroller featuring 256‑Kbyte Flash memory and 48‑Kbyte RAM in an LQFP100 package
However, the reference manual (RM0316) for STM32F303x6 et al. indicates only 16 Kbytes of SRAM (section 3.3) and 64 Kbytes of Flash memory (section 4.1) for STM32F303x6. The 256 and 48 Kbyte values match up with the STM32F303xB/C, which is also what is linked to on the board's data brief under Table 1's "Target STM32", even though it says "STM32F303VCT6".
I don't understand why there appears to be a discrepancy. Am I missing or misunderstanding something?
Latest version on the st website seems right: Reference Manual.
For the STM32F303VC:
STM32F303xB/C and STM32F358xC devices feature up to 48 Kbytes of static SRAM.
For the STM32F303x6:
STM32F303x6/8 and STM32F328x8 devices feature the same memory but only up to 16Kbytes of static SRAM
Maybe you have a old copy with a typo?
My problem was that I did not realize x was a single-letter substitution. I thought that in STM32F303x6, x replaced VCT. It does not. STM32F303VCT6 matches to STM32F303xC (x replaces V), and the T6 is a specific product in this line of chips.
In an attempt to understand C memory alignment or whatever the term is (data structure alignment?), i'm trying to write code that results in a alignment error. The original reason that brought me to learning about this is that i'm writing data parsing code that reads binary data received over the network. The data contains some uint32s, uint64s, floats, and doubles, and i'd like to make sure they are never corrupted due to errors in my parsing code.
An unsuccessful attempt at causing some problem due to misalignment:
uint32_t integer = 1027;
uint8_t * pointer = (uint8_t *)&integer;
uint8_t * bytes = malloc(5);
bytes[0] = 23; // extra byte to misalign uint32_t data
bytes[1] = pointer[0];
bytes[2] = pointer[1];
bytes[3] = pointer[2];
bytes[4] = pointer[3];
uint32_t integer2 = *(uint32_t *)(bytes + 1);
printf("integer: %u\ninteger2: %u\n", integer, integer2);
On my machine both integers print out the same. (macbook pro with Intel 64 bit processor, not sure what exactly determines alignment behaviour, is it the architecture? or exact CPU model? or compiler maybe? i use Xcode so clang)
I guess my processor/machine/setup supports unaligned reads so it takes the above code without any problems.
What would a case where parsing of say an uint32_t would fail because of code not taking alignment in account? Is there a way to make it fail on a modern Intel 64 bit system? Or am i safe from alignment errors when using simple datatypes like integers and floats (no structs)?
Edit: If anyone's reading this later, i found a similar question with interesting info: Mis-aligned pointers on x86
Normally, the x86 architecture doesn't have alignment requirements [except for some SIMD instructions like movdqa).
However, since you're trying to write code to cause such an exception ...
There is an alignment check exception bit that can be set into the x86 flags register. If you turn in on, an unaligned access will generate an exception which will show up [under linux at least] as a bus error (i.e. SIGBUS)
See my answer here: any way to stop unaligned access from c++ standard library on x86_64? for details and some sample programs to generate an exception.
I have a Cyber Robot CYBER 310 and a Sciento CS-113 robotic arm with no documentation. Both use a parallel port.
How could I program those?
For the Cyber one, I found this:
Nothing at all on the Sciento one.
Any pointers or examples in Python/Java/C/whatever appreciated.
[update] This page contains some information, but I'm still lost: http://www.anf.nildram.co.uk/beebcontrol/arms/cyber/software.html
I am not entirely sure I understand what the question is.
Are you unfamiliar with with programming the parallel port?
My memory on it is hazy, but iirc it's pretty simple. It's a "dumb" interface so you simply need to write to it.
If you are running under linux then there are some great resources on it:
Linux Device Drivers: Chapter 9: An Overview of the Parallel port - Talks a bit about parallel port programming and goes on to talk about writing device drivers for it. A bit overkill I think for your application, but the entire book is fascinating, and enlightening.
Linux I/O port programming - essentially you can write to /dev/port, or include asm/io.h and use inb() and outb() (I haven't done this in a while, but im sure if you run into a specific problem there will be a multitude of answers out there once you have it narrowed down to something specific)
If you are on windows or mac, then id still suggest reading the above so you know what you are trying to do, they are straightforward in my opinion, then search for the windows/mac equivalent.
Now for what I assume the crux of the question is, what do you write to the ports?
For the Cyber 310 you have the pin layouts, although there seems to be multiple different pin layouts if you browse the site you have listed, and if we follow anf.nildram.co.uk here we can find some PIC assembly that will show us how to rotate the base.
I have never touched PIC assembly before today, but with some help from the internet and the comments, I think we can translate what this is trying to do (snipped out the relevant portion, as most of it is timing and looping )
; 6: Symbol prf = PORTA.0
; The address of 'prf' is 0x5,0
; 7: Symbol strobe = PORTA.1
; The address of 'strobe' is 0x5,1
; 8: Symbol base = PORTB.0
; The address of 'base' is 0x6,0
; 9: Symbol shoulder = PORTB.1
; The address of 'shoulder' is 0x6,1
...
; 16: main:
L0001:
; 17: base = 1
BSF 0x06,0 // set bit 0 at 0x06 to 1 essentially set base bit to 1
; 18: strobe = 1
BSF 0x05,1 // set strobe bit to 1
; 19: strobe = 0
BCF 0x05,1 // set strobe bit to 0
; 20: While a <> 730 // now we loop 729 more times
So it appears, from my naive perspective, that to rotate the arm you need to set the motor bits (grabbed from your pinout) then set and clear strobe.
Let me know if I am completely off base, this is a fascinating project.
Chris is right about the parallel port being a dumb interface. The parallel port has an address that you can output an 8bit binary number to that match the Digital Output's positions.
I found this to be a really good example of programming the Parallel port using C#.
http://www.codeproject.com/Articles/4981/I-O-Ports-Uncensored-1-Controlling-LEDs-Light-Emit
To match your project to his example. C0 is strobe. Then your Digital Outputs from left to right match his D0-D6.
Seems like a really fun project. Have fun.
i'm trying to write a routine that will logically bitshift by n positions to the right all elements of a vector in the most efficient way possible for the following vector types: BYTE->BYTE, WORD->WORD, DWORD->DWORD and WORD->BYTE (assuming that only 8 bits are present in the result). I would like to have three routines for each type depending on the type of processor (SSE2 supported, only MMX suppported, only standard instruction se supported). Therefore i need 12 functions in total.
I have already found by myself how to backup and restore the registers that i need, how to make a loop, how to copy data into regular registers or MMX registers and how to shift by 1 position logically.
Because i'm not familiar with assembly language that's about it.
Which registers should i use for each instruction set?
How will the availability of the large vector (an image) in L1 cache be optimized?
How do i find the next element of the vector (a pointer kind of thing), i know i can make a mov by address and i assume i have to increment the address by 1, 2 or 4 depending on my type of data?
Although i have all the ideas, writing the code is a bit difficult at this point.
Thank you.
Arnaud.
Edit:
Here is what i'm trying to do for MMX for a shift by 1 on a DWORD:
__asm("push mm"); // backup register
__asm("push cx"); // backup register
__asm("mov %cx, length"); // initialize loop
__asm("loopstart_shift1:"); // start label
__asm("movd %xmm0, r/m32"); // get 32 bits data
__asm("psrlq %xmm0, 1"); // right shift 32 bits data logically (stuffs 0 on the left) by 1
__asm("mov r/m32,%xmm0"); // set 32 bits data
__asm("dec %cx"); // decrement index
__asm("cmp %cx,0");
__asm("jnz loopstart_shift1");
__asm("pop cx"); // restore register
__asm("pop mm"); // restore register
__asm("emms"); // leave MMX state
I strongly suggest you pause and take a look at using intrinsics with C or C++ instead of trying to write raw asm - that way the C/C++ compiler will take care of all the register allocation, instruction scheduling and general housekeeping tasks and you can just focus on the important parts, e.g. instead of using psrlq see _m_psrlq in mmintrin.h. (Better yet, look at using 128 bit SSE intrinsics.)
Sounds like you'd benefit from either using or looking into BitMagic's source. its entirely intrinsics based too, which makes its far more portable (though from the looks of it your using GCC, so it might have to get an MSVC to GCC intrinics mapping).