How can I check my CPI calculation is correct? - clock

I got a CPI of 20.15 on the exercise below, but this value seems too high to me. How can I check if this is correct?
A processor with 7.5 GHz clock frequency runs a program with 8000 millions instructions (8*10^9) in 21.5 seconds.
What's the average CPI assuming that the program above is a representation of the average kinds of programs that runs in this computer?
I've tried it in many different way but I keep getting a CPI of 20.15. Is this correct?

Instructions per second = (8000 ÷ 21.5) million
Clock frequency = 7.5 GHz
So, CPI = (7.5 × 1000) ÷ (8000 ÷ 21.5) = 20.15.
So, yes 20.15 is correct.

Related

Calculating total number of program instructions?

So if a program take 5.7 seconds to execute on a processor with the clock frequency of 1.8 GHz where each instruction take 7 clock cycles, what's the total amount of instructions of the program?
I though I could calculate it like this:
Total number of clock cycles = 5.7 seconds * 1.8 GHz = 10,260,000,000 cycles.
Then divide total number of cycles with number of cycles per instructions: 10,260,000,000 / 7 we get 1,466,428,571
But apparently this is wrong? It's part of a quiz and I got this question wrong, wonder why that is?

GTX 970 bandwidth calculation

I am trying to calculate the theoretical bandwidth of gtx970. As per the specs given in:-
http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-970/specifications
Memory clock is 7Gb/s
Memory bus width = 256
Bandwidth = 7*256*2/8 (*2 because it is a DDR)
= 448 GB/s
However, in the specs it is given as 224GB/s
Why is there a factor 2 difference? Am i making a mistake, if so please correct me.
Thanks
The 7 Gbps seems to be the effective clock, i.e. including the data rate. Also note that the field explanation for this Wikipedia list says that "All DDR/GDDR memories operate at half this frequency, except for GDDR5, which operates at one quarter of this frequency", which suggests that all GDDR5 chips are in fact quad data rate, despite the DDR abbreviation.
Finally, let me point out this note from Wikipedia, which disqualifies the trivial effective clock * bus width formula:
For accessing its memory, the GTX 970 stripes data across 7 of its 8 32-bit physical memory lanes, at 196 GB/s. The last 1/8 of its memory (0.5 GiB on a 4 GiB card) is accessed on a non-interleaved solitary 32-bit connection at 28 GB/s, one seventh the speed of the rest of the memory space. Because this smaller memory pool uses the same connection as the 7th lane to the larger main pool, it contends with accesses to the larger block reducing the effective memory bandwidth not adding to it as an independent connection could.
The clock rate reported is an "effective" clock rate and already takes into account the transfer on both rising and falling edges. The trouble is the factor of 2 for DDR.
Some discussion on devtalk here: https://devtalk.nvidia.com/default/topic/995384/theoretical-bandwidth-vs-effective-bandwidth/
In fact, your format is correct, but the memory clock is wrong. GeForce GTX 970's memory clock is 1753MHz(refers to https://www.techpowerup.com/gpu-specs/geforce-gtx-970.c2620).

Is there a relation between single and double precision in NVIDIA Tesla?

In the model Tesla K20 the peak single-precision floating point performance is about 3.52 TFlops but the double-precision is 1.17 TFlops,so the ratio is 3. The Tesla K20X has 3.95 and 1.31, and Tesla K40 has 4.29 and 1.43 TFlops, the ratio seems to repeat. My question is if there is a reason to the ratio be 3 and not 2, that seems logical to me because the difference between single and double precision. I am learning about GPUS and GPGPUS, so i don't know very much about it.
In the second page of this pdf there is a specs table.
NVIDIA-Tesla-Kepler-Family-Datasheet.pdf
The models you listed are all based on Kepler architecture, which has peak double precision rate equal to 1/3 of peak single precision rate. This is the way NVIDIA has built this piece of hardware. For comparison, Fermi, which is the previous hardware generation, had the ratio of 1/2 between peak double and single precision rate.
You may refer to NVIDIA documentation for instruction throughput, by instruction type and hardware generation:
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#maximize-instruction-throughput
You will notice that consumer-grade products (GeForce GTX) typically have much lower double-to-single precision rate - 1/8, 1/12, 1/24, and even 1/32, depending on hardware version.

I/O Disk Drive Calculations

So I am studying for an up and coming exam, one of the questions involves calculating various disk drive properties. I have spent a fair while researching sample questions and formula but because I'm a bit unsure on what I have come up with I was wondering could you possibly help confirm my formulas / answers?
Information Provided:
Rotation Speed = 6000 RPM
Surfaces = 6
Sector Size = 512 bytes
Sectors / Track = 500 (average)
Tracks / Surfaces = 1,000
Average Seek Time = 8ms
One Track Seek Time = 0.4 ms
Maximum Seek Time = 10ms
Questions:
Calculate the following
(i) The capacity of the disk
(ii) The maximum transfer rate for a single track
(iii) Calculate the amount of cylinder skew needed (in sectors)
(iv) The Maximum transfer rate (in bytes) across cylinders (with cylinder skew)
My Answers:
(i) Sector Size x Sectors per Track x Tracks per Surface x No. of surfaces
512 x 500 x 1000 x 6 = 1,536,000,000 bytes
(ii) Sectors per Track x Sector Size x Rotation Speed per sec
500 x 512 x (6000/60) = 25,600,000 bytes per sec
(iii) (Track to Track seek time / Time for 1 Rotation) x Sectors per Track + 4
(0.4 / 0.1) x 500 + 4 = 24
(iv) Really unsure about this one to be honest, any tips or help would be much appreciated.
I fairly sure a similar question will appear on my paper so it really would be a great help if any of you guys could confirm my formulas and derived answers for this sample question. Also if anyone could provide a bit of help on that last question it would be great.
Thanks.
(iv) The Maximum transfer rate (in bytes) across cylinders (with cylinder skew)
500 s/t (1 rpm = 500 sectors) x 512 bytes/sector x 6 (reading across all 6 heads maximum)
1 rotation yields 1536000 bytes across 6 heads
you are doing 6000 rpm so that is 6000/60 or 100 rotations per second
so, 153,600,000 bytes per second (divide by 1 million is 153.6 megabytes per second)
takes 1/100th of a second or 10ms to read in a track
then you need a .4ms shift of the heads to then read the next track.
10.0/10.4 gives you a 96.2 percent effective read rate moving the heads perfectly.
you would be able to read at 96% of the 153.6 or 147.5 Mb/s optimally after the first seek.
where 1 Mb = 1,000,000 bytes

Calculate size in Hex Bytes

what is the proper way to calculate the size in hex bytes of a code segment. I am given:
IP = 0848 CS = 1488 DS = 1808 SS = 1C80 ES = 1F88
The practice exercise I am working on asks what is the size (in hex bytes) of the code segment and gives these choices:
A. 3800 B. 1488 C. 0830 D. 0380 E. none of the above
The correct answer is A. 3800, but I haven't a clue as to how to calculate this.
How to calculate the length:
Note CS. Find the segment register that's nearest to it, but greater.
Take the difference between the two, and multiply by 0x10 (read: tack on a 0).
In your example, DS is closest. 1808 - 1488 == 380. And 380 x 10 = 3800.
BTW, this only works on the 8086 and other, similarly boneheaded CPUs, and in real mode on x86. In protected mode on x86 (which is to say, unless you're writing a boot sector or a simple DOS program), the value of the segment register has very little to do with the size of the segment, and thus the stuff above simply doesn't apply.