Microcontroller to interface with Led Driver - embedded

My knowledge of Micro-controllers is fairly limited at this point, but here goes.
I have an Led Driver PT6959 which I'm trying to interface with. Data is read serially by the Driver IC on the input CLK rising edge once the STB input line goes low.
My question is, how do I know what the input CLK frequency should be?
Does it matter? Or should it be the same as the Led Driver OSC Pin frequency?
I've read the datasheet but can't find any reference to specifying an input CLK frequency.

If your microcontroller has a SPI port, connect as follows:
DIN <-- SPI-MOSI
CLK <-- SPI-CLK
STB <-- CS (often just a GPIO rather than a dedicated SPI chipselect)
The SPI peripheral will then do most of the work for you. Most SPI peripherals allow different combinations of polarity and phase known as modes:
Mode CPOL CPHA
0 0 0
1 0 1
2 1 0
3 1 1
The PT6959 operates in mode 3.
The clock rate is probably not be critical. If you are bit-banging it rather than using SPI, it need not even be periodic or fixed - it is the state of DIN on rising and falling edges that is critical - not the frequency. The device will have some maximum rate - the data sheet specifies this in terms of minimum mark/space widths of >=400ns, assuming a 50% mark:space, that would correspond to 1.25MHz, but there is little benefit in operating at the maximum speed.

I have found it finally here the bigger datasheet fourteen (14) pages, not three.
So the time constraints for this signal as below,
PW CLK (Clock Pulse Width) ≥ 400ns
t setup (Data Setup Time) ≥ 100ns
t CLK-STB (Clock - Strobe Time) ≥ 1μs
t TZH (Rise Time) ≤ 1μs
t TZL < 1μs
V1.7
PW STB (Strobe Pulse Width) ≥ 1μs
t hold (Data Hold Time) ≥ 100ns
t THZ (Fall Time) ≤ 10μ
fosc=Oscillation Frequency
t TLZ < 10μs
As you can see the minimum clock pulse can be as 400ns which means the maximum clock frequency can found as 1/(2x400x10-9) = 1250000Hz (1.25Mhz)
Other calculations you can do the same way. But, yes, it is everything needed better covered at time-diagrams, which are given in the document above. I place them here just in case the link can die one day.

Related

Long latency instruction

I would like a long-latency single-uop x861 instruction, in order to create long dependency chains as part of testing microarchitectural features.
Currently I'm using fsqrt, but I'm wondering is there is something better.
Ideally, the instruction will score well on the following criteria:
Long latency
Stable/fixed latency
One or a few uops (especially: not microcoded)
Consumes as few uarch resources as possible (load/store buffers, page walkers, etc)
Able to chain (latency-wise) with itself
Able to chain input and out with GP registers
Doesn't interfere with normal OoO execution (beyond whatever ROB, RS, etc, resources it consumes)
So fsqrt is OK in most senses, but the latency isn't that long and it seems hard to chain with GP regs.
1 On modern Intel x86 in particular, with bonus points if it also works well on AMD Zen*.
Mainstream Intel CPUs don't have any very long latency single-uop integer instructions. There are integer ALUs for 1-cycle latency uops on all ALU ports, and a 3-cycle-latency pipelined ALU on port 1. I think AMD is similar.
The div/sqrt unit is the only truly high-latency ALU, but integer div/idiv are microcoded on Intel so yes, use FP where div/sqrt are typically single-uop instructions.
AMD's integer div / idiv are 2-uop instructions (presumably to write the 2 outputs), with data-dependent latency.
Also, AMD Bulldozer/Piledriver (where 2 integer cores share a SIMD/FP unit) has pretty high latency for movd xmm, r32 (10c 2 uops) and movd r32, xmm (8c 1 uop). Steamroller shortens that by 1c each. Ryzen has 3-cycle 1 uop in either direction.
movd to/from XMM regs is cheap on Intel: single-uop with 1-cycle (Broadwell and earlier) or 2-cycle latency (Skylake). (https://agner.org/optimize/)
sqrtss has fixed latency (on IvB and later), other than maybe with subnormal inputs. If your chain-with-integer involves just movd xmm, r32 of an arbitrary integer bit-pattern, you might want to set DAZ/FTZ to remove the possibility of FP assists. NaN inputs are fine; that doesn't cause a slowdown for SSE/AVX math, only x87.
Other CPUs (Sandybridge and earlier, and all AMD) have variable-latency sqrtss so you probably want to control the starting bit-pattern there.
Same goes if you want to use sqrtsd for higher latency per uop than sqrtss. It's still variable latency even on Skylake. (15-16 cycles).
You can assume that the latency is a pure function of the input bit-pattern, so starting a chain of sqrtss instructions with the same input every time will give the same sequence of latencies. Or with a starting input of 0.0, 1.0, +inf, or NaN, you'll get the same latency for every uop in the sequence.
(Simple inputs like 1.0 and 0.0 (few significant figures in the input and output) presumably run with the lowest latency. sqrt(1.0) = 1.0 and sqrt(0) = 0, so these are self-perpetuating. Same for sqrt(NaN) = NaN)
You might use and reg, 0 or other non-dep-breaking zeroing as part of your chain to control the input bit-pattern. Or perhaps or reg, -1 to create NaN. Then you can get fixed latency on Sandybridge or earlier, and on AMD including Zen.
Or perhaps pinsrw xmm0, eax, 7 (2 uops for port 5 on Intel) to only modify the high qword of an XMM, leaving the bottom as known 0.0 or 1.0. Probably cheaper to just and with 0 and use movd, unless port-5 pressure is a non-issue.
To create a throughput bottleneck (not latency), your best bet on Skylake is vsqrtpd ymm - 1 uop for p0, latency = 15-16, throughput = 9-12.
On Broadwell and earlier, it was 3 uops (2p0 p15), but Skylake I think widened the SIMD divider (in preparation for AVX512 I guess).
vsqrtss might be somewhat better than fsqrt since it at least satisfies relatively easy chaining with GP registers (since GP <-> vector is just a movd away).

Does multiply by 1.0 take less time than usual multiplications

Does x86(64 too) processor optimise away the multiplication if one of the operands of multiplication happens to be 1.0?
PS:I do not mean compiler optimising a constant multiplication of 1.0.
That's not something I've seen mentioned in docs about instruction latencies or microarchitectures of Intel or AMD CPUs.
I suspect it doesn't happen, because variable latency would interfere with pipelined execution units. (multiple results coming out of the same execution unit in the same clock cycle = extra complexity). Also, there are probably other bits of logic (uop scheduling / queueing, result forwarding networks) that are designed around every uop having known latency. (except for special cases like division / sqrt).
IIRC, one analyst, maybe Agner Fog or David Kanter, suggested that some uops might have been possible to implement with 2 cycle latency, but instead take 3 cycles to match the other uops that their execution port can handle. So constant latency for operations appears to be a big deal for Intel CPU designs, to the extent that it was worth making an operation slower.
Note that we're only talking about latency here. If your multiply isn't part of a loop-carried dependency chain, or you have enough independent multiplies, you can keep the multiplier(s) going with one operation per clock.
Haswell CPUs can sustain a throughput of 2 FP vector multiplies per clock. (256b vectors of 4 doubles or 8 floats). Latency = 5 clock cycles for the result to be ready, regardless of input. Or 1 vector integer multiply per clock. (The vector multiply ALU is on port 0. The vector FP multipliers are on port 0 and port 1).
Avoid multiplying when you can, it leads to long dependency chains. (Usually this comes up for integer multiplies to calculate loop indices. Compilers do a lot better when you write your loop to increment the counter by 16, instead of multiplying i++ by 16 as an array index.)

How to achive higher PWM frequency?

I am using the libraries provided by C18 compiler to open and set the duty cycle for PWM usage. I noticed that the max PWM frequency I can get with 100% duty-cycle is about 13.5 KHz. The lower the duty-cycle the higher the PWM frequency. How can I achieve a higher PWM frequency with still 100% duty-cycle? Is it possible to at least get more than 13.5 KHz? I just can't figure out what I missing, maybe someone can help here, and I am using PIC18F87J1.
Here is the C18 C Compiler Libraries
Here is PIC18F87J1 datasheet
Here is a snippet of the code I am using regarding PWM.
TRISCbits.TRISC1 = 0;
OpenTimer2(TIMER_INT_OFF & T2_PS_1_1 & T2_POST_1_1);
OpenPWM2(0x03ff);
SetDCPWM2(255);
Your help is appreciated, thanks!
For a start, you have the parameters to the functions reversed. Open() takes a char value less than 256, and Set() takes a 10-bit number.
That said, you have chosen the largest value (255) which gives the lowest frequency. As the datasheet explains, the Open() function takes a value for the period as the parameter. A higher frequency is equivalent to a shorter period, and vice versa.
Lastly, why would you want a duty cycle of 100%? That is the same as having the pin always high. In that case, frequency doesn't matter at all. Just turn the pin on and don't use PWM at all.
You haven't said what you are driving with this PWM, but generally speaking, having the frequency too high can cause problems. It can produce radio interference, overheat, and so on.
Your question indicates you misunderstand the purpose of PWM and what the terms refers to, so here is a tl;dr.
PWM simulates a voltage between 0 and Vcc by rapidly turning a pin high and low. The simulated voltage is proportional to the time_high/(time_high + time_low). The percent of time the pin is at Vcc is called the duty cycle. (So 100% duty is always on, giving Vcc volts. 0% duty is always off, giving 0 V.)
The rate at which this on/off cycle repeats is called the PWM frequency. If the frequency is too small (the period too long) the load will see the pin voltage fluctuating. The goal is to run the PWM fast enough to smooth out the voltage the load sees, but not so fast as to cause other problems. The available frequencies are appropriate for most applications. Also note that setting the frequency high (period small) will also reduce the accuracy of the duty cycle. This is explained in the datasheet. The reason is basically that the duty cycle must ultimately be converted to clock ticks on versus clock ticks off. The faster the frequency, the fewer ways to divide the clock ticks in each cycle.

How to get UART to work in PIC32 with correct clock frequency and baud rate?

I am working on UART with pic32mx5xx. All I need is to send a message from pic to terminal (Putty), but it is not working as I would get invalid characters appearing. The baud rate is set to 19200, how do I calculate the clock frequency?
Is it true that the clock frequency of the UART is 16 times the baud rate. If I do the math the clock frequency should be 307200, but this is doesn't seem right.
Can someone help me understand how baud rate and clock frequency relate to each other ? Also how to calculate both?
Thanks!
The baud rate generator has a free-running 16-bit timer. To get the desired baud rate, you must configure its period register UxBRG and prescaler BRGH.
When BRGH is set to 0 (default), the timer is incremented every 16th cycle of peripheral bus clock.
When BRGH is 1, the timer increments every 4th cycle.
It is usually better to set BRGH to 1 to get a smaller baud rate error, as long as the UxBRG value doesn't grow too large to fit into the 16-bit register (on slower baud rates).
The value in the period register UxBRG determines the duration of one pulse on the data line in baud rate generator's timer increments.
See the formulas in section 21.3 - UART Baud Rate Generator in the reference manual to learn how to calculate a proper value for UxBRG.
To compute the period of the 16-bit baud rate generator timer to achieve the desired baud rate:
When BRGH = 0:
UxBRG = FPB / (16 * BAUDRATE) - 1
When BRGH = 1:
UxBRG = FPB / (4 * BAUDRATE) - 1
Where FPB is the peripheral bus clock frequency.
For example, if FPB = 20 MHz and BRGH = 1 and the desired baud rate 19200, you would calculate:
UxBRG = 20000000 / (4 * 19200) - 1
= 259
If you are using some of the latest development libraries and code examples from Microchip you may find that there are already UART methods in the libraries that will set up the PIC for your needs. If you dig deep into the new compiler directory structures you will find help files in the microsoft format (no fear, if you are on a Unix type computer there are Unix utilities that read these types of files.). There you can drill down into the help to find the documentation of various ready made methods you can call from your program to configure the PIC's hardware. Buyer Beware, the code is not that mature. For instance I was working on a PIC project that needed to sample two analog signals. The PIC hardware A/D converter was very complex. But it was clear the ready made code only covered about 10% of that PIC's abilities.
-good luck

Calculating the maximum physical rate (Nyquist performance limitation) of an ADC onboard a microcontroller

I'm trying to evaluate the maximum physical rate (Nyquist performance limit) of the A/Ds integrated on board various PIC microcontrollers.
However, to do the calculation requires parameters that I'm not finding explicitly stated in the datasheets, specifically Tacq, Fosc, TAD, and divisor parameters.
I've proceeded by making some assumptions but would be helpful to have a sanity check -- am I doing the maximum physical rate calculations correctly?
For illustration purposes only, I've taken the simplest possible PIC10F220 that has an ADC. This is to focus specifically on the interpretation of Tacq, Fosc, TAD, and divisor parameters, and not to suggest that any practical functionality could be implemented on this very basic chip. (This is to Clifford's points in the comments below.)
Calculation:
Nyquist Performance Analysis of PIC10F220
- Runs at clock speed of 8MHz.
- Has an instruction cycle of 0.5us [4 clock steps per instruction]
So:
- Get Tacq = 6.06 us [acquisition time for ADC, assuming chip temp. = 50*C]
[from datasheet p34]
- Set Fosc := 8MHz [? should this be internal clock speed ?]
- Set divisor := 4 [? assuming this is 4 from 4 clock steps per CPU instruction ?]
- This gives TAD = 0.5us [TAD = 1/(Fosc/divisor) ]
- Get conversion time is 13*TAD [from datasheet p31]
- This gives conversion time 6.5 us
- So ADC duration is 12.56 us [? Tacq + 13*TAD]
Assuming 10 instructions for a simple load/store/threshold done in real-time before the next sample (this is just a stub -- the point is the rest of the calculation):
- This adds another 5 us [0.5 us per instruction]
- To give total ADC and handling time of 17.56 us [ 12.56us + 1us + 4us ]
- before the sampling loop repeats [? Again Tacq ? + 13*TAD + handling ]
- If this is correct, then the max sampling rate is 56.9 ksps [ 1/ total time ]
- So the Nyquist frequency for this sampling rate is 28 kHz. [1/2 sampling rate]
Which means the (theoretical) performance of this system --- chip's A/D with the hypothetical real-time handling code --- is for signals that are bandlimited to 28 kHz.
Is this a correct assignment / interpretation of the data sheet in obtaining Tacq, Fosc, TAD, and divisor parameters and using them to obtain the maximum physical rate, or Nyquist performance limit, of this chip?
Thanks,
You're not going to be able to do much processing in 8 instructions, but assuming you're just doing something simple like storing the incoming samples to a buffer, or detecting a threshold, then your analysis looks good.
The actual chips I'm considering for the design are the dsPIC33FJ128MC804 (with 16b A/D) or dsPIC30F3014 (with 12b A/D).
That is an important distinction; the dsPIC ADC supports ping-pong DMA transfers of multiple channels simultaneously, so can minimise the effective software overhead per sample. That makes the calculation a somewhat different one. You need to determine from the sample rate and the DMA buffer size the time between sample buffer interrupts; that is how much processing time you have to deal with each buffer. If you are using Microchip's DSP library, it gives precise cycle time formulae for each algorithm, and block processing is considerably more efficient that sample-by-sample processing.
My last project was on a dsPIC33 with two channels sampled at 48KHz and 32word sample buffers (giving 667us to process each pair of buffers). The software processing was therefore entirely independent of the sampling since by using DMA they take place simultaneously.