What does TDO on 4th bit in ICSP SendCommand header mean? (PIC32MX, ICSP 2-wire 4-phase) - embedded

Right now I'm trying to implement the flash programming specification for PIC32MX. I'm working with a PIC32MX512L and a PIC32MX512H. The PIC32MX512L must eventually transfer a program to the two wires PGEC2 and PGED2 of the PIC32MX512H.
Right now I'm trying to execute the check device operation. As specified, I'm entering the programming mode by MCLR-juggling and executing SetMode (6b011111) on the TMS clock while the TDI clock stays low. The TAP controller replies with zeroes (every TDO is low).
After that I must execute SendCommand( MTAP_SW_MTAP ) to select the MTAP controller. The sequence to be shifted is
(header) 01 01 00 00_ | (data) 00 00 10 00 00 | (most sign. bit) 01 | (footer) 01 00
The first bit of each pairs is the TDI and the second -- TMS. I write TDI on the first clock, TMS on the second clock and read TDO during the third and the fourth clock. This sequence is feeded from the left to the right. Shifted bits hold their value during each clock fall.
The issue
After shifting the first 4 pairs, the TDO line goes on the fourth pair high (on the third clock) and low at the end of that 4-phase part (on the fourth clock). I've marked this spot with an underscore in the sequence above. After that the controller ignores any further commands. On the next SendCommand( MTAP_COMMAND ), the TDO stays low and later on for XferData( MCHP_STATUS ) TDO still stays low, no matter how often I send the command.
I've done a small screenshot from my oscilloscope. The blue line is the clock, the green one is the data. The hop on the right is what I mean.
The question
Does anyone know what the TAP controller is trying to tell me with that TDO high on the fourth phase?
Thank you in advance!

Well, I've fixed it. Generally the last TDO of the prologue is the first least significant bit of the output. For SendCommand it has no meaning, but for XferData and XferFastData it is important.
For XferFastData it is the PrAacc bit according to the spec. If the bit is zero, you should repeat the whole operation. But beware: the MCU implementation doesn't follow the spec. If you really restart the whole operation for FastData if PrAcc is zero, it won't work. Instead just ignore the bit and proceed writing. I've found it out eventually by trial and error and by comparing my XferFastData implementation against pic32prog.

Related

How to test Bit-Banged communication's assembly routines

For one MCU I have written some assembly routines performing RX and TX of a proprietary protocol (UART-based) in a bit-bang fashion. How can I test them?
TX might be tested by sending data, and at the same time, with the help of a logic analyzer, checking that all the sampled timings are correct (manually or with some scripts).
RX on the other hand is more difficult. On one hand I can check if I'm receiving what someone else is sending, but on the other hand how do I know that the RX sampling is happening correctly (timing-wise)?
For example, my RX routine may return the correct data by sampling at the edge of the "bit window" instead of the middle.
I thought about toggling a "debug pin" to indicate when the sampling is actually happening, but this introduces delays in the sampling procedure, hence I wouldn't be testing my original routine.
Some things worth clarifying after reading comments:
I know that hardware UART is better (it depends, though), but I can't use it. This is not a matter of "have you tried this ...?";
I know how to do the bit banging (I have already written the assembly routines);
I can't connect TX to RX because I'm only using 1 wire (the communication is half-duplex);
I'm asking how to test the RX sampling timings, not how to implement UART.
I thought about toggling a "debug pin" to indicate when the sampling
is actually happening, but this introduces delays in the sampling
procedure, hence I wouldn't be testing my original routine.
Test with the instrumentation code, and then leave the instrumentation - or near-equivalent code that doesn't actually twiddle hardware - in place.
You'll need something to send data to the MCU, perhaps a second MCU. I've worked on similar code for both 6502 and Z80 for old 8 bit Atari peripherals. These are half duplex protocols, so whenever the device is idle, it's polling for a start bit. After detecting a start bit, it delays 1.5 bit times, then receives 8 bits, with 1 bit time between bits. Both receiving and sending of data routines are coded to get exact cycle counts for timing. These were old devices, and even the fastest bit rate was relatively slow at 19 microseconds per bit ~= 52600 baud.
The question has been updated. If the input and output instructions take the exact same time to run (cycle count), you could modify the receive code to transmit data to verify the bit time, and confirm exactly how fast the processor is running.
For the timing regarding sensing the start bit and doing a 1.5 bit time wait, you'd have to calculate the minimum and maximum number of cycles to sense the start bit. The maximum cycle count would be an input instruction that just misses the trailing edge of the start bit, the test instruction, and the loop back to the input, followed by another test and then a fall through the loop to continue the receive. The minimum cycle count would be an input that just barely catches the leading edge of the start bit, does a test, then falls through the loop. Then the remainder of the receive code needs to sample as close as possible to the middle of the data bit periods.
Here is example of code for a 4mhz Z80 that receives data at 19 microseconds == 76 cycles per data bit. The comments include the cycle count for each instruction. The ideal wait time for start bit to 1st data bit is 114 cycles. The min,max cycle time for the start bit loop is 20,50 cycles. An additional delay plus the input of the first data bit of 79 cycles is used so min,max cycle time to sense start to receive 1st data bit is 99,129 cycles, within the min,max bounds of 76,152 cycles. The remaining data bits are read at exactly 76 cycles per bit.
LD E,0 ;SET UP
; ; START BIT TO DATA BIT=114
NRXF0: LD A,(FBS) ;(13) WAIT FOR START BIT
AND FBSRXD ;(7)
JP NZ,NRXF0 ;(10)
; ; NOTE: 20 MIN, 50 MAX, 35 AVG
EX (SP),HL ;(19) DELAY
EX (SP),HL ;(19)
LD A,(HL) ;(7)
NRXF1: LD A,(HL) ;(7)
LD A,(HL) ;(7)
LD D,8 ;(7) 8 BITS PER BYTE
; ; 76 CYCLES PER DATA BIT
NRXF2: LD A,(FBS) ;(13) GET DATA BIT
AND FBSRXD ;(7)
ADD A,0FFH ;(7)
RR C ;(8)
PUSH BC ;(11) DELAY
POP BC ;(10)
NOP ;(4)
DEC D ;(4) LP TIL BYTE DONE
JR NZ,NRXF2 ;(12/7)
RET NZ ;(5) DELAY
NRXF4: LD A,(FBS) ;(13) WAIT FOR NEXT START BIT
AND FBSRXD ;(7)
JP NZ,NRXF4 ;(10)
; ; START BIT TO DATA BIT=114
LD (HL),C ;(7) STORE BYTE
LD A,C ;(4) DO CKSUM
ADD A,E ;(4)
ADC A,0 ;(7)
LD E,A ;(4)
INC HL ;(6) ADV ADR
DJNZ NRXF1 ;(13/8) LP IF MORE BYTES

measuring time between two rising edges in beaglebone

I am reading sensor output as square wave(0-5 volt) via oscilloscope. Now I want to measure frequency of one period with Beaglebone. So I should measure the time between two rising edges. However, I don't have any experience with working Beaglebone. Can you give some advices or sample codes about measuring time between rising edges?
How deterministic do you need this to be? If you can tolerate some inaccuracy, you can probably do it on the main Linux OS; if you want to be fancy pants, this seems like a potential use case for the BBB's PRU's (which I unfortunately haven't used so take this with substantial amounts of salt). I would expect you'd be able to write PRU code that just sits with an infinite outerloop and then inside that loop, start looping until it sees the pin shows 0, then starts looping until the pin shows 1 (this is the first rising edge), then starts counting until either the pin shows 0 again (this would then be the falling edge) or another loop to the next rising edge... either way, you could take the counter value and you should be able to directly convert that into time (the PRU is states as having fixed frequency for each instruction, and is a 200Mhz (50ns/instruction). Assuming your loop is something like
#starting with pin low
inner loop 1:
registerX = loadPin
increment counter
jump if zero registerX to inner loop 1
# pin is now high
inner loop 2:
registerX = loadPin
increment counter
jump if one registerX to inner loop 2
# pin is now low again
That should take 3 instructions per counter increment, so you can get the time as 3 * counter * 50 ns.
As suggested by Foon in his answer, the PRUs are a good fit for this task (although depending on your requirements it may be fine to use the ARM processor and standard GPIO). Please note that (as far as I know) both the regular GPIOs and the PRU inputs are based on 3.3V logic, and connecting a 5V signal might fry your board! You will need an additional component or circuit to convert from 5V to 3.3V.
I've written a basic example that measures timing between rising edges on the header pin P8.15 for my own purpose of measuring an engine's rpm. If you decide to use it, you should check the timing results against a known reference. It's about right but I haven't checked it carefully at all. It is implemented using PRU assembly and uses the pypruss python module to simplify interfacing.

usbmon, the usb spec and endianness/byte-order

I am trying to decipher a trace of USB I/O traffic produced by usbmon and am having some issues getting my head around the endianness. For the sake of example, here are two lines from the trace I am working with:
ffff8800650e7000 433121059 S Ci:2:000:0 s 80 06 0100 0000 0040 64 <
ffff8800650e7000 433121661 C Ci:2:000:0 0 18 = 12010002 00000040 da0b8781 00010102 0301
I initially had no suspicion whatsoever of anything other than big-endianness in the trace, but then I saw da0b8781 in the second line, which corresponds to the identity of the USB device I am tracing which has a vendor ID of 0x0bda and product ID of 0x8187 (note the reversal of byte-order in the trace).
So at this point I thought that maybe within a given field of a usbmon trace, the bytes were always in reverse byte order and should be interpreted as such. But to the contrary, let's examine a small part near the end of the first trace line, ... 0040 64
0040 is a hex field representing the maximum accepted response size. 64 is a decimal field that should represent exactly the same thing. 0x0040 = 64 decimal, without switching the byte order to 0x4000, which would then != 64 decimal. So it's at this point I started to get a bit uncertain about the byte-order of the different parts of the usbmon trace.
Next I thought, maybe it's just the data portions of the usbmon trace that are in reverse byte order. So I thought perhaps I should really be reading
...12010002 00000040 da0b8781 00010102 0301
as
1030 20101000 1878b0ad 04000000 20001021...
Nope, that doesn't seem to be right either. The USB Specification states that the vendor Id (0x0bda in my case) should be at byte offset 8 for this particular string. If we leave the above string in its original order, then the vendor Id does start at byte offset 8 (12010002 00000040 consumes the first 8 bytes), but if we reverse it as I have above, then it starts at byte offset 6 (1030 20101000 only consumes the first 6 bytes).
So my best guess now is that usbmon displays everything big-endian, accept that it switches to reverse byte order within each 4-byte word, but for data only. Can anyone offer some clarification on whether this is correct, or whether there may be something else I'm missing?
May be a bit late for you but I've tried usbmon (and found it OK)
you may want to take a look at evtest
http://www.freedesktop.org/wiki/Evtest

Why does COBOL have both `SECTION` and `PARAGRAPH`?

Why does COBOL have both SECTION and PARAGRAPH?
Can anybody explain why the designers of COBOL created both SECTIONs and PARAGRAPHs? These have been around since the initial release of COBOL so I suspect the real reason for their existence has long since gone away (similar to things like NEXT SENTENCE which are still in the language specification for backward compatibility but no longer required since the introduction of explicit scope terminators).
My guess is that SECTION may have been introduced to support program overlays. SECTION has an optional PRIORITY number associated with it to identify the program overlay it is part of. However, most modern implementations of COBOL ignore or have dropped PRIORITY numbers (and overlays).
Currently, I see that SECTIONs are still required in the DECLARATIVE part of the PROCEDURE DIVISION, but can find no justification for this. I see no semantic difference between SECTION and PARAGRAPH other than PARAGRAPH is subordinate to SECTION.
Some COBOL shops ban the use of SECTION in favour of PARAGRAPH (seems common in North America). Others ban PARAGRAPH in favour of SECTION (seems common in Europe). Still others have guidelines as to when each is appropriate. All of this seems highly arbitrary to me - which begs the question: Why were they put into the language specification in the first place? And, do they have any relevance today?
If you answer this question, it would be great if you could also point to a reference to support your answer.
Thanks
No references on this, since I heard it passed on to me from one of the old timers in my shop but...
In the old COBOL compilers, at least for IBM and Unisys, sections were able to be loaded into memory one at a time. Back in the good old days when memory was scarce, a program that was too large to be loaded into memory all at once was able to be modularized for memory usage using sections. Having both sections and paragraphs allowed the programmer to decide which code parts were loaded into memory together if they couldn't all be loaded at once - you'd want two parts of the same perform loop loaded together for efficiency's sake. Nowadays it's more or less moot.
My shop uses paragraphs only, prohibits GOTO and requires exit paragraphs, so all our PERFORMS are PERFORM 100-PARAGRAPH THRU 100-EXIT or something similar - which seems to make the paragraphs more like sections to me. But I don't think that there's really much of a difference now.
I learned COBOL around 1978, on an ICL 2903. I have a vague memory that the SECTION headers could be assigned a number range, which meant that those SECTION headers could be swapped in and out of memory, when the program was too large for memory.
I know this is an old question, but the OP requested about documentation on the original justification of the use of SECTION as well as PARAGRAPH in COBOL.
You can't get much more "original" than the CODASYL Journal documentation.
in section 8 of the Journal's specification for the language,
"COBOL segmentation is a facility that provides a means by which the
user may communicate with the compiler to specify object program
overlay requirements"
( page 331, section 8.1 "Segmentation - General Description")
"Although it is not mandatory, the Procedure Division for a source
program is usually written as a consecutive group of sections, each of
which is composed of a series of closely related operations that are
designed to collectively perform a particular function. However s when
segmentation is used, the entire Procedure Division must be in
sections. In addition, each section must be classified as belonging
either to the fixed portion or to one of the independent segments of
the object program. Segmentation in no way affects the need for
qualification of procedure-names to insure uniqueness."
(p 331, section 8.1.2.1 "Program Segments")
In her book on comparative programming languages ("Programming Languages: History and Fundamentals", 1969) Jean Sammet (who sat on the CODASYL committee, representing Sylvania Electric) states:
".. The storage allocation is handled automatically by the compiler.
The prime unit for allocating executable code is a group of sections
called a segment. The programmer combines sections be specifying a
priority number with each section's name. ... The compiler is required
to see that the proper control transfers are provided so that control
among segments which are not stored simultaneously can take place.
..."
(p 369 - 371 V.3 COBOL)
Well, the simplest of the reasons is that SECTION s provide you the "modularity" -- just as functions in C -- a necessity in the "structured" programs. You would notice that code written using SECTIONs appears far more readable than the code written just in paragraphs, for every section has to have an "EXIT" -- a sole and very explicit exit point from a SECTION (exit point of a paragrpah is far more vague and implicit, i.e. until a new paragraph declaration is found). Consider this example and you may be tempted to use sections in your code:
*==================
MAINLINE SECTION.
*==================
PERFORM SEC-A
PERFORM SEC-B
PERFORM SEC-C
GOBACK.
*==================
MAINLINE-EXIT.
*==================
EXIT.
*==================
SEC-A SECTION.
*==================
.....
.....
.....
.....
IF <cond>
go to A-EXIT
end-if
.....
.....
.....
.....
.
*==================
A-EXIT.
*==================
EXIT.
Don't think you would have this sort of a privlege when writing your codes in paragraphs. You may have had to write a huge ELSE statement to cover up the statements you didn't want to execute when a certain condition is reached (consider that set of statements to be running across 2-3 pages... a further set of IF / ELSE would cramp you up for indentation). Of course, you'll have to use "GO TO" to achieve this, but you can always direct your professionals not to use GO TOs except while Exiting, which is a fair deal, I think.
So, whilst I also agree that anything that can be written using SECTIONs can also be written using paragraphs (with little or no tweaks), my personal choice would be to go for an implementation that can make the job of my developers a little easier in future!
Cobol was developed in the mid-50's. As the full name alludes, it was developed for business programming, as being a language more relevant for business purposes than the existing "scientific" or "technical" languages (there were very few "languages" anyway, and "machine code" (specific, of course, to a particular architechture (I nearly said "specific chip", before thinking of vacuum tubes)) which may have to be set through physical switches/dials on some machines) and if lucky with an "Assembler". Cobol was very advanced for its day, for its purpose.
The intention was for programs written in Cobol to be much more like English-language than just a set of "codes" which mean something to the initiated.
If you look at some of the nomenclature relating to the language - paragraph, sentence, verb, clause - it is deliberately following the patterns ascribed to the English language.
SECTION doesn't quite fit into this, until you relate things to a formal business document.
Both SECTIONs and paragraphs also appear outside the PROCEDURE DIVISION. As in written English, paragraphs can exist on their own, or can be a part of a SECTION.
SECTIONs may have a priority-number which relates to the "segmentation feature". This used to include "overlaying" of SECTIONs to afford a primitive level of memory management. This is a "computing featuer" rather than an English-language one :-) The "segmentation feature" does have something of a remaining affect, but I've never seen it actually used.
Without DECLARATIVES (which I don't use, and have just noticed the manual to be unclear upon) then it is "choice" as to whether SECTIONs or paragraphs are used for PERFORM.
If GO TO is used, rationally, "equivalence" can be achieved with PERFORM ... TRHU .... If not, and there is not gratuitous use of PERFORM ... THRU ..., then there is equivalence already.
Comparisons to "structured" code and modern languages are "reading history backwards" or just outlining a particular "practice". From the reputation attained by "spaghetti code" and ALTER ... TO PROCEED TO ... it may well be that for 20 years it was "common" to not do much with PERFORM unless you needed the "memory management", but I have no references or knowledge to back this up.
SECTIONs allow duplicate paragraph-names, otherwise paragraph-names must be unique.
I can't put a specific finger on one over the other all the time.
If using GO TO, I'd use SECTIONs. If not, paragraphs. With DECLARATIVES I'd use SECTIONs. If using SECTIONs I'd start PROCEDURE DIVISION with a SECTION to avoid a diagnostic message.
Local standards may dictate, but not necessarily on a "modern" (or even "rational") basis. Much is "known" but actually misunderstood about SECTIONs and paragraphs, in my experience.
For performance (where masses of data is being processed, and I mean masses) then a PERFORM of one SECTION rather than multiple individual paragraphs would see improvements. The effect would be the same with PERFORM ... THRU ..., but I prefer not to recommend it.
GO TO outside the range of a PERFORM is 1) bad 2) can lose out on "optimization". Shouldn't be a problem *except" when GO TO abend/exception and not expecting any logical return. If the use of this is felt to be necessarily "immediately", then it is better done with a PERFORM despite the "counter-intuitive" aspect (so document it).
For one thing, paragraph names must be unique unless they are in separate sections, so sections allow for "namespacing" of paragraphs.
If I recall correctly, the only reason you must use a SECTION is for DECLARATIVES. Aside from that they are optional and primarily useful for grouping paragraphs. I think it's common (relatively speaking, anyway) to require that PERFORM be used on paragraphs only when they are in the same section.
A section can have several paragraphs in it. When you PERFORM a section, it executes all the paragraphs in the section. Within the section you can use PERFORM or GOTO to branch to the paragraphs within the section.
I will do the best I can to answer this. If your only coding exposure is x86 or ARM then you will have significant difficulty. Yes those chips sell a lot but that doesn't mean they are good, just cheap enough people don't mind throwing them away.
Much of this information can be found in "The Minimum You Need to Know to Be an OpenVMS Application Developer." You will find it is one of the scant few titles on Dr. Dobb's recommended reading list for all developers. Yes, I wrote it. It is also the book recommended by HP OpenVMS Engineering group for developers looking to learn the platform.
My COBOL on that platform mostly happened during the 1980s when it was VAX/VMS. Then it became OpenVMS; Alpha/OpenVMS; Itanium/OpenVMS; and soon to be x86/OpenVMS. On a real computer with a real operating system, sections have meaning. Every section created a PSECT. In linker terms that was short for Program SECtion. Based on what the section was, various load attributes were set. Each PSECT would be loaded into one or more 512 Byte memory pages. Memory pages were designed to be the exact same size as a disk block. VMS stood for Virtual Memory System. IBM had several of their own operating systems which, under the hood were different, but they too were true virtual memory systems. This wasn't "overlay linking." That's an x86 term and came about due to severe architectural flaws.Read up on Compact, Small, Medium, and Large "memory models" from the 286 days on forward. Also read up on EMS and XMS memory paging. Oiy was THAT fun!
Here is one of the numerous programs found in that book.
IDENTIFICATION DIVISION.
PROGRAM-ID. COB_ZILL_DUE_REPORT_SUB.
AUTHOR. Roland Hughes.
DATE-WRITTEN. 2005-02-08.
DATE-COMPILED. TODAY.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT DRAW-STATS
ASSIGN TO 'DRAWING_STATS'
ORGANIZATION IS INDEXED
ACCESS MODE IS SEQUENTIAL
RECORD KEY IS ELM_NO IN DSTATS-REC
LOCK MODE IS AUTOMATIC
FILE STATUS IS D-STAT.
SELECT MEGA-STATS
ASSIGN TO 'MEGA_STATS'
ORGANIZATION IS INDEXED
ACCESS MODE IS SEQUENTIAL
RECORD KEY IS ELM_NO IN MSTATS-REC
LOCK MODE IS AUTOMATIC
FILE STATUS IS M-STAT.
SELECT SORT-FILE ASSIGN TO 'TMP.SRT'.
SELECT SORTED-FILE ASSIGN TO DISK.
SELECT RPT-FILE ASSIGN TO 'ZILL_DUE.RPT'.
DATA DIVISION.
FILE SECTION.
FD DRAW-STATS
IS GLOBAL
LABEL RECORDS ARE STANDARD.
COPY 'CDD_RECORDS.ZILLIONARE_STATS_RECORD' FROM DICTIONARY
REPLACING ZILLIONARE_STATS_RECORD BY DSTATS-REC.
FD MEGA-STATS
IS GLOBAL
LABEL RECORDS ARE STANDARD.
COPY 'CDD_RECORDS.ZILLIONARE_STATS_RECORD' FROM DICTIONARY
REPLACING ZILLIONARE_STATS_RECORD BY MSTATS-REC.
FD RPT-FILE
LABEL RECORDS ARE OMITTED.
01 RPT-DTL PIC X(80).
SD SORT-FILE.
COPY 'CDD_RECORDS.ZILLIONARE_STATS_RECORD' FROM DICTIONARY
REPLACING ZILLIONARE_STATS_RECORD BY SORT-REC.
FD SORTED-FILE
VALUE OF ID IS SORTED-FILE-NAME.
COPY 'CDD_RECORDS.ZILLIONARE_STATS_RECORD' FROM DICTIONARY
REPLACING ZILLIONARE_STATS_RECORD BY SORTED-REC.
Data declarations
WORKING-STORAGE SECTION.
01 CONSTANTS.
05 SORT-FILE-NAME PIC X(7) VALUE 'TMP.SRT'.
05 SORTED-FILE-NAME PIC X(8) VALUE 'STAT.SRT'.
01 STATUS-VARIABLES.
05 M-STAT PIC X(2).
05 D-STAT PIC X(2).
05 EOF-FLAG PIC X.
88 IT-IS-END-OF-FILE VALUE 'Y'.
01 STUFF.
05 TODAYS-DATE.
10 TODAY_YYYY PIC X(4).
10 TODAY_MM PIC X(2).
10 TODAY_DD PIC X(2).
05 TODAYS-DATE-FORMATTED.
10 FMT_MM PIC Z9.
10 FILLER PIC X VALUE '/'.
10 FMT_DD PIC 99.
10 FILLER PIC X VALUE '/'.
10 FMT_YYYY PIC 9(4).
05 FLT-1 COMP-2.
05 WORK-STR PIC X(65).
01 REPORT-DETAIL.
05 ELM-NO-DTL PIC Z9.
05 FILLER PIC X(3).
05 HIT-COUNT-DTL PIC ZZZ9.
05 FILLER PIC X(3).
05 SINCE-LAST-DTL PIC ZZZ9.
05 FILLER PIC X(5).
05 PCT-HITS-DTL PIC Z9.999.
05 FILLER PIC X(4).
05 AVE-BTWN-DTL PIC ZZ9.999.
01 REPORT-HDR1.
05 THE-DATE PIC X(12).
05 FILLER PIC X(20).
05 PAGE-TITLE PIC X(17).
01 REPORT-HDR2.
05 FILLER PIC X(33).
05 GROUP-TITLE PIC X(20).
01 REPORT-HDR3.
05 HDR3-TXT PIC X(40) VALUE
'No Hits Since Pct_hits Ave_btwn'.
01 REPORT-HDR4.
05 HDR4-TXT PIC X(40) VALUE
'-- ---- ----- -------- --------'.
PROCEDURE DIVISION.
A000-MAIN.
PERFORM B000-HSK.
SORT SORT-FILE
ON DESCENDING KEY SINCE_LAST IN SORT-REC
INPUT PROCEDURE IS S000-DSTAT-INPUT
GIVING SORTED-FILE.
PERFORM B010-REPORT-DRAWING-NUMBERS.
STRING SORT-FILE-NAME, ';*' DELIMITED BY SIZE INTO WORK-STR.
CALL 'LIB$DELETE_FILE' USING BY DESCRIPTOR WORK-STR.
STRING SORTED-FILE-NAME, ';*' DELIMITED BY SIZE INTO WORK-STR.
CALL 'LIB$DELETE_FILE' USING BY DESCRIPTOR WORK-STR.
*
* Set up for second part of report
*
MOVE SPACES TO RPT-DTL.
WRITE RPT-DTL BEFORE ADVANCING PAGE.
MOVE SPACES TO EOF-FLAG.
MOVE ' Mega Drawing Numbers' TO GROUP-TITLE.
SORT SORT-FILE
ON DESCENDING KEY SINCE_LAST IN SORT-REC
INPUT PROCEDURE IS S001-MSTAT-INPUT
GIVING SORTED-FILE.
PERFORM B010-REPORT-DRAWING-NUMBERS.
STRING SORT-FILE-NAME, ';*' DELIMITED BY SIZE INTO WORK-STR.
CALL 'LIB$DELETE_FILE' USING BY DESCRIPTOR WORK-STR.
STRING SORTED-FILE-NAME, ';*' DELIMITED BY SIZE INTO WORK-STR.
CALL 'LIB$DELETE_FILE' USING BY DESCRIPTOR WORK-STR.
CLOSE RPT-FILE.
CALL 'LIB$SPAWN' USING BY DESCRIPTOR 'EDIT/READ ZILL_DUE.RPT'.
EXIT PROGRAM.
Paragraph to initialize our data and files.
B000-HSK.
CALL 'COB_FILL_IN_LOGICALS'.
MOVE SPACES TO STATUS-VARIABLES.
ACCEPT TODAYS-DATE FROM DATE YYYYMMDD.
MOVE TODAY_YYYY TO FMT_YYYY.
MOVE TODAY_DD TO FMT_DD.
MOVE TODAY_MM TO FMT_MM.
OPEN OUTPUT RPT-FILE.
MOVE SPACES TO REPORT-HDR1.
MOVE TODAYS-DATE-FORMATTED TO THE-DATE.
MOVE 'Due Number Report' to PAGE-TITLE.
MOVE SPACES TO REPORT-HDR2.
MOVE 'Drawing Numbers' TO GROUP-TITLE.
Paragraph to process the sorted selection file and
create the portion of the report relating to drawing
numbers.
B010-REPORT-DRAWING-NUMBERS.
MOVE SPACES TO EOF-FLAG.
OPEN INPUT SORTED-FILE.
READ SORTED-FILE
AT END SET IT-IS-END-OF-FILE TO TRUE.
PERFORM C010-DRAWING-HEADINGS.
PERFORM UNTIL IT-IS-END-OF-FILE
MOVE SPACES TO REPORT-DETAIL
MOVE ELM_NO IN SORTED-REC TO ELM-NO-DTL
MOVE HIT_COUNT IN SORTED-REC TO HIT-COUNT-DTL
MOVE SINCE_LAST IN SORTED-REC TO SINCE-LAST-DTL
MOVE PCT_HITS IN SORTED-REC TO PCT-HITS-DTL
MOVE AVE_BTWN IN SORTED-REC TO AVE-BTWN-DTL
MOVE REPORT-DETAIL TO RPT-DTL
WRITE RPT-DTL BEFORE ADVANCING 1 LINE
READ SORTED-FILE
AT END SET IT-IS-END-OF-FILE TO TRUE
END-READ
END-PERFORM.
CLOSE SORTED-FILE.
Paragraph to print headings for the main drawing numbers
Which are due.
C010-DRAWING-HEADINGS.
MOVE SPACES TO RPT-DTL.
MOVE REPORT-HDR1 TO RPT-DTL.
WRITE RPT-DTL BEFORE ADVANCING 2 LINES.
MOVE SPACES TO RPT-DTL.
MOVE REPORT-HDR2 TO RPT-DTL.
WRITE RPT-DTL BEFORE ADVANCING 1 LINE.
MOVE SPACES TO RPT-DTL.
MOVE REPORT-HDR3 TO RPT-DTL.
WRITE RPT-DTL BEFORE ADVANCING 1 LINE.
MOVE SPACES TO RPT-DTL.
MOVE REPORT-HDR4 TO RPT-DTL.
WRITE RPT-DTL BEFORE ADVANCING 1 LINE.
Paragraph to filter due numbers into sort file.
Creates a floating point temporary to compare against
floating point value from input file. When greater
record is released to the sort file.
S000-DSTAT-INPUT.
OPEN INPUT DRAW-STATS.
READ DRAW-STATS NEXT
AT END SET IT-IS-END-OF-FILE TO TRUE.
PERFORM UNTIL IT-IS-END-OF-FILE
MOVE SINCE_LAST IN DSTATS-REC TO FLT-1
IF FLT-1 >= AVE_BTWN IN DSTATS-REC
MOVE DSTATS-REC TO SORT-REC
RELEASE SORT-REC
END-IF
READ DRAW-STATS
AT END SET IT-IS-END-OF-FILE TO TRUE
END-READ
END-PERFORM.
CLOSE DRAW-STATS.
Paragraph to filter due numbers into sort file.
Creates a floating point temporary to compare against
floating point value from input file. When greater
record is released to the sort file.
S001-MSTAT-INPUT.
OPEN INPUT MEGA-STATS.
READ MEGA-STATS NEXT
AT END SET IT-IS-END-OF-FILE TO TRUE.
PERFORM UNTIL IT-IS-END-OF-FILE
MOVE SINCE_LAST IN MSTATS-REC TO FLT-1
IF FLT-1 >= AVE_BTWN IN MSTATS-REC
MOVE MSTATS-REC TO SORT-REC
RELEASE SORT-REC
END-IF
READ MEGA-STATS
AT END SET IT-IS-END-OF-FILE TO TRUE
END-READ
END-PERFORM.
CLOSE MEGA-STATS.
END PROGRAM COB_ZILL_DUE_REPORT_SUB.
Sorry for the way the "code" feature works in this editor.
Certain sections have to exist. Your program cannot do I-O without an INPUT-OUTPUT SECTION. This is where you map names to physical storage.
If you have an INPUT-OUTPUT SECTION then you have to have a FILE SECTION. This is where you define the record layout(s) of each named file. LABEL RECORDS are always STANDARD when dealing with disk data files and OMITTED when writing report text files. There are a few other clauses I don't remember. Please note the SD included in all of those FD statements. FD is File Definition and SD is Sort Definition.
If you are going to have any local variables you have to have a WORKING-STORAGE SECTION. You cannot declare variables on the fly, they all have to be declared here. This PSECT gets a DATA segment attribute among other things. If you call some service or something and it has a bad address, attempting to execute code within this PSECT the operating system will shoot your application out of the saddle.
All PSECTs created after PROCEDURE DIVISION are flagged EXEC, write protected. If you try to overwrite anything in here during execution the operating system will shoot your program out of the saddle. Any other program attempting to write here will also be shot out of the saddle.
Scan down to the SORT SORT-FILE in A000-MAIN. The COBOL sort routine is amazing. Notice that I provided an INPUT PROCEDURE and it is a paragraph. On IBM mainframes running ROSCOE back in the day this had to be an INPUT SECTION. They needed different attributes on the PSECT so the system sort routine could read/write.
Here is a snippet from another program in that book.
*
* FMS definitions
*
COPY 'COBFDVDEF' OF 'MEGA_TEXT_LIB'.
LINKAGE SECTION.
01 FMS-STUFF.
05 FMSSTATUS PIC S9(9) COMP.
05 RMSSTATUS PIC S9(9) COMP.
05 TCA PIC X(12).
05 WORKSPACE PIC X(12).
PROCEDURE DIVISION USING FMS-STUFF.
The linkage section creates a PSECT of sharable memory. When you call external routines which return values, they need to be here.You must also grant your PROCEDURE DIVISION access to various things it needs in the linkage section.
As you can see from this snippet later in the code
B010-USER-INPUT.
PERFORM C000-FORWARD-LOAD
CALL 'FDV$PUTAL' USING BY DESCRIPTOR SCREEN-REC.
MOVE SPACES TO WORK-STR.
CALL 'FDV$GETAL' USING BY DESCRIPTOR WORK-STR
BY REFERENCE TERMINATOR.
EVALUATE TERMINATOR
WHEN FDV$K_FK_E6 SET LOAD-FORWARD TO TRUE
WHEN FDV$K_FK_E5 SET LOAD-REVERSE TO TRUE
WHEN FDV$K_FK_F10 SET WE-ARE-DONE TO TRUE
END-EVALUATE.
you can pass any local variable you wish as long as you pass it correctly. It's the writing which needs special PSECT attributes.
It's late and I'm tired but I seem to remember you could could have USING clauses on SECTION declarations in the PROCEDURE DIVISION. The on-line documentation available for COBOL, at least that indexed by GOOGLE really is quite worthless. If you want more detailed information search for a circa 1980s COBOL textbook. It won't have any of the new stuff but it will answer many questions.
Here's a kind of bad tutorial on COBOL structure.
We use COBOL SECTION coding in all of our 37K MVS batch COBOL programs. We use this technique to get much faster run times and significantly reduced CPU overhead. This COBOL coding technique is very similar to high performance batch assembler.
Call it High Performance Functionally Structured COBOL programming
Once a SECTION is defined all PERFORM xxxxx will return at the next coded SECTION not the next paragraph in the SECTION. If paragraphs are coded ahead of the first SECTION then they can be executed normally. (But we don't allow this)
Using a SECTION has higher overhead than when using and PERFORM ing only paragraphs - U N L E S S - you use GOTO logic to bypass code that should be conditionally executed. Our rule is that a GOTO can only point to a Tag-Line in the same SECTION. (a paragraph) All paragraphs in a SECTION must be a sub function of the SECTION s function. The EXIT instruction is an assembler NOP instruction. It allow for a Tag-Line to be placed before the next SECTION - a fast exit/return.
Executing a PERFORM xxxx THRU yyyy has more CPU overhead than execution a SECTION without the GOTO s.
WARNING: Executing a PERFORM xxxx Tag-Line in a SECTION will fall thru all the code in the SECTION until the next SECTION is encountered. A GOTO Tag-Line outside of the current SECTION will fall thru all the code in the new landing SECTION until the next SECTION is encountered. (But we don't allow this)

What are the most hardcore optimisations you've seen?

I'm not talking about algorithmic stuff (eg use quicksort instead of bubblesort), and I'm not talking about simple things like loop unrolling.
I'm talking about the hardcore stuff. Like Tiny Teensy ELF, The Story of Mel; practically everything in the demoscene, and so on.
I once wrote a brute force RC5 key search that processed two keys at a time, the first key used the integer pipeline, the second key used the SSE pipelines and the two were interleaved at the instruction level. This was then coupled with a supervisor program that ran an instance of the code on each core in the system. In total, the code ran about 25 times faster than a naive C version.
In one (here unnamed) video game engine I worked with, they had rewritten the model-export tool (the thing that turns a Maya mesh into something the game loads) so that instead of just emitting data, it would actually emit the exact stream of microinstructions that would be necessary to render that particular model. It used a genetic algorithm to find the one that would run in the minimum number of cycles. That is to say, the data format for a given model was actually a perfectly-optimized subroutine for rendering just that model. So, drawing a mesh to the screen meant loading it into memory and branching into it.
(This wasn't for a PC, but for a console that had a vector unit separate and parallel to the CPU.)
In the early days of DOS when we used floppy discs for all data transport there were viruses as well. One common way for viruses to infect different computers was to copy a virus bootloader into the bootsector of an inserted floppydisc. When the user inserted the floppydisc into another computer and rebooted without remembering to remove the floppy, the virus was run and infected the harddrive bootsector, thus permanently infecting the host PC. A particulary annoying virus I was infected by was called "Form", to battle this I wrote a custom floppy bootsector that had the following features:
Validate the bootsector of the host harddrive and make sure it was not infected.
Validate the floppy bootsector and
make sure that it was not infected.
Code to remove the virus from the
harddrive if it was infected.
Code to duplicate the antivirus
bootsector to another floppy if a
special key was pressed.
Code to boot the harddrive if all was
well, and no infections was found.
This was done in the program space of a bootsector, about 440 bytes :)
The biggest problem for my mates was the very cryptic messages displayed because I needed all the space for code. It was like "FFVD RM?", which meant "FindForm Virus Detected, Remove?"
I was quite happy with that piece of code. The optimization was program size, not speed. Two quite different optimizations in assembly.
My favorite is the floating point inverse square root via integer operations. This is a cool little hack on how floating point values are stored and can execute faster (even doing a 1/result is faster than the stock-standard square root function) or produce more accurate results than the standard methods.
In c/c++ the code is: (sourced from Wikipedia)
float InvSqrt (float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1); // Now this is what you call a real magic number
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;
}
A Very Biological Optimisation
Quick background: Triplets of DNA nucleotides (A, C, G and T) encode amino acids, which are joined into proteins, which are what make up most of most living things.
Ordinarily, each different protein requires a separate sequence of DNA triplets (its "gene") to encode its amino acids -- so e.g. 3 proteins of lengths 30, 40, and 50 would require 90 + 120 + 150 = 360 nucleotides in total. However, in viruses, space is at a premium -- so some viruses overlap the DNA sequences for different genes, using the fact that there are 6 possible "reading frames" to use for DNA-to-protein translation (namely starting from a position that is divisible by 3; from a position that divides 3 with remainder 1; or from a position that divides 3 with remainder 2; and the same again, but reading the sequence in reverse.)
For comparison: Try writing an x86 assembly language program where the 300-byte function doFoo() begins at offset 0x1000... and another 200-byte function doBar() starts at offset 0x1001! (I propose a name for this competition: Are you smarter than Hepatitis B?)
That's hardcore space optimisation!
UPDATE: Links to further info:
Reading Frames on Wikipedia suggests Hepatitis B and "Barley Yellow Dwarf" virus (a plant virus) both overlap reading frames.
Hepatitis B genome info on Wikipedia. Seems that different reading-frame subunits produce different variations of a surface protein.
Or you could google for "overlapping reading frames"
Seems this can even happen in mammals! Extensively overlapping reading frames in a second mammalian gene is a 2001 scientific paper by Marilyn Kozak that talks about a "second" gene in rat with "extensive overlapping reading frames". (This is quite surprising as mammals have a genome structure that provides ample room for separate genes for separate proteins.) Haven't read beyond the abstract myself.
I wrote a tile-based game engine for the Apple IIgs in 65816 assembly language a few years ago. This was a fairly slow machine and programming "on the metal" is a virtual requirement for coaxing out acceptable performance.
In order to quickly update the graphics screen one has to map the stack to the screen in order to use some special instructions that allow one to update 4 screen pixels in only 5 machine cycles. This is nothing particularly fantastic and is described in detail in IIgs Tech Note #70. The hard-core bit was how I had to organize the code to make it flexible enough to be a general-purpose library while still maintaining maximum speed.
I decomposed the graphics screen into scan lines and created a 246 byte code buffer to insert the specialized 65816 opcodes. The 246 bytes are needed because each scan line of the graphics screen is 80 words wide and 1 additional word is required on each end for smooth scrolling. The Push Effective Address (PEA) instruction takes up 3 bytes, so 3 * (80 + 1 + 1) = 246 bytes.
The graphics screen is rendered by jumping to an address within the 246 byte code buffer that corresponds to the right edge of the screen and patching in a BRanch Always (BRA) instruction into the code at the word immediately following the left-most word. The BRA instruction takes a signed 8-bit offset as its argument, so it just barely has the range to jump out of the code buffer.
Even this isn't too terribly difficult, but the real hard-core optimization comes in here. My graphics engine actually supported two independent background layers and animated tiles by using different 3-byte code sequences depending on the mode:
Background 1 uses a Push Effective Address (PEA) instruction
Background 2 uses a Load Indirect Indexed (LDA ($00),y) instruction followed by a push (PHA)
Animated tiles use a Load Direct Page Indexed (LDA $00,x) instruction followed by a push (PHA)
The critical restriction is that both of the 65816 registers (X and Y) are used to reference data and cannot be modified. Further the direct page register (D) is set based on the origin of the second background and cannot be changed; the data bank register is set to the data bank that holds pixel data for the second background and cannot be changed; the stack pointer (S) is mapped to graphics screen, so there is no possibility of jumping to a subroutine and returning.
Given these restrictions, I had the need to quickly handle cases where a word that is about to be pushed onto the stack is mixed, i.e. half comes from Background 1 and half from Background 2. My solution was to trade memory for speed. Because all of the normal registers were in use, I only had the Program Counter (PC) register to work with. My solution was the following:
Define a code fragment to do the blend in the same 64K program bank as the code buffer
Create a copy of this code for each of the 82 words
There is a 1-1 correspondence, so the return from the code fragment can be a hard-coded address
Done! We have a hard-coded subroutine that does not affect the CPU registers.
Here is the actual code fragments
code_buff: PEA $0000 ; rightmost word (16-bits = 4 pixels)
PEA $0000 ; background 1
PEA $0000 ; background 1
PEA $0000 ; background 1
LDA (72),y ; background 2
PHA
LDA (70),y ; background 2
PHA
JMP word_68 ; mix the data
word_68_rtn: PEA $0000 ; more background 1
...
PEA $0000
BRA *+40 ; patched exit code
...
word_68: LDA (68),y ; load data for background 2
AND #$00FF ; mask
ORA #$AB00 ; blend with data from background 1
PHA
JMP word_68_rtn ; jump back
word_66: LDA (66),y
...
The end result was a near-optimal blitter that has minimal overhead and cranks out more than 15 frames per second at 320x200 on a 2.5 MHz CPU with a 1 MB/s memory bus.
Michael Abrash's "Zen of Assembly Language" had some nifty stuff, though I admit I don't recall specifics off the top of my head.
Actually it seems like everything Abrash wrote had some nifty optimization stuff in it.
The Stalin Scheme compiler is pretty crazy in that aspect.
I once saw a switch statement with a lot of empty cases, a comment at the head of the switch said something along the lines of:
Added case statements that are never hit because the compiler only turns the switch into a jump-table if there are more than N cases
I forget what N was. This was in the source code for Windows that was leaked in 2004.
I've gone to the Intel (or AMD) architecture references to see what instructions there are. movsx - move with sign extension is awesome for moving little signed values into big spaces, for example, in one instruction.
Likewise, if you know you only use 16-bit values, but you can access all of EAX, EBX, ECX, EDX , etc- then you have 8 very fast locations for values - just rotate the registers by 16 bits to access the other values.
The EFF DES cracker, which used custom-built hardware to generate candidate keys (the hardware they made could prove a key isn't the solution, but could not prove a key was the solution) which were then tested with a more conventional code.
The FSG 2.0 packer made by a Polish team, specifically made for packing executables made with assembly. If packing assembly isn't impressive enough (what's supposed to be almost as low as possible) the loader it comes with is 158 bytes and fully functional. If you try packing any assembly made .exe with something like UPX, it will throw a NotCompressableException at you ;)