I am using yosys to synthesize using the synth_ice40 command which calls ABC as well. In my Verilog code, I have used (* keep *) wire wire_1; Yosys does not optimize this but when it comes to ABC, it removes these unused wires. Is there anything equivalent to (* keep *) which could be used in Verilog which ABC does not optimize and remove the certain wires?
Any help would be appreciated.
Thanks,
Log:
`.44.2. Continuing TECHMAP pass.
No more expansions possible.
Removed 0 unused cells and 3 unused wires.
2.45. Executing OPT_LUT pass (optimize LUTs).
Discovering LUTs.
Number of LUTs: 1
2-LUT 1
Eliminating LUTs.
Number of LUTs: 1
2-LUT 1
Combining LUTs.
Number of LUTs: 1
2-LUT 1
Eliminated 0 LUTs.
Combined 0 LUTs.
<suppressed ~2 debug messages>
2.46. Executing TECHMAP pass (map to technology primitives).
2.46.1. Executing Verilog-2005 frontend: /usr/local/bin/../share/yosys/ice40/cells_map.v
Parsing Verilog input from /usr/local/bin/../share/yosys/ice40/cells_map.v' to AST representation. Generating RTLIL representation for module $lut'.
Successfully finished Verilog frontend.
2.46.2. Continuing TECHMAP pass.
Using template $paramod$lut\WIDTH=2\LUT=4'1000 for cells of type $lut.
No more expansions possible.
<suppressed ~14 debug messages>
Removed 0 unused cells and 2 unused wires.`
Related
I've been testing yosys for some use cases.
Version: Yosys 0.7+200 (git sha1 155a80d, gcc-6.3 6.3.0 -fPIC -Os)
I wrote a simple block which converts gray code to binary:
module gray2bin (gray, bin);
parameter WDT = 3;
input [WDT-1:0] gray;
output [WDT-1:0] bin;
assign bin = {gray[WDT-1], bin[WDT-1:1]^gray[WDT-2:0]};
endmodule
This is an acceptable and valid code in verilog, and there is no loop in it.
It passes compilation and synthesis without any warnings in other tools.
But, when I run in yosys the next commands:
read_verilog gray2bin.v
scc
I get that a logic loop was found:
Found an SCC: $xor$gray2bin.v:11$1
Found 1 SCCs in module gray2bin.
Found 1 SCCs.
The next code, which is equivalent, pass the check:
module gray2bin2 (
gray,
bin
);
parameter WDT = 3;
input [WDT-1:0] gray;
output [WDT-1:0] bin;
assign bin[WDT-1] = gray[WDT-1];
genvar i;
generate
for (i = WDT-2; i>=0; i=i-1) begin : gen_serial_xor
assign bin[i] = bin[i+1]^gray[i];
end
endgenerate
endmodule
Am I missing a flag or synthesis option of some kind?
Using word-wide operators this circuit clearly has a loop (generated with yosys -p 'prep; show' gray2bin.v):
You have to synthesize the circuit to a gate-level representation to get a loop-free version (generated with yosys -p 'synth; splitnets -ports; show' gray2bin.v, the call to splitnets is just there for better visualization):
The answer given by CliffordVienna indeed gives a solution, but I also want to clarify that that it's not suitable to all purposes.
My analysis was done for the purpose of formal verification. Since I replaced the prep to synth to solve the falsely identified logic loops, my formal code got optimized. Wires which I've created that were driven only by the assume property pragma, were removed - this made many assertions redundant.
It's not correct to reduce any logic for the purpose of behavioral verification.
Therefore, if the purpose is to prepare a verification database, I suggest not to use the synth command, but to use a subset of commands the synth command executes.
You can find those commands under:
http://www.clifford.at/yosys/cmd_synth.html
In general, I've used all the commands specified in the above link that do not optimize logic:
hierarchy -check
proc
check
wreduce
alumacc
fsm
memory -nomap
memory_map
techmap
abc -fast
hierarchy -check
stat
check
And everything works as expected.
I am trying to see if Yosys fits my requirements or no.
What i want to do is to find an operation in Verilog code (e.g. temp = 16*val1 + 8*val2 ) and replace this with another op like ( temp = val1 << 4 + val2 << 3 ).
Which parts i need to learn & use from Yosys? if anyone knows the set of command to use, can he/she please let me know to boost my learning curve ?
Thanks.
For example consider the following verilog input (test.v):
module test(input [7:0] val1, val2, output [7:0] temp);
assign temp = 16*val1 + 8*val2;
endmodule
The command yosys -p 'prep; opt -full; show' test.v will produce the following circuit diagram:
And the output written to the console contains this:
3.1. Executing OPT_EXPR pass (perform const folding).
Replacing multiply-by-16 cell `$mul$test.v:2$1' in module `\test' with shift-by-4.
Replacing multiply-by-8 cell `$mul$test.v:2$2' in module `\test' with shift-by-3.
Replacing $shl cell `$mul$test.v:2$1' (B=3'100, SHR=-4) in module `test' with fixed wiring: { \val1 [3:0] 4'0000 }
Replacing $shl cell `$mul$test.v:2$2' (B=2'11, SHR=-3) in module `test' with fixed wiring: { \val2 [4:0] 3'000 }
The two lines reading Replacing multiply-by-* cell are the transformation you mentioned. The two lines after that replace the constant shift operations with wiring, using {val1[3:0], 4'b0000} and {val2[4:0], 3'b000} as inputs for the adder.
This is done in the opt_expr pass. See passes/opt/opt_expr.cc for its source code to see how it's done.
I'm trying to figure out how to run LLVM's built-in loop vectorizer. I have a small program containing an extremely simple loop (I had some output at one point which is why stdio.h is still being included despite never being used):
1 #include <stdio.h>
2
3 unsigned NUM_ELS = 10000;
4
5 int main() {
6 int A[NUM_ELS];
7
8 #pragma clang loop vectorize(enable)
9 for (int i = 0; i < NUM_ELS; ++i) {
10 A[i] = i*2;
11 }
12
13 return 0;
14 }
As you can see, it does nothing at all useful; I just need the for loop to be vectorizable. I'm compiling it to LLVM bytecode with
clang -emit-llvm -O0 -c loop1.c -o loop1.bc
llvm-dis -f loop1.bc
Then I'm applying the vectorizer with
opt -loop-vectorize -force-vector-width=4 -S -debug loop1.ll
However, the debug output gives me this:
LV: Checking a loop in "main" from loop1.bc
LV: Loop hints: force=? width=4 unroll=0
LV: Found a loop: for.cond
LV: SCEV could not compute the loop exit count.
LV: Not vectorizing: Cannot prove legality.
I've dug around in the LLVM source a bit, and it looks like SCEV comes from the ScalarEvolution pass, which has the task of (among other things) counting the number of back edges back to the loop condition, which in this case (if I'm not mistaken) should be the trip count minus the first trip (so 9,999 in this case). I've run this pass on a much larger benchmark and it gives me the exact same error at every loop, so I'm guessing it isn't the loop itself, but that I'm not giving it enough information.
I've spent quite a bit of time combing through the documentation and Google results to find an example of a full opt command using this transformation, but have been unsuccessful so far; I'd appreciate any hints as to what I may be missing (I'm new to vectorizing code so it could be something very obvious).
Thank you,
Stephen
vectorization depends on number of other optimization which needs to be run before. They are not run at all at -O0, therefore you cannot expect that your code would be 'just' vectorized there.
Adding -O2 before -loop-vectorize in opt cmdline would help here (make sure your 'A' array is external / used somehow, otherwise everything will be optimized away).
This question already has answers here:
Solution needed for building a static IDT and GDT at assemble/compile/link time
(1 answer)
How to do computations with addresses at compile/linking time?
(2 answers)
Closed 5 days ago.
I'm working on a project that has tight boot time requirements. The targeted architecture is an IA-32 based processor running in 32 bit protected mode. One of the areas identified that can be improved is that the current system dynamically initializes the processor's IDT (interrupt descriptor table). Since we don't have any plug-and-play devices and the system is relatively static, I want to be able to use a statically built IDT.
However, this proving to be troublesome for the IA-32 arch since the 8 byte interrupt gate descriptors splits the ISR address. The low 16 bits of the ISR appear in the first 2 bytes of the descriptor, some other bits fill in the next 4 bytes, and then finally the last 16 bits of the ISR appear in the last 2 bytes.
I wanted to use a const array to define the IDT and then simply point the IDT register at it like so:
typedef struct s_myIdt {
unsigned short isrLobits;
unsigned short segSelector;
unsigned short otherBits;
unsigned short isrHibits;
} myIdtStruct;
myIdtStruct myIdt[256] = {
{ (unsigned short)myIsr0, 1, 2, (unsigned short)(myIsr0 >> 16)},
{ (unsigned short)myIsr1, 1, 2, (unsigned short)(myIsr1 >> 16)},
etc.
Obviously this won't work as it is illegal to do this in C because myIsr is not constant. Its value is resolved by the linker (which can do only a limited amount of math) and not by the compiler.
Any recommendations or other ideas on how to do this?
You ran into a well known x86 wart. I don't believe the linker can stuff the address of your isr routines in the swizzled form expected by the IDT entry.
If you are feeling ambitious, you could create an IDT builder script that does something like this (Linux based) approach. I haven't tested this scheme and it probably qualifies as a nasty hack anyway, so tread carefully.
Step 1: Write a script to run 'nm' and capture the stdout.
Step 2: In your script, parse the nm output to get the memory address of all your interrupt service routines.
Step 3: Output a binary file, 'idt.bin' that has the IDT bytes all setup and ready for the LIDT instruction. Your script obviously outputs the isr addresses in the correct swizzled form.
Step 4: Convert his raw binary into an elf section with objcopy:
objcopy -I binary -O elf32-i386 idt.bin idt.elf
Step 5: Now idt.elf file has your IDT binary with the symbol something like this:
> nm idt.elf
000000000000000a D _binary_idt_bin_end
000000000000000a A _binary_idt_bin_size
0000000000000000 D _binary_idt_bin_start
Step 6: relink your binary including idt.elf. In your assembly stubs and linker scripts, you can refer to symbol _binary_idt_bin_start as the base of the IDT. For example, your linker script can place the symbol _binary_idt_bin_start at any address you like.
Be careful that relinking with the IDT section doesn't move anyting else in your binary, e.g. your isr routines. Manage this in your linker script (.ld file) by puting the IDT into it's own dedicated section.
---EDIT---
From comments, there seems to be confusion about the problem. The 32-bit x86 IDT expects the address of the interrupt service routine to be split into two different 16-bit words, like so:
31 16 15 0
+---------------+---------------+
| Address 31-16 | |
+---------------+---------------+
| | Address 15-0 |
+---------------+---------------+
A linker is thus unable to plug-in the ISR address as a normal relocation. So, at boot time, software must construct this split format, which slows boot time.
I'm working on a assembler for fun, written in C,flex,bison. I'd like to add macros, includes and repeating blocks and was thinking of doing this with a separate preprocessing stage parser.
My question is, how might I keep track of original source line numbers (and filenames)? This is for producing useful error messages, pretty printing, and generating debug information.
yylineno in the second parser after preprocessing is complete will presumably be offset after macro expansion and so on.
you can add
;#file filename.asm
;#line 5
to the preprocessed assembler so
foo:
PUSHREG(A,B,C)
;--10 lines of code
POPREG(A,B,C)
set PC,POP
turns into
foo:
;#file functionmacros.asm
;#line 10
set push,A
set push,B
set push,C
;#file yourfile.asm
;#line 5
;--10 lines of code
;#file functionmacros.asm
;#line 30
set C,pop
set B,POP
set C,POP
;#file yourfile.asm
;#line 16
set PC,POP