Is there an option to synthsise some code into verilog built-in primitives? - yosys

I thought that techmap without any argument will do it but it didn't.
probably I missunderstand what 'logical synthsis' means.
basic example:
AND_GATE.v:
module AND_GATE( input A, input B, output X);
assign X = A & B;
endmodule
yosys> read_verilog AND_GATE.v
yosys> synth
....................
Number of wires: 3
Number of wire bits: 3
Number of public wires: 3
Number of public wire bits: 3
Number of memories: 0
Number of memory bits: 0
Number of processes: 0
Number of cells: 1
$_AND_ 1
yosys> abc -g AND,NAND,OR,NOR,XOR,XNOR
........................
3.1.2. Re-integrating ABC results.
ABC RESULTS: AND cells: 1
ABC RESULTS: internal signals: 0
ABC RESULTS: input signals: 2
ABC RESULTS: output signals: 1
Removing temp directory.
yosys> clean
Removed 0 unused cells and 3 unused wires.
yosys> write_verilog net.v
net.v
module AND_GATE(A, B, X);
(* src = "AND_GATE.v:1" *)
input A;
(* src = "AND_GATE.v:1" *)
input B;
(* src = "AND_GATE.v:1" *)
output X;
assign X = B & A;
endmodule

Using something like synth; abc -g AND,NAND,OR,NOR,XOR,XNOR will map to a basic set of gates equivalent to the Verilog primitives - techmap on its own won't get you far away either - but the Yosys verilog backend doesn't have an option to use built-in primitives, it always writes the gates as their expression.

Related

Cyclomatic complexity - draw control flow graph for this java statement

Can anyone help with this?
while (x > level)
x = x – 1;
x = 0
Cyclomatic complexity can be computed using the formula provided here.
Cyclomatic complexity = E - N + P
where,
E = number of edges in the flow graph.
N = number of nodes in the flow graph.
P = number of nodes that have exit points
For your case, the graph should look like this:
--------------- ----------
| x > level |----- NO ------>| x = x-1|
|-------------| ----|-----
| |---------------------
|
Yes
|
-------|----------
| End while (if) |
-------|----------
|
|
---------
| x = 0 |
----------
(not an ASCII art person)
So, the cyclomatic complexity should be:
E = 4, N = 4, P = 2 => Complexity = 4 - 4 + 2 = 2
[edit]
Ira Baxter points out very well how to simplify this computation for languages like Java, C#, C++ etc. However, identifying the conditionals must be carefully performed, as shown here:
- Start with a count of one for the method.
- Add one for each of the following flow-related elements that are found in the method.
Returns - Each return that isn't the last statement of a method.
Selection - if, else, case, default.
Loops - for, while, do-while, break, and continue.
Operators - &&, ||, ?, and :
Exceptions - catch, finally, throw, or throws clause.

Is both have the same meaning?

In Verilog code
case ({Q[0], Q_1})
2'b0_1 :begin
A<=sum[7]; Q<=sum; Q_1<=Q;
end
2'b1_0 : begin
A<=difference[7]; Q<=difference; Q_1<=Q;
end
default: begin
A<=A[7]; Q<=A; Q_1<=Q;
end
endcase
is above code is same as below code
case ({Q[0], Q_1})
2'b0_1 : {A, Q, Q_1} <= {sum[7], sum, Q};
2'b1_0 : {A, Q, Q_1} <= {difference[7], difference, Q};
default: {A, Q, Q_1} <= {A[7], A, Q};
endcase
If yes then why i am getting different result?
Edit:-A, Q, sum and difference are all 8-bit values and Q_1 is a 1-bit value.
No, these are not the same. The concatenation operator ({ ... }) allows you to create vectors from several different signals, allowing you to both use these vectors and assign to these vectors, resulting in the assignment of the component signals will the appropriate bits from the result. From your previous question (Please Explain these verilog code?), I see that A, Q, sum and difference are all 8-bit values and Q_1 is a 1-bit value. Lets examine the first assignment (noting that the other three work the same way):
{A, Q, Q_1} <= {sum[7], sum, Q};
If we look at the right-hand side, we can see that the result of the concatenation is a 17-bit vector, as sum[7] is 1 bit (the MSb of sum), sum is 8 bits, and Q is 8 bits (1 + 8 + 8 = 17). Lets say sum = 8'b10100101 and Q = 8'b00110110, what would {sum[7], sum, Q} look like? Well, its the concatenation of the values from sum and Q so it would be 17'b1_10100101_00110110, the first bit coming from sum[7], the next 8 bits from sum and the final 8 bits from Q.
Now we have to assign this 17-bit value to the left hand side. On the left, we have {A, Q, Q_1}, which is also 17 bits (A is 8 bits, Q is 8 bits and Q_1 is 1 bit). However, we have to assign the bits from our 17-bit value we got above to the proper signals that make up this new 17-bit vector, that means the 8 most significant bits go into A, the next 8 bits go into Q and the least significant bit go into Q_1. So, if we take our value from above (17'b1_10100101_00110110), and split it up this way (17'b11010010_10011011_0), we see A = 8'b11010010, Q = 8'b10011011 and Q_1 = 1'b0. Thus, this is not the same as assigning A = sum[7], Q = sum and Q_1 = Q (this would result in A = 8'b00000001, Q = 8'b10100101, Q_1 = 1'b0, with many bits of Q being lost and A having 7 extra bits).
However, this doesnt mean we cant split up the left-hand side concatenation, it would just look like this:
A <= {sum[7], sum[7:1]};
Q <= {sum[0], Q[7:1]};
Q_1 <= Q[0];
Yes, they are same. For example try this small code and check, the output is same :
module test;
wire A,B,C;
reg p,q,r;
initial
begin
p=1; q=1; r=0;
end
assign {A,B,C} = {p,q,r};
initial #1 $display("%b %b %b",A,B,C);
endmodule
In general if you want to understand concatenation operator, you can refer here
Edit : I have assumed A and p , B and q, C and r of same length.

correct variable definition in fortran routine

I want to run m*n matrix to do QR decomposition in fortran
PROGRAM SUBDEM
INTEGER key, n, m, loopA
REAL resq
REAL A(3,2)
REAL B(3)
REAL X(2)
key = 0
n = 2
m = 3
resq = 0
CALL QR(m, n, A, B, X, resq)
END
The QR routine is:
subroutine QR(m, n, a, b, x, resq)
implicit double precision (a-h, o-z)
dimension a(m,n),b(m),x(n)
double precision sum, dot
resq=-2.0
if (m .lt. n) then
return
endif
resq=-1.0
! Loop ending on 1800 rotates a into upper triangular form.
do 1800 j=1, n
! Find constants for rotation and diagonal entry.
sq=0.0
do 1100 i=j, m
sq=a(i,j)**2 + sq
1100 continue
if (sq .eq. 0.0) then
return
endif
qv1=-sign(sqrt(sq), a(j,j))
u1=a(j,j) - qv1
a(j,j)=qv1
j1=j + 1
! Rotate remaining columns of sub-matrix.
do 1400 jj=j1, n
dot=u1*a(j,jj)
do 1200 i=j1, m
dot=a(i,jj)*a(i,j) + dot
1200 continue
const=dot/abs(qv1*u1)
do 1300 i=j1, m
a(i,jj)=a(i,jj) - const*a(i,j)
1300 continue
a(j,jj)=a(j,jj) - const*u1
1400 continue
! Rotate b vector.
dot=u1*b(j)
do 1600 i=j1, m
dot=b(i)*a(i,j) + dot
1600 continue
const=dot/abs(qv1*u1)
b(j)=b(j) - const*u1
do 1700 i=j1, m
b(i)=b(i) - const*a(i,j)
1700 continue
1800 continue
! Solve triangular system by back-substitution.
do 2200 ii=1, n
i=n-ii+1
sum=b(i)
do 2100 j=i+1, n
sum=sum - a(i,j)*x(j)
2100 continue
if (a(i,i).eq. 0.0) then
return
endif
x(i)=sum/a(i,i)
2200 continue
! Find residual in overdetermined case.
resq=0.0
do 2300 i=n+1, m
resq=b(i)**2 + resq
2300 continue
return
end subroutine
However I am getting:
Error 1 error #6633: The type of the actual argument differs from the
type of the dummy argument. [A] Error 2 error #6633: The type of
the actual argument differs from the type of the dummy argument.
[B] Error 3 error #6633: The type of the actual argument differs
from the type of the dummy argument. [X] Error 4 error #6633: The
type of the actual argument differs from the type of the dummy
argument. [RESQ]
What Am I doing wrong?
I will sum the comments of #george and #M.S.B.
The types and kinds of dummy arguments of procedures must match the actual arguments used by the calling code. The compiler can check this when having an explicit interface, when the procedure is in a module, for example, and some compilers can do that also for external procedures, which is your case.
Placing the subroutines in modules is the preferred way for making this check possible at all conditions and all compilers.
By using implicit double precision (a-h, o-z) you declare all variables with name beginning with a-h or o-z as being double precision. In your main program you are using real for calling the procedure. This is the error, the types have to match.
It is strongly discouraged to use any other form of the implicit other than implicit none, which is the form that should be present at the beginning of every compilation unit (program, module, external procedures).

Reading a known number of variable from a file when one of the variables are missing in input file

I already checked similar posting. The solution is given by M. S. B. here Reading data file in Fortran with known number of lines but unknown number of entries in each line
So, the problem I am having is that from text file I am trying to read inputs. In one line there is supposed to be 3 variables. But sometimes the input file may have 2 variables. In that case I need to make the last variable zero. I tried using READ statement with IOSTAT but if there is only two values it goes to the next line and reads the next available value. I need to make it stop in the 1st line after reading 2 values when there is no 3rd value.
I found one way to do that is to have a comment/other than the type I am trying to read (in this case I am reading float while a comment is a char) which makes a IOSTAT>0 and I can use that as a check. But if in some cases I may not have that comment. I want to make sure it works even than.
Part of the code
read(15,*) x
read(15,*,IOSTAT=ioerr) y,z,w
if (ioerr.gt.0) then
write(*,*)'No value was found'
w=0.0;
goto 409
elseif (ioerr.eq.0) then
write(*,*)'Value found', w
endif
409 read(15,*) a,b
read(15,*) c,d
INPUT FILE is of the form
-1.000 abcd
12.460 28.000 8.00 efg
5.000 5.000 hijk
20.000 21.000 lmno
I need to make it work even when there is no "8.00 efg"
for this case
-1.000 abcd
12.460 28.000
5.000 5.000 hijk
20.000 21.000 lmno
I can not use the string method suggested by MSB. Is there any other way?
I seem to remember trying to do something similar in the past. If you know that the size of a line of the file won't exceed a certain number, you might be able to try something like:
...
character*(128) A
read(15,'(A128)') A !This now holds 1 line of text, padded on the right with spaces
read(A,*,IOSTAT=ioerror) x,y,z
if(IOSTAT.gt.0)then
!handle error here
endif
I'm not completely sure how portable this solution is from one compiler to the next and I don't have time right now to read up on it in the f77 standard...
I have a routine that counts the number of reals on a line. You could adapt this to your purpose fairly easily I think.
subroutine line_num_columns(iu,N,count)
implicit none
integer(4),intent(in)::iu,N
character(len=N)::line
real(8),allocatable::r(:)
integer(4)::r_size,count,i,j
count=0 !Set to zero in case of premature return
r_size=N/5 !Initially try out this max number of reals
allocate(r(r_size))
read(iu,'(a)') line
50 continue
do i=1,r_size
read(line,*,end=99) (r(j),j=1,i) !Try reading i reals
count=i
!write(*,*) count
enddo
r_size=r_size*2 !Need more reals
deallocate(r)
allocate(r(r_size))
goto 50
return
99 continue
write(*,*) 'I conclude that there are ',count,' reals on the first line'
end subroutine line_num_columns
If a Fortran 90 solution is fine, you can use the following procedure to parse a line with multiple real values:
subroutine readnext_r1(string, pos, value)
implicit none
character(len=*), intent(in) :: string
integer, intent(inout) :: pos
real, intent(out) :: value
integer :: i1, i2
i2 = len_trim(string)
! initial values:
if (pos > i2) then
pos = 0
value = 0.0
return
end if
! skip blanks:
i1 = pos
do
if (string(i1:i1) /= ' ') exit
i1 = i1 + 1
end do
! read real value and set pos:
read(string(i1:i2), *) value
pos = scan(string(i1:i2), ' ')
if (pos == 0) then
pos = i2 + 1
else
pos = pos + i1 - 1
end if
end subroutine readnext_r1
The subroutine reads the next real number from a string 'string' starting at character number 'pos' and returns the value in 'value'. If the end of the string has been reached, 'pos' is set to zero (and a value of 0.0 is returned), otherwise 'pos' is incremented to the character position behind the real number that was read.
So, for your case you would first read the line to a character string:
character(len=1024) :: line
...
read(15,'(A)') line
...
and then parse this string
real :: y, z, w
integer :: pos
...
pos = 1
call readnext_r1(line, pos, y)
call readnext_r1(line, pos, z)
call readnext_r1(line, pos, w)
if (pos == 0) w = 0.0
where the final 'if' is not even necessary (but this way it is more transparent imho).
Note, that this technique will fail if there is a third entry on the line that is not a real number.
I know the following simple solution:
w = 0.0
read(15,*,err=600)y, z, w
goto 610
600 read(15,*)y, z
610 do other stuff
But it contains "goto" operators
You might be able to use the wonderfully named colon edit descriptor. This allows you to skip the rest of a format if there are no further items in the I/O list:
Program test
Implicit None
Real :: a, b, c
Character( Len = 10 ) :: comment
Do
c = 0.0
comment = 'No comment'
Read( *, '( 2( f7.3, 1x ), :, f7.3, a )' ) a, b, c, comment
Write( *, * ) 'I read ', a, b, c, comment
End Do
End Program test
For instance with gfortran I get:
Wot now? gfortran -W -Wall -pedantic -std=f95 col.f90
Wot now? ./a.out
12.460 28.000 8.00 efg
I read 12.460000 28.000000 8.0000000 efg
12.460 28.000
I read 12.460000 28.000000 0.00000000E+00
^C
This works with gfortran, g95, the NAG compiler, Intel's compiler and the Sun/Oracle compiler. However I should say I'm not totally convinced I understand this - if c or comment are NOT read are they guaranteed to be 0 and all spaces respectively? Not sure, need to ask elsewhere.

Tweaking MIT's bitcount algorithm to count words in parallel?

I want to use a version of the well known MIT bitcount algorithm to count neighbors in Conway's game of life using SSE2 instructions.
Here's the MIT bitcount in c, extended to count bitcounts > 63 bits.
int bitCount(unsigned long long n)
{
unsigned long long uCount;
uCount = n – ((n >> 1) & 0×7777777777777777)
- ((n >> 2) & 0×3333333333333333)
- ((n >> 3) & 0×1111111111111111);
return ((uCount + (uCount >> 4))
& 0x0F0F0F0F0F0F0F0F) % 255;
}
Here's a version in Pascal
function bitcount(n: uint64): cardinal;
var ucount: uint64;
begin
ucount:= n - ((n shr 1) and $7777777777777777)
- ((n shr 2) and $3333333333333333)
- ((n shr 3) and $1111111111111111);
Result:= ((ucount + (count shr 4))
and $0F0F0F0F0F0F0F0F) mod 255;
end;
I'm looking to count the bits in this structure in parallel.
32-bit word where the pixels are laid out as follows.
lo-byte lo-byte neighbor
0 4 8 C 048C 0 4 8 C
+---------------+
1|5 9 D 159D 1|5 9 D
| |
2|6 A E 26AE 2|6 A E
+---------------+
3 7 B F 37BF 3 7 B F
|-------------| << slice A
|---------------| << slice B
|---------------| << slice C
Notice how this structure has 16 bits in the middle that need to be looked up.
I want to calculate neighbor counts for each of the 16 bits in the middle using SSE2.
In order to do this I put slice A in XMM0 low-dword, slice B in XXM0-dword1 etc.
I copy XMM0 to XMM1 and I mask off bits 012-456-89A for bit 5 in the low word of XMM0, do the same for word1 of XMM0, etc. using different slices and masks to make sure each word in XMM0 and XMM1 holds the neighbors for a different pixel.
Question
How do I tweak the MIT-bitcount to end up with a bitcount per word/pixel in each XMM word?
Remarks
I don't want to use a lookup table, because I already have that approach and I want to
test to see if SSE2 will speed up the process by not requiring memory accesses to the lookup table.
An answer using SSE assembly would be optimal, because I'm programming this in Delphi and I'm thus using x86+SSE2 assembly code.
The MIT algorithm would be tough to implement in SSE2, since there is no integer modulus instruction which could be used for the final ... % 255 expression. Of the various popcnt methods out there, the one that lends itself to SSE most easily and efficiently is probably the first one in Chapter 5 of "Hackers Delight" by Henry S. Warren, which I have implemented here in C using SSE intrinsics:
#include <stdio.h>
#include <emmintrin.h>
__m128i _mm_popcnt_epi16(__m128i v)
{
v = _mm_add_epi16(_mm_and_si128(v, _mm_set1_epi16(0x5555)), _mm_and_si128(_mm_srli_epi16(v, 1), _mm_set1_epi16(0x5555)));
v = _mm_add_epi16(_mm_and_si128(v, _mm_set1_epi16(0x3333)), _mm_and_si128(_mm_srli_epi16(v, 2), _mm_set1_epi16(0x3333)));
v = _mm_add_epi16(_mm_and_si128(v, _mm_set1_epi16(0x0f0f)), _mm_and_si128(_mm_srli_epi16(v, 4), _mm_set1_epi16(0x0f0f)));
v = _mm_add_epi16(_mm_and_si128(v, _mm_set1_epi16(0x00ff)), _mm_and_si128(_mm_srli_epi16(v, 8), _mm_set1_epi16(0x00ff)));
return v;
}
int main(void)
{
__m128i v0 = _mm_set_epi16(7, 6, 5, 4, 3, 2, 1, 0);
__m128i v1;
v1 = _mm_popcnt_epi16(v0);
printf("v0 = %vhd\n", v0);
printf("v1 = %vhd\n", v1);
return 0;
}
Compile and test as follows:
$ gcc -Wall -msse2 _mm_popcnt_epi16.c -o _mm_popcnt_epi16
$ ./_mm_popcnt_epi16
v0 = 0 1 2 3 4 5 6 7
v1 = 0 1 1 2 1 2 2 3
$
It looks like around 16 arithmetic/logical instructions so it should run at around 16 / 8 = 2 clocks per point.
You can easily convert this to raw assembler if you need to - each intrinsic maps to a single instruction.