Yosys synthesys - is this opimum? - optimization

I'm using yosys to synthesize simple circuits and show how the result varies with the cell library.
However, it looks like the result is not well optimized.
I'm using the library vsclib013.lib downloaded from: http://www.vlsitechnology.org/synopsys/vsclib013.lib
E.g. I synthesize an adder composed by 4 full adders. Since I do not use Carry_in and Carry_out I do expect that an half adder is synthesized (XOR with two inputs) for the LSB adder.
The result of the synthesis is the following.
Number of cells 12
cgi2v0x05 4
iv1v0x05 4
xor3v1x05 4
It uses 4 cells that are XOR with three inputs.
This is also clear from the graph of the circuit: graph obtained using the yosys command 'show'
The circuit is simply composed by four identical full adders and there is no optimization for the Carry_in being equal to '0' and for the Carry_out not being connected.
The script I used to syntesize is:
ghdl TOP_ENTITY
hierarchy -check -top TOP_ENTITY
proc; opt; memory; opt; fsm; opt
techmap; opt
read_liberty -lib vsclib013.lib
dfflibmap -liberty vsclib013.lib
abc -liberty vsclib013.lib -D 1000 -constr constraint_file_vsclib013.txt
splitnets -ports; opt
clean
write_verilog TOP_ENTITY.v
flatten
show -stretch -format pdf -lib TOP_ENTITY.v
Thank you for any suggestion to improve the synthesys.

Thx for your answer.
After some tries and errors I obtained good resutls by simply using flatten.
I also added -full to the opt commands for (hopefully) good meaure.
Now, my working script is like this:
ghdl TOP_ENTITY
hierarchy -check -top TOP_ENTITY
flatten
proc; opt -full; memory; opt -full; fsm; opt -full
techmap; opt -full
read_liberty -lib vsclib013.lib
dfflibmap -liberty vsclib013.lib
abc -liberty vsclib013.lib -D 1000 -constr constraint_file_vsclib013.txt
splitnets -ports; opt -full
clean -purge
write_verilog TOP_ENTITY.v
flatten
show -stretch -format pdf -lib TOP_ENTITY.v
I also added -purge option to the clean command to get a nicer printed schematic.

Related

How to correctly use ogr2ogr and gdal_rasterize to rasterize a smaller area of a large vector of GeoPackage?

I am using gdal_rasterize and ogr2ogr with a goal to get a partial raster of .gpkg file.
With first command I want to clip a smaller area of a large map.
ogr2ogr -spat xmin ymin xmax ymax out.gpkg in.gpkg
This results in a file that with command ogrinfo out.gpkg gives expected output by listing the layers numbers and names.
Then trying to rasterize this new file with:
gdal_rasterize out.gpkg -burn 255 -ot Byte -ts 250 250 -l anylayer out.tif
results in: ERROR 1: Cannot get layer extent when tried with any of the layers names given by ogrinfo
Using the same command on the original in.gpkg doesnt give errors and results in expected .tiff file raster.
ogr2ogr --version GDAL 2.4.2, released 2019/06/28
gdal_rasterize --version GDAL 2.4.2, released 2019/06/28
This process should at the end be implemented with the gdal C++ API.
Are the commands given some how invalid this way, how?
Should the whole process be done differently, how?
What does the ERROR 1: Cannot get layer extent mean?

Horizontal artifacts between geotiff edges using gdalbuildvrt and gdaltranslate

I am trying to merge a number of geotiffs into one large geotiff with overviews, however the final merged geotiff shows a number of horizontal artifacts around the edges of the original merged geotiffs (see here for an example).
I create the merged file using the following code:
'''Produce Combined VRT'''
string ='gdalbuildvrt -srcnodata "0 0 0 0" -hidenodata -r bilinear %s -overwrite %s' %(tmp_vrt, GDal_merge_string)
os.system(string)
'''Convert VRT to Geotiff'''
string ='gdal_translate -b 1 -b 2 -b 3 -mask 4 --config GDAL_TIFF_INTERNAL_MASK YES -of GTiff %s %s' %(tmp_vrt,tmp_fname)
os.system(string)
I have a hunch that this might have to do with using gdal_translate on a vrt, as the erorrs occur on the edges of the original geotiffs, and in this case it might be related or similar to the issue found in this post.
This code is using VRTs to combine the geotiffs for speed purposes, but perhaps it might be better to just merge these with gdalwarp?
Edit: I have reduced the number of flags and left out the overviews in the code above, as suggested in the comment below by Benjamin. The error seems to be produced in the code above. I think the issue may lie in the masking process. I guess at some point in the process of stacking the bands, the inputs are distorted. Is it generally inadvisable to gdal_translate VRTs?

How to count the number of CPU clock cycles between the start and end of a benchmark in gem5?

How to count the number of CPU clock cycles between the start and end of a benchmark in gem5?
I'm interested in all of the following cases:
full system userland benchmark. Maybe the m5 guest tool has a way to do it?
bare metal benchmark. When gem5 exits it dumps the stats automatically, so the main question is how to skip the cycles for bootloader and go straight to the benchmark itself.
Is there a way besides modifying the benchmark source with instrumentation instructions? How to write those instrumentation instructions in detail?
syscall emulation benchmark. I think gem5 just outputs the stats.txt at the end of the run, and then you ca just grep system.cpu.numCycles, but I have to confirm it, currently blocked on: How to solve "FATAL: kernel too old" when running gem5 in syscall emulation SE mode?
I want to use this to learn:
learn how CPUs work
how to optimize assembly code or compiler settings to run optimally on a given CPU
m5 tool
A good approximation is to run, ideally from a shell script that is the /init program:
m5 resetstats
run-benchmark
m5 dumpstats
Then on host:
grep -E '^system.cpu.numCycles ' m5out/stats.txt
Gives something like:
system.cpu.numCycles 33942872680 # number of cpu cycles simulated
Note that if you replay from a m5 checkpoint with a different CPU, e.g.:
--restore-with-cpu=HPI --caches
then you need to grep for a different identifier:
grep -E '^system.switch_cpus.numCycles ' m5out/stats.txt
resetstats zeroes out the cumulative stats, and dumpstats dumps what has been collected during the benchmark.
This is not perfect since there is some time between the exec syscall for m5 dumpstats finishing and the benchmark starting, but if the benchmark enough, this shouldn't matter.
http://arm.ecs.soton.ac.uk/wp-content/uploads/2016/10/gem5_tutorial.pdf also proposes a few more heuristics:
#!/bin/sh
# Wait for system to calm down
sleep 10
# Take a checkpoint in 100000 ns
m5 checkpoint 100000
# Reset the stats
m5 resetstats
run-benchmark
# Exit the simulation
m5 exit
m5 exit also works since GEM5 dumps stats when it finishes.
Instrumentation instructions
Sometimes those seem to be just inevitable that you have to modify the input source code a bit with those instructions in order to:
skip initialization and go directly to steady state
evaluate individual main loop runs
You can of course deduce those instructions from the gem5 m5 tool code code, but here are some very easy to re-use one line copy pastes for arm and aarch64, e.g. for aarch64:
/* resetstats */
__asm__ __volatile__ ("mov x0, #0; mov x1, #0; .inst 0XFF000110 | (0x40 << 16);" : : : "x0", "x1")
/* dumpstats */
__asm__ __volatile__ ("mov x0, #0; mov x1, #0; .inst 0xFF000110 | (0x41 << 16);" : : : "x0", "x1")
The m5 tool uses the same mechanism under the hood, but by adding the instructions directly into the source, we avoid the syscall, and therefore more precise and representative (at the cost of more manual work).
To ensure that the assembly is not reordered around your ROI by the compiler however, you might want to use the techniques mentioned at: Enforcing statement order in C++
Address monitoring
Another technique that can be used is to monitory addresses of interest instead of adding magic instructions to the source.
E.g., if you know that a benchmark starts with PIC == 0x400, it should be possible to do something when that addresses is hit.
To find the addresses of interest, you would have for example to use readelf or gdb or tracing, and the if running full system on top of Linux, ensure that ASLR is turned off.
This technique would be the least intrusive one, but the setup is harder, and to be honest I haven't done it yet. One day, one day.

Increasing number of predictions in Inception for Tensorflow

I am going through the training tutorial on retraining Inception's final layer after having installed Tensorflow for Ubuntu with regular CPU support. I successfully made the flower examples work however after switching to a new set of categories with ten sub-folders I cannot make Inception produce ten scores for each input image rather than the default five. My current command line to run a test image looks like this, working with headers labelled 0-9.
bazel build tensorflow/examples/label_image:label_image && \
bazel-bin/tensorflow/examples/label_image/label_image \
--graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt \
--output_layer=final_result \ --input_layer=Mul
--image=$HOME/Input/Example.jpg
Which produces as a result
5 (4): 0.642959
3 (2): 0.243444
9 (8): 0.0513504
4 (5): 0.0231318
6 (7): 0.0180509
However I cannot find anything in the programs that Inception runs to reconfigure how many output scores are produced so that all ten of my categories have scores rather than just five. How do I change this?
I tried with 8 categories and was able to get result for all of them.
If your code has below line
top_k = predictions[0].argsort()[-5:][::-1]
change it to
top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
If code contains predictions = np.squeeze(predictions) then use predictions instead of predictions[0]
I have run this using following command instead of bazel and I found it easier.
python /path_to_file/label_image.py /path_to_image/image.jpeg
First make sure that graph is created after you run retrain.py and it is at the correct location. (default is inside /tmp/).

ghostscript downsampling of pdf images, downsample factor error

I issue the following command:
gs \
-o downsampled.pdf \
-sDEVICE=pdfwrite \
-dDownsampleColorImages=true \
-dColorImageResolution=180 \
-dColorImageDownsampleThreshold=1.0 \
And get the following errors:
Subsample filter does not support non-integer downsample factor (1.994360)
Failed to initialise downsample filter, downsampling aborted
(on some pages)
and:
Subsample filter does not support non-integer downsample factor (2.000029)
Failed to initialise downsample filter, downsampling aborted
Originally I tried to downsample to 150dpi, which gave the error with factor (2.40????), meaning multiple errors, where the last few digits are different for different pages. So I guessed that images are approximately 150*2.4 = 360 dpi. So I try downsampling to 180. But it seems the images are all slightly off?
Is there a way to specify the factor instead of the dpi?
Is there a way to "round" the factor?
No, there is no way to specify the factor (this is the Adobe specification for distiller params, we are currently limited to those). You cannot specify an approximation for rounding either, without modifying the source code.
You can use a different downsampling algorithm.
[much later]
In fact I just checked the current code, and you must be using an old version of Ghostscript.
The current default downsampling filter is the Bicubic filter, and if you do force the Subsample filter, then the code checks to see if the downsample factor requested is an integer.
If the factor is not an integer but is within 0.1 of an integer then it forces factor to the nearest integer.
If its outside 0.1 of an integer factor then it aborts the subsample filter and switches to Bicubic.
I'd recommend upgrading.
[later edit]
So avoiding the bogus ColorDownsampleOption, the problem is actually not colour images at all, its monochrome images, or more precisely in your case, imagemasks.
I set up this command line:
gs
-sDEVICE=pdfwrite \
-sOutputFile=pdfwrite.pdf \
-dDownsampleColorImages=true \
-dDownsampleGrayImages=true \
-dDownsampleMonoImages=true \
-dColorImageDownsampleThreshold=1 \
-dGrayImageDownsampleThreshold=1 \
-dMonoImageDownsampleThreshold=1 \
-dColorImageDownsampleType=/Bicubic \
-dGrayImageDownsampleType=/Bicubic \
-dMonoImageDownsampleType=/Bicubic \
-dColorImageResolution=72 \
-dGrayImageResolution=72 \
-dMonoImageResolution=100 "gs sample.pdf"
And that produces an error message that the only filter available for monochrome images is Subsample, followed by the error messages you quote about the imprecise factor.
I guess basically this makes my point that an example file is pretty much vital in order to investigate problems.
So there is a problem there, and I will look into it, obviously for monochrome images it should be clamped to the nearest integer resolution, since no other filter is possible. However, Gray and Colour images do work as expected.
Reporting a bug, as I suggested in an early comment would probably have got to this point much sooner. I'd still suggest you do that, so that this is not overlooked.
You may be interested to note that, for me, the resulting file when I don't downsample monochrome images, but do downsample the others, as per the command line above, is 785KB the original being 2.5MB.