I was curious if there exists a ready-made script that would provide some starting point for an ultimate code size tracker tool. To start with I'd like to be able to graph size with various optimisation options for an number of cross-compiler targets and I'm quite tempted to put this on revision timeline later as well.
So taken the output from size command:
text data bss dec hex filename
1634 0 128 1762 6e2 csv_data.o (ex libs/libxyz.a)
28 0 0 28 1c csv_data_layer.o (ex libs/libxyz.a)
1063 0 0 1063 427 http_parser.o (ex libs/libxyz.a)
1312 0 1024 2336 920 http_queries.o (ex libs/libxyz.a)
8 36 0 44 2c transport.o (ex libs/libxyz.a)
1748 0 3688 5436 153c transport_layer.o (ex libs/libxyz.a)
8 0 0 8 8 misc_allocator.o (ex libs/libxyz.a)
847 108 1 956 3bc misc_err.o (ex libs/libxyz.a)
0 4 0 4 4 misc_globals.o (ex libs/libxyz.a)
273 0 0 273 111 misc_helpers.o (ex libs/libxyz.a)
71 0 4 75 4b misc_printf.o (ex libs/libxyz.a)
1044 0 44 1088 440 misc_time.o (ex libs/libxyz.a)
3724 0 0 3724 e8c xyz.o (ex libs/libxyz.a)
627 0 0 627 273 dummy.o (ex libs/libxyz.a)
8 16 0 24 18 dummy_layer.o (ex libs/libxyz.a)
12395 164 4889 17448 4428 (TOTALS)
With most of values being different when the library is being compiled with various optimisation flags (i.e.: -Os, -O0, -O1, -O2) and a variety of cross-compilers (e.g.: AVR, MSP430, ARMv6, i386), I'd like to make a combined graph or set of graphs using either gnuplot, d3.js, matplotlib or any other package. Has anyone have a seen ready-made script which would help this partially (e.g. at least convert the above tabular format to CSV, JSON or XML) or some study paper that presents a decent visualisation example? I have to admit, it's rather hard to find this using a web search engine.
Here is a possible visualization of the data as bar chart using gnuplot. This is of course not the ultimate visualization, but should be a good starting point.
set style data histogram
set style histogram rowstacked
set style fill solid 1.0 border lc rgb "white"
set xtics rotate 90
set key outside reverse Left
set bmargin 8
plot 'file.dat' using (!(stringcolumn(6) eq "(TOTALS)") ? column(1) : 1/0):xtic(6) title columnheader(1), \
for [i=2:5] '' using (!(stringcolumn(6) eq "(TOTALS)") ? column(i) : 1/0) title columnheader(i)
With the settings set terminal pngcairo size 1000,800, this gives
You must also decide, which columns you want to use, because plotting every column for every file for every compiler will be quite messy. Maybe you want to plot only the size:
set style data histogram
set style histogram clustered
set style fill solid 1.0 noborder
set xtics rotate 90
set key outside reverse Left
set bmargin 8
plot 'file.dat' using (!(stringcolumn(6) eq "(TOTALS)") ? $4 : 1/0):xtic(6) title 'i386', \
'' using (!(stringcolumn(6) eq "(TOTALS)") ? $4*1.2 : 1/0) title 'ARMv6',\
'' using (!(stringcolumn(6) eq "(TOTALS)") ? $4*0.7 : 1/0) title 'AVR'
Which gives you:
Note, that the lengthy using statements are only to skip the last line with the TOTAL. Alternatively you could also remove this last line with head, either when generating the data files, or on-the-fly like this:
plot '< head -n -1 file.dat' using 4:xtic(6) title 'i386', \
'' using ($4*1.2) title 'ARMv6',\
'' using ($4*0.7) title 'AVR'
Of course, for your real data you would have something like
plot '< head -n -1 file-i386.dat' using 4:xtic(6) title 'i386', \
'< head -n -1 file-armv6.dat' using ($4*1.2) title 'ARMv6',\
'< head -n -1 file-avr.dat' using ($4*0.7) title 'AVR'
I hope, this gives you an idea of different visualization possiblities. What might be appropriate, you must decide by yourself.
Related
I have to write a multi lingual text a pdf using C++. I have unicode values as well as glyph id values with their advances and displacements for the string input.
But I need to know how to position the dependent glyph with the independent base glyph.
Suppose if I have a advance and displacement values using FreeType / HarfBuzz, how should I input these values into the pdf content stream along with the glyph ids in the input.
I have tried the output values of FreeType & HarfBuzz, which could print the individual glyphs properly, but the positioning of the glyphs with its base glyph is not proper still, even if i used the advance and displacement values given in their outputs.
I just need the logic of how to use the output values in the content stream to deliver a proper readable word/letter.
Example:
Text = tamil letter + hindi letter.
I need to print this output.proper output
But currently only I am able to print this. improper output
Tamil combined letter:
வ = U+0BB5 TAMIL LETTER VA = base glyph
ா = U+0BBE TAMIL VOWEL SIGN AA = dependent glyph
HarfBuzz run:
hb-shape.exe -O json -u u+0bb5,u+0bbe --no-glyph-names "C:\\Windows\\Fonts\\Nirmala.ttf"
gid output:
[{"g":2953,"cl":0,"dx":0,"dy":0,"ax":2111,"ay":0},{"g":2959,"cl":0,"dx":0,"dy":0,"ax":1453,"ay":0}]
Hindi combined letter:
म = U+092E DEVANAGARI LETTER MA = base glyph
ि = U+093F DEVANAGARI VOWEL SIGN I = dependent glyph
HarfBuzz run:
hb-shape.exe -O json -u u+092e,u+093f --no-glyph-names "C:\\Windows\\Fonts\\Nirmala.ttf"
gid output:
[{"g":302,"cl":0,"dx":0,"dy":0,"ax":532,"ay":0},{"g":273,"cl":0,"dx":0,"dy":0,"ax":1379,"ay":0}]
Subjecting these output values into the formula,
PDF doc formula
Assuming unity for all variables except width and advance,
by obtaining the width value using FreeType and computing them.
Glyph Advance values for four glyphs in order:
tx = 1769
tx = 1132
tx = 1586
tx = 1448
If I provide these values in the content stream in the order as
<glyph id 1> tx 1 <glyph id 2> tx 2 <glyph id 3> tx 3 <glyph id 4> tx 4
Content stream:
/OC /oc2 BDC q BT /FXF1 1 Tf 70.866142 0.000000 0.000000 70.866142 28.346457 141.732285 Tm[<0B89>-1769<0B8F>-1132<0111>-1586<012E>-1448]TJ ET Q EMC
PDF Doc says (+)ve value of advances will move the text towards left.
Is it other way...?
Or if the difference of the advances is to be obtained...?
Additional PDF objects:
Font descriptor object,Base font object,Font object.
I have tried using only advance values and only computed values also.
The only problem is the horizontal & vertical space within combined glyphs, which also affects the spacing between subsequent glyphs.
Any of these does not render the glyphs as legible, atleast in a generalised programmatic manner.
From my analysis of #mkl at various stack overflow places, I suspect the need for individual transformation matrix or Td for each glyph. But is it that complex...?
As per my thought, it must be easily be rendered.
If individual transformation matrix or Td is the need, then how to compute the values to be supplied in for them.
Any help & guidance is welcome and much appreciated.
Thank you.
It helps to work out pdf as plain text you can compile by save in notepad.
Here I am altering a batch.cmd (work in progress :-) to test my compiler handles the changes as text but you can use raw pdf in editor too. beware cut and paste may need a value or two changed Also unknown yet how you can easily reference non Latin fonts (next hurdle after images, which are almost done), so I used "symbol" font as illustrative of those positioning mods.
Note for specific queries #mkl is the expert I simply do programming by examples, that function not by the book.
%PDF-1.0
%µ¶µ¶
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj
2 0 obj<</Type/Pages/Count 1/Kids[3 0 R]>>endobj
3 0 obj<</Type/Page/Parent 2 0 R/MediaBox [0 0 594 792]/Resources<</Font<< /F1 4 0 R /F2 5 0 R>>>>/Contents 6 0 R>>endobj
4 0 obj<</Type/Font/Subtype/Type1/BaseFont/Helvetica>>endobj
5 0 obj<</Type/Font/Subtype/Type1/BaseFont/Symbol>>endobj
%Comment the following /Length 0999 is a dummy value it should be altered to equal decimal stream length, but most readers will ignore or work around invalid
6 0 obj<</Length 1326>>
stream
q
BT /F1 20 Tf 072 740 Td (20 units (default units usually = pts) high Headline) Tj ET
BT /F1 16 Tf 036 700 Td (All text is "Body" text. (no heads or tails)) Tj ET
BT /F1 10 Tf 004 780 Td (Text can be any order see "Body" text above. (Printed by Filename="C:\Users\K\Downloads\Programming\CMDaPDF\MAKE2PDF.cmd") spot the escape errors) Tj ET
BT /F1 12 Tf 036 675 Td (Here # 12 units high you must include just enough text for parts of a line. PDF has no page feeds no wrapping,) Tj 0 -20 Td (nor \\new line feed, no ¶aragraphs) Tj 86 -15 Td (nor carriage \r\\return. \n\r ) Tj 100 5 Td ( It is not \007\010\011\012\\tabular, each page is one row of multiple pages,) Tj 50 -15 Td (each page is one text column wide .[ ×] no yes check) Tj 0 -10 Td (each row is one text column wide .[x] no is yes) Tj 0 -10 Td (each row is one text column wide . · bullet point OK) Tj ET
BT +0.50 Tc -1.4 Tw 999 TL /F1 1 Tf 15 001 10. 30 200.000 440.000 Tm [(Jane A)600(usten)] TJ ET
BT +0.50 Tc 0.00 Tw 000 TL /F2 1 Tf 15 000 000 15 200.000 430.000 Tm [(Ja)-1000(ne Austen)] TJ ET
BT -1.20 Tc 0.00 Tw 999 TL /F2 1 Tf 15 000 000 15 200.000 420.000 Tm [(J)-1200(a)800(ne Austen)] TJ ET
BT +0.00 Tc 0.00 Tw 000 TL /F2 1 Tf 15 000 000 15 200.000 410.000 Tm [(Jane A)100(us)-500(ten)] TJ ET
Q
endstream
xref
0 7
0000000000 65535 f
0000000019 00000 n
0000000065 00000 n
0000000117 00000 n
0000000242 00000 n
0000000306 00000 n
0000000527 00000 n
trailer<</Size 7/Root 1 0 R>>
startxref
1903
%%EOF
This is a red box:
162 86 m 162 286 l 362 286 l 362 86 l h
1 0 0 rg f
How can I add partial transparency to it?
I've read the transparency section of the PDF spec, but I can only seem to find models and formulas, not how to actually add alpha to a fill.
As the OP indicated, there is a whole section in the PDF specification on the topic of Transparency. This is due to a multitude of ways to apply transparency. The most appropriate way for the OP's context is explained in the following section:
11.6.4.4 Constant Shape and Opacity
The current alpha constant parameter in the graphics state (see “Graphics State”) shall be two scalar values—one for strokes and one for all other painting operations—to be used for the constant shape (f_k) or
constant opacity (q_k) component in the colour compositing formulas.
NOTE 1 This parameter is analogous to the current colour used when painting elementary objects.
The nonstroking alpha constant shall also be applied when painting a transparency group’s results onto its backdrop.
The stroking and nonstroking alpha constants shall be set, respectively, by the CA and ca entries in a graphics state parameter dictionary (see “Graphics State Parameter Dictionaries”). As described previously for the soft mask, the alpha source flag in the graphics state shall determine whether the alpha constants are interpreted as shape values (true) or opacity values (false).
Thus, you first have to define an appropriate graphics state parameter dictionary in the page resources, e.g.:
/Resources<</ExtGState<<
/GS1 <</ca 0.5>>
>>>>
Now you can use these named graphics state parameters in your content stream:
/GS1 gs
1 0 0 rg
162 86 m
162 286 l
362 286 l
362 86 l
h
f
If drawn upon a green lattice, the result looks like this:
By the way, there was an error in the OP's original content stream fragment
162 86 m 162 286 l 362 286 l 362 86 l h
1 0 0 rg f
The color setting operation here is between the path definition (162 ... l h) and the path filling operation (f). This is invalid, compare Figure 9 – Graphics Objects in the specification, after path construction (and an optional clipping path operator) the path painting operation must follow immediately. (Numerous PDF viewers do accept the invalid operation order but it's invalid nonetheless).
The alpha value for the upcoming operations need not be constant. Instead it can e.g. be governed by a mask with, say, a radial shading.
Indeed, if you define the graphics state parameters like this:
/Resources<</ExtGState<<
/GS1 << /SMask<</Type/Mask/S/Luminosity/G 1 0 R >> >>
>> >>
and the object 1 0 is this XObject:
1 0 obj
<<
/Group<</CS/DeviceGray/S/Transparency>>
/Type/XObject
/Resources<</Shading<<
/Sh1<<
/Coords[262 186 10 262 186 190]
/ColorSpace/DeviceRGB
/ShadingType 3
/Extend[true true]
/Function <</Domain[0 1]/FunctionType 2/N 1/C1[0 0 0]/C0[1 1 1]>>
>>
>>>>
/Subtype/Form
/BBox[0 0 500 400]
/Matrix [1 0 0 1 0 0]
/Length 10
/FormType 1
>>stream
/Sh1 sh
endstream
you get for the above content stream fragment drawn upon a green lattice:
My data file has this content
# data file for use with gnuplot
# Report 001
# Data as of Tuesday 03-Sep-2013
total 1976
case1 522 278 146 65 26 7
case2 120 105 15 0 0 0
case3 660 288 202 106 63 1
I am making a histogram from the case... lines using the script below - and that works. My question is: how can I load the grand total value 1976 (next to the word 'total') from the data file and either (a) store it into a variable or (b) use it directly in the title of the plot?
This is my gnuplot script:
reset
set term png truecolor
set terminal pngcairo size 1024,768 enhanced font 'Segoe UI,10'
set output "output.png"
set style fill solid 1.00
set style histogram rowstacked
set style data histograms
set xlabel "Case"
set ylabel "Frequency"
set boxwidth 0.8
plot for [i=3:7] 'mydata.dat' every ::1 using i:xticlabels(1) with histogram \
notitle, '' every ::1 using 0:2:2 \
with labels \
title "My Title"
For the benefit of others trying to label histograms, in my data file, the column after the case label represents the total of the rest of the values on that row. Those total numbers are displayed at the top of each histogram bar. For example for case1, 522 is the total of (278 + 146 + 65 + 26 + 7).
I want to display the grand total somewhere on my chart, say as the second line of the title or in a label. I can get a variable into sprintf into the title, but I have not figured out syntax to load a "cell" value ("cell" meaning row column intersection) into a variable.
Alternatively, if someone can tell me how to use the sum function to total up 522+120+660 (read from the data file, not as constants!) and store that total in a variable, that would obviate the need to have the grand total in the data file, and that would also make me very happy.
Many thanks.
Lets start with extracting a single cell at (row,col). If it is a single values, you can use the stats command to extract the values. The row and col are specified with every and using, like in a plot command. In your case, to extract the total value, use:
# extract the 'total' cell
stats 'mydata.dat' every ::::0 using 2 nooutput
total = int(STATS_min)
To sum up all values in the second column, use:
stats 'mydata.dat' every ::1 using 2 nooutput
total2 = int(STATS_sum)
And finally, to sum up all values in columns 3:7 in all rows (i.e. the same like the previous command, but without using the saved totals) use:
# sum all values from columns 3:7 from all rows
stats 'mydata.dat' every ::1 using (sum[i=3:7] column(i)) nooutput
total3 = int(STATS_sum)
These commands require gnuplot 4.6 to work.
So, your plotting script could look like the following:
reset
set terminal pngcairo size 1024,768 enhanced
set output "output.png"
set style fill solid 1.00
set style histogram rowstacked
set style data histograms
set xlabel "Case"
set ylabel "Frequency"
set boxwidth 0.8
# extract the 'total' cell
stats 'mydata.dat' every ::::0 using 2 nooutput
total = int(STATS_min)
plot for [i=3:7] 'mydata.dat' every ::1 using i:xtic(1) notitle, \
'' every ::1 using 0:(s = sum [i=3:7] column(i), s):(sprintf('%d', s)) \
with labels offset 0,1 title sprintf('total %d', total)
which gives the following output:
For linux and similar.
If you don't know the row number where your data is located, but you know it is in the n-th column of a row where the value of the m-th column is x, you can define a function
get_data(m,x,n,filename)=system('awk "\$'.m.'==\"'.x.'\"{print \$'.n.'}" '.filename)
and then use it, for example, as
y = get_data(1,"case2",4,"datafile.txt")
using data provided by user424855
print y
should return 15
It's not clear to me where your "grand total" of 1976 comes from. If I calculate 522+120+660 I get 1302 not 1976.
Anyway, here is a solution which works even without stats and sum which were not available in gnuplot 4.4.0.
In the data you don't necessarily need the "grand total" or the sum of each row, because gnuplot can calculate this for you. This is done by (not) plotting the file as a matrix, and at the same time summing up the rows in the string variable S0 and the total sum in variable Total. There will be a warning warning: matrix contains missing or undefined values which you can ignore. The labels are added by plotting '+' ... with labels extracting the desired values from the S0 string.
Data: SO18583180.dat
So, the reduced input data looks like this:
# data file for use with gnuplot
# Report 001
# Data as of Tuesday 03-Sep-2013
case1 278 146 65 26 7
case2 105 15 0 0 0
case3 288 202 106 63 1
Script: (works for gnuplot>=4.4.0, March 2010 and gnuplot 5.x)
### histogram with sums and total sum
reset
FILE = "SO18583180.dat"
set style histogram rowstacked
set style data histograms
set style fill solid 0.8
set xlabel "Case"
set ylabel "Frequency"
set boxwidth 0.8
set key top left noautotitle
set grid y
set xrange [0:2]
set offsets 0.5,0.5,0,0
Total = 0
S0 = ''
addSums(v) = S0.sprintf(" %g",(M=$2,(N=$1+1)==1?S1=0:0,S1=S1+v))
plot for [i=2:6] FILE u i:xtic(1) notitle, \
'' matrix u (S0=addSums($3),Total=Total+$3,NaN) w p, \
'+' u 0:(real(S2=word(S0,int($0*N+N)))):(S2) every ::::M w labels offset 0,0.7 title sprintf("Total: %g",Total)
### end of script
Result: (created with gnuplot 4.4.0, Windows terminal)
How do I calculate leading in a PDF document?
For example:
48 0 0 48 72 677.28 Tm
(Hello World) Tj
0 -1.1075 TD
This renders the text Hello World at 48pt/57.6pt (120% line height) in Times-Roman.
According to the PDF Reference manual, "the leading parameter is measured in unscaled text space units. It specifies the vertical distance between the baselines of adjacent lines of text... The number is expressed in thousandths of a unit of text space."
Can someone please explain how 1.1075 and 57.6 are related?
You pdf commands is incorrect. I suppose you mean:
48 0 0 48 72 677.28 Tm
0 -1.1075 TD
(Hello World) Tj
This code set text coordinate system to (Tm command):
Scale x48 on x and x48 on y
Start position (72, 677.28)
Then it's move position to next line. Next line in 1.1075 "text" pixels. And then move start position by -1.1075 "text" pixels on y coordinate. Text pixel in this example it's pdf pixel multiplyed by 48. It's set by Tm command.
I may simplify you PDF code. It's the same:
48 0 0 48 72 570.096 Tm
(Hello World) Tj
Explanation: 677.28 - (1.1075*48) - (1.1075*48)
YOU should always remember that PDF it's a language. To calculate the real coordinates you shoud parse all previous commands.
There may be something like this before you commands:
10 0 0 10 0 0 cm
The leading is usually set in the PDF by the command TL, just like this:
12 TL
(El ingenioso hidalgo don Quijote de la Mancha)'
That 12 indicates a leading of 12 points until another TL is found.
I hope it helps you. I think this is the easiest way to do it :)
If you look at the original Wordnet search and select "Display options: Show Lexical File Info", you'll see an extremely useful classification of words called lexical file. Eg for "filling" we have:
<noun.substance>S: (n) filling, fill (any material that fills a space or container)
<noun.process>S: (n) filling (flow into something (as a container))
<noun.food>S: (n) filling (a food mixture used to fill pastry or sandwiches etc.)
<noun.artifact>S: (n) woof, weft, filling, pick (the yarn woven across the warp yarn in weaving)
<noun.artifact>S: (n) filling ((dentistry) a dental appliance consisting of ...)
<noun.act>S: (n) filling (the act of filling something)
The first thing in brackets is the "lexical file". Unfortunately I have not been able to find a SPARQL endpoint that provides this info
The latest RDF translation of Wordnet 3.0 points to two things:
Talis SPARQL endpoint. Use eg this query to check there's no such info:
DESCRIBE <http://purl.org/vocabularies/princeton/wn30/synset-chair-noun-1>
W3C's mapping description. Appendix D "Conversion details" describes something useful: wn:classifiedByTopic.
But it's not the same as lexical file, and is quite incomplete. Eg "chair" has nothing, while one of the senses of "completion" is in the topic "American Football"
DESCRIBE <http://purl.org/vocabularies/princeton/wn30/synset-completion-noun-1> ->
<j.1:classifiedByTopic rdf:resource="http://purl.org/vocabularies/princeton/wn30/synset-American_football-noun-1"/>
The question: is there a public Wordnet query API, or a database, that provides the lexical file information?
Using the Python NLTK interface:
from nltk.corpus import wordnet as wn
for synset in wn.synsets('can'):
print synset.lexname
I don't think you can find it in the RDF/OWL Representation of WordNet. It's in the WordNet distribution though: dict/lexnames. Here is the content of the file as of WordNet 3.0:
00 adj.all 3
01 adj.pert 3
02 adv.all 4
03 noun.Tops 1
04 noun.act 1
05 noun.animal 1
06 noun.artifact 1
07 noun.attribute 1
08 noun.body 1
09 noun.cognition 1
10 noun.communication 1
11 noun.event 1
12 noun.feeling 1
13 noun.food 1
14 noun.group 1
15 noun.location 1
16 noun.motive 1
17 noun.object 1
18 noun.person 1
19 noun.phenomenon 1
20 noun.plant 1
21 noun.possession 1
22 noun.process 1
23 noun.quantity 1
24 noun.relation 1
25 noun.shape 1
26 noun.state 1
27 noun.substance 1
28 noun.time 1
29 verb.body 2
30 verb.change 2
31 verb.cognition 2
32 verb.communication 2
33 verb.competition 2
34 verb.consumption 2
35 verb.contact 2
36 verb.creation 2
37 verb.emotion 2
38 verb.motion 2
39 verb.perception 2
40 verb.possession 2
41 verb.social 2
42 verb.stative 2
43 verb.weather 2
44 adj.ppl 3
For each entry of dict/data.*, the second number is the lexical file info. For example, this filling entry contains the number 13, which is noun.food.
07883031 13 n 01 filling 0 002 # 07882497 n 0000 ~ 07883156 n 0000 | a food mixture used to fill pastry or sandwiches etc.
It can be done through MIT JWI (MIT Java Wordnet Interface) a Java API to query Wordnet. There's a topic in this link showing how to implement a java class to access lexicographic
This is what worked for me,
Synset[] synsets = database.getSynsets(wordStr);
ReferenceSynset referenceSynset = (ReferenceSynset) synsets[i];
int lexicalCode =referenceSynset.getLexicalFileNumber();
Then use above table to deduce "lexnames" e.g. noun.time
If you're on Windows, chances are it is in your appdata, in the local directory. To get there, you will want to open your file browser, go to the top, and type in %appdata%
Next click on roaming, and then find the nltk_data directory. In there, you will have your corpora file. The full path is something like:
C:\Users\yourname\AppData\Roaming\nltk_data\corpora
and lexnames will present under
C:\Users\yourname\AppData\Roaming\nltk_data\corpora\wordnet.