Alphabetical file numbering system - sequence

Frequently I wish to output a number of files with names like file-1, file-2, and so on. However, if there are 10 or more files, the alphabetical ordering of the files will not be the same as the numerical ordering, since file-10 falls between file-1 and file-2.
If I know in advance how many files there will be, I can pad the lower numbers with 0s. But I would like to do this in a "streaming" fashion, without knowing in advance how many files there will be.
That is, I want an infinite sequence of strings S(n) such that:
S(i) lexicographically precedes S(i+1) for any i
S(i+1) can be computed in O(log(i)) time, given S(i)
The maximum length of strings S(i) for i<=n is in O(log(n))
Is there any sequence that satisfies these conditions?

Here is a partial solution. For simplicity, this only uses the symbols 0 and 1.
S(1) is the string 1. To generate S(i+1) from S(i):
If S(i) consists of all 1s, then i+1 is a power of 2. Append log2(i+1) 0s to S(i).
Otherwise, increment the binary number represented by S(i).
The sequence begins:
1, 10, 11, 1100, 1101, 1110, 1111, 1111000, 1111001, ...
This produces a sequence of strings whose lengths grow as O((log(i))^2), which is not quite what I would like. Also the resulting strings are not very human-readable.

Related

Remove decimal separator with output format on real values in Fortran [duplicate]

So I have some code that does essentially this:
REAL, DIMENSION(31) :: month_data
INTEGER :: no_days
no_days = get_no_days()
month_data = [fill array with some values]
WRITE(1000,*) (month_data(d), d=1,no_days)
So I have an array with values for each month, in a loop I fill the array with a certain number of values based on how many days there are in that month, then write out the results into a file.
It took me quite some time to wrap my head around the whole 'write out an array in one go' aspect of WRITE, but this seems to work.
However this way, it writes out the numbers in the array like this (example for January, so 31 values):
0.00000 10.0000 20.0000 30.0000 40.0000 50.0000 60.0000
70.0000 80.0000 90.0000 100.000 110.000 120.000 130.000
140.000 150.000 160.000 170.000 180.000 190.000 200.000
210.000 220.000 230.000 240.000 250.000 260.000 270.000
280.000 290.000 300.000
So it prefixes a lot of spaces (presumably to make columns line up even when there are larger values in the array), and it wraps lines to make it not exceed a certain width (I think 128 chars? not sure).
I don't really mind the extra spaces (although they inflate my file sizes considerably, so it would be nice to fix that too...) but the breaking-up-lines screws up my other tooling. I've tried reading several Fortran manuals, but while some of the mention 'output formatting', I have yet to find one that mentions newlines or columns.
So, how do I control how arrays are written out when using the syntax above in Fortran?
(also, while we're at it, how do I control the nr of decimal digits? I know these are all integer values so I'd like to leave out any decimals all together, but I can't change the data type to INTEGER in my code because of reasons).
You probably want something similar to
WRITE(1000,'(31(F6.0,1X))') (month_data(d), d=1,no_days)
Explanation:
The use of * as the format specification is called list directed I/O: it is easy to code, but you are giving away all control over the format to the processor. In order to control the format you need to provide explicit formatting, via a label to a FORMAT statement or via a character variable.
Use the F edit descriptor for real variables in decimal form. Their syntax is Fw.d, where w is the width of the field and d is the number of decimal places, including the decimal sign. F6.0 therefore means a field of 6 characters of width with no decimal places.
Spaces can be added with the X control edit descriptor.
Repetitions of edit descriptors can be indicated with the number of repetitions before a symbol.
Groups can be created with (...), and they can be repeated if preceded by a number of repetitions.
No more items are printed beyond the last provided variable, even if the format specifies how to print more items than the ones actually provided - so you can ask for 31 repetitions even if for some months you will only print data for 30 or 28 days.
Besides,
New lines could be added with the / control edit descriptor; e.g., if you wanted to print the data with 10 values per row, you could do
WRITE(1000,'(4(10(F6.0,:,1X),/))') (month_data(d), d=1,no_days)
Note the : control edit descriptor in this second example: it indicates that, if there are no more items to print, nothing else should be printed - not even spaces corresponding to control edit descriptors such as X or /. While it could have been used in the previous example, it is more relevant here, in order to ensure that, if no_days is a multiple of 10, there isn't an empty line after the 3 rows of data.
If you want to completely remove the decimal symbol, you would need to rather print the nearest integers using the nint intrinsic and the Iw (integer) descriptor:
WRITE(1000,'(31(I6,1X))') (nint(month_data(d)), d=1,no_days)

Fortran does'nt end when obtain unexpected value?

I've got a program, which compute a several variables and then these variables are writing in to the output file.
Is it possilbe, that when my program can't get a correct results for my formula, it does'nt terminate?
To clarify what I do, here is part of my code, where the variable of my interest are compute:
dx=x(1,i)-x(nk,i)
dy=y(1,i)-y(nk,i)
dz=z(1,i)-z(nk,i)
call PBC(dx,dy,dz)
r2i=dx*dx+dy*dy+dz*dz
r2=r2+r2i
r2g0=0.0d0
r2gx=0.0d0
dx=x(1,i)-x(2,i)
call PBC(dx,dy,dz)
rspani=dsqrt(dx*dx)
do ii=1,nk-1
rx=x(ii,i)
ry=y(ii,i)
rz=z(ii,i)
do jj=ii+1,nk
dx=x(jj,i)-rx
dy=y(jj,i)-ry
dz=z(jj,i)-rz
call PBC(dx,dy,dz)
r21=dx*dx+dy*dy+dz*dz
r21x=dx*dx
r2g=r2g+r21
r2gx=r2gx+r21x
r2g0=r2g0+r21
rh=rh+1.0d0/dsqrt(r21)
rh1=rh1+1.0d0
ir21=dnint(dsqrt(r21)/dr)
p(ir21)=p(ir21)+2.0D0
dxs=dsqrt(r21x)
if(dxs.gt.rspani) rspani=dxs
end do
and then in to the output I just write these variables:
write(12,870)r2i,sqrt(r2i),r2g0,r2gx/(nk*nk)
870 FORMAT(3(f15.7,3x),f15.7)
The x, y, z are actully generate via a random number generator.
The problem is that my output contains, correct values for lets say 457 lines, and then a one line is just "*********" when I use mc viewer and then the output continues with correct values, but let's say 12 steps form do cycle which compute these variables is missing.
So my questions are basic:
Is it possible, that my program can't get a correct numbers, and that's why the result is not writing in to the program?
or
Could it this been caused due to wrong output formating or something related with formating?
Thank you for any suggestion
********* is almost certainly the result of trying to write a number too large for the field specified in a format string.
For example, a field specified as f15.7 will take 1 spot for the decimal point, 1 spot for a leading sign (- will always be printed if required, + may be printed if options are set), 7 for the fractional digits, leaving 6 digits for the whole part of the number. There may therefore be cases where the program won't fit the number into the field and will print 15 *s instead.
Programs compiled with an up to date Fortran compiler will write a string such as NaN or -Inf if they encounter a floating-point number which represents one of the IEEE special values

What does "hyphenation vector" mean?

The Hyphen library seems to be a very popular and free way to have hyphenation in your app.
What does hyphenation vector mean?
I am running the example attached to the library source code.
Example output:
hibernate // input word
030412000 // output hyphenation vector
hi=ber=nate // hyphen points
- hi=bernate
- hiber=nate
Odd numbers in the vector indicate hyphenation points. But what do all of those values mean?
László Németh describes the algorithm in OpenOffice's documentation in full detail.
The library uses the algorithm developed by Frank M. Liang ("Word Hy-phen-a-tion by Com-pu-ter"): all letters in digrams, trigrams, and longer patterns are assigned numerical values to indicate it's a 'usual' place (an odd number) or an 'unusual' place (an even number) for a hyphen to occur. The higher the number is, the greater importance -- a pattern will almost never be broken on a larger even number, and almost always on a larger odd number. The number sequences are statistically determined on a corpus of pre-hyphenated words.
Note that the numbers are for positions between two characters. A better notation would have been
h i b e r n a t e
0 3 0 4 1 2 0 0 (0)
(where the last 0 is obsolete).

Fortran 90: How to correctly read an integer among other real

I have created a Fortran 90 code to filter and convert the text output of another program in a csv form. The file contains a table with columns of various types (character, real, integer). There is a column that generally contains decimal values (probability values). BUΤ, in some rows, where the value should be decimal "1.000", the value is actually integer "1".
I use "F5.3" specifier to read this column and I have the same format statement for every row of the table. So, when the code finds "1", it reads ".001", because it does not find a decimal point.
What ways could I use to correctly (and generally) read integers among other decimals?
Could I specify "unformatted" input only for a number of "spaces"?
The data edit descriptor fw.d for floating point format specification is for input normally used with zero d (it cannot be ommited). Nonzero d is used in the rare case when the floating point data is stored as scaled integers, or you do some unit conversion from the integer values.
You could try using list-directed input: use a * instead of a format specifier. This would be for the entire read, not selected items. Or you could read the lines into a string test their contents to decide how to read them. If the sub-string has a decimal point: read (string(M:N), '(F5.3)') value. If it doesn't, use a different format, e.g., perhaps read as as F5.0.
P.S. "unformatted" is reading binary data without conversion ... it is a direct copy of the data from the file to the data item. "listed-directed" is the Fortran term for reading & converting data without using a format specification.
well here's someting new to me: f90 allows a mix of comma and space delimiters for a simple list directed read:
read(unit,*)v1,v2,v3,v4
with input
1.222 2 , 3.14 , 4
yields
1.222000 2.000000 3.140000 4.000000

maximum 'string' length in fortran

does fortran have a maximum 'string' length?
i am going to be reading lines from a file which could have very long lines. the one i am looking at now has around 1.3k characters per line, but it is possible that they may have much more. i am reading each line from the file to a character*5000 variable, but if i get more in the future, is it bad to make it a character*5000000 variable? is there a max? is there a better way to solve this problem than making a very large character variables?
Since the usual Fortran IO is record based, reading lines into strings implies knowing the maximum string length. Another possible design: use stream IO and Fortran will ignore the record boundaries. Read the file in fixed-length chunks that are shorter than the longest lines. The complication is handling items split across chunk boundaries. The practicality depends on details not given in the question.
P.S. From "The Fortran 2003 Handbook" by Adams et al.: "The maximum length permitted for character strings is processor-dependent." -- meaning compiler dependent.
Maximum wil be implementation dependant. For your case, I can think of something along these lines:
character(:),allocatable :: ch
l = 5
do
allocate(character(l) :: ch)
read(unit,'(a)',iostat=io) ch
if (ch(l-4:l) = ' ' .or. io/=0) exit
deallocate(ch)
l = l * 2
end do
Obviously will not work for pad='no' and if you expect long regions with spacec in your records.