Is there a way create a multi-line scalar in YAML that would preserve leading spaces for all lines?
For example, this text:
* apples
* oranges
* lemons
intentionally starts each line with space. What if I want to have exactly the above
string (i.e. " * apples\n * oranges\n * lemons" defined somewhere in YAML document? Is it possible to do without losing the spaces and
in a most readable way?
The usual way using the | does not work as it strips the spaces before the asterisks:
data:
maybe_a_bulleted_list: |
* apples
* oranges
* lemons
but if I could just say that the next block is indented by four spaces instead of five (subtracting the previous indentation, which is already known), it would be almost what I need.
Is there a way to achieve this with elegant YAML?
Background: I'm trying to create a human-editable file that contains small sections of free text that will end up converted to HTML. I would like to use Markdown for this, but since the sections are going to be only simple (un)ordered lists, I'm thinking about supporting both Markdown and TracWiki. The problem is that TracWiki format, to work properly, needs each bullet list item to start with space (otherwise multi-line items don't fold properly).
You should look into Block Chomping Indicator spec.
In your case it will look like:
data:
maybe_a_bulleted_list: |4+
* apples
* oranges
* lemons
Related
So I have some code that does essentially this:
REAL, DIMENSION(31) :: month_data
INTEGER :: no_days
no_days = get_no_days()
month_data = [fill array with some values]
WRITE(1000,*) (month_data(d), d=1,no_days)
So I have an array with values for each month, in a loop I fill the array with a certain number of values based on how many days there are in that month, then write out the results into a file.
It took me quite some time to wrap my head around the whole 'write out an array in one go' aspect of WRITE, but this seems to work.
However this way, it writes out the numbers in the array like this (example for January, so 31 values):
0.00000 10.0000 20.0000 30.0000 40.0000 50.0000 60.0000
70.0000 80.0000 90.0000 100.000 110.000 120.000 130.000
140.000 150.000 160.000 170.000 180.000 190.000 200.000
210.000 220.000 230.000 240.000 250.000 260.000 270.000
280.000 290.000 300.000
So it prefixes a lot of spaces (presumably to make columns line up even when there are larger values in the array), and it wraps lines to make it not exceed a certain width (I think 128 chars? not sure).
I don't really mind the extra spaces (although they inflate my file sizes considerably, so it would be nice to fix that too...) but the breaking-up-lines screws up my other tooling. I've tried reading several Fortran manuals, but while some of the mention 'output formatting', I have yet to find one that mentions newlines or columns.
So, how do I control how arrays are written out when using the syntax above in Fortran?
(also, while we're at it, how do I control the nr of decimal digits? I know these are all integer values so I'd like to leave out any decimals all together, but I can't change the data type to INTEGER in my code because of reasons).
You probably want something similar to
WRITE(1000,'(31(F6.0,1X))') (month_data(d), d=1,no_days)
Explanation:
The use of * as the format specification is called list directed I/O: it is easy to code, but you are giving away all control over the format to the processor. In order to control the format you need to provide explicit formatting, via a label to a FORMAT statement or via a character variable.
Use the F edit descriptor for real variables in decimal form. Their syntax is Fw.d, where w is the width of the field and d is the number of decimal places, including the decimal sign. F6.0 therefore means a field of 6 characters of width with no decimal places.
Spaces can be added with the X control edit descriptor.
Repetitions of edit descriptors can be indicated with the number of repetitions before a symbol.
Groups can be created with (...), and they can be repeated if preceded by a number of repetitions.
No more items are printed beyond the last provided variable, even if the format specifies how to print more items than the ones actually provided - so you can ask for 31 repetitions even if for some months you will only print data for 30 or 28 days.
Besides,
New lines could be added with the / control edit descriptor; e.g., if you wanted to print the data with 10 values per row, you could do
WRITE(1000,'(4(10(F6.0,:,1X),/))') (month_data(d), d=1,no_days)
Note the : control edit descriptor in this second example: it indicates that, if there are no more items to print, nothing else should be printed - not even spaces corresponding to control edit descriptors such as X or /. While it could have been used in the previous example, it is more relevant here, in order to ensure that, if no_days is a multiple of 10, there isn't an empty line after the 3 rows of data.
If you want to completely remove the decimal symbol, you would need to rather print the nearest integers using the nint intrinsic and the Iw (integer) descriptor:
WRITE(1000,'(31(I6,1X))') (nint(month_data(d)), d=1,no_days)
I want to make a search/replace macro in word which is on 2 or 3 rows, like this
"art. 2
2
pct. 22 din"
and convert it to this
"art. 2<sup>2</sup>
pct. 22 din"
instead of art. i can have other words too like lit., pct., alin. and the numbers are always different
i tried to use the next wildcard replace but it doesn't work:
search: "(art. )([0-9]{1;})(^13)([0-9]{1;})(^13)"
replace: "\1\2<sup>\3</sup>^p"
if i type only (art. )([0-9]{1;})(^13) at the search field it works, but if i type the rest it doesn't find anything
Probably a bit late to be helpful, but I think this will do the trick, if I odd the horrible Word syntax correctly (I used this guide to check).
Search: ([a-z]#. )([0-9]#)^13([0-9]#)^13
With your replace as before.
First line match:
([a-z]#. ) - Any lower case letter occurring 1 or more times, followed by a dot.
([0-9]#) - Any digit occurring one or more times.
^13 - Paragraph mark.
Second line match:
([0-9]#) - Any digit occurring one or more times.
^13 - Paragraph mark.
If you find yourself doing stuff like this a lot, it might be worth using something that supports more common regular expressions (e.g. Notepad++ or similar). You might find the syntax a little bit more readable, and it would have the added bonus of teaching you something you can apply across many other environments.
A bit new to regexp and looking for some help understanding some of the capabilities. I'am currently trying to select some sets of data that start with a word followed by a space and then several possible words.
Example 1:
I am basically looking to select data such as Product1 green, Product1 red, Product1 blue (green, red or blue basically) but not:
xyz Product1, Product1 black, Product1 white, Product1 garbage red.
I have tried to the following queries with not much success:
Where regexp_like(item, 'Product1 [green | red | blue]');
Where regexp_like(item, 'Product1 [green, red, blue]');
Where regexp_like(item, '^Product1 [green, red, blue]');
Hypothetically, does anybody know of a way I could also implement an 'AND', for example selecting items which contain the words green and red in the same attribute.
Example 2:
Similar situation, but trying to match a word after a punctuation
Where regexp_like (job, 'Commerce [[:punct:]] .*');
With this query I am looking to select jobs which have
Commerce - test
Commerce : abcdefg
These queries are not working as I would expect them to and I'm not able to quite figure out why. I am assuming I have misunderstood the construct of these regular expressions.
Any help / explanations would be greatly appreciated!
For the first, try the following
WHERE REGEXP_LIKE(ITEM, '^Product1.*(green|red|blue)')
or
WHERE REGEXP_LIKE(ITEM, '^Product1 (green|red|blue)')
or
WHERE REGEXP_LIKE(ITEM, '^Product1 +(green|red|blue)')
depending on what you expect after the Product1 - the first case allows zero or more characters of any kind, the second requires that there be a single space after Product1, and the third requires one or more blanks after Product1.
Not sure where you're going exactly on the second one. If you really want strings that begin with 'Commerce', followed by a space, followed by a punctuation character, another space, and then anything, try
WHERE REGEXP_LIKE(JOB, '^Commerce [:punct:] .*');
If instead of a punctuation character you're looking for either ':' or '-', try
WHERE REGEXP_LIKE(JOB, '^Commerce [:-] .*');
I'm no great expert on regular expressions but I'll try to offer some explanations:
^ requires that the following element be at the beginning of the string. Thus, in the first case ^Product1 means "'Product1' must be at the the start of the string".
In regular expressions parentheses are used to group expressions, so in the first case (green|red|blue) are grouped together.
| is a logical OR, so (green|red|blue) means "must be one of 'green' or 'red' or 'blue'".
Square brackets are used for character classes. You can use either predefined classes, such as :punct: or :space:, or you can make up your own as in [:-]. During regular expression interpretation a square bracket character class, no matter how long, represents a single character in the string being matched. So in the regular expression ^Commerce [:-] .* the character class [:-] means "look for either a colon or a dash". If you want to indicate that you expect multiple occurrences of characters in the class, one after another, use one of the repetition operators (* or +) after the class - so [abc]* would match all of abcabcabc.
Also keep in mind that in a regular expression every character means something, so you can't use whitespace to make regular expressions more legible because the whitespace becomes something that will be looked for when the expression is interpreted.
Share and enjoy.
Edit
Didn't notice your question about AND earlier. A simple way to AND together multiple expressions is to just put them one after another. To look for (green|red|blue), followed by a space, followed by (green|red|blue) a simple expression would be
WHERE REGEXP_LIKE(ITEM, '^Product1 (green|red|blue) (green|red|blue)')
If potentially multiple spaces were to be allowed between the colors
WHERE REGEXP_LIKE(ITEM, '^Product1 (green|red|blue) +(green|red|blue)')
could be used.
Resistance is useless.
In my SQL table Image, when i perform a search query
SELECT * FROM Image WHERE platename LIKE 'WDD 666'
it return no result(using other column to search then no problem).
The all the column data was inserted by C# code. (If enter data manually search works.)
now i suspect that the words WDD 666 wasn't english alphabet. is this possible?
In c#,
the plate number was generate by using tesseract wrapper string type.
what should i do to search the plate number?
Thanks in advance and sorry for my bad English.
Since your case matches, I'm going to rule out Case-sensitivity.
There may be leading or trailing blank spaces - Try this..
SELECT * FROM Image WHERE platename LIKE '%WDD 666%'
Try running this command:
SELECT '*'+plateName+'*',len(plateName)
FROM image.
I suspect platename has some non-printable characters in the field.
It appears to be a CR/LF at the end of the data. You can use
UPDATE image SET plateName = replace(plateName,char(13)+char(10),'')
WHERE plateName like '%'+char(13)+char(10)+'%'
If you get a positive row count, you'll know there was CR/LF data and it was removed. If you run the select afterwards, your lengths should be 7 and 8 based on your sample data
I am using PDFBox to extract text from PDF.
The PDF has a tabular structure, which is quite simple and columns are also very widely spaced from each-other
This works really well, except that all kinds of horizontal space gets converted into a single space character, so that I cannot tell columns apart anymore (space within words in a column looks just like space between columns).
I appreciate that a general solution is very hard, but in this case the columns are really far apart so that having a simple differentiation between "long spaces" and "space between words" would be enough.
Is there a way to tell PDFBox to turn horizontal whitespace of more then x inches into something other than a single space? A proportional approach (x inch become y spaces) would also work.
The pdftotext C library/tool has a '-layout' switch that tries to preserve the layout. Basically, if I can emulate that with PDFBox, that would be perfect.
There does not seem to be a setting for this, but I was able to modify the source for the PDFTextStripper tool to output a column separator (|) when a "long" space was encountered. In the code where it was building the output line it is possible to look at the x positions of the current and previous letter, and if it is large enough, do something special. PDFTextStripper has lots of protected methods, but turned out to be not really all that extensible. I ended up having to copy the whole class to change a private method.
Looking at the code in there, I call myself lucky that with the particular PDF, this simple approach was successful. A more general solution seems very tricky.
PDF text extraction is difficult.
If the text was output as one big string separated by spaces such as :-
PDFTextOut(" Column 1 Column 2 Column 3");
and you are using a fixed width font such as Courier then you could theoretically calculate the number of spaces between items of text because each character is the same width. If the font is proportional such a Arial then the calculation is harder.
In reality most PDF's generated by individually placing each piece of text directly into its position. Therefore, there is technically no space character or any other characters between columns. The text is just placed into an absolute position on the page.
PDFMoveTo(100,100);
PDFTextOut("Column 1");
PDFMoveTo(250,100);
PDFTextOut("Column 2");
In order to perform data extraction on PDF documents you have to do a little bit more work to find and match column data by using pixel locations as you have mentioned and by making some assumptions and having a little bit of luck.