Simple question I hope - I have a pdf and want to detect the co-ordinates of specific word(s) or placeholder text. I then intend to use itextsharp to stamp a replacement bit of text on top at the co-ordinates found.
Can anyone recommend anything please?
Thanks
As answered in the comments, one could use iText to perform such a task. Maybe there are some better solutions, however, I doubt it. The cause of the mentioned issue, i.e. "[itextsharp] sometimes give co-ords of the start of the sentence the search text is in", is that sometimes glyphs are so close, that their boxes overlap, hence I don't see how it could be handled as you want.
So you can do the following:
extend LocationTextExtractionStrategy class and override eventOccurred, for example, as follows:
#Override
public void eventOccurred(IEventData data, EventType type) {
if (type.equals(EventType.RENDER_TEXT)) {
TextRenderInfo renderInfo = (TextRenderInfo) data;
// Obtain all the necesary information from renderInfo, for example
LineSegment segment = renderInfo.getBaseline();
// ...
}
pass an instance of such an extended class to PdfTextExtractor.getTextFromPage as follows:
PdfTextExtractor.getTextFromPage(pdfDocument.getPage(1), new ExtendedLocationTextExtractionStrategy()
once text is found, the event will be triggered.
There are some difficulties in such a solution, of course, because the text you want to find and write above could be present in the PDF not as "Text", but "T", "ex", t", or even "t", "x", "e", "T". However, since you use iText, you may want to harness the advantages of one of its products - pdfSweep. This product aims to completely remove unnecessary content from the PDF, with such a content being passed either as some locations (which you want to obtain, so that is not an option) or regexes.
This is how to create such a regex strategy (to find all "Dolor" and "dolor" instances in the document, completely remove them (from all the streams, so that they are either not observed from a PDF viewer nor found in the underlying PDF objects):
RegexBasedCleanupStrategy strategy = new RegexBasedCleanupStrategy("(D|d)olor").setRedactionColor(ColorConstants.GREEN);
This is how to use it:
PdfAutoSweep autoSweep = new PdfAutoSweep(strategy);
autoSweep.cleanUp(pdf); // a PdfDocument instance
And this is how to write some text on the location, at which the unnecessary text was present:
for (IPdfTextLocation location : strategy.getResultantLocations()) {
Rectangle rect = location.getRectangle();
// do something, for exapmle, write some text
}
I've got a QTextDocument read from an HTML file; given a QString of HTML data named topicFileData, I do topicFileTextDocument.setHtml(topicFileData);. I then want to strip off all of the color information, making the whole document just use the default foreground and background brush. (I do not want to explicitly set the text to be black text on a white background; I want to remove the color information from the document.) (Background info: the reason I need to do this is that there are spans within the document that are erroneously set with a black foreground color, rather than just having no color information set, and that causes those spans to display as black-on-black when my app is running in "dark mode", when Qt changes the default text background brush to be black instead of white.)
Here's what I tried:
QTextCursor tc(&topicFileTextDocument);
tc.select(QTextCursor::Document);
QTextCharFormat noColorFormat;
noColorFormat.clearForeground();
noColorFormat.clearBackground();
tc.mergeCharFormat(noColorFormat);
This does not work, unfortunately; it looks like mergeCharFormat() does not understand that I want the clearForeground() and clearBackground() actions to be merged in to strip off those attributes.
I can do tc.setCharFormat(noColorFormat); instead, of course, and that does strip off the color attributes correctly; but it also obliterates all of the other character format info (font, etc.), which is not acceptable.
So, ideally I'd like to find an API that lets me explicitly remove a given text attribute from a QTextDocument. Alternatively, I guess I need to loop through all the spans of the QTextDocument one by one, get the char format of the current span, remove the color attributes from the format, and set the modified format back onto the span. That would be fine; but I have no idea how to loop over spans in that way. Thanks for any help.
Instead of creating a new instance of QTextCharFormat, update the current format and reapply it on the QTextEdit;
default = QTextCharFormat()
charFormat = self.textCursor().charFormat()
charFormat.setBackground(default.background())
charFormat.setForeground(default.foreground())
self.textCursor().mergeCharFormat(charFormat)
A sub-optimal solution that I have found as a workaround is to actually edit the HTML data string before I create the QTextDocument, using a regex:
topicFileData.replace(QRegularExpression("(;? ?color: ?#[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f])"), "");
This works for my situation, because all of the colors in my HTML file are set with color: #XXXXXX style attributes that can be stripped out of the HTML itself. This is fragile, however; colors specified in other ways would not be stripped, and if the body text of the HTML document happened to contain text that matched the regex, the regex would modify it and thus corrupt the content of the document. So I don't recommend this solution, and I won't be accepting it. If somebody can offer a better solution that would be preferable.
Intellij keeps replacing final static variable names with random strings. I get that this idea comes from code style/best practices etc., but I find this very annoying. How do I disable this?
E.g. if I create this variable:
private static final Logger logger
= LoggerFactory.getLogger(AuthController.class);
As soon as I type "logger" and press space/enter, it replaces the name "logger" with some random string e.g. "asdoiasdk"; in the editor it then looks like:
private static final Logger asdoiasdk
= LoggerFactory.getLogger(AuthController.class);
Screenshots below:
I start adding a private static final variable called "logger" - the screenshot shows that state just after I typed the variable name but haven't added " = " yet:
Then I press space and equals (" = ") and the variable name changes to this random string "jmeecp":
I found out why and after having spent a day on this, I'll add the reason in the hope that some other poor soul will benefit from this. Basically, the issue happens if you are using "Fantasque Sans Mono" as your editor font. I think it doesn't play well with the highlighting applied by IntelliJ for typos. E.g.: in the below example, the word "REQUESTSTART" is a typo and thus highlighted by IntelliJ (this font in this screenshot is "Droid Sans Mono"):
When I change the font to "Fantasque Sans Mono", the issue surfaces:
There is a pattern between the original string the one shown with this font - the ascii code seems to be going back two positions e.g. R->P, E->C and so on. Very interesting.
Edit 2019-04-15: See this thread for the workaround that resolved the issue for me.
I'm currently working on converting a VBA AutoCAD-application over to VB.NET, and the current command I'm working on is creating a simple leader with code like this:
Set leaderObj = ThisDrawing.ModelSpace.AddLeader(points, blockRefObj, leaderType)
leaderObj.ArrowheadType = acArrowDotSmall
leaderObj.ArrowheadSize = 2.5 * varDimscale
leaderObj.DimensionLineColor = acWhite
I've been able to create the Leader-line in .NET using
Dim l = New Leader()
For Each point In jig.LeaderPoints
l.AppendVertex(point)
Next
l.Dimldrblk = arrId
The arrId I got from using the function found here, but I've been unable to figure out how to set the color of the leader to white (it shows up as red by default), and also how to set the size of the arrowhead. If anyone could help me out with this I would be most grateful.
Ok, after a lot of trial and error, I figured out that the solution was rather simple. I didn't have to override any dimension styles (which I honestly don't even know what is, I had a short beginners course in AutoCAD before getting handed this project), I simply had to set an obscure property on the Leader-object. For future references, and for anyone else trying to do the same, here's the properties I ended up using:
leader.Dimclrd
The color of the leader-line. Stands for something like "dimension line color".
leader.Dimasz
The scale of the leader-head.
As type BlockReference, it should have a color property and the property should be an Autodesk.Autocad.Colors.Color or an Integer. Also the reason you are getting the object for read is, in your transaction you are opening the database with
OpenMode.ForRead
And that is correct. But to edit the object in the database, you must retrieve the object like below
var obj = Thetransaction.GetObject(theobjectid,OpenMode.ForWrite) as BlockReferance;
This is done inside of the
using(var trans = TransactionManager.StartTransaction()){}
I'm doing this on a cell, so check the camel case and syntax because I write in c#, but it should be pretty close.
You may want to see if there is a scale property, as to change the size.
Hopefully this will move you in the right direction.
Let me know if you have any problems. :)
I would really like to see a proportional font IDE, even if I have to build it myself (perhaps as an extension to Visual Studio). What I basically mean is MS Word style editing of code that sort of looks like the typographical style in The C++ Programming Language book.
I want to set tab stops for my indents and lining up function signatures and rows of assignment statements, which could be specified in points instead of fixed character positions. I would also like bold and italics. Various font sizes and even style sheets would be cool.
Has anyone seen anything like this out there or know the best way to start building one?
I'd still like to see a popular editor or IDE implement elastic tabstops.
Thinking with Style suggests to use your favorite text-manipulation software like Word or Writer. Create your programme code in rich XML and extract the compiler-relevant sections with XSLT. The "Office" software will provide all advanced text-manipulation and formatting features.
i expected you'll get down-modded and picked on for that suggestion, but there's some real sense to the idea.
The main advantage of the traditional 'non-proportional' font requirement in code editors is to ease the burden of performing code formatting.
But with all of the interactive automatic formatting that occurs in modern IDE's, it's really possible that a proportional font could improve the readability of the code (rather than hampering it, as i'm sure many purists would expect).
A character called Roedy Green (famous for his 'how to write unmaintainable code' articles) wrote about a theoretical editor/language, based on Java and called Bali. It didn't include non-proportional fonts exactly, but it did include the idea of having non-uniform font-sizes.
Also, this short Joel Spolsky post posts to a solution, elastic tab stops (as mentioned by another commentor) that would help with the support of non-proportional (and variable sized) fonts.
#Thomas Owens
I don't find code formatted like that easier to read.
That's fine, it is just a personal preference and we can disagree. Format it the way you think is best and I'll respect it. I frequently ask myself 'how should I format this or that thing?' My answer is always to format it to improve readability, which I admit can be subjective.
Regarding your sample, I just like having that nicely aligned column on the right hand side, its sort of a quick "index" into the code on the left. Having said that, I would probably avoid commenting every line like that anyway because the code itself shouldn't need that much explanation. And if it does I tend to write a paragraph above the code.
But consider this example from the original poster. Its easier to spot the comments in the second one in my opinion.
for (size-type i = 0; i<v.size(); i++) { // rehash:
size-type ii = has(v[i].key)%b.size9); // hash
v[i].next = b[ii]; // link
b[ii] = &v[i];
}
for (size-type i = 0; i<v.size(); i++) { // rehash:
size-type ii = has(v[i].key)%b.size9); // hash
v[i].next = b[ii]; // link
b[ii] = &v[i];
}
#Thomas Owens
But do people really line comments up
like that? ... I never try to
line up declarations or comments or
anything, and the only place I've ever
seen that is in textbooks.
Yes people do line up comments and declarations and all sorts of things. Consistently well formatted code is easier to read and code that is easier to read is easier to maintain.
I wonder why nobody actually answers your question, and why the accepted answer doesn't really have anything to do with your question. But anyway...
a proportional font IDE
In Eclipse you can cchoose any font on your system.
set tab stops for my indents
In Eclipse you can configure the automatic indentation, including setting it to "tabs only".
lining up function signatures and rows of assignment statements
In Eclipse, automatic indentation does that.
which could be specified in points instead of fixed character positions.
Sorry, I don't think Eclipse can help you there. But it is open source. ;-)
bold and italics
Eclipse has that.
Various font sizes and even style sheets would be cool
I think Eclipse only uses one font and font-size for each file type (for example Java source file), but you can have different "style sheets" for different file types.
When I last looked at Eclipse (some time ago now!) it allowed you to choose any installed font to work in. Not so sure whether it supported the notion of indenting using tab stops.
It looked cool, but the code was definitely harder to read...
Soeren: That's kind of neat, IMO. But do people really line comments up like that? For my end of line comments, I always use a single space then // or /* or equivalent, depending on language I'm using. I never try to line up declarations or comments or anything, and the only place I've ever seen that is in textbooks.
#Brian Ensink: I don't find code formatted like that easier to read.
int var1 = 1 //Comment
int longerVar = 2 //Comment
int anotherVar = 4 //Command
versus
int var2 = 1 //Comment
int longerVar = 2 //Comment
int anotherVar = 4 //Comment
I find the first lines easier to read than the second lines, personally.
The indentation part of your question is being done today in a real product, though possibly to even a greater level of automation than you imagined, the product I mention is an XSLT IDE, but the same formatting principles would work with most (but not all) conventional code syntaxes.
This really has to be seen in video to get the sense of it all (sorry about the music back-track). There's also a light XML editor spin-off product, XMLQuire, that serves as a technology demonstrator.
The screenshot below shows XML formatted with quite complex formatting rules in this XSLT IDE, where all indentation is performed word-processor style, using the left margin - not space or tab characters.
To emphasise this formatting concept, all characters have been highlighted to show where the left-margin extends to keep indentation. I use the term Virtual Formatting to describe this - it's not like elastic tab stops, because there simply are no tabs, just margin information which is part of the 'paragraph' formatting (RTF codes are used here). The parser reformats continuously, in the same pass as syntax coloring.
A proportional font hasn't been used here, but it could have been quite easily - because the indentation is set in TWIPS. The editing experience is quite compelling because, as you refactor the code (XML in this case), perhaps through drag and drop, or by extending the length of an attribute value, the indentation just re-flows itself to fit - there's no tab-key or 'reformat' button to press.
So, the indentation is there, but the font work is a more complex problem. I've experimented with this, but found that if fonts are re-selected as you type, the horizontal shifting of the code is too distracting - there would need to be a user-initiated 'format fonts' command probably. The product also has Ink/Handwriting technology built-in for annotating code, but I've yet to exploit this in the live release.
Folks are all complaining about comments not lining up.
Seems to me that there's a very simple solution: Define the unit space as the widest character in the font. Now, proportionally space all characters except the space. the space takes up as much room so as to line up the next character where it would be if all preceeding characters on the line were the widest in the font.
ie:
iiii_space_Foo
xxxx_space_Foo
would line up the "Foo", with the space after the "i" being much wider than after the "x".
So call it elastic spaces. rather than tab-stops.
If you're a smart editor, treat comments specially, but that's just gravy
Let me recall arguments about using the 'var' keyword in C#. People hated it, and thought it would make code less clear. For example, you couldn't know the type in something like:
var x = GetResults("Main");
foreach(var y in x)
{
WriteResult(x);
}
Their argument was, that you couln't see if x was an array, an List or any other IEnumerable. Or what the type of y was. In my opinion the unclearity did not arise from using var, but from picking unclear variable names. Why not just type:
var electionResults = GetRegionalElactionResults("Main");
foreach(var result in electionResults)
{
Write(result); // you can see what you're writing!!
}
"But you still cannot see the type of electionResults!" - does it really matter? If you want to change the return type of GetRegionalElectionResults, you can do so. Any IEnumerable will do.
Fast forward to now. People want to align comments en similar code:
int var2 = 1; //The number of days since startup, including the first
int longerVar = 2; //The number of free days per week
int anotherVar = 38; //The number of working hours per week
So without the comment everything is unclear. And if you don't align the values, you cannot seperate them from the variales. But do you? What about this (ignore the bullets please)
int daysSinceStartup = 1; // including first
int freeDaysPerWeek = 2;
int workingHoursPerWeek = 38;
If you need a comment on EVERY LINE, you're doing something wrong. "But you still need to align the VALUES" - do you? what does 38 have to do with 2?
In C# Most code blocks can easily be aligned using only tabs (or acually, multiples of four spaces):
var regionsWithIncrease =
from result in GetRegionalElectionResults()
where result.TotalCount > result > PreviousTotalCount &&
result.PreviousTotalCount > 0 // just new regions
select result.Region;
foreach (var region in regionsWithIncrease)
{
Write(region);
}
You should never use line-to-line comments and you should rarely need to vertically align things. Rarely, not never. So I understand if some of you guys prefer a monospaced font. I prefer the readibility of font Noto Sans or Source Sans Pro. These fonts are available freely from Google, and resemble Calibri, but are designed for programming and thus have all the neccesary characteristics:
Big : ; . , so you can clearly see the difference
Clearly distinct 0Oo and distinct Il|
The major problem with proportional fonts is they destroy the vertical alignment of the code and this is a fairly major loss when it comes to writing code.
The vertical alignment makes it possible to manipulate rectangular blocks of code that span multiple lines by allowing block operations like cut, copy, paste, delete and indent, unindent etc to be easily performed.
As an example consider this snippet of code:
a1 = a111;
B2 = aaaa;
c3 = AAAA;
w4 = wwWW;
W4 = WWWW;
In a mono-spaced font the = and the ; all line up.
Now if this text is loded into Word and display using a proportional font the text effectively turns into this:
NOTE: Extra white space added to show how the = and ; no longer line up:
a1 = a1 1 1;
B2 = aaaa;
c3 = A A A A;
w4 = w w W W;
W4 = W W W W;
With the vertical alignment gone those nice blocks of code effectively disappear.
Also because the cursor is no longer guaranteed to move vertically (i.e. the column number is not always constant from one line to the next) it makes it more difficult to write throw away macro scripts designed to manipulated similar looking lines.