Remove a sentence in pdf by pdfbox - pdfbox

I do the job watermark of remove. i faced a problem how to remove a sentence in pdf file. I hava an idea that when processing operator(TJ Tj '),i record the ordre of such operator(TJ Tj ' ... showIdx). when the need to be removed sentence was found, i found the order index of operator ,and reprocess content stream,delete them.
the [op]<a https://stackoverflow.com/questions/58475104/filter-out-all-text-above-a-certain-font-size-from-pdf>[1] introduce PdfContentStreamEditor,but i can not get help from it.
BT
Tj showIdx2
TJ showIdx2
、
ET
BT
Tj showIdx3
TJ showIdx4
、
ET
···
[the case pdf file] <a https://github.com/zhongguogu/PDFBOX/blob/master/pdf/watermark.pdf >
the content in page header "本报告仅供-中庚基金管理有限公司-中庚报告邮箱使用 p2"

According to Google translate that line says that "this report is only for-Zhong Geng Fund Management Co., Ltd.-Zhong Geng Report Mailbox". This quite likely means that the report indeed was for Zhong Geng eyes only. But let's assume they decided to publish those reports more widely and you have the task of removing that soft restriction.
You mentioned the PdfContentStreamEditor from this answer.
Indeed you can use it similar to how it has been used in this answer where a string "[QR]" was to be removed from underneath some QR codes:
PDDocument document = ...
for (PDPage page : document.getDocumentCatalog().getPages()) {
PdfContentStreamEditor editor = new PdfContentStreamEditor(document, page) {
final StringBuilder recentChars = new StringBuilder();
#Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, Vector displacement)
throws IOException {
String string = font.toUnicode(code);
if (string != null)
recentChars.append(string);
super.showGlyph(textRenderingMatrix, font, code, displacement);
}
#Override
protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List<COSBase> operands) throws IOException {
String recentText = recentChars.toString();
recentChars.setLength(0);
String operatorString = operator.getName();
if (TEXT_SHOWING_OPERATORS.contains(operatorString) && "本报告仅供-中庚基金管理有限公司-中庚报告邮箱使用 p2".equals(recentText))
{
return;
}
super.write(contentStreamWriter, operator, operands);
}
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
};
editor.processPage(page);
}
document.save("watermark-RemoveByText.pdf");
(RemoveText test testRemoveByText)
Beware, though, this only works if the text to remove is drawn using one text showing instruction only and that instruction only draws the text to remove.
If instead the text to replace is drawn using multiple instructions following each other, you have to start collecting instructions as long as you have a potential match instead of dropping them immediately. As soon as the potential match turns out not to be a match after all, you'll have to super.write the collected instructions.
And if instead the text the replace is only part of what a single instruction draws, you'll have to doctor around with that instruction. Depending on one's script this may be very difficult, depending on how much it uses ligatures and stuff.
And the most complex situations may require you to collect all instructions while they're coming in, analyzing the whole of them, adapting identified instructions, and then forwarding the manipulated collected instructions to super.write.

Related

Optaplanner: NullPointerException when calling scoreDirector.beforeVariableChanged in a simple custom move

I am building a Capacited Vehicle Routing Problem with Time Windows, but with one small difference when compared to the one provided in examples from the documentation: I don't have a depot. Instead, each order has a pickup step, and a delivery step, in two different locations.
(like in the Vehicle Routing example from the documentation, the previousStep planning variable has the CHAINED graph type, and its valueRangeProviderRefs includes both Drivers, and Steps)
This difference adds a couple of constraints:
the pickup and delivery steps of a given order must be handled by the same driver
the pickup must be before the delivery
After experimenting with constraints, I have found that it would be more efficient to implement two types of custom moves:
assign both steps of an order to a driver
rearrange the steps of a driver
I am currently implementing that first custom move. My solver's configuration looks like this:
SolverFactory<RoutingProblem> solverFactory = SolverFactory.create(
new SolverConfig()
.withSolutionClass(RoutingProblem.class)
.withEntityClasses(Step.class, StepList.class)
.withScoreDirectorFactory(new ScoreDirectorFactoryConfig()
.withConstraintProviderClass(Constraints.class)
)
.withTerminationConfig(new TerminationConfig()
.withSecondsSpentLimit(60L)
)
.withPhaseList(List.of(
new LocalSearchPhaseConfig()
.withMoveSelectorConfig(CustomMoveListFactory.getConfig())
))
);
My CustomMoveListFactory looks like this (I plan on migrating it to an MoveIteratorFactory later, but for the moment, this is easier to read and write):
public class CustomMoveListFactory implements MoveListFactory<RoutingProblem> {
public static MoveListFactoryConfig getConfig() {
MoveListFactoryConfig result = new MoveListFactoryConfig();
result.setMoveListFactoryClass(CustomMoveListFactory.class);
return result;
}
#Override
public List<? extends Move<RoutingProblem>> createMoveList(RoutingProblem routingProblem) {
List<Move<RoutingProblem>> moves = new ArrayList<>();
// 1. Assign moves
for (Order order : routingProblem.getOrders()) {
Driver currentDriver = order.getDriver();
for (Driver driver : routingProblem.getDrivers()) {
if (!driver.equals(currentDriver)) {
moves.add(new AssignMove(order, driver));
}
}
}
// 2. Rearrange moves
// TODO
return moves;
}
}
And finally, the move itself looks like this (nevermind the undo or the isDoable for the moment):
#Override
protected void doMoveOnGenuineVariables(ScoreDirector<RoutingProblem> scoreDirector) {
assignStep(scoreDirector, order.getPickupStep());
assignStep(scoreDirector, order.getDeliveryStep());
}
private void assignStep(ScoreDirector<RoutingProblem> scoreDirector, Step step) {
StepList beforeStep = step.getPreviousStep();
Step afterStep = step.getNextStep();
// 1. Insert step at the end of the driver's step list
StepList lastStep = driver.getLastStep();
scoreDirector.beforeVariableChanged(step, "previousStep"); // NullPointerException here
step.setPreviousStep(lastStep);
scoreDirector.afterVariableChanged(step, "previousStep");
// 2. Remove step from current chained list
if (afterStep != null) {
scoreDirector.beforeVariableChanged(afterStep, "previousStep");
afterStep.setPreviousStep(beforeStep);
scoreDirector.afterVariableChanged(afterStep, "previousStep");
}
}
The idea being that at no point I'm doing an invalid chained list manipulation:
However, as the title and the code comment indicate, I am getting a NullPointerException when I call scoreDirector.beforeVariableChanged. None of my variables are null (I've printed them to make sure). The NullPointerException doesn't occur in my code, but deep inside Optaplanner's inner workings, making it difficult for me to fix it:
Exception in thread "main" java.lang.NullPointerException
at org.drools.core.common.NamedEntryPoint.update(NamedEntryPoint.java:353)
at org.drools.core.common.NamedEntryPoint.update(NamedEntryPoint.java:338)
at org.drools.core.impl.StatefulKnowledgeSessionImpl.update(StatefulKnowledgeSessionImpl.java:1579)
at org.drools.core.impl.StatefulKnowledgeSessionImpl.update(StatefulKnowledgeSessionImpl.java:1551)
at org.optaplanner.core.impl.score.stream.drools.DroolsConstraintSession.update(DroolsConstraintSession.java:49)
at org.optaplanner.core.impl.score.director.stream.ConstraintStreamScoreDirector.afterVariableChanged(ConstraintStreamScoreDirector.java:137)
at org.optaplanner.core.impl.domain.variable.inverserelation.SingletonInverseVariableListener.retract(SingletonInverseVariableListener.java:96)
at org.optaplanner.core.impl.domain.variable.inverserelation.SingletonInverseVariableListener.beforeVariableChanged(SingletonInverseVariableListener.java:46)
at org.optaplanner.core.impl.domain.variable.listener.support.VariableListenerSupport.beforeVariableChanged(VariableListenerSupport.java:170)
at org.optaplanner.core.impl.score.director.AbstractScoreDirector.beforeVariableChanged(AbstractScoreDirector.java:430)
at org.optaplanner.core.impl.score.director.AbstractScoreDirector.beforeVariableChanged(AbstractScoreDirector.java:390)
at test.optaplanner.solver.AssignMove.assignStep(AssignMove.java:98)
at test.optaplanner.solver.AssignMove.doMoveOnGenuineVariables(AssignMove.java:85)
at org.optaplanner.core.impl.heuristic.move.AbstractMove.doMove(AbstractMove.java:35)
at org.optaplanner.core.impl.heuristic.move.AbstractMove.doMove(AbstractMove.java:30)
at org.optaplanner.core.impl.score.director.AbstractScoreDirector.doAndProcessMove(AbstractScoreDirector.java:187)
at org.optaplanner.core.impl.localsearch.decider.LocalSearchDecider.doMove(LocalSearchDecider.java:132)
at org.optaplanner.core.impl.localsearch.decider.LocalSearchDecider.decideNextStep(LocalSearchDecider.java:116)
at org.optaplanner.core.impl.localsearch.DefaultLocalSearchPhase.solve(DefaultLocalSearchPhase.java:70)
at org.optaplanner.core.impl.solver.AbstractSolver.runPhases(AbstractSolver.java:98)
at org.optaplanner.core.impl.solver.DefaultSolver.solve(DefaultSolver.java:189)
at test.optaplanner.OptaPlannerService.testOptaplanner(OptaPlannerService.java:68)
at test.optaplanner.App.main(App.java:13)
Is there something I did wrong? It seems I am following the documentation for custom moves fairly closely, outside of the fact that I am using exclusively java code instead of drools.
The initial solution I feed to the solver has all of the steps assigned to a single driver. There are 15 drivers and 40 orders.
In order to bypass this error, I have tried a number of different things:
remove the shadow variable annotation, turn Driver into a problem fact, and handle the nextStep field myself => this makes no difference
use Simulated Annealing + First Fit Decreasing construction heuristics, and start with steps not assigned to any driver (this was inspired by looking up the example here, which is more complete than the one from the documentation) => the NullPointerException appears on afterVariableChanged instead, but it still appears.
a number of other things which were probably not very smart
But without a more helpful error message, I can't think of anything else to try.
Thank you for your help

how to judge there is content in QTextEdit in QT?

I have written a notepad which looks like notepad in Windows. How to set to make the find action disabled when the QTextEdit empty but enabled when something in it
The myTextEdit->plainText().isEmpty() procedure does not seem to be very efficient: The plainText method needs to convert the full QTextEdit contents into a new QString buffer, which is expensive if the QTextEdit contains a large amount of text.
I suggest to use myTextEdit->document()->isEmpty() instead, which queries the QTextDocument storage, i.e. the original data structure.
In my use case, the QTextEdit contains an error log, and before appending a line I check if the text is empty; if not, I insert a newline(*). Converting the log buffer to a QString each time a line is appended would be a bad idea.
(*) I cannot insert a newline together with each log entry, because the entries themselves are comma-separated lists. Roughly speaking I have a newEntry(...) and a newLine(...) function, and newEntry does not know if newLine or newEntry will be called next.
You connect a function that enables/disables the action based on the text edits plainText(), to the textChanged() signal of the text edit.
For example:
void MyWidget::someSetupMethod()
{
// ... some code that sets up myTextEdit and myFindAction here
connect(myTextEdit, &QTextEdit::textChanged, myFindAction, [myTextEdit, myFindAction]() {
myFindAction->setEnabled(!myTextEdit->plainText().isEmpty());
});
// ...
}
or, if you cannot or do not want to use C++11, something like
void MyWidget::someSetupMethod()
{
// ... some code that sets up m_myTextEdit and m_myFindAction here
connect(m_myTextEdit, &QTextEdit::textChanged, this, &MyWidget::updateFindAction);
// ...
}
void MyWidget::updateFindAction()
{
m_myFindAction->setEnabled(!m_myTextEdit->plainText().isEmpty());
}

Found 'UR'-anomaly for variable

I have this sonar error Major:
Found 'UR'-anomaly for variable 'language' (lines '83'-'85')
in this function:
public void saveAll(List<Language> languages){
//Found 'UR'-anomaly for variable 'country' (lines '83'-'85').
//Code Smell Major Open Not assigned 20min effort Comment
for (Language language: languages) {
save(language);
}
}
how to fix this major error please, thanks for advance
Edit:
Found even more information it this other SO post. While that is more PMD centric, the background information can be of interest to you.
Java for each loop being flagged as UR anomaly by PMD.
This is a rule from PMD it seems. Definition:
The dataflow analysis tracks local definitions, undefinitions and
references to variables on different paths on the data flow. From
those informations there can be found various problems. 1. UR -
Anomaly: There is a reference to a variable that was not defined
before. This is a bug and leads to an error. 2. DU - Anomaly: A
recently defined variable is undefined. These anomalies may appear in
normal source text. 3. DD - Anomaly: A recently defined variable is
redefined. This is ominous but don't have to be a bug.
There is an open bug report for this:
https://sourceforge.net/p/pmd/bugs/1190/
In the example they report it for Arrays, but somebody commented that it happens for them also for collections.
Example:
public static void main(final String[] args) {
for (final String string : args) {
string.getBytes(); //UR Anomaly
}
for (int i = 0; i < args.length; i++) {
args[i].getBytes();
}
}
In our sonar setup we don't use this rule. Based on the available information you may wish not to use it in yours.

How to create suggestion messages with ANTLR?

I want to create an interactive version of the ANTLR calculator example, which tells the user what to type next. For instance, in the beginning, the ID, INT, NEWLINE, and WS tokens are possible. Ignoring WS, a suggestion message could be:
Type an identifier, a number, or newline.
After parsing a number, the message should be
Type +, -, *, or newline.
and so on. How to do this?
Edit
What I have tried so far:
private void accept(String sentence) {
ANTLRInputStream is = new ANTLRInputStream(sentence);
OperationLexer l = new OperationLexer(is);
CommonTokenStream cts = new CommonTokenStream(l);
final OperationParser parser = new OperationParser(cts);
parser.addParseListener(new OperationBaseListener() {
#Override
public void enterEveryRule(ParserRuleContext ctx) {
ATNState state = parser.getATN().states.get(parser.getState());
System.out.print("RULE " + parser.ruleNames[state.ruleIndex] + " ");
IntervalSet following = parser.getATN().nextTokens(state, ctx);
for (Integer token : following.toList()) {
System.out.print(parser.tokenNames[token] + " ");
}
System.out.println();
}
});
parser.prog();
}
prints the right suggestion for the first token, but for all other tokens, it print the current token. I guess capturing the state at enterEveryRule() is too early.
Accurately gathering this information in an LL(k) parser, where k>1, requires a thorough understanding of the parser internals. Several years ago, I faced this problem with ANTLR 3, and found the only real solution was so complex that it resulted in me becoming a co-author of ANTLR 4 specifically so I could handle this issue.
ANTLR (including ANTLR 4) disambiguates the parse tree during the parsing phase, which means if your grammar is not LL(1) then performing this analysis in the parse tree means you have already lost information necessary to be accurate. You'll need to write your own version of ParserATNSimulator (or a custom interpreter which wraps it) which does not lose the information.

Lucene stop phrases filter

I'm trying to write a filter for Lucene, similar to StopWordsFilter (thus implementing TokenFilter), but I need to remove phrases (sequence of tokens) instead of words.
The "stop phrases" are represented themselves as a sequence of tokens: punctuation is not considered.
I think I need to do some kind of buffering of the tokens in the token stream, and when a full phrase is matched, I discard all tokens in the buffer.
What would be the best approach to implements a "stop phrases" filter given a stream of words like Lucene's TokenStream?
In this thread I was given a solution: use Lucene's CachingTokenFilter as a starting point:
That solution was actually the right way to go.
EDIT: I fixed the dead link. Here is a transcript of the thread.
MY QUESTION:
I'm trying to implement a "stop phrases filter" with the new TokenStream
API.
I would like to be able to peek into N tokens ahead, see if the current
token + N subsequent tokens match a "stop phrase" (the set of stop phrases
are saved in a HashSet), then discard all these tokens when they match a
stop phrase, or keep them all if they don't match.
For this purpose I would like to use captureState() and then restoreState()
to get back to the starting point of the stream.
I tried many combinations of these API. My last attempt is in the code
below, which doesn't work.
static private HashSet<String> m_stop_phrases = new HashSet<String>();
static private int m_max_stop_phrase_length = 0;
...
public final boolean incrementToken() throws IOException {
if (!input.incrementToken())
return false;
Stack<State> stateStack = new Stack<State>();
StringBuilder match_string_builder = new StringBuilder();
int skippedPositions = 0;
boolean is_next_token = true;
while (is_next_token && match_string_builder.length() < m_max_stop_phrase_length) {
if (match_string_builder.length() > 0)
match_string_builder.append(" ");
match_string_builder.append(termAtt.term());
skippedPositions += posIncrAtt.getPositionIncrement();
stateStack.push(captureState());
is_next_token = input.incrementToken();
if (m_stop_phrases.contains(match_string_builder.toString())) {
// Stop phrase is found: skip the number of tokens
// without restoring the state
posIncrAtt.setPositionIncrement(posIncrAtt.getPositionIncrement() + skippedPositions);
return is_next_token;
}
}
// No stop phrase found: restore the stream
while (!stateStack.empty())
restoreState(stateStack.pop());
return true;
}
Which is the correct direction I should look into to implement my "stop
phrases" filter?
CORRECT ANSWER:
restoreState only restores the token contents, not the complete stream. So
you cannot roll back the token stream (and this was also not possible with
the old API). The while loop at the end of you code is not working as you
exspect because of this. You may use CachingTokenFilter, which can be reset
and consumed again, as a source for further work.
You'll really have to write your own Analyzer, I should think, since whether or not some sequence of words is a "phrase" is dependent on cues, such as punctuation, that are not available after tokenization.