Regex grammar for action mapping in Struts 1.x to map the URLs? - struts

I know the documentation says, only (*) and escape sequence is supported. I want to differentiate between /somepath/category/subcategory and /somepath/category/article.html
I need to route both to different action handlers. I want to use wildcard characters in path mappings.
What is the best approach to do this?

Related

Add custom punctuation to spacy model

How do you add custom punctuation (e.g. asterisk) to the infix list in a Tokenizer and have that recognized by nlp.explain as punctuation? I would like to be able to add characters that are not currently recognized as punctuation to the punctuation list from the list of set infixes so that the Matcher can use them when matching {'IS_PUNCT': True} .
An answer to a similar issue was provided here
How can I add custom signs to spaCy's punctuation functionality?
The only problem is I am unable to package the newly recognized punctuation with the model. A side note: the tokenizer already recognizes infixes with the desired punctuation, so all that is left is propagating this to the Matcher.
The lexeme attribute IS_PUNCT is completely separate from any of the tokenizer settings. In a packaged pipeline, you'd either create a custom language (https://spacy.io/usage/linguistic-features#language-subclass) or run the customization in a callback in [nlp.before_creation] (https://spacy.io/usage/training#custom-code-nlp-callbacks).
Be aware that modifying EnglishDefaults affects all English pipelines loaded in the same script, so the custom language option is cleaner (in particular if you're distributing this model for general use), but also slightly more work to implement.
On the other hand, if you're just using the Matcher, it might be easier to use a REGEX pattern to match the tokens you want instead of customizing IS_PUNCT.

Escaping `/**` in asciidoc

My goal
I am trying to submit a fix to the Filebeat documentation, written in asciidoc.
Source
Currently it is not possible to recursively fetch all files in all subdirectories of a directory. However,
the /** pattern can be used to fetch all files from a predefined level of subdirectories. For example,
/var/log/**/*.log and /var/log/**/**/*.log fetch all .log files from the subdirectories and sub-subdirectories of /var/log, respectively. Note that neither fetch log files from the /var/log folder itself.
Result
From asciidoclive.com:
My problem
The /** becomes / in the output, and the words after it are unintentionally marked in bold.
My question
How do I properly escape /** in asciidoc?
This is tricky, ’cause escaping is de facto broken in Asciidoctor. :(
I know about two possible workarounds:
use passthrough: +++\**+++,
or define attribute for asterisk, e.g. :star: * and write \{star}{star}.
Yes, it sucks and it’s not easily fixable in Asciidoctor. The best option would be to implement new AsciiDoc parser based on proper grammar etc, Asciidoctor is unfortunately in very bad shape in sense of code-quality. :( (If anyone would be willing to sponsor such project, pls contact me.)

mod_rewrite does not encode special characters even if NE flag is not supplied?

So obviously from apache documentation I see the following description for NE flag:
https://httpd.apache.org/docs/2.2/rewrite/flags.html#flag_ne
By default, special characters, such as & and ?, for example, will be converted to their hexcode equivalent. Using the [NE] flag prevents that from happening.
RewriteRule ^/anchor/(.+) /bigpage.html#$1 [NE,R]
The above example will redirect /anchor/xyz to /bigpage.html#xyz. Omitting the [NE] will result in the # being converted to its hexcode equivalent, %23, which will then result in a 404 Not Found error condition.
However, I have seen tons of examples where you simply put a RewriteRule like this:
RewriteRule ^(.*)$ http://www.mydomain.com/?foo=bar&jee=lee [L,R]
And if you examine the final request sent to the server after redirect, it's just this same plain string without any uri encoding. If I experiment more, it seems like uri-encoding only happens inside mod_rewrite if the source string has some special character inside the query string part, say the source is originaldomain.com/?foo%5d=6
Then mod_rewrite will try to rewrite it to mydomain.com/?foo%255d=6 by encoding "%" into "%25", if NE is not supplied. But note if I omit "?" in my original request, the encoding will not happen.
So that makes me confused about what's described in most sites and document, unless I am understanding this concept in a totally wrong way.
And also, I will be curious to learn about in general, what is the rule of thumb that browser and mod_rewrite use to decide whether they want to encode certain characters or not. Seems to me that browser tends not to encode anything unless it finds it hard or does not make sense to send what's being typed in the browser, is that correct? Also it would be really nice if someone can give a complete workflow as to when and where all the encoding and decoding happen from typing the domain in the browser to actually get the page rendered, in the whole process?
The general "rule of thumb" and "complete workflow as to when and where all the encoding and decoding happen" in regard to URIs can be found in RFC3986:
The generic syntax uses the slash ("/"), question mark ("?"), and
number sign ("#") characters to delimit components that are
significant to the generic parser's hierarchical interpretation of an
identifier.
In short, the # symbol when used by most browsers is considered a relative reference. For instance you can add a link to an id on a page with:
http://www.example.com/mypage.html#some_div_id
Because of this Apache isn't expecting this to be on the server side of things. Therefore by default it's url encoding (their terminology is escaping) the hash symbol to pass it forward when you're doing a rewrite. (It's trying to protect you from yourself according to the RFC.)
The [NE] or noescape flag basically prevents the default url encoding from taking place.
Also according to the RFC:
2.2. Reserved Characters
URIs include components and subcomponents that are delimited by
characters in the "reserved" set. These characters are called
"reserved" because they may (or may not) be defined as delimiters by
the generic syntax, by each scheme-specific syntax, or by the
implementation-specific syntax of a URI's dereferencing algorithm.
If data for a URI component would conflict with a reserved
character's purpose as a delimiter, then the conflicting data must be
percent-encoded before the URI is formed.
Additionally from section 1.2.3
As relative references can only be used within the context of a
hierarchical URI, designers of new URI schemes should use a syntax
consistent with the generic syntax's hierarchical components unless
there are compelling reasons to forbid relative referencing within
that scheme.

Best Practice for integrating HTMLPurifier in Zend Framework 2

What is or might be the best practice to integrate the HTMLPurifier in Zend Framework 2?
The goal is to filter Zend Form Elements and input fields, which where not generated with Zend Form.
How would you do this?
From my point of view I would say you can create a new filter. If Purifier were already a 'part' of Zend then it might be name Zend\Filter\HtmlPurifier. That's where I see it fitting the most.
You could also make it a Validator (Zend\Validator\HtmlPurifier) so you can tell if a piece of text 'passes' or not. Depends on what you want to do.
If you want to reject bad input, use the validator path. If you want to filter out bad input use the filter path.
After you've made your Zend filter/validator, use it like you would any other filter/validator.

ANTLR and content assist in Eclipse

I have a project in Eclipse where I have an editor for a custom language. I am using ANTLR to generate the compiler for it. What I need is to add content assist to the editor.
The input is a source code in the custom language, and the position of the character where the user requested content assist. The source code is most of time incomplete as the user can ask for content assist any time. What I need is to calculate the list of possible tokens that are valid for the given position.
It is possible to write a custom code to do the calculation, but that code would have to be manually kept in sync with the grammar. I figured the parser is doing something similar. It has to be able to determine at a given context what are the acceptable tokens. Is it possible to "reuse" that? What is the best practice in creating content assist anyway?
Thanks,
Balint
Have a look at Xtext. Xtext uses Antlr3 under the hood and provides content assist for the Antlr based languages. Have a look especially into package org.eclipse.xtext.ui.editor.contentassist.
You may consider to redefine your grammar with Xtext, which would provide the content assist out-of-the-box. It is not possible to reuse the Antlr grammar of a custom language.