Detecting text encapsulation in VB.NET - vb.net

In my program I need to make various word replacements but I don't want to replace the word if it's encapsulated. Here's an example:
This is an example. Here is the {definition of example}: A small part
or quantity intended to show what the whole is like. Examples are great.
I would like it to turn that into this:
This is an replacement words. Here is the {definition of example}: A small part
or quantity intended to show what the whole is like. replacement words are great.
Sorry, I don't have any code to show progress yet aside from my current code to make replacements and that won't help with this problem.
Thanks!
Phil

One way to accomplish this is using Regular Expressions.
A great tool for experimenting with and learning Regular Expressions is Expresso (it provides detailed explanations of entered Regex's).
The 30 Minute Regex Tutorial
Using Regular Expressions with The Microsoft .NET Framework
Regular expressions - An introduction
Regex Class

Related

How to use Alignment API to generate a Alignment Format file?

I am going to attend the Instance Matching of OAEI, now I need to make my results to Alignment Format. In order to achieve it, I have learned official tutorials.(link:http://alignapi.gforge.inria.fr/tutorial/tutorial1/index.html).
But there are many differences between the method taught and the method I want. In other words, I can't understand the API.
This is my situation:
I have 2 rdf file(person11.rdf and person12.rdf respectively.data link is http://oaei.ontologymatching.org/2010/im/index.html, the PR dataset), each file has information of many person. I want to find the coreferent entities, the results must be printed in Alignment Format. I find the results by using SPARQL, but I don't know how to print it in Alignment Format.
So, I have three questions:
First, if I want to generate a Alignment Format file, is the method taught the only way?
Second, can you give me your method(code better) to generate the Alignment Format file? Maybe I am wrong from the beginning, can you give me some suggestions?
Third, if you attended OAEI or know something about Instance Matching, can you give me some advice? I want to find the coreferent entities.
Thank you!
First question: I guess that the "mentioned method" is the one in tutorial1. It is not the appropriate one since you have to write a program to output the alignment format and this is a command line interface tutorial. In this case, you'd better look at http://alignapi.gforge.inria.fr/tutorial/tutorial2/index.html
Then, there are basically two ways to do:
The advised one (for several reasons and for participating to OAEI) is to follow these tutorials, to create an empty alignment in it, to create the correspondences from the results of your SPARQL query and to render it. Everything is covered by the tutorials but the part concerning your SPARQL queries. This assumes that you are programming in Java.
The non-advised solution (primarily non advised because you will have to debug your own renderer), is to write, in any programming language that you want a program that output the format (which corresponds to what you cite).
Think about it: how would you expect that the Alignment API knows the results of your SPARQL query? If you come up with a nice solution, contact the API developers, they may integrate it and others could benefit.
Second question: I cannot do better than what is above.
Third question: too general. Read the OAEI results (http://oaei.ontologymatching.org) and look at the code of others.
Good luck!

Processing text of SQL script

I want to develop tool which will prettify SQL scripts - make all special words and commands (SELECT, JOIN, FROM, etc.) upper/lower case; add square brackets; and couple other things (yes, ). I'm going to implement it as extension for my IDE or as external tool - I'm not decided it yet.
I was going to split a script by spaces, brackets, commas and periods - get separate words - and check each word to match to one of the keywords. If it matches - then capitalize/lowercase word depending on settings. If not - leave it as it was.
But then I thought that it may be other solutions.
I thought about using RegEx (unfortunately I don't know much about it). I suppose that it will work more efficient. And therefore using it will be more preferred.
Is RegEx the best way to achieve my goal? Or my initial approach is also appropriate?
Is there other ways?
P.S. I know that similar tools can already exist out there. And I will appreciate if you share them. But I want to implement my own tool for self-education reasons.

best way to index a text consist of multilingual word in elasticsearch

I'm new to elasticsearch.The doc on official site just say the basic and do not contain specific example.Due to it is a little disorganized as my view, I can't figure out how to get start to achieve my purpose.
I have crawl a lot of torrents, they are published by many different language.
I see there is analysis in elasticsearch to deal with input text, but I don't understand the work flow. elasticsearch do not use all analyzers to process input data as I try.
It seems I should appoint a analyzer to process a text.
Such as a text :no game no life 游戏人生 ノーゲーム・ノーライフ, it contain three language.How can I know which three analyzers I have to use?And it also too heavy to use all analyzer to process this text.
I have seen a article Three Principles for Multilingal Indexing in Elasticsearch talk about this.However I am a beginner and non-native English speaker, it is hard to understand without a example.
Please give me some guide.
Thank you.
I would probably create two fields (or multiple for number of expected languages) and apply different analyzers (language dependent) to each of them. Then when you search you would search both fields.

Code related web searches

Is there a way to search the web which does NOT remove punctuation? For example, I want to search for window.window->window (Yes, I actually do, this is a structure in mozilla plugins). I figure that this HAS to be a fairly rare string.
Unfortunately, Google, Bing, AltaVista, Yahoo, and Excite all strip the punctuation and just show anything with the word "window" in it. And according to Google, on their site, at least, there is NO WAY AROUND IT.
In general, searching for chunks of code must be hard for this reason... anyone have any hints?
google codesearch ("window.window->window" but it doesn't seem to get any relevant result out of this request)
There is similar tools all over the internet like codase or koders but I'm not sure they let you search exactly this string. Anyway they might be useful to you so I think they're worth mentioning.
edit: It is very unlikely you'll find a general purpose search engine which will allow you to search for something like "window.window->window" because most search engines will do some processing on the document before storing it. For instance they might represent it internally as vectors of words (a vector space model) and use that to do the search, not the actual original string. And creating such a vector involves first cutting the document according to punctuation and other critters. This is a very complex and interesting subject which I can't tell you much more about. My bad memory did a pretty good job since I studied it at school!
BTW they might do the same kind of processing on your query too. You might want to read about tf-idf which is probably light years from what google and his friends are doing but can give you a hint about what happens to your query.
There is no way to do that, by itself in the main Google engine, as you discovered -- however, if you are looking for information about Mozilla then the best bet would be to structure your query something more like this:
"window.window->window" +Mozilla
OR +XUL
+ Another search string related to what you are
trying to do.
SymbolHound is a web search that does not remove punctuation from the queries. There is an option to search source code repositories (like the now-discontinued Google Code Search), but it also has the option to search the Internet for special characters. (primarily programming-related sites such as StackOverflow).
try it here: http://www.symbolhound.com
-Tom (co-founder)

Tricky replacements in vb.net

I want to replace the text in a string "Sin()" with the computed value for Math.Sin() where is...anything.
My problem: The string can have more than one right parenthesis. Also, since it is performing mathematical operations, it would have to know how to do the innermost ones first.
Obviously, there is not a built in method for computing mathematical equations (well, nothing that is SUPPOSED to be used for that), as noted in a previous question of mine.
This is very tricky, can anyone help?
You want to tokenize your input text, and then parse your tokens. As Alex Martelli points out, CodeProject also has a good example of something similar (which I was able to find in <10 seconds with Google).
There's a good example of an expression parser and evaluator in vb.net here, I imagine you can study and modify those sources (it also offers clear text explanations).