Using ssplit options for CoreNLP - tokenize

According to the documentation, I can use options such as ssplit.isOneSentence for parsing my document into sentences. How exactly do I do this though, given a StanfordCoreNLP object?
Here's my code -
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, depparse");
pipeline.annotate(document);
Annotation document = new Annotation(doc);
pipeline.annotate(document);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
At what point do I add this option and where?
Something like this?
pipeline.ssplit.boundaryTokenRegex = '"'
I'd also like to know how to use it for the specific option boundaryTokenRegex
EDIT:
I think this seems more appropriate -
props.put("ssplit.boundaryTokenRegex", "/"");
But I still have to verify.

The way to do it for tokenizing sentences to end at any instance of a ' " ' is this -
props.setProperty("ssplit.boundaryMultiTokenRegex", "/\'\'/");
or
props.setProperty("ssplit.boundaryMultiTokenRegex", "/\"/");
depending on how it is stored. (CoreNLP normalizes it as the former)
And if you want both starting and ending quotes -
props.setProperty("ssplit.boundaryMultiTokenRegex","\/'/'|``\");

Related

How can I access value in sequence type?

There are the following attributes in client_output
weights_delta = attr.ib()
client_weight = attr.ib()
model_output = attr.ib()
client_loss = attr.ib()
After that, I made the client_output in the form of a sequence through
a = tff.federated_collect(client_output) and round_model_delta = tff.federated_map(selecting_fn,a)in here . and I declared
`
#tff.tf_computation() # append
def selecting_fn(a):
#TODO
return round_model_delta
in here. In the process of averaging on the server, I want to average the weights_delta by selecting some of the clients with a small loss value. So I try to access it via a.weights_delta but it doesn't work.
The tff.federated_collect returns a tff.SequenceType placed at tff.SERVER which you can manipulate the same way as for example client dataset is usually handled in a method decorated by tff.tf_computation.
Note that you have to use the tff.federated_collect operator in the scope of a tff.federated_computation. What you probably want to do[*] is pass it into a tff.tf_computation, using the tff.federated_map operator. Once inside the tff.tf_computation, you can think of it as a tf.data.Dataset object and everything in the tf.data module is available.
[*] I am guessing. More detailed explanation of what you would like to achieve would be helpful.

Extract IP Addresses from JSON file using VB.net

is there a way to extract only IPv4 from a file in JSON language using VB.net
For example I would like that when I open a JSON file from VB I can filter only IPv4 from this text for example: https://pastebin.com/raw/S7Vnnxqa
& i expect the results like this https://pastebin.com/raw/8L8Ckrwi i founded this website that he offer a tool to do that https://www.toolsvoid.com/extract-ip-addresses/ i put the link here to understand more what i mean but i don't want to use an external tool i want it to be converted from VB directly thanks for your help in advance.
Your "text" is JSON. Load it using the JSON parser of your choice (google VB.NET parse JSON), loop over the matches array and read the IP address from the http.host property of each element.
Here is an example how to do it using the Newtonsoft.Json package (see it working here on DotNetFiddle):
' Assume that the variable myJsonString contains the original string
Dim myJObject = JObject.Parse(myJsonString)
For Each match In myJObject("matches")
Console.WriteLine(match("http")("host"))
Next
Output:
62.176.84.198
197.214.169.59
46.234.76.75
122.136.141.67
219.73.94.83
2402:800:621b:33f1:d1e3:5544:4fcf:526e
178.136.75.125
188.167.212.252
...
If you want to extract only IPv4 and not IPv6, you can use a regular expression to check whether it matches:
Dim IPV4Regex = New Regex("^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$")
Dim ip = match("http")("host")
If IPV4Regex.Match(ip).Success Then
Console.WriteLine(ip)
End If
62.176.84.198
197.214.169.59
46.234.76.75
122.136.141.67
219.73.94.83
178.136.75.125
188.167.212.252
...
Of course it's always recommended to parse the input data in a structured way, to avoid surprises such as false positives. But if you just want to match anything that looks like an IP address, regardless of the input format (even if you just put hello1.2.3.4world in the textbox), then you could use just the regular expression and skip the structured approach (see it working here on DotNetFiddle):
Dim IPV4RegexWithWordBoundary = New Regex("\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b")
Dim match = IPV4RegexWithWordBoundary.Match(myJsonString)
Do While match.Success
Console.WriteLine(match.Value)
match = match.NextMatch()
Loop
Here I modified the regular expression to use \b...\b instead of ^...$ so that it matches word boundaries instead of start/end of string. Note however that now we get IP addresses twice with the input that you provided, because the addresses exist more than once:
62.176.84.198
62.176.84.198
197.214.169.59
197.214.169.59
46.234.76.75
46.234.76.75
...

scilab : index in variable name loop

i would like to read some images with scilab and i use the function imread like this
im01=imread('kodim01t.jpg');
im02=imread('kodim02t.jpg');
im03=imread('kodim03t.jpg');
im04=imread('kodim04t.jpg');
im05=imread('kodim05t.jpg');
im06=imread('kodim06t.jpg');
im07=imread('kodim07t.jpg');
im08=imread('kodim08t.jpg');
im09=imread('kodim09t.jpg');
im10=imread('kodim10t.jpg');
i would like to know if there is a way to do something like below in order to optimize the
for i = 1:5
im&i=imread('kodim0&i.jpg');
end
thanks in advance
I see two possible solutions using execstr or using some kind of list/matrix
Execstr
First create a string of the command to execute with msprintf and then execute this with execstr. Note that in the msprintf conversion the right amount of leading zeros are inserted by %0d format specifier descbribed here.
for i = 1:5
cmd=msprintf('im%d=imread(\'kodim%02d.jpg\');', i, i);
execstr(cmd);
end
List/Matrix
This is probably the more intuitive option using a indexable container such as list.
// This list could be generated using msprintf from example above
file_names_list = list("kodim01t.jpg", "kodim02t.jpg" ,"kodim03t.jpg");
// Create empty list to contain images
opened_images = list();
for i=1:length(file_names_list)
// Open image and insert it at end of list
opened_images($+1) = imread(file_names_list[i]);
end

Lucene.net PerFieldAnalyzerWrapper

I've read on how to use the per field analyzer wrapper, but can't get it to work with a custom analyzer of mine. I can't even get the analyzer to run the constructor, which makes me believe I'm actually calling the per field analyzer incorrectly.
Here's what I'm doing:
Create the per field analyzer:
PerFieldAnalyzerWrapper perFieldAnalyzer = new PerFieldAnalyzerWrapper(srchInfo.GetAnalyzer(true));
perFieldAnalyzer.AddAnalyzer("<special field>", dta);
Add all the fields do document as usual, including a special field that we analyze differently.
And add document using the analyzer like this:
iw.AddDocument(doc, perFieldAnalyzer);
Am I on the right track?
The problem was related to my reliance on CMSs (Kentico) built-in Lucene helper classes. Basically, using those classes you need to specify the custom analyzer at index-level through the CMS and I did not wish to do that. So I ended up using Lucene.net directly almost everywhere gaining the flexibility of using any custom analyzer I want
I also did some changes to how I structure data and ended up using the tried-and-true KeywordAnalyzer to analyze document tags. Previously I was trying to do some custom tokenization magic on comma separated values like [tag1, tag2, tag with many parts] and could not get it reliably working with multi-parted tags. I still kept that field, but started adding multiple "tag" fields to the document, each storing one tag. So now I have N "tag" fields for "N" tags, each analyzed as a keyword, meaning each tag (one word or many) is a single token.
I think I overthinked it with my initial approach.
Here is what I ended up with.
On Indexing:
KeywordAnalyzer ka = new KeywordAnalyzer();
PerFieldAnalyzerWrapper perFieldAnalyzer = new PerFieldAnalyzerWrapper(srchInfo.GetAnalyzer(true));
perFieldAnalyzer.AddAnalyzer("documenttags_t", ka);
-- Some procedure to compile all documents by reading from DB and putting into Lucene docs
foreach(var doc in docs)
{
iw.AddDocument(doc, perFieldAnalyzer);
}
On Searching:
KeywordAnalyzer ka = new KeywordAnalyzer();
PerFieldAnalyzerWrapper perFieldAnalyzer = new PerFieldAnalyzerWrapper(srchInfo.GetAnalyzer(true));
perFieldAnalyzer.AddAnalyzer("documenttags_t", ka);
string baseQuery = "documenttags_t:\"" + tagName + "\"";
Query query = _parser.Parse(baseQuery);
var results = _searcher.Search(query, sortBy)

RegEx help - finding / returning a code

I must admit it's been a few years since my RegEx class and since then, I have done little with them. So I turn to the brain power of SO. . .
I have an Excel spreadsheet (2007) with some data. I want to search one of the columns for a pattern (here's the RegEx part). When I find a match I want to copy a portion of the found match to another column in the same row.
A sample of the source data is included below. Each line represents a cell in the source.
I'm looking for a regex that matches "abms feature = XXX" where XXX is a varibale length word - no spaces in it and I think all alpha characters. Once I find a match, I want to toss out the "abms feature = " portion of the match and place the code (the XXX part) into another column.
I can handle the excel coding part. I just need help with the regex.
If you can provide a solution to do this entirely within Excel - no coding required, just using native excel formula and commands - I would like to hear that, too.
Thanks!
###################################
Structure
abms feature = rl
abms feature = sta
abms feature = pc, pcc, pi, poc, pot, psc, pst, pt, radp
font = 5 abms feature = equl, equr
abms feature = bl
abms feature = tl
abms feature = prl
font = 5
###################################
I am still learning about regex myself, but I have found this place useful for getting ideas or comparing what I came up with, might help in the future?
http://regexlib.com/
Try this regular expression:
abms feature = (\w+)
Here is an example of how to extract the value from the capture group:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
Regex regex = new Regex(#"abms feature = (\w+)",
RegexOptions.Compiled |
RegexOptions.CultureInvariant |
RegexOptions.IgnoreCase);
Match match = regex.Match("abms feature = XXX");
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
}
}
}
(?<=^abms feature = )[a-zA-Z]*
assuming you're not doing anything with the words after the commas