What is the significance of the brackets answers versus the input answer in a keystore? Please see the example below. Here there are different answers between the brackets and the input response. Does this happen if the end user has forgotten the Keystore password? I saw this on some notes from a previous version installation and was confused on why there is a difference. THis happened when we set up a keystore. I assume that there was an old keystore where the password was forgotten.
Does this mean? [old response]: new response?
What is your first and last name?
i. [Security Department]: Security Department
e. What is the name of your organizational unit?
i. [Security Department]: Security Department
f. What is the name of your organization?
i. [XYZ]: ABC Corporation
g. What is the name of you City of Locality?
i. [Columbus]: Pittsburgh
h. What is the name of your State of Province?
i. [PA]: PA
i. What is the two-letter country code for this unit?
i. [US]: US
j. Is CN=Security Department, OU=Physical Security, O=ABC Corporation, L=Pittsburgh, ST=PA, C=US correct?`
There is a difference in responses.
Related
I just want to extract all instances of a sentence
starts with a title (ie. Mr, Miss, Ms or Dr)
contains the word "asked"
end with .
I tried the below regex but got back an empty list. Thank you
import re
text_list="26 Mr Kwek Hian Chuan Henry asked the Minister for the Environment and Water Resources whether Singapore will stay the course on fighting climate change and meet our climate change commitments despite the current upheavals in the energy market and the potential long-term economic impact arising from the COVID-19 situation. We agree with the Panel and will instead strengthen regulations to safeguard the safety of path users. With regard to Ms Rahayu Mahzam's suggestion of tapping on the Small Claims Tribunal for personal injury claims up to $20,000, we understand that the Tribunal does not hear personal injury claims. Mr Gan Thiam Poh, Ms Rahayu Mahzam and Mr Melvin Yong have asked about online retailers of PMDs. Mr Melvin Yong asked about the qualifications and training of OEOs."
asked_regex=re.compile(r'^(Mr|Miss|Ms|Dr)(.|\n){1,}(asked)(.|\n){1,}\.$')
asked=re.findall(asked_regex, text_list)
Desired Output:
["Mr Kwek Hian Chuan Henry asked the Minister for the Environment and Water Resources whether Singapore will stay the course on fighting climate change and meet our climate change commitments despite the current upheavals in the energy market and the potential long-term economic impact arising from the COVID-19 situation. ",
"Mr Gan Thiam Poh, Ms Rahayu Mahzam and Mr Melvin Yong have asked about online retailers of PMDs.",
"Mr Melvin Yong asked about the qualifications and training of OEOs."]
try this regex pattern:
import re
text_list="26 Mr Kwek Hian Chuan Henry asked the Minister for the Environment and Water Resources whether Singapore will stay the course on fighting climate change and meet our climate change commitments despite the current upheavals in the energy market and the potential long-term economic impact arising from the COVID-19 situation. We agree with the Panel and will instead strengthen regulations to safeguard the safety of path users. With regard to Ms Rahayu Mahzam's suggestion of tapping on the Small Claims Tribunal for personal injury claims up to $20,000, we understand that the Tribunal does not hear personal injury claims. Mr Gan Thiam Poh, Ms Rahayu Mahzam and Mr Melvin Yong have asked about online retailers of PMDs. Mr Melvin Yong asked about the qualifications and training of OEOs."
asked_regex=re.compile(r'(Mr|Miss|Ms|Dr)[^\.]*asked[^\.]*\.')
asked=re.findall(asked_regex, text_list)
(Mr|Miss|Ms|Dr)
this will search for all sentences that start with Mr,Miss,Ms,Dr (your pattern would only look for those that were at start of the string.)
[^\.]*asked[^\.]*
this part accepts any string that has word asked in it and before and after of asked is not a full stop or ..
\.
checks that sentence ends with full stop or .
I'm sure regex is right but I don't know why it doesn't work with findall.
here is the code that regex101.com generated based on the pattern and it works.
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"(Mr|Miss|Ms|Dr)[^\.]*asked[^\.]*\."
test_str = "26 Mr Kwek Hian Chuan Henry asked the Minister for the Environment and Water Resources whether Singapore will stay the course on fighting climate change and meet our climate change commitments despite the current upheavals in the energy market and the potential long-term economic impact arising from the COVID-19 situation. We agree with the Panel and will instead strengthen regulations to safeguard the safety of path users. With regard to Ms Rahayu Mahzam's suggestion of tapping on the Small Claims Tribunal for personal injury claims up to $20,000, we understand that the Tribunal does not hear personal injury claims. Mr Gan Thiam Poh, Ms Rahayu Mahzam and Mr Melvin Yong have asked about online retailers of PMDs. Mr Melvin Yong asked about the qualifications and training of OEOs."
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.```
I am using Spacy to do POS tagging and lemmatization. I believe the best practice is to disable unneeded components to maximize performance. Having disabled several components however it now seems that every token POS is noun!
It seems the tok2vec component is required for POS tagging. Is that correct, and if so, is this explained anywhere?
Additionally, is there a better way to optimize Spacy pipelines besides removing components?
import spacy
txt = '''ex-4.1 2 d879007dex41.htm ex-4.1 ex-4.1 exhibit 4.1 amendment no. 6 to note amendment no. 6 to note (this " amendment "), dated and effective as of january 30, 2020, is made by and between the u.s. small business administration (" sba "), an agency of the united states, and its successors and assigns, and freshstart venture capital corporation (the " licensee "), a small business investment borrower, licensed under the small business investment act of 1958, as amended, whose principal office is located at 437 madison avenue, new york, ny 10022. recitals whereas , the licensee issued that certain note, effective as of march 1, 2017 in the principal amount of $34,024,755.58 (thirty-four million twenty-four thousand seven hundred fifty-five and 58/100 dollars) in favor of sba (the " existing note "). whereas , sba and the licensee have agreed, subject to the terms and conditions of this amendment, that the existing note be amended to reflect certain agreed upon revisions to the terms of the existing note. now therefore, sba and the licensee hereby agree, in consideration of the mutual premises and mutual obligations set forth herein, that the existing note is hereby amended as follows: section 1. defined terms . except as otherwise indicated herein, all words and terms defined in the existing note shall have the same meanings when used herein. section 2. amendments . a. in the last sentence of the second paragraph of the existing note the phrase, "february 1, 2020" is hereby deleted in its entirety and replaced with the following: "april 1, 2020" b. in the third paragraph of the existing note the phrase, "february 1, 2020" is hereby deleted in its entirety and replaced with the following: "april 1, 2020" section 3. representations and warranties . each party hereby represents and warrants to the other party that it is in compliance with all the terms and provisions set forth in the existing note on its part to be observed or performed and hereby confirms and reaffirms each of its representations and warranties contained in the existing note. section 4. limited effect . except as expressly amended and modified by this amendment, the existing note shall continue to be, and shall remain, in full force and effect in accordance with its terms (and as duly amended). 1 section 5. counterparts . this amendment may be executed by each of the parties hereto on any number of separate counterparts, each of which shall be an original and all of which taken together shall constitute one and the same instrument. delivery of an executed signature page of this amendment in portable document format (pdf) or by facsimile transmission shall be effective as delivery of an executed original counterpart of this amendment. section 6. governing law . pursuant to section 101.106(b) of part 13 of the code of federal regulations, this amendment is to be construed and enforced in accordance with the act, the regulations and other federal law, and in the absence of applicable federal law, then by applicable new york law to the extent it does not conflict with the act, the regulations or other federal law. [signatures appear on next page] 2 in witness whereof, the parties have caused this amendment to be executed by their respective officers thereunto duly authorized, as of the date first above written. freshstart venture capital corporation by: /s/ thomas j. munson name: thomas j. munson title: svp u.s. small business administration by: /s/ thomas g. morris name: thomas g. morris title: director, o/l & acting deputy a/a oii 3'''
nlp = spacy.load('en_core_web_sm')
nlp.disable_pipe("parser")
nlp.disable_pipe("tok2vec") # it seems this is needed in fact?
nlp.disable_pipe("ner")
nlp.enable_pipe("senter")
nlp.max_length = 5000000
doc = nlp(txt)
print(nlp.pipe_names)
for token in doc:
print(token.text, token.pos_, token.lemma_)
NER is not required for POS tagging. Assuming are actually using the above code, the tok2vec is the issue, as that is required for POS tagging.
For advice on making spaCy faster, please see the spaCy speed FAQ. Besides disabling components you aren't using, another thing you can do is use nlp.pipe to batch requests.
I am working for a big manufacturer and supplier to Amazon. We are currently in testing mode with them for EDI. We are using AS2, EDIFACT standard, like required from Amazon. regarding INVOIC messages, Amazon is insisting on a specific payer address in NAD IV segment - the company name of Amazon Germany, which is about 41 characters. We have exact payer address stored in SAP, but once we make EDI transfer, the payer name segment is cut to 35 character.
What we can transmit:
NAD+IV+5450534005838::9++AMAZON EU SARL:NIEDERLASSUNG DEUTSC+Marcel-Breuer-Str. 12+MUENCHEN++80807+DE'
What AMAZON expects:
NAD+IV+5450534005838::9++AMAZON EU SARL:NIEDERLASSUNG DEUTSCHLAND+MARCEL-BREUER-STR. 12+MUENCHEN++80807+DE
Amazon is consequently rejecting our invoices after transmission as long as there is no exact match.It is insane, as Amazon itself provides documentation where the field limit is stated.
However we do not get qualified response over their vendor central. (Everyone working with Amazon knows what I mean)
Has anybody experience with EDI setup with Amazon, their requirements and this specific field limitation?
We have tried to use an abbreviation of company name, but this is not accepted. Billing address cannot be changed.
Change of field length in code not possible at the moment
the NAD segment has several name and address fields in composite C080 (5 of them in release D96A in fact). You can store the required name in those fields, not just in the first one. The colon in your message example is not part of the name, it is a separator for fields in a composite. It's part of the EDIFACT syntax. The plus sign separates fields and composites, the colon separates fields within a composite.
dissecting the expected NAD segment it looks like this:
NAD (Segment name)
IV Field 3035, Party Qualifier
5450534005838 Composite C082, Field 3039, Party Identification
Composite C058 is left empty
AMAZON EU SARL Composite C080, Field 3036 (first occurrence), Party Name
NIEDERLASSUNG DEUTSCHLAND Composite C080, Field 3036 (second occurrence), Party Name
MARCEL-BREUER-STR. 12 Composite C059, Field 3024 Street
MUENCHEN Field 3164, City Name
Field 3229 is left empty
80807 Field 3251, Postcode
DE Field 3207, Country coded
I personally use the EDIFACT directories from Truugo to check the message definitions:
the NAD segment: https://www.truugo.com/edifact/d96a/nad/
the INVOIC message https://www.truugo.com/edifact/d96a/invoic/
I have a set of queries and I am trying to get web_urls using the NYT article search API. But I am seeing that it works for q2 below but not for q1.
q1: Seattle+Jacob Vigdor+the University of Washington
q2: Seattle+Jacob Vigdor+University of Washington
If you paste the url below with your API key in the web browser, you get an empty result.
Search request for q1
api.nytimes.com/svc/search/v2/articlesearch.json?q=Seattle+Jacob%20Vigdor+the%20University%20of%20Washington&begin_date=20170626&api-key=XXXX
Empty results for q1
{"response":{"meta":{"hits":0,"time":27,"offset":0},"docs":[]},"status":"OK","copyright":"Copyright (c) 2013 The New York Times Company. All Rights Reserved."}
Instead if you paste the following in your web browser (without the article 'the' in the query) you get non-empty results
Search request for q2
api.nytimes.com/svc/search/v2/articlesearch.json?q=Seattle+Jacob%20Vigdor+University%20of%20Washington&begin_date=20170626&api-key=XXXX
Non-empty results for q2
{"response":{"meta":{"hits":1,"time":22,"offset":0},"docs":[{"web_url":"https://www.nytimes.com/aponline/2017/06/26/us/ap-us-seattle-minimum-wage.html","snippet":"Seattle's $15-an-hour minimum wage law has cost the city jobs, according to a study released Monday that contradicted another new study published last week....","lead_paragraph":"Seattle's $15-an-hour minimum wage law has cost the city jobs, according to a study released Monday that contradicted another new study published last week.","abstract":null,"print_page":null,"blog":[],"source":"AP","multimedia":[],"headline":{"main":"New Study of Seattle's $15 Minimum Wage Says It Costs Jobs","print_headline":"New Study of Seattle's $15 Minimum Wage Says It Costs Jobs"},"keywords":[],"pub_date":"2017-06-26T15:16:28+0000","document_type":"article","news_desk":"None","section_name":"U.S.","subsection_name":null,"byline":{"person":[],"original":"By THE ASSOCIATED PRESS","organization":"THE ASSOCIATED PRESS"},"type_of_material":"News","_id":"5951255195d0e02550996fb3","word_count":643,"slideshow_credits":null}]},"status":"OK","copyright":"Copyright (c) 2013 The New York Times Company. All Rights Reserved."}
Interestingly, both queries work fine on the api test page
http://developer.nytimes.com/article_search_v2.json#/Console/
Also, if you look at the article below returned by q2, you see that the query term in q1, 'the University of Washington' does occur in it and it should have returned this article.
https://www.nytimes.com//aponline//2017//06//26//us//ap-us-seattle-minimum-wage.html
I am confused about this behaviour of the API. Any ideas what's going on? Am I missing something?
Thank you for all the answers. Below I am pasting the answer I received from NYT developers.
NYT's Article Search API uses Elasticsearch. There are lots of docs online about the query syntax of Elasticsearch (it is based on Lucene).
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax
If you want articles that contain "Seattle", "Jacob Vigdor" and "University of Washington", do
"Seattle" AND "Jacob Vigdor" AND "University of Washington"
or
+"Seattle" +"Jacob Vigdor" +"University of Washington"
I think you need to change encoding of spaces (%20) to + (%2B):
In your example,
q=Seattle+Jacob%20Vigdor+the%20University%20of%20Washington
When I submit from the page on the site, it uses %2B:
q=Seattle%2BJacob+Vigdor%2Bthe+University+of+Washington
How are you URL encoding? One way to fix it would be to replace your spaces with + before URL encoding.
Also, you may need to replace %20 with +. There are various schemes for URL encoding, so the best way would depend on how you are doing it.
I'm trying to understand in what order and what precedence the geocoding API takes when processing the pieces of the address that was passed to it.
I have this example of why I'm asking the question. The correct address is:
2608 N Ocean Bv
Myrtle Beach, SC 29577
Running that into the API, absolutely no problems:
http://maps.googleapis.com/maps/api/geocode/json?address=2608+n+ocean+bv+myrtle+beach+sc+29577&sensor=false
However, take this typoed version of the address:
2608 N Ocean Bv
Mrytle Beach, NC 29577
The city is spelled wrong, and it has the wrong state. Street number, name and zip code are correct. Mrytle Beach does not exist anywhere, and not in NC.
http://maps.googleapis.com/maps/api/geocode/json?address=2608+n+ocean+bv+mrytle+beach+nc+29577&sensor=false
Google comes back with:
2608 N Ocean Bv
North Myrtle Beach, SC 29582
Now, that is a valid address. But why did Google decide that was the address I was looking for?
If you remove the incorrect state, and don't replace it with anything:
http://maps.googleapis.com/maps/api/geocode/json?address=2608+n+ocean+bv+mrytle+beach+29577&sensor=false
Google returns a corrected version of the correct address. So it seems that state trumps zip code - however, North Myrtle Beach does not exist in NC.
I'm thinking that omitting city and state eliminates most of this issue - but I'd like to understand why - if possible. Thanks.
Edit:
After some further playing around - it seems that Google looks for a city match as highest priority, then state - ignore all else. In this case:
Can't find a city called "Mrytle Beach" anywhere in the world.
Let's start in NC then and find the closest match to the street address if there is one.
Ah, here is the closest one to NC - in North Myrtle Beach.
If you change the state in my example above from NC to FL, the more southern Myrtle Beach match is closer to Florida than the more northern North Myrtle Beach address, and that is what Google returns.
I'm trying to understand the reasoning behind this. It seems that this sort of logic would be near last resort - or at least after making use of the zip code passed - which it appears it doesn't use at all.