In `readr::read_csv()`,When parse "German language",`Ü` become `¨1`, how to solver it

In `readr::read_csv()`,When parse "German language",`Ü` become `¨1`, how to solver it - tidyverse

In readr::read_csv(),When parse "German language",Ü become ¨1, how to solver it? Thanks!
readr::read_csv(I("type\nBlitzangebotsgebühr\nÜbertrag"),locale = locale(encoding='ISO-8859-1'))

using UTF8 will fix our problem.
library(readr)
read_csv(I("type\nBlitzangebotsgebühr\nÜbertrag"), locale = locale(encoding='utf-8'))

Related

Adding a Retokenize pipe while training NER model

I am currenly attempting to train a NER model centered around Property Descriptions. I could get a fully trained model to function to my liking however, I now want to add a retokenize pipe to the model so that I can set up the model to train other things.
From here, I am having issues getting the retokenize pipe to actually work. Here is the definition:
def retok(doc):
ents = [(ent.start, ent.end, ent.label) for ent in doc.ents]
with doc.retokenize() as retok:
string_store = doc.vocab.strings
for start, end, label in ents:
retok.merge(
doc[start: end],
attrs=intify_attrs({'ent_type':label},string_store))
return doc
i am adding it into my training like this:
nlp.add_pipe(retok, after="ner")
and I am adding it into the Language Factories like this:
Language.factories['retok'] = lambda nlp, **cfg: retok(nlp)
The issue I keep getting is "AttributeError: 'English' object has no attribute 'ents'". Now I am assuming I am getting this error because the parameter that is being passed through this function is not a doc but actually the NLP model itself. I am not really sure to get a doc to flow into this pipe during training. At this point I don't really know where to go from here to get the pipe to function the way I want.
Any help is appreciated, thanks.

You can potentially use the built-in merge_entities pipeline component: https://spacy.io/api/pipeline-functions#merge_entities
The example copied from the docs:
texts = [t.text for t in nlp("I like David Bowie")]
assert texts == ["I", "like", "David", "Bowie"]
merge_ents = nlp.create_pipe("merge_entities")
nlp.add_pipe(merge_ents)
texts = [t.text for t in nlp("I like David Bowie")]
assert texts == ["I", "like", "David Bowie"]
If you need to customize it further, the current implementation of merge_entities (v2.2) is a good starting point:
def merge_entities(doc):
"""Merge entities into a single token.
doc (Doc): The Doc object.
RETURNS (Doc): The Doc object with merged entities.
DOCS: https://spacy.io/api/pipeline-functions#merge_entities
"""
with doc.retokenize() as retokenizer:
for ent in doc.ents:
attrs = {"tag": ent.root.tag, "dep": ent.root.dep, "ent_type": ent.l
abel}
retokenizer.merge(ent, attrs=attrs)
return doc
P.S. You are passing nlp to retok() below, which is where the error is coming from:
Language.factories['retok'] = lambda nlp, **cfg: retok(nlp)
See a related question: Spacy - Save custom pipeline

Validation of dynamic text in testing

I am trying to validate a pin code in my application. I am using Katalon and I have not been able to find an answer.
The pin code that I need to validate is the same length but different each time I run the test and looks like this on my page: PIN Code: 4938475948.
How can I account for the number changing each time I run the test?
I have tried the following regular expressions:
assertEquals(
"PIN Code: [^a-z ]*([.0-9])*\\d",
selenium.getText("//*[#id='RegItemContent0']/div/table/tbody/tr/td[2]/table/tbody/tr[2]/td/div[1]/div[3]/ul/li[2]/span")
);
Note: This was coded in Selenium and converted to Katalon.

In Katalon, use a combination of WebUI.getText() and WebUI.verifyMatch() to do the same thing.
E.g.
TestObject object = new TestObject().addProperty('xpath', ConditionType.EQUALS, '//*[#id='RegItemContent0']/div/table/tbody/tr/td[2]/table/tbody/tr[2]/td/div[1]/div[3]/ul/li[2]/span')
def actualText = WebUI.getText(object)
def expectedText = '4938475948'
WebUI.verifyMatch(actualText, expectedText, true)
Use also toInteger() or toString() groovy methods to convert types, if needed.

Editing upper example but to get this working
TestObject object = new TestObject().addProperty('xpath', ConditionType.EQUALS, '//*[#id='RegItemContent0']/div/table/tbody/tr/td[2]/table/tbody/tr[2]/td/div[1]/div[3]/ul/li[2]/span')
def actualText = WebUI.getText(object)
def expectedText = '4938475948'
WebUI.verifyMatch(actualText, expectedText, true)
This can be done as variable but in Your case I recommend using some java
import java.util.Random;
Random rand = new Random();
int n = rand.nextInt(9000000000) + 1000000000;
// this will also cover the bad PIN (above limit)

I'd tweak your regex just a little since your pin code is the same length each time: you could limit the number of digits that the regex looks for and make sure the following character is white space (i.e. not a digit, or another stray character). Lastly, use the "true" flag to let the WebUI.verifyMatch() know it should expect a regular expression from the second string (the regex must be the second parameter).
def regexExpectedText = "PIN Code: ([0-9]){10}\\s"
TestObject pinCodeTO = new TestObject().addProperty('xpath', ConditionType.EQUALS, '//*[#id='RegItemContent0']/div/table/tbody/tr/td[2]/table/tbody/tr[2]/td/div[1]/div[3]/ul/li[2]/span')
def actualText = WebUI.getText(pinCodeTO)
WebUI.verifyMatch(actualText, expectedText, true)
Hope that helps!

read wsgi post data with unicode encoding

How can i read wsgi POST with unicode encoding,
this is part of my code :
....
request_body_size = int(environ.get('CONTENT_LENGTH', 0))
req = str(environ['wsgi.input'].read(request_body_size))
and from req i read my fileds,
this is what i posted :
کلمه
and this is what i read it from inside of py code:
b"%DA%A9%D9%84%D9%85%D9%87"
This is a byte string but i can't convert it or read it ,
I use encode and decode methods but none of these are not worked .
I use python3.4 and wsgi and mod_wsgi(apache2).

I use urllib module of python, with this code and worked :
fm = urllib.parse.parse_qs(request_body['family'].encode().decode(),True) # return a dictionary
familyvalue = str([k for k in fm.keys()][0]) # access to first item
is this a right way ?

boost::asio and make_shared does not work?

I seem to be able to use boost::make_shared everywhere except with boost asio?
example: _ioService = boost::shared_ptr<io_service>(new io_service)
if I turn this into: _ioService = boost::make_shared<io_service>()
I get all kinds of errors?
Same problem if I take:
_acceptor = boost::shared_ptr<tcp::acceptor>(new tcp::acceptor(*_ioService));
and turn it into this:
_acceptor = boost::make_shared<tcp::acceptor>(*_ioService);

As boost::asio::tcp::acceptor takes a boost::asio::io_service by non-const reference you need to change:
_acceptor = boost::make_shared<tcp::acceptor>(*_ioService);
to:
_acceptor = boost::make_shared<tcp::acceptor>(boost::ref(*_ioService));

How to get an outline view in sublime texteditor?

How do I get an outline view in sublime text editor for Windows?
The minimap is helpful but I miss a traditional outline (a klickable list of all the functions in my code in the order they appear for quick navigation and orientation)
Maybe there is a plugin, addon or similar? It would also be nice if you can shortly name which steps are neccesary to make it work.
There is a duplicate of this question on the sublime text forums.

Hit CTRL+R, or CMD+R for Mac, for the function list. This works in Sublime Text 1.3 or above.

A plugin named Outline is available in package control, try it!
https://packagecontrol.io/packages/Outline
Note: it does not work in multi rows/columns mode.
For multiple rows/columns work use this fork:
https://github.com/vlad-wonderkidstudio/SublimeOutline

I use the fold all action. It will minimize everything to the declaration, I can see all the methods/functions, and then expand the one I'm interested in.

I briefly look at SublimeText 3 api and view.find_by_selector(selector) seems to be able to return a list of regions.
So I guess that a plugin that would display the outline/structure of your file is possible.
A plugin that would display something like this:
Note: the function name display plugin could be used as an inspiration to extract the class/methods names or ClassHierarchy to extract the outline structure

If you want to be able to printout or save the outline the ctr / command + r is not very useful.
One can do a simple find all on the following grep ^[^\n]*function[^{]+{ or some variant of it to suit the language and situation you are working in.
Once you do the find all you can copy and paste the result to a new document and depending on the number of functions should not take long to tidy up.
The answer is far from perfect, particularly for cases when the comments have the word function (or it's equivalent) in them, but I do think it's a helpful answer.
With a very quick edit this is the result I got on what I'm working on now.
PathMaker.prototype.start = PathMaker.prototype.initiate = function(point){};
PathMaker.prototype.path = function(thePath){};
PathMaker.prototype.add = function(point){};
PathMaker.prototype.addPath = function(path){};
PathMaker.prototype.go = function(distance, angle){};
PathMaker.prototype.goE = function(distance, angle){};
PathMaker.prototype.turn = function(angle, distance){};
PathMaker.prototype.continue = function(distance, a){};
PathMaker.prototype.curve = function(angle, radiusX, radiusY){};
PathMaker.prototype.up = PathMaker.prototype.north = function(distance){};
PathMaker.prototype.down = PathMaker.prototype.south = function(distance){};
PathMaker.prototype.east = function(distance){};
PathMaker.prototype.west = function(distance){};
PathMaker.prototype.getAngle = function(point){};
PathMaker.prototype.toBezierPoints = function(PathMakerPoints, toSource){};
PathMaker.prototype.extremities = function(points){};
PathMaker.prototype.bounds = function(path){};
PathMaker.prototype.tangent = function(t, points){};
PathMaker.prototype.roundErrors = function(n, acurracy){};
PathMaker.prototype.bezierTangent = function(path, t){};
PathMaker.prototype.splitBezier = function(points, t){};
PathMaker.prototype.arc = function(start, end){};
PathMaker.prototype.getKappa = function(angle, start){};
PathMaker.prototype.circle = function(radius, start, end, x, y, reverse){};
PathMaker.prototype.ellipse = function(radiusX, radiusY, start, end, x, y , reverse/*, anchorPoint, reverse*/ ){};
PathMaker.prototype.rotateArc = function(path /*array*/ , angle){};
PathMaker.prototype.rotatePoint = function(point, origin, r){};
PathMaker.prototype.roundErrors = function(n, acurracy){};
PathMaker.prototype.rotate = function(path /*object or array*/ , R){};
PathMaker.prototype.moveTo = function(path /*object or array*/ , x, y){};
PathMaker.prototype.scale = function(path, x, y /* number X scale i.e. 1.2 for 120% */ ){};
PathMaker.prototype.reverse = function(path){};
PathMaker.prototype.pathItemPath = function(pathItem, toSource){};
PathMaker.prototype.merge = function(path){};
PathMaker.prototype.draw = function(item, properties){};

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

In `readr::read_csv()`,When parse "German language",`Ü` become `¨1`, how to solver it - tidyverse

In readr::read_csv(),When parse "German language",Ü become ¨1, how to solver it? Thanks! readr::read_csv(I("type\nBlitzangebotsgebühr\nÜbertrag"),locale = locale(encoding='ISO-8859-1'))

using UTF8 will fix our problem. library(readr) read_csv(I("type\nBlitzangebotsgebühr\nÜbertrag"), locale = locale(encoding='utf-8'))

Related

Adding a Retokenize pipe while training NER model

Validation of dynamic text in testing

read wsgi post data with unicode encoding

boost::asio and make_shared does not work?

How to get an outline view in sublime texteditor?

Categories

Resources