Select specific entities to be tagged - spacy

Is it possible to only have NER tag a subset on entities. For example I may only need the date and money entities how could I accomplish that?
I’ve looked through the EntityRecognized documentation but didn’t see anything around removing entities.

It looks like it might be possible to accomplish this by retraining the NER tagger. (If you're interested in this route, check out this article that discusses your problem.)
But are you absolutely sure it's necessary though?
For instance, you could create a method that filters the results so that it only returns the entity types you are looking for.
def get_entities(doc):
for entity in doc.ents:
if entity.label_ in ["DATE","MONEY"]:
yield entity
else:
continue
Then instead of iterating over doc.ents, you can iterate over get_entities(doc) instead.
This seems like the easier way to me.

Related

sql count filtering - rails way

Suppose I have Posts and posts' Comments. I want to filter all the Posts that have more than 10 comments. I began writing something like Posts.includes(:comments).group("post.id").count("comments.id"), to obtain a hash of posts and their counts, and I can extract the information from there, but I want some one-line straightforward way to do that
Sure I can use some pure sql syntax statements, but I want it in a pure rails way. Any idea ?
Assuming the models are named in the more typical singular form of Post and Comment and have the usual association relationship, then the following should work:
Post.joins(:comments).group('posts.id').having('count(comments.id) > 10')

Scala 22param limit trying to find a workaround and still use for comprehensions instead of plain SQL in Slick

I'm working with 23 fields here, at last count. I've generally thrown my hands up with trying to count them after reducing from a 31-fielded table by using a foreign-key.
All the good links
Fundamental explanation of how to read and understand Slick's schema code provided by one very good Faiz.
On 22+ parameters...
Stefan Zeigar has been immensely helpful in the example code he's written in this discussion and also more directly linked to here on Github
The good Stefan Zeigar has also posted here on plain SQL queries
What this post is about
I think the above is enough to get me on my way to a working refactoring of my app so that CRUD is feasible. I'll update this question or ask new questions if something comes up and stagnates me. The thing is...
I miss using for comprehensions for querying. I'm talking about Slick's Query Templates
The problem I run into when I use a for comprehensions is that the table... will probably have
object Monsters extends Table[Int]("monster_table"){
// lots of column definitions
def * = id /* for a Table[Int] despite
having 21 other columns I'm not describing
in this projection/ColumnBase/??? */
}
and the * projection won't describe everything I want to return in a query.
The usual simple for comprehension Slick query template will look something like this:
def someQueryTemplate = for {
m <- Monsters
} yield m
and m will be an Int instead of the entire object I want because I declared the table to be a Table[Int] because I can't construct a mapped projection of 22 params because of all the code that needs to be generated for compiler support of class generation for each tuple and arbitrariness
So... in a nutshell:
Is there any way to use Query Templates in Slick with 22+ columns?
I stumbled on this question after answering a similar question a couple days ago. Here's a link to the question. slick error : type TupleXX is not a member of package scala (XX > 22)
To answer the question here, you are able to use nested tuples to solve the 22 column limit. When solving the problem myself, the following post was extremely helpful. https://groups.google.com/forum/#!msg/scalaquery/qjNW8P7VQJ8/ntqCkz0S4WIJ
Another option is to use the soon-to-be-released version of Slick which uses HLists to remove the column count limitation. This version of Slick can be pulled from the Slick master branch on Github. Kudos to cvogt for pointing out this option. https://github.com/slick/slick/blob/master/slick-testkit/src/main/scala/com/typesafe/slick/testkit/tests/MapperTest.scala#L249

Coldfusion ORM relationship id list

I'm wondering if there is a simple way to get an ID list of all the target objects relating to a source object in a coldfusion component using ORM?
I can see that you can do a collection mapping for one-to-many relationships, but I am using many-to-many relationships. I don't want to have to get the array of objects and then loop over it to get each id.
Is there any built in function or property that could do this?
I think something like the code sample below is a little too heavy since it is getting the whole query and then getting a single column from it.
valuelist( EntityToQuery( object.getRelationalFields() ).id )
Sometimes it doesn't make sense to use ORM, and this is the time. Use the good old <cfquery> for this!
I think ORMExecuteQuery may work for you, something like this:
result = ORMExecuteQuery("select id from Model as m where m.parent.id = :id", {id = 123});
Actual clause format depends on the relationship definition.
In a result you'll have the array of model PKs.

Rails sum on an associated record

I have a survey model that works like so:
ResponseSets have many Responses
Responses belong_to Answer
Answer model has a "value" column.
Given a ResponseSet, I'd like the sum of the Answers that are associated with each Response.
Ie, what I'd like to be able to do, (in imaginary code) is:
response_set.responses.answers.sum('value')
However, this obviously doesn't work, I need to build a query through response_set.responses, but I don't know how.
What's the SQL-fu way to tackle this in ActiveRecord?
After much trial and error I came up with this relatively simple solution, I hope this helps others in the future:
response_set.responses.joins(:answer).sum('answers.value')
To make this more convenient I just made this a method in the ResponseSet Model:
def total_value
self.responses.joins(:answer).sum('answers.value')
end
Well if you're using Rails 3.2 you can do something like:
response_set.responses.answers.pluck(:value).inject{|sum,x| sum + x }
Are the answers Integers, in the sense that you're looking to find ALL numeric answers associated with a response and literally add them all up? I think you could use map and inject for something like this, depending exactly on how your models/associations are set up..
response_set.responses.answers.map(:&value).inject(:+)
Can you post your models?

Modeling products with vastly different sets of needed-to-know information and linking them to lineitems?

I'm currently working on a site that sells products of varying types that are custom manufactured. I've got your general, standard cart schema: Order has many LineItems, LineItems have one Product, but I've run into a bit of a sticking point:
Lets say one of our products is a ball, and one of our products is a box of crayons. While people are creating their order, we end up creating items that could be represented by some psuedocode:
Ball:
attributes:
diameter: decimal
color: foreign_ref_to Colors.id
material: foreign_ref to Materials.id
CrayonBox:
attributes:
width: decimal
height: decimal
front_text: string
crayons: many_to_many with Crayon
...
Now, these are created and stored in our db before an order is made. I can pretty easily make it so that when an item is added to a cart, we get a product name and price by doing the linking from Ball or CrayonBox in my controller and generating the LineItem, but it would be nice if we could provide a full set of info for every line item.
I've thought of a few possible solutions, but none that seem ideal:
One: use an intermediary "product info" linking table, and represent different products in terms of that, so we'd have something like:
LineItem
information: many_to_many with product_information
...
ProductInformation:
lineitem: many_to_many with line_item
name: string
value: string
ProductInformation(name='color', value=$SOMECOLOR)
ProductInformation(name='color', value=$SOMEOTHERCOLOR)
...
The problem with this is that the types of data needed to be represented for each attribute of a product does not all fall under the same column type. I could represent everything with strings, but $DEITY knows I don't even come close to thinking that's a good solution.
The other solution I've thought of is having the LineItem table have a foreign key to each table that represents a Product type. Unfortunately, this means I would have to check for the existence of each foreign key in my controller. I don't like this very much at all, but I like it marginally better than stuffing every piece of data into one datatype and then dealing with all the conversion stuff outside of the DB.
One other possible solution would be to store the tablename of the product data in a column, but that can't possibly be a good thing to do, can it? I lose the capability of the db to link stuff together, and it strikes me as akin to using eval() where it's not needed -- and we all know that eval() isn't really needed very often.
I want to be able to say "give me the line item, and then the extended info for that line item", and have the correct set of information for various product types.
So, people who actually know what they're doing with database schema, what should I be doing? How should I be representing this? This seems like it would be a fairly common use case, but I haven't been able to find much info with googling -- is there a common pattern for things like this? Need more info? This can't possibly be outside of the realm of "you can use a RDBMS for this", can it?
Edit: I'm now fairly certain that what I want here is Class Table Inheritance. with an alias in my individual models to "normalize" the link followed to the "info" table for each product type. Unfortunately, the ORM I'm kinda stuck using for this (Doctrine 1.2) doesn't support Class Table Inheritance. I may be able to accomplish something similar with Doctrine's "column aggregation" inheritance, but egh. Anyone think I'm barking way up the wrong tree? I looked over EAV, and I don't think it quite fits the problem -- each set of information about different products is known, although they might be very different from product type A to product type B. The flexibility of EAV may be nice, but it seems like an abuse of the db for a problem like this.
It strikes me that this is a perfect fit for the likes of CouchDB / MongoDB which allow every 'row' to contain different attributes, yet permits indexed lookups. It should be fairly straightforward to build a hybrid structure using MySQL for the rigid relational parts and 'nosql' for the parts of varying shape.
Take a look at this discussion.
Assumptions:
You have some specific products you're selling. I.e., you know you're selling crayons, but not spatulas. The customer doesn't come to your site and try to order a product you've never heard of.
The products you're selling have a pre-existing set of attributes. I.e., crayons have color; crayon_boxes have width, height, crayons... The customer doesn't come to your site and try to specify the value for an attribute you've never heard of.
One way to do this (if you're a RBDM purist, please close your eyes now until I tell you to open them again) is to use an attribute string. So the table would be like this:
Products
+ ProductName
+ ProductAttribute
And then a sample record would be like this:
Product Name = "Crayon Box"
Product Attribute = "Height:5 inches;Width:7 inches"
With something like this, parse the name/value pairs in or out as necessary.