Searchkick Synonyms not getting mapped as expected

Searchkick Synonyms not getting mapped as expected - ruby-on-rails-5

I was trying to map VP to Vice President, CEO to Chief Executive Officer and so on. So that when my search keyword is VP I could see results with Vice President as well. Searchkick gem is what I used to achieve this.
I am having a person model like one below
class Person < ApplicationRecord
searchkick merge_mappings: true,
word_start: [:name],
text_middle: [:title],
synonyms:[['vp', 'vice president'],
['it', 'information technology'],
['ceo', 'chief executive officer'],
['cto', 'chief technology officer']]
end
After re-indexing my entire data and when I check the index metadata this is what I see
"analysis": {
"filter": {
"searchkick_synonym": {
"type": "synonym",
"synonyms": [
"vp,vicepresident",
"it,informationtechnology",
"ceo,chiefexecutive officer",
"cto,chieftechnology officer"
]
}
}
}
Why is this mapped like vicepresident without space? Is this the reason why synonyms are not working in my search query? Is there any issue in the model class I created?
NB: ElasticSearch Version: 7.6.0, SearchKick Gem: 4.3.0

The synonyms option doesn't support multi-word synonyms. In Searchkick 4.4.0+, you can use the search_synonyms option for multi-word synonyms.

Related

JSON API relationship with meta information

I have an entity contract with relationship contract_contacts that should be presented in JSON API format.
To be more clear here's the structure of my entities:
Contract
id
name
ContractContact
contract_id
contact_id
type
comment
Contact
id
name
Possible JSON API output will look like:
{
"data": {
"type": "contracts",
"id": "1",
"attributes": {
"name": "Contract 1"
},
"relationships": {
"contacts": {
"data": [
{
"type": "contract_contacts",
"id": "1"
},
{
"type": "contract_contacts",
"id": "2"
}
]
}
}
}
}
This approach is not good enough - you have to create additional resource for relation where you will store your contact and comment with type. You have to include with 2 levels deep to get you contact fields. Also in this case to create contract frontend should work with both resources:
Create contract contact and get id
Then Create contract with relationship
with id from above
The second approach is seems hacky to me because it will use meta and it's up to you how to use it. Example:
{
"data": {
"type": "contracts",
"id": "1",
"attributes": {
"name": "Contract 1"
},
"relationships": {
"contacts": {
"data": [
{
"meta": {
"comment": "comment 1",
"type": 1
},
"type": "contacts",
"id": "10"
},
{
"meta": {
"comment": "comment 2",
"type": 2
},
"type": "contacts",
"id": "11"
}
]
}
}
}
}
This approach will simplify the mess with api requests that was in previous example.
But is that correct to POST/PUT/PATCH with meta fields as they are not supposed to be changed from client (or supposed to be)? I'm confused with this part.

The relationship that you are describing is often referred to as a has-many-through relationship: A contract has a many contacts through a contract_contacts. These is defined as a relationship that links two resources through an intermediate resource.
JSON:API specification does not provide first-level support for these kind of relationship. You should instead model them through separate resources as described by you as your first option. This allows you to create, modify and delete your intermediate resource in the same way as any other resource. Doing so reduces the complexity as the intermediate resource is just another resource type as any other.
You mentioned two problems with doing so:
You have to include with 2 levels deep to get you contact fields.
This is true but shouldn't be an issue. include query parameter allows your client to sideload resources any many level deep as it needs. The response document might be a little bit bigger than it would be if the information of the intermediate resource is stored on the relationship itself but that shouldn't be relevant in production after gzip.
Also in this case to create contract frontend should work with both resources:
Create contract contact and get id
Then Create contract with relationship with id from above
This is true and a serious limitation of the current stable version of JSON:API specification (v1.0). It's not directly related to has-many-through relationships so. It's a general limitation of the specification, which does not support creating, modifying and/or deleting more than one resource with one request.
An official Atomic Operations extension is proposed for v1.1 of the specification to address that limitation. It's very likely that these one or a similar proposal will be included in the upcoming version.
It might be tempting to store the information of the intermediate model as meta data on the relationship. But doing so will introduce serious limitations which for I would strongly recommend to not take that path:
The JSON:API specification does not cover changing meta data. You would need to introduce your own specification to create or update these meta data.
Client-side libraries for the JSON:API specification do not expect such information to be available as meta data of the relationship. It's very likely that the consumers will have a hard time processing the information.
Storing information of the intermediate resource would lock you into using resource linkage to express relationship information in a resource document. You would not be able to use related resource links. These may introduce serious performance issues as resource linkage requires to always lookup the IDs of related resources in the database, which is not required if using related resource links.

Without JOINs, what is the right way to handle data in document databases?

I understand that JOINs are either not possible or frowned upon in document databases. I'm coming from a relational database background and trying to understand how to handle such scenarios.
Let's say I have an Employees collection where I store all employee related information. The following is a typical employee document:
{
"id": 1234,
"firstName": "John",
"lastName": "Smith",
"gender": "Male",
"dateOfBirth": "3/21/1967",
"emailAddresses":[
{ "email": "johnsmith#mydomain.com", "isPrimary": "true" },
{ "email": "jsmith#someotherdomain.com", "isPrimary": "false" }
]
}
Let's also say, I have a separate Projects collection where I store project data that looks something like that:
{
"id": 444,
"projectName": "My Construction Project",
"projectType": "Construction",
"projectTeam":[
{ "_id": 2345, "position": "Engineer" },
{ "_id": 1234, "position": "Project Manager" }
]
}
If I want to return a list of all my projects along with project teams, how do I handle making sure that I return all the pertinent information about individuals in the team i.e. full names, email addresses, etc?
Is it two separate queries? One for projects and the other for people whose ID's appear in the projects collection?
If so, how do I then insert the data about people i.e. full names, email addresses? Do I then do a foreach loop in my app to update the data?
If I'm relying on my application to handle populating all the pertinent data, is this not a performance hit that would offset the performance benefits of document databases such as MongoDB?
Thanks for your help.

"...how do I handle making sure that I return all the pertinent information about individuals in the team i.e. full names, email addresses, etc? Is it two separate queries?"
It is either 2 separate queries OR you denormalize into the Project document. In our applications we do the 2nd query and keep the data as normalized as possible in the documents.
It is actually NOT common to see the "_id" key anywhere but on the top-level document. Further, for collections that you are going to have millions of documents in, you save storage by keeping the keys "terse". Consider "name" rather than "projectName", "type" rather than "projectType", "pos" rather than "position". It seems trivial but it adds up. You'll also want to put an index on "team.empId" so the query "how many projects has Joe Average worked on" runs well.
{
"_id": 444,
"name": "My Construction Project",
"type": "Construction",
"team":[
{ "empId": 2345, "pos": "Engineer" },
{ "empId": 1234, "pos": "Project Manager" }
]
}
Another thing to get used to is that you don't have to write the whole document every time you want to update an individual field or, say, add a new member to the team. You can do targeted updates that uniquely identify the document but only update an individual field or array element.
db.projects.update(
{ _id : 444 },
{ $addToSet : "team" : { "empId": 666, "position": "Minion" } }
);
The 2 queries to get one thing done hurts at first, but you'll get past it.

Mongo DB is a document storage database.
It supports High Availability, and Scalability.
For returning a list of all your projects along with project team(details),
according to my understanding, you will have to run 2 queries.
Since mongoDb do not have FK constraints, we need to maintain it at the program level.
Instead of FK constraints,
1) if the data is less, then we can embed the data as a sub document.
2) rather than normalized way of designing the db, in MongoDb we need to design according to the access pattern. i.e. the way we need to query the data more likely. (However time for update is more(slow), but at the user end the performance mainly depends on read activity, which will be better than RDBMS)
The following link provides a certificate course on mongo Db, free of cost.
Mongo DB University
They also have a forum, which is pretty good.

Find relation between two entities in Freebase

I am new in Freebase and I have a simple question . I would like to use Freebase KB to find relation between two entities. For example if I have name entities "Washington" and "United States" , I would like to send a query to Freebase and get :
Location/Location/Capital or Null in the case of No relation.
Thank you very much.

If you only want to go one ply out (ie nearest neighbors), this is pretty simple to do using the reflection API if you're using the online version of Freebase. If you're using the bulk downloads, you'll need to work with whatever query engine you're using (probably SPARQL unless you converted the RDF to something else).
If you want to find the shortest path(es) regardless of who far apart they are, it becomes a graph search algorithm.
EDIT: If you only want to find capitols, you can fill in your IDs in this query:
[{
"type": "/location/administrative_division_capital_relationship",
"capital": [{
"id": null
}],
"administrative_division": [{
"id": null
}],
"limit": 1
}]
Note that for Washington, D.C., this will return null because the data isn't in Freebase.
If you need to handle arbitrary properties, you'll need to use reflection. See https://developers.google.com/freebase/mql/ch03#reflection

[Freebase]: Finding relationship between nodes

I am new to Freebase and I have been trying to find relationships between 2 nodes without success.
For example, I want to find if there is link between Lewis Hamilton(/en/lewis_hamilton) and Formula One(/en/formula_one), which there is in real life, but I can't seem to find it.
I have tried the following MQL codes, alternating IDs as well :
1)
[{
"type" : "/type/link",
"source" : { "id" : "/en/lewis_hamilton" },
"master_property" : null,
"target" : { "id" : "/en/formula_one" },
"target_value" : null
}]
2)
{
"id":"/en/lewis_hamilton",
"/type/reflect/any_master":[{
"link":null,
"name":null
}],
"/type/reflect/any_reverse":[{
"link":null,
"name":null
}],
"/type/reflect/any_value":[{
"link":null,
"value":null
}]
}
I'm also not able to use a couple of their apps that could do this because it returns "user rate limit exceeded" every time. Apps are:
http://between.freebaseapps.com
http://shortestpath.freebaseapps.com
Do you guys have any suggestions?

The queries that you gave are correct except that they only look at relationships that are one link apart. Surprisingly there isn't a path from Lewis Hamilton to Formula One in Freebase right now. If there was it might look something like this:
/en/lewis_hamilton → /type/object/type → /base/formula1/formula_1_driver
/base/formula1/formula_1_driver → /type/type/domain → /base/formula1
/base/formula1 → /freebase/domain_profile/equivalent_topic → /en/formula_one
Freebase doesn't support recursive queries so there's no good way to find these multi-link paths between topics. The apps that you tried simulate recursion by generating queries with increasingly nested subqueries. Unfortunately they are out of date and missing the proper API keys to run properly right now. Here's what those nested queries look like:
{
"id": "/en/lewis_hamilton",
"name": null,
"/type/reflect/any_master": [{
"link": {
"master_property": null,
"target": {
"id": null,
"name": null,
"/type/reflect/any_master": [{
"link": {
"master_property": null,
"target": {
"id": "//base/formula1",
"name": null
}
},
"name": null
}]
}
},
"name": null
}]
}
These sorts of queries can take a long time to run and are probably better if run locally over the Freebase data dumps.

Freebase is returning nothing but 503s right now, so it's a little difficult to experiment, but
All apps on Freebaseapps are open source, so looking at the sources for the apps you found should give you some good hints. The app directory is at https://www.freebase.com/apps (but isn't rendering right now)
All apps on Freebaseapps can be cloned with a single click. Pretty much every app written on that infrastructure stopped working when Google switched to the new API and the developers are unlikely to fix them if they haven't been looked at in years, but you can probably get the ones of interest working by a) cloning them, b) registering for an API key and c) adding that API key to cloned app.

How to get ids from Freebase given part of a name?

When I go to Freebase.com and search for bush (for example), I get suggestions like George Bush, Kate Bush, George H.W. Bush etc.
How can I get that list as IDs from the query api?
I am trying to get the a list, like this, of ids for a particular name:
[{
"id": null,
"name": "bush",
}]

You can use the search API
https://www.googleapis.com/freebase/v1/search?query=bush
or if you want to use MQL, use can use the contains operator (~=)
https://www.googleapis.com/freebase/v1/mqlread?query=[{%22name~=%22:%20%22bush%22,%20%22id%22:null}]
The contains operator does whole word matching by default. If you want partial word matches, you can add additional wild cards, but you risk running into timeouts, particularly for leading wildcards.
https://www.googleapis.com/freebase/v1/mqlread?query=[{%22name~=%22:%20%22*bush*%22,%20%22id%22:null}]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas