Creating a couchdb view to index if item in an array exists - indexing

I have the following sample documents in my couchdb. The original table in production has about 2M records.
{
{
"_id": "someid|goes|here",
"collected": {
"tags": ["abc", "def", "ghi"]
}
},
{
"_id": "someid1|goes|here",
"collected": {
"tags": ["abc", "klm","pqr"]
},
},
{
"_id": "someid2|goes|here",
"collected": {
"tags": ["efg", "hij","klm"]
},
}
}
Based on my previous question here, how to search for values when the selector is an array,
I currently have an index added for the collected.tags field, but the search is still taking a long time. Here is the search query I have.
{
"selector": {
"collected.tags": {
"$elemMatch": {
"$regex": "abc"
}
}
}
}
There are about 300k records matching the above condition, there search seems to take a long time. So, I want to create a indexed view to retrieve and lookup faster instead of a find/search. I am new to couchdb and am not sure how to setup the map function to create the indexed view.

Figured the map function out myself. Now all the documents are indexed and retrievals are faster
function (doc) {
if(doc.collected.tags.indexOf('abc') > -1){
emit(doc._id, doc);
}
}

Related

How not to expose duplicated (normalize?) nodes via GraphQL?

Given "user has many links" (what means a link was created by a user) DB entities relations, I want to develop API to fetch links along with users so that the returned data does not contain duplicated users.
In other words, instead of this request:
query {
links {
id
user {
id email
}
}
}
that returns the following data:
{
"data": {
"links": [
{
"id": 1,
"user": {
"id": 2,
"email": "user2#example.com"
}
},
{
"id": 2,
"user": {
"id": 2,
"email": "user2#example.com"
}
}
]
}
}
I want to make a request like this (note the "references" column):
query {
links {
id
userId
}
references {
users {
id
email
}
}
}
that returns associated users without duplicates:
{
"data": {
"links": [
{
"id": 1,
"userId": 2
},
{
"id": 2,
"userId": 2
},
],
"references": {
"users": [
{
"id": 2,
"email": "user2#example.com"
}
]
}
}
}
That should reduce amount of data transferred between client and server that adds a bit of speed boost.
Is there ready common implementation on any language of that idea? (Ideally, seeking for Ruby)
It's not a query or server role to normalize data.
there are no such possibilities in GraphQL specs;
server must return all asked fields within queried [response] structure;
... but you can implement some:
standarized (commonly used) pagination (relay style edges/nodes, nodes only or better both);
query [complexity] weights to promote this optimized querying style - separate problem;
reference dictionary field within queried type;
links {
egdes {
node {
id
title
url
authorId
# possible but limited usage with heavy weights
# author {
# id
# email
# }
}
}
pageInfo {
hasNextPages
}
referencedUsers {
id
email
}
}
where:
User has id and email props;
referencedUsers is [User!] type;
node.author is User type;
Normalizing Graphql client, like Apollo, can easily access cached user fields without making separate requests.
You can render (react?) some <User/> component (within <Link /> component) passing node.authorId as an argument like <User id={authorId} />. User component can useQuery hook with cache-only policy to read user props/fields.
See Apollo docs for details. You should implement this for yourself and document this to help/guide API users.

how to select a single item and get it's relations in faunadb?

I have two collections which have the data in the following format
{
"ref": Ref(Collection("Leads"), "267824207030650373"),
"ts": 1591675917565000,
"data": {
"notes": "voicemail ",
"source": "key-name",
"name": "Glenn"
}
}
{
"ref": Ref(Collection("Sources"), "266777079541924357"),
"ts": 1590677298970000,
"data": {
"key": "key-name",
"value": "Google Ads"
}
}
I want to be able to query the Leads collection and be able to retrieve the corresponding Sources document in a single query
I came up with the following query to try and use an index but I couldn't get it to run
Let(
{
data: Get(Ref(Collection('Leads'), '267824207030650373'))
},
{
data: Select(['data'],Var('data')),
source: q.Lambda('data',
Match(Index('LeadSourceByKey'), Get(Select(['source'], Var('data') )) )
)
}
)
Is there an easy way to retrieve the Sources document ?
What you are looking for is the following query which I broke down for you in multiple steps:
Let(
{
// Get the Lead document
lead: Get(Ref(Collection("Leads"), "269038063157510661")),
// Get the source key out of the lead document
sourceKey: Select(["data", "source"], Var("lead")),
// use the index to get the values via match
sourceValues: Paginate(Match(Index("LeadSourceValuesByKey"), Var("sourceKey")))
},
{
lead: Var("lead"),
sourceValues: Var("sourceValues")
}
)
The result is:
{
lead: {
ref: Ref(Collection("Leads"), "269038063157510661"),
ts: 1592833540970000,
data: {
notes: "voicemail ",
source: "key-name",
name: "Glenn"
}
},
sourceValues: {
data: [["key-name", "Google Ads"]]
}
}
sourceValues is an array since you specified in your index that there will be two items returned, the key and the value and an index always returns the array. Since your Match could have returned multiple values in case it wasn't a one-to-one, this becomes an array of an array.
This is only one approach, you could also make the index return a reference and Map/Get to get the actual document as explained on the forum.
However, I assume you asked the same question here. Although I applaud asking questions on stackoverflow vs slack or even our own forum, please do not just post the same question everywhere without linking to the others. This makes many people spend a lot of time while the question is already answered elsewhere.
You might probably change the Leads document and put the Ref to Sources document in source:
{
"ref": Ref(Collection("Leads"), "267824207030650373"),
"ts": 1591675917565000,
"data": {
"notes": "voicemail ",
"source": Ref(Collection("Sources"), "266777079541924357"),
"name": "Glenn"
}
}
{
"ref": Ref(Collection("Sources"), "266777079541924357"),
"ts": 1590677298970000,
"data": {
"key": "key-name",
"value": "Google Ads"
}
}
And then query this way:
Let(
{
lead: Select(['data'],Get(Ref(Collection('Leads'), '267824207030650373'))),
source:Select(['source'],Var('lead'))
},
{
data: Var('lead'),
source: Select(['data'],Get(Var('source')))
}
)

Elasticsearch make whole index or type not_analyzed

When I create an elasticsearch index I don't know what fields will be inserted with new docs. So I can't specify which fields are to be "index": "not_analyzed" at index creation time. Fortunately I want all fields to be not_analyzed so is there a way to have the entire index or type, meaning all created fields, not_analyzed?
As per documentation while creating index define analyzer default or default_index to be of type keyword.
Example
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"type": "keyword"
}
}
}
}
}
You can also scope the analyzer per type at the moment but looks like it will be deprecated in the future issue 8874.
However currently you can set default analyzer for a type in the mapping as follows :
put test/test_type/_mapping
{
"test_type" : {
"analyzer": "keyword"
}
}

elastic search query filter out ids by wildcard

I'm hoping to create a query where it will filter out IDs containing a wildcard. For instance, I would like to search for something everywhere except where the ID contains the word current. Is this possible?
Yes it is possible using Regex Filter/Regex Query. I could not figure a way to directly do it using the Complement option hence I've used bool must_not to solve your problem for the time being. I'll refine the answer later if possible.
POST <index name>/_search
{
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must_not": [
{
"regexp": {
"ID": {
"value": ".*current.*"
}
}
}
]
}
}
}

How do I sort ElasticSearch when it's empty?

Sometimes, I have nothing in the index, sometimes, I have some documents. That's just the nature of my application. When the index does contain documents, I sort by "final_score" descending. My query looks like this:
GET /_search
{
"query": {
"match_all":{}
},
"sort":[
{ "final_score" : "desc" }
]
}
However, this query breaks when there are 0 documents in the index. I would have to remove the sort to make the query work.
How can I make this query work with any amount of documents (0, or more?)
If you don't have field and ask elasticsearch to sort by that field then there is problem,
So,Have mapping for final_score, so that it will not throw error (if nothing is indexed also).
Example:
POST http://localhost:9200/index/type/_mapping
{
"type": {
"properties": {
"final_score": {
"type": "integer"
}
}
}
}