Index return "data" field entirely in fauna db - faunadb

I am trying to create an index that returns entire data object of the documents in a collection;
here is the code:
CreateIndex({
name: "users_by_data",
source: Collection("users"),
values: { field: ['data'] }
})
but after creation it says: Values Not set (using ref by default)
If I specifically define fields (separately by their name), it will behave as expected, but data isn't working. the question is:
Is it impossible (e.g. for performance reasons) or am I doing it wrong?
side note: I am aware that I can do Lambda function on Paginate and achieve similar result, but this question specifically is about the Index level;

You can currently index regular values (strings, numbers, dates, etc) and you can index an array which will more or less 'unroll' the array into separate index entries. However, what you are trying, indexing an object is not possible at this point. An object (like data) will be ignored if you try to index it.
Currently, you have two options:
as you mentioned, using Map/Get at query time.
listing all values of the data object in the index since you can select specific values of an object in the index (which is however less flexible if new attributes arrive later on in the object)
We intend to support indexing of objects in the future, I can't provide an ETA yet though. There is a feature request on our forums as well that you can vote up: https://forums.fauna.com/t/object-as-terms-instead-of-scalar-s/628

You're going to want to use the Select function on the Ref you get back from the Index if you only want the data field back.
For an individual document, you can do something like this
Select( "data",
Get(
Match(
Index("yourIndexName"),
**yourIndexTerm // Could point to String/Number/FQL Ref
)
)
)
For a list of documents, you can use Paginate as you said but you can still pull the data property out of each document
Map(
Paginate(
Match(
Index("yourIndexName"),
**yourIndexTerm // Could point to String/Number/FQL Ref
)
),
Lambda("doc", Select("data", Get(Var("doc"))))
)

Related

How to querying and filtering efficiently on Fauna DB?

For example, let’s assume we have a collection with hundreds of thousands of documents of clients with 3 fields, name, monthly_salary, and age.
How can I search for documents that monthly_salary is higher than 2000 and age higher than 30?
In SQL this would be straightforward but with Fauna, I´m struggling to understand the best approach because terms of Index only work with an exact match. I see in docs that I can use the Filter function but I would need to get all documents in advance so it looks a bit counterintuitive and not performant.
Below is an example of how I can achieve it, but not sure if it’s the best approach, especially if it contains a lot of records.
Map(
Filter(
Paginate(Documents(Collection('clients'))),
Lambda(
'client',
And(
GT(Select(['data', 'monthly_salary'], Get(Var('client'))), 2000),
GT(Select(['data', 'age'], Get(Var('client'))), 30),
)
)
),
Lambda(
'filteredClients',
Get(Var('filteredClients'))
)
)
Is this correct or I´m missing some fundamental concepts about Fauna and FQL?
can anyone help?
Thanks in advance
Efficient searching is performed using Indexes. You can check out the docs for search with Indexes, and there is a "cookbook" for some different search examples.
There are two ways to use Indexes to search, and which one you use depends on if you are searching for equality (exact match) or inequality (greater than or less than, for example).
Searching for equality
If you need an exact match, then use Index terms. This is most explicit in the docs, and it is also not what your original question is about, so I am not going to dwell much here. But here is a simple example
given user documents with this shape
{
ref: Ref(Collection("User"), "1234"),
ts: 16934907826026,
data: {
name: "John Doe",
email: "jdoe#example.com,
age: 50,
monthly_salary: 3000
}
}
and an index defined like the following
CreateIndex({
name: "users_by_email",
source: Collection("User"),
terms: [ { field: ["data", "email"] } ],
unique: true // user emails are unique
})
You can search for exact matches with... the Match function!
Get(
Match(Index("user_by_email"), "jdoe#example.com")
)
Searching for inequality
Searching for inequalities is more interesting and also complicated. It requires using Index values and the Range function.
Keeping with the document above, we can create a new index
CreateIndex({
name: "users__sorted_by_monthly_salary",
source: Collection("User"),
values: [
{ field: ["data", "monthly_salary"] },
{ field: ["ref"] }
]
})
Note that I've not defined any terms in the above Index. The important thing for inequalities is again the values. We've also included the ref as a value, since we will need that later.
Now we can use Range to get all users with salary in a given range. This query will get all users with salary starting at 2000 and all above.
Paginate(
Range(
Match(Index("users__sorted_by_monthly_salary")),
[2000],
[]
)
)
Combining Indexes
For "OR" operations, use the Union function.
For "AND" operations, use the Intersection function.
Functions like Match and Range return Sets. A really important part of this is to make sure that when you "combine" Sets with functions like Intersection, that the shape of the data is the same.
Using sets with the same shape is not difficult for Indexes with no values, they default to the same single ref value.
Paginate(
Intersection(
Match(Index("user_by_age"), 50), // type is Set<Ref>
Match(Index("user_by_monthly_salary, 3000) // type is Set<Ref>
)
)
When the Sets have different shapes they need to be modified or else the Intersection will never return results
Paginate(
Intersection(
Range(
Match(Index("users__sorted_by_age")),
[30],
[]
), // type is Set<[age, Ref]>
Range(
Match(Index("users__sorted_by_monthly_salary")),
[2000],
[]
) // type is Set<[salary, Ref]>
)
)
{
data: [] // Intersection is empty
}
So how do we change the shape of the Set so they can be intersected? We can use the Join function, along with the Singleton function.
Join will run an operation over all entries in the Set. We will use that to return only a ref.
Join(
Range(Match(Index("users__sorted_by_age")), [30], []),
Lambda(["age", "ref"], Singleton(Var("ref")))
)
Altogether then:
Paginate(
Intersection(
Join(
Range(Match(Index("users__sorted_by_age")), [30], []),
Lambda(["age", "ref"], Singleton(Var("ref")))
),
Join(
Range(Match(Index("users__sorted_by_monthly_salary")), [2000], []),
Lambda(["age", "ref"], Singleton(Var("ref")))
)
)
)
tips for combining indexes
You can use additional logic to combine different indexes when different terms are provided, or search for missing fields using bindings. Lot's of cool stuff you can do.
Do check out the cook book and the Fauna forums as well for ideas.
BUT WHY!!!
It's a good question!
Consider this: Since Fauna is served as a serverless API, you get charged for each individual read and write on your documents and indexes as well as the compute time to execute your query. SQL can be much easier, but it is a much higher level language. Behind SQL sits a query planner making assumptions about how to get you your data. If it cannot do it efficiently it may default to scanning your entire table of data or otherwise performing an operation much more expensive than you might have expected.
With Fauna, YOU are the query planner. That means it is much more complicated to get started, but it also means you have fine control over the performance of you database and thus your cost.
We are working on improving the experience of defining schemas and the indexes you need, but at the moment you do have to define these queries at a low level.

Building a (process) variable in Appian using the value of another one?

As far as I understand, it is not possible in Appian to dynamically construct (process) variable names, just like you would do e.g. with bash using backticks like MY_OBJECT=pv!MY_CONS_`extract(valueOfPulldown)`. Is that correct? Is there a workaround?
I have set of Appian constants, let's call them MY_CONS_FOO, MY_CONS_BAR, MY_CONS_LALA, all of which are e.g. refering to an Appian data store entity. I would like to write an Appian expression rule which populates another variable MY_OBJECT of the same type (here: data store entity), depending e.g. of the options of a pull-down menu having the possible options stored in an array MY_CONS_OPTIONS looking as follows
FOO
BAR
LALA
I could of course build a lengthy case-structure which I have to maintain in addition to MY_CONS_OPTIONS, so I am searching for a more dynanmic approach using the extract() function depending on valueOfPulldown as the chosen value of the pulldown-menu.
Edit: Here the expression-rule (in pseudo-code) I want to avoid:
if (valueOfPulldown = 'FOO') then MY_OBJECT=pv!MY_CONS_FOO
if (valueOfPulldown = 'BAR') then MY_OBJECT=pv!MY_CONS_BAR
if (valueOfPulldown = 'LALA') then MY_OBJECT=pv!MY_CONS_LALA
The goal is to be able to change the data store entity via pulldown-menu.
This can help you find what is behind your constant.
fn!typeName(fn!typeOf(cons!YOUR_CONSTANT)).
Having in mind additional details I would do as follows:
Create separate expression that will combine details into list of Dictionary like below:
Expression results (er):
{
{dd_label: "label1", dd_value: 1, cons: "cons!YOUR_CONSTANT1" }
,{dd_label: "label2", dd_value: 2, cons: "cons!YOUR_CONSTANT2" }
}
on UI for your dropdown control use er.dd_label as choiceLabels and er.dd_value as choiceValues
when user selects value on Dropdown save dropdown value to some local variable and then use it to find your const by doing:
property( index(er, wherecontains(local!dropdownselectedvalue, tointeger(er.dd_value))), "cons")
returned value of step 3 is your constant
This might not be perfect as you still have to maintain your dictionary but you can avoid long if...else statements.
As a alternative have a look on Decisions Tables in Appian https://docs.appian.com/suite/help/21.1/Appian_Decisions.html

SHOW KEYS in Aerospike?

I'm new to Aerospike and am probably missing something fundamental, but I'm trying to see an enumeration of the Keys in a Set (I'm purposefully avoiding the word "list" because it's a datatype).
For example,
To see all the Namespaces, the docs say to use SHOW NAMESPACES
To see all the Sets, we can use SHOW SETS
If I want to see all the unique Keys in a Set ... what command can I use?
It seems like one can use client.scan() ... but that seems like a super heavy way to get just the key (since it fetches all the bin data as well).
Any recommendations are appreciated! As of right now, I'm thinking of inserting (deleting) into (from) a meta-record.
Thank you #pgupta for pointing me in the right direction.
This actually has two parts:
In order to retrieve original keys from the server, one must -- during put() calls -- set policy to save the key value server-side (otherwise, it seems only a digest/hash is stored?).
Here's an example in Python:
aerospike_client.put(key, {'bin': 'value'}, policy={'key': aerospike.POLICY_KEY_SEND})
Then (modified Aerospike's own documentation), you perform a scan and set the policy to not return the bin data. From this, you can extract the keys:
Example:
keys = []
scan = client.scan('namespace', 'set')
scan_opts = { 'concurrent': True, 'nobins': True, 'priority': aerospike.SCAN_PRIORITY_MEDIUM }
for x in (scan.results(policy=scan_opts)): keys.append(x[0][2])
The need to iterate over the result still seems a little clunky to me; I still think that using a 'master-key' Record to store a list of all the other keys will be more performant, in my case -- in this way, I can simply make one get() call to the Aerospike server to retrieve the list.
You can choose not bring the data back by setting includeBinData in ScanPolicy to false.

Cypher-Neo4j Node Single Property change to Array

Following is a Node we having in DB
P:Person { name:"xxx", skill:"Java" }
and after awhile, we would like to change the Skill to skill array, is it possible?
P:Person { name:"xxx", skill:["Java", "Javascript"] }
Which Cypher query should I use?
If you have a single skill value in skill, then just do
MATCH (p:Person)
WHERE HAS (p.skill)
SET p.skill=[p.skill]
If there are multiple values you need to convert to an array such as P:Person { name:"xxx", skill:"Java","JavaScript" } then this should work:
MATCH (p:P)
SET p.skill= split(p.skill,",")
In fact, I think your real problem here is not how to get an array property in a node, but how to store it. Your data model is wrong in my opinion, storign data as array in neo4j is not common, since you have relations to store multiple skills (in your example).
How to create your data model
With your question, I can already see that you have one User, and one User can have 1..n skills.
I guess that one day (maybe tomorrow) you will need to know which users are able to use Java, C++, PHP, and every othre skills.
So, Here you can already see that every skill should have its own node.
What is the correct model in this case?
I think that, still with only what you said in question, you should have something like this:
(:Person{name:"Foo"})-[:KNOWS]->(:Skill{name:"Bar"})
using such a data model, you can get every Skill known by a Person using this query:
MATCH (:Person{name:"Foo"})-[:KNOWS]->(skill:Skill)
RETURN skill //or skill.name if you just want the name
and you can also get every Person who knows a Skill using this:
MATCH (:Skill{name:"Bar"})<-[:KNOWS]-(person)
RETURN person //Or person.name if you just want the name
Keep in mind
Storing array values in properties should be the last option when you are using neo4j.
If a property can be found in multiple nodes, having the same value, you can create a node to store it, then you will be able to link it the other nodes using relations, and finding every node having the property X = Y will be easier.

Is it possible to add custom metadata to a Lucene field?

I've come to the point where I need to store some additional data about where a particular field comes from in my Lucene.Net index. Specifically, I want to attach a guid to certain fields of a document when the field is added to the document, and retrieve it again when I get the document from a search result.
Is this possible?
Edit:
Okay, let me clarify a bit by giving an example.
Let's say I have an object that I want to allow the user to tag with custom tags like "personal", "favorite", "some-project". I do this by adding multiple "tag" fields to the document, like so:
doc.Add( new Field( "tag", "personal" ) );
doc.Add( new Field( "tag", "favorite" ) );
The problem is I now need to record some meta data about each individual tag itself, specifically a guid representing where that tag came from (imagine it as a user id). Each tag could potentially have a different guid, so I can't simply create a "tag-guid" field (unless the order of the values is preserved---see edit 2 below). I don't need this metadata to be indexed (and in fact I'd prefer it not to be, to avoid getting hits on metadata), I just need to be able to retrieve it again from the document/field.
doc.GetFields( "tag" )[0].Metadata...
(I'm making up syntax here, but I hope my point is clear now.)
Edit 2:
Since this is a completely different question, I've posted a new question for this approach: Is the order of multi-valued fields in Lucene stable?
Okay let's try another approach... The key problem area is the indeterminacy of the multiple field values under the same field name (e.g. "tag"). If I could introduce or obtain some kind of determinacy here, I might be able to store the metadata in another field.
For example, if I could rely on the order of the values of the field never changing, I could use an index in the set of values to identify exactly which tag I am referring to.
Is there any guarantee that the order I add the values to a field will remain the same when I retrieve the document at a later time?
Depending on your search requirements for this index, this may be possible. That way you can control the order of fields. It would require updating both fields as the tag list changes of course, but the overhead may be worth it.
doc.Add(new Field("tags", "{personal}|{favorite}"));
doc.Add(new Field("tagsref", "{1234}|{12345}"));
Note: using the {} allows you to qualify your search for uniqueness where similar values exist.
Example: If values were stored as "person|personal|personage" searching for "person" would return a document that has any one of person, personal or personage. By qualifying in curly brackets like so: "{person}|{personal}|{personage}", I can search for "{person}" and be sure it won't return false positives. Of course, this assumes you don't use curly brackets in your values.
I think you're asking about payloads.
Edit: From your use case, it sounds like you have no desire to use this metadata in your search, you just want it there. (Basically, you want to use Lucene as a database system.)
So, why can't you use a binary field?
ExtraData ed = new ExtraData { Tag = "tag", Type = "personal" };
byte[] byteData = BinaryFormatter.Serialize(ed); // this isn't the correct code, but you get the point
doc.Add(new Field("myData", byteData, Field.Store.YES));
Then you can deserialize it on retrieval.