In our raven based application we are starting to experience major performance issues when the master document starts increasing in size, as it holds a lot of collections that keep growing. As such I am now planning a major data redesign that is likely to take months and I want to be sure I'm on the right track before I do so.
The current design looks like this:
Community
{
id:,
name,
//other properties
members[
{
id:,
name:,
date of birth:
//etc
},
{
//another member, this list could potentially grow to hundreds of thousands
}
],
league
[
{
id,
name,
seasons[
{...},
{
id:,
divisions
[
{
id:,
name:
matches[
{
id:,
//match details
},
{
//another match. there could be hundreds here in a big league
},
{}
As we started hitting performance issues, we started using transformers to only load what is needed, but that didn't solve the problem fully as some of our leagues are a couple mb's just on their own. The other issue is we always need to be doing member checks to check for admin/membership rights so the members list is always needed.
I understand I could omit the member list completely using a transformer and use an index for membership checks, but the problem remains about what to do, when a member is added , that list will need to be loaded and with an upcoming project there is a potential for it to grow to half a million people or more.
So my plan is separate each entity into it's own document, so in the case of leagues, I will have a league document, and a match document, with matches containing {leagueId, season number, division number, other match details}.
Each member will have their own document with a list of community document Id's they're a member of.
I'm just a bit worried, that using this design, is missing the whole point of a document db and we may as well have used sql, or do you think I'm on the right track with approach?
Related
I'm working on an app where users learn about different patterns of grammar in a language. There are three collections; users and patterns are interrelated by progress, which looks like this:
Create(Collection("progress"), {
data: {
userRef: Ref(Collection("users"), userId),
patternRef: Ref(Collection("patterns"), patternId),
initiallyLearnedAt: Now(),
lastReviewedAt: Now(),
srsLevel: 1
}
})
I've learned how to do some basic Fauna queries, but now I have a somewhat more complex relational one. I want to write an FQL query (and the required indexes) to retrieve all patterns for which a given user doesn't have progress. That is, everything they haven't learned yet. How would I compose such a query?
One clarifying assumption - a progress document is created when a user starts on a particular pattern and means the user has some progress. For example, if there are ten patterns and a user has started two, there will be two documents for that user in progress.
If that assumption is valid, your question is "how can we find the other eight?"
The basic approach is:
Get all available patterns.
Get the patterns a user has worked on.
Select the difference between the two sets.
1. Get all available patterns.
This one is trivial with the built-in Documents function in FQL:
Documents(Collection("patterns"))
2. Get the patterns a user has worked on.
To get all the patterns a user has worked on, you'll want to create an index over the progress collection, as you've figured out. Your terms are what you want to search on, in this case userRef. Your values are the results you want back, in this case patternRef.
This looks like the following:
CreateIndex({
name: "patterns_by_user",
source: Collection("progress"),
terms: [
{ field: ["data", "userRef"] }
],
values: [
{ field: ["data", "patternRef"] }
],
unique: true
})
Then, to get the set of all the patterns a user has some progress against:
Match(
"patterns_by_user",
Ref(Collections("users"), userId)
)
3. Select the difference between the two sets
The FQL function Difference has the following signature:
Difference( source, diff, ... )
This means you'll want the largest set first, in this case all of the documents from the patterns collection.
If you reverse the arguments you'll get an empty set, because there are no documents in the set of patterns the user has worked on that are not also in the set of all patterns.
From the docs, the return value of Difference is:
When source is a Set Reference, a Set Reference of the items in source that are missing from diff.
This means you'll need to Paginate over the difference to get the references themselves.
Paginate(
Difference(
Documents(Collection("patterns")),
Match(
"patterns_by_user",
Ref(Collection("users"), userId)
)
)
)
From there, you can do what you need to do with the references. As an example, to retrieve all of the data for each returned pattern:
Map(
Paginate(
Difference(
Documents(Collection("patterns")),
Match(
"patterns_by_user",
Ref(Collection("users"), userId)
)
)
),
Lambda("patternRef", Get(Var("patternRef")))
)
Consolidated solution
Create the index patterns_by_user as in step two
Query the difference as in step three
I am using Audit.Net library to log EntityFramework actions into a database (currently everything into one AuditEventLogs table, where the JsonData column stores the data in the following Json format:
{
"EventType":"MyDbContext:test_database",
"StartDate":"2021-06-24T12:11:59.4578873Z",
"EndDate":"2021-06-24T12:11:59.4862278Z",
"Duration":28,
"EntityFrameworkEvent":{
"Database":"test_database",
"Entries":[
{
"Table":"Offices",
"Name":"Office",
"Action":"Update",
"PrimaryKey":{
"Id":"40b5egc7-46ca-429b-86cb-3b0781d360c8"
},
"Changes":[
{
"ColumnName":"Address",
"OriginalValue":"test_address",
"NewValue":"test_address"
},
{
"ColumnName":"Contact",
"OriginalValue":"test_contact",
"NewValue":"test_contact"
},
{
"ColumnName":"Email",
"OriginalValue":"test_email",
"NewValue":"test_email2"
},
{
"ColumnName":"Name",
"OriginalValue":"test_name",
"NewValue":"test_name"
},
{
"ColumnName":"OfficeSector",
"OriginalValue":1,
"NewValue":1
},
{
"ColumnName":"PhoneNumber",
"OriginalValue":"test_phoneNumber",
"NewValue":"test_phoneNumber"
}
],
"ColumnValues":{
"Id":"40b5egc7-46ca-429b-86cb-3b0781d360c8",
"Address":"test_address",
"Contact":"test_contact",
"Email":"test_email2",
"Name":"test_name",
"OfficeSector":1,
"PhoneNumber":"test_phoneNumber"
},
"Valid":true
}
],
"Result":1,
"Success":true
}
}
Me and my team has a main aspect to achieve:
Being able to create a search page where administrators are able to tell
who changed
what did they change
when did the change happen
They can give a time period, to reduce the number of audit records, and the interesting part comes here:
There should be an input text field which should let them search in the values of the "ColumnValues" section.
The problems I encountered:
Even if I map the Json structure into relational rows, I am unable to search in every column, with keeping the genericity.
If I don't map, I could search in the Json string with LIKE mssql function but on the order of a few 100,000 records it takes an eternity for the query to finish so it is probably not the way.
Keeping the genericity would be important, so we don't need to modify the audit search page every time when we create or modify a new entity.
I only know MSSQL, but is it possible that storing the audit logs in a document oriented database like cosmosDB (or anything else, it was just an example) would solve my problem? Or can I reach the desired behaviour using relational database like MSSQL?
Looks like you're asking for an opinion, in that case I would strongly recommend a document oriented DB.
CosmosDB could be a great option since it supports SQL queries.
There is an extension to log to CosmosDB from Audit.NET: Audit.AzureCosmos
A sample query:
SELECT c.EventType, e.Table, e.Action, ch.ColumnName, ch.OriginalValue, ch.NewValue
FROM c
JOIN e IN c.EntityFrameworkEvent.Entries
JOIN ch IN e.Changes
WHERE ch.ColumnName = "Address" AND ch.OriginalValue = "test_address"
Here is a nice post with lot of examples of complex SQL queries on CosmosDB
Backstory:
I'm building an e-commerce web app (online store)
Now I got to the point of choosing a database system and an appropriate design.
I got stuck with developing a design for product attributes
I've been considering of choosing NoSQL (MongoDB) or SQL database systems
I need you advice and help
The problem:
When you choose a product type (e.g. table) it should show you the corresponding filters for such a type (e.g. height, material etc.). When you choose another type, say "car", it provides you with the car specific filter attributes (e.g. fuel, engine volume)
For example, here on one popular online store if you choose a data storage type you get a filter fo this type attributes, such as hard drive size or connection type
Question
What approach is the best for such a problem? I described some below, but maybe you have your own thoughts in regard to it
MongoDB
Possible solution:
You can implement such product attrs structure pretty easy.
You can create one collection with a field attrs for each product and put there whatever you want, like they suggest here (field "details"):
https://docs.mongodb.com/ecosystem/use-cases/product-catalog/#non-relational-data-model
The structure will be
Problem:
With such a solution you don't have product types at all so you can't filter the products out by their types. Each product contains it's own arbitrary structure in attrs field and don't follow any pattern
Ir maybe I can somehow go with this approach?
SQL
There are solutions like single table where all the products store in one table and you end up with as many fields as an attribute number of all the products taken together.
Or for every product type you create a new table
But I won't consider these ones. One is very bulky and another one isn't much flexible and requires a dynamic scheme design
Possible solution
There is one pretty flexible solution called EAV https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
Our schema would be:
EAV
Such a design may be done on MongoDB system, but I'm not sure it's been made for such a normalised structure
Problem
The schema is going to get really huge and really hard to query and grasp
If you choose SQL database, take a look PostgreSQL which supports JSON features. Not necessarily you need to follow Database normalization.
If you choose MongoDB, you need to store attrs array with generic {key:"field", value:"value"} pairs.
{id:1, attrs:[{key: "prime", value: true}, {key:"height", value:2}, {key:"material", value:"wood"},{key:"color", "value":"brown"}]}
{id:2, attrs:[{key: "prime", value: true}, {key:"fuel", value:"gas"}, {key:"volume", "value":3}]}
{id:3, attrs:[{key: "prime", value: true}, {key:"fuel", value:"diesel"}, {key:"volume", "value":1.5}]}
Then you define Multi-key index like this:
db.collection.createIndex({"attrs.key":1, "attrs.value":1})
If you want apply step-by-step filters, use MongoDB aggregation with $elemMatch operator
☑ Prime
☑ Fuel
☐ Other
...
☑ Volume 3
☐ Volume 1.5
Query's representation
db.collection.aggregate([
{
$match: {
$and: [
{
attrs: {
$elemMatch: {
key: "prime",
value: true
}
}
},
{
attrs: {
$elemMatch: {
key: "fuel"
}
}
},
{
attrs: {
$elemMatch: {
key: "volume",
"value": 3
}
}
}
]
}
}
])
MongoPlayground
Could not find this answer online, so decided to post the question then the answer.
I created a table in the capabilities.json file:
"dataRoles": [
{
"displayName": "Stakeholders",
"name": "roleIwant",
"kind": "GroupingOrMeasure"
}
...
"dataViewMappings": [
{
"table": {
"rows": {
"select": [
{
"for": {
"in": "roleIwant"
}
}
]
}
}
}
]
I realized that I could not simply set, for instance, legend data from the first category, because the first category comes from the first piece of data the user drags in, regardless of position. So if they set a bunch of different pieces of data in Power BI online, for instance, then remove one, the orders of everything get messed up. I thought the best way to settle this would be to identify the role of each column and go from there.
When you click on show Dataview, the hierarchy clearly shows:
...table->columns[0]->roles: { "roleIwant": true }
So I thought I could access it like:
...table.columns[0].roles.roleIwant
but that is not the case. I was compiling using pbiviz start from the command prompt, which gives me an error:
error TYPESCRIPT /src/visual.ts : (56,50) Property 'roleIwant' does not exist on type '{ [name: string]: boolean; }'.
Why can I not access this in this way? I was thinking because natively, roles does not contain the property roleIwant, which is true, but that shouldn't matter...
The solution is actually pretty simple. I got no 'dot' help (typing a dot after roles for suggestions), but you can use regular object properties for roles. The command for this case would be:
...table.columns[0].roles.hasOwnProperty("roleIwant")
And the functional code portion:
...
columns.forEach((column) =>{
if(column.roles.hasOwnProperty("roleIwant")){
roleIwantData = dataview.categorical.categories[columns.indexOf(column)].values;
})
If it has the property, it belongs to that role. From here, the data saved will contain the actual values of that role! The only thing I would add on here is that if a column is used for multiple roles, depending on how you code, you may want to do multiple if's to check for the different roles belonging to a column instead of if else's.
If anyone has any further advice on the topic, or a better way to do it, by all means. I searched for the error, all over for ways to access columns' roles, and got nothing, so hopefully this topic helps someone else. And sorry for the wordiness - I tend to talk a lot.
Mandatory User Filters
I am working on a tool to allow customers to apply Mandatory User Filters. When attributes are loaded like "Year" or "Age", each can have hundreds of elements with the subsequent ids. In the POST request to create a filter (documented here: https://developer.gooddata.com/article/lets-get-started-with-mandatory-user-filters), looks like this:
{
"userFilter": {
"content": {
"expression": "[/gdc/md/{project-id}/obj/{object-id}]=[/gdc/md/{project-id}/obj/{object-id}/elements?id={element-id}]"
},
"meta": {
"category": "userFilter",
"title": "My User Filter Name"
}
}
}
In the "expression" property, it notes how one ID could be set. What I want is to have multiple ids associated with the object-id set with the post. For example, if I user wanted to add a filter to all of the elements in "Year" (there are 150) in the demo project, it seems odd to make 150 post requests.
Is there a better way?
UPDATE
Tomas thank you for your help.
I am not having trouble assigning multiple userfilters to a user. I can easily apply a singular filter to a user with the method outlined in the documentation. However, this overwrites the userfilter field. What is the syntax for this?
Here is my demo POST data:
{ "userFilters":
{ "items": [
{ "user": "/gdc/account/profile/decd0b2e3077cf9c47f8cfbc32f6460e",
"userFilters":["/gdc/md/a1nc4jfa14wey1bnfs1vh9dljaf8ejuq/obj/808728","/gdc/md/a1nc4jfa14wey1bnfs1vh9dljaf8ejuq/obj/808729","/gdc/md/a1nc4jfa14wey1bnfs1vh9dljaf8ejuq/obj/808728"]
}
]
}
}
This receives a BAD REQUEST.
I'm not sure what you mean by "have multiple ids associated with the object-id" exactly, but I'll try to tell you all I know about it. :-)
If you indeed made multiple POST requests, created multiple userFilters and set them all for one user, the user wouldn't see anything at all. That's because the system combines separate userFilters using logical AND, and a Year cannot be 2013 and 2014 at the same time. So for the rest of my answer, I'll assume that you want OR instead.
There are several ways to do this. As you may have guessed by now, you can use AND/OR explicitly, using an expression like this:
[/…/obj/{object-id}]=[/…/obj/{object-id}/elements?id={element-id}] OR [/…/obj/{object-id}]=[/…/obj/{object-id}/elements?id={element-id}]
This can often be further simplified to:
[/…/obj/{object-id}] IN ( [/…/obj/{object-id}/elements?id={element-id}], [/…/obj/{object-id}/elements?id={element-id}], … )
If the attribute is a date (year, month, …) attribute, you could, in theory, also specify ranges using BETWEEN instead of listing all elements:
[/…/obj/{object-id}] BETWEEN [/…/obj/{object-id}/elements?id={element-id}] AND [/…/obj/{object-id}/elements?id={element-id}]
It seems, though, that this only works in metrics MAQL and is not allowed in the implementation of user filters. I have no idea why.
Also, for your own attribute like Age, you can't do that since user-defined numeric attributes aren't supported. You could, in theory, add a fact that holds the numeric value, and construct a BETWEEN filter based on that fact. It seems that this is not allowed in the implementation of user filters either. :-(
Hope this helps.