Select random N records from GraphQL query - react-native

I am building a simple quiz app that will allow a user to choose various categories and generate a 5 question quiz to test their knowledge. I have a long list of questions setup in AppSync accessible via GraphQL. However, as that list keeps growing, it doesn’t make sense for me to pull these to the client and randomly select there.
Does GraphQL support choosing random 5 from a query? Such that, serverside, I can select just 5 records at random?
query listAll {
listQuestions(filter: {
topic: {
contains: "chocolate"
}
}) {
items {
question
answer
}
}
}
I have thought about other approaches such as randomly assigning each record a number and filtering on this, but this would not be random each time.
Any ideas?

Does GraphQL support choosing random 5 from a query?
Not directly, no. Most of the more "interesting" things you might imagine doing in an SQL query, even simpler things like "return only the first 10 records" or "has a family name of 'Jones'", aren't directly supported in GraphQL. You have to build this sort of thing out of the primitives it gives you.
Such that, serverside, I can select just 5 records at random?
Most GraphQL server implementations support resolver functions which are arbitrary code called when a field value is requested. You could write a schema like
type Query {
listQuestions(filter: QuestionFilter, random: Int): [Question!]!
}
and get access to the arguments in the resolver function.
It looks like AppSync has its own resolver system. It's not obvious to me from paging through the documentation that it supports a "pick n at random" method; it seems to be mostly designed as a facade around database storage, and most databases aren't optimized for this kind of query.

David is right about writing this logic inside a resolver (as a GraphQL way).
If you are using AWS AppSync, you can use a Lambda resolver and attach it to the query, so you can write the logic to pick random values inside of the Lambda so it's part of the GraphQL response. This is one way of doing this.

Related

Laravel - Fastest way to get sum of one column from many results?

I'm building a web app, and on this web app the user page displays the user's comment karma (works exactly like reddit, it's basically the sum of the total score of all their comments).
So in my user model I have a function to calculate this:
public function comments()
{
return $this->hasMany(Comment::class, 'created_by', 'id');
}
public function commentKarma()
{
return $this->comments->where('anonymous', 0)->sum('total_score');
}
However, I'm noticing that in my dev environment (I'm using laravel homestead for reference), this query goes very slow.
So for example, for this user that only has 174 comments, the query is taking up almost 5 seconds.
select * from comments where comments.created_by = 2 and comments.created_by is not null
4.62s
I don't want to imagine how long this query would take for a user that would have thousands of comments. What would be the best way to speed this up?
I was thinking of maybe setting up a task schedule that would automatically calculate all the users karma every hour or something, but that doesn't seem very optimal either. Thoughts?
return $this->comments->where('anonymous', 0)->sum('total_score');
If you only want the sum (meaning the comments relationship is not loaded), this is not a good approach. It will load every comment associated with $this from the database, Turn them into Eloquent Models, put them into an Eloquent Collection, and finally get the sum from that collection.
If the comments relationship is already loaded, then the following does not apply and maybe adding an index in the comments table might be for the best.
return $this->comments()->where('anonymous', 0)->sum('total_score');
If you only want the sum, use the query builder to get it. This on the other hand, will only work if total_score is a database column and not an accessor.

How to query multiple aggregates efficiently with DDD?

When I need to invoke some business method, I need to get all aggregate roots related to the operation, even if the operation is as primitive as the one given below (just adding item into a collection). What am I missing? Or is CRUD-based approach where you run one single query including table joins, selects and insert at the end - and database engine does all the work for you - actually better in terms of performance?
In the code below I need to query separate aggregate root (which creates another database connection and sends another select query). In real world applications I have been querying a lot more than one single aggregate, up to 8 for a single business action. How can I improve performance/query overhead?
Domain aggregate roots:
class Device
{
Set<ParameterId> parameters;
void AddParameter(Parameter parameter)
{
parameters.Add(parameter.Id);
}
}
class Parameter
{
ParameterId Id { get; }
}
Application layer:
class DeviceApplication
{
private DeviceRepository _deviceRepo;
private ParameterRepository _parameterRepo;
void AddParameterToDevice(string deviceId, string parameterId)
{
var aParameterId = new ParameterId(parameterId);
var aDeviceId = new DeviceId(deviceId);
var parameter = _parameterRepo.FindById(aParameterId);
if (parameter == null) throw;
var device = _deviceRepo.FindById(aDeviceId);
if (device == null) throw;
device.AddParameter(parameter);
_deviceRepo.Save(device);
}
}
Possible solution
I've been told that you can pass just an Id of another aggregate like this:
class Device
{
void AddParameter(ParameterId parameterId)
{
parameters.Add(parameterId);
}
}
But IMO it breaks incapsulation (by explicitely emphasizing term ID into the business), also it doesn't prevent from pasting wrong or otherwise incorrect identity (created by user).
And Vaughn Vernon gives examples of application services that use the first approach (passing whole aggregate instance).
The short answer is - don't query your aggregates at all.
An aggregate is a model that exposes behaviour, not data. Generally, it is considered a code smell to have getters on aggregates (ID is the exception). This makes querying a little tricky.
Broadly speaking there are 2 related ways to go about solving this. There are probably more but at least these don't break the encapsulation.
Option 1: Use domain events -
By getting your domain (aggregate roots) to emit events which illustrate the changes to internal state you can build up tables in your database specifically designed for querying. Done right you will have highly performant, denormalised queryable data, which can be linearly scaled if necessary. This makes for very simple queries. I have an example of this on this blog post: How to Build a Master-Details View when using CQRS and Event Sourcing
Option 2: Infer query tables -
I'm a huge fan of option 1 but if you don't have an event sourced approach you will still need to persist the state of your aggregates at some point. There are all sorts of ways to do this but you could plug into the persistence pipeline for your aggregates a process whereby you extract queryable data into a read model for use with your queries.
I hope that makes sense.
If you figured out that having RDBMS query with joins will work in this case - probably you have wrong aggregate boundaries.
For example - why would you need to load the Parameter in order to add it to the Device? You already have the identity of this Parameter, all you need to do is to add this id to the list of references Parameters in the Device. If you do it in order to satisfy your ORM - you're most probably doing something wrong.
Also remember that your aggregate is the transactional boundary. You really would want to complete all database operations inside one transaction and one connection.

Getting specific Backbone.js models from a collection without getting all models first

I'm new to Backbone.js. I'm intrigued by the idea that you can just supply a URL to a collection and then proceed to create, update, delete, and get models from that collection and it handle all the interaction with the API.
In the small task management sample applications and numerous demo's I've seen of this on the web, it seems that the collection.fetch() is used to pull down all models from the server then do something with them. However, more often than not, in a real application, you don't want to pull down hundreds of thousands or even millions of records by issuing a GET statement to the API.
Using the baked-in connection.sync method, how can I specify parameters to GET specific record sets? For example, I may want to GET records with a date of 2/1/2014 or GET records that owned by a specific user id.
In this question, collection.find is used to do this, but does this still pull down all records to the client first then "finds" them or does the collection.sync method know to specify arguments when doing a GET to the server?
You do use fetch, but you provide options as seen in collection.fetch([options]).
So for example to obtain the one model where id is myIDvar:
collection.fetch(
{
data: { id: myIDvar },
success: function (model, response, options) {
// do a little dance;
}
};
My offhand recollections is that find, findWhere and where would invoke all models being downloaded and then the filtering taking place on the client. I believe with fetch the filtering takes places on the server side.
You can implement some kind of pagination on server side and update your collection with limited number of records. In this case all your data will be up to date with backend.
You can do it by overriding fetch method with you own implementaion, or specify params
For example:
collection.fetch({data: {page: 3})
You can also use find where method here
collection.findWhere(attributes)

neo4j count nodes performance on 200K nodes and 450K relations

We're developing an application based on neo4j and php with about 200k nodes, which every node has a property like type='user' or type='company' to denote a specific entity of our application. We need to get the count of all nodes of a specific type in the graph.
We created an index for every entity like users, companies which holds the nodes of that property. So inside users index resides 130K nodes, and the rest on companies.
With Cypher we quering like this.
START u=node:users('id:*')
RETURN count(u)
And the results are
Returned 1 row.Query took 4080ms
The Server is configured as default with a little tweaks, but 4 sec is too for our needs. Think that the database will grow in 1 month 20K, so we need this query performs very very much.
Is there any other way to do this, maybe with Gremlin, or with some other server plugin?
I'll cache those results, but I want to know if is possible to tweak this.
Thanks a lot and sorry for my poor english.
Finaly, using Gremlin instead of Cypher, I found the solution.
g.getRawGraph().index().forNodes('NAME_OF_USERS_INDEX').query(
new org.neo4j.index.lucene.QueryContext('*')
).size()
This method uses the lucene index to get "aproximate" rows.
Thanks again to all.
Mmh,
this is really about the performance of that Lucene index. If you just need this single query most of the time, why not update an integer with the total count on some node somewhere, and maybe update that together with the index insertions, for good measure run an update with the query above every night on it?
You could instead keep a property on a specific node up to date with the number of such nodes, where updates are done guarded by write locks:
Transaction tx = db.beginTx();
try {
...
...
tx.acquireWriteLock( countingNode );
countingNode.setProperty( "user_count",
((Integer)countingNode.getProperty( "user_count" ))+1 );
tx.success();
} finally {
tx.finish();
}
If you want the best performance, don't model your entity categories as properties on the node. In stead, do it like this :
company1-[:IS_ENTITY]->companyentity
Or if you are using 2.0
company1:COMPANY
The second would also allow you automatically update your index in a separate background thread by the way, imo one of the best new features of 2.0
The first method should also proof more efficient, since making a "hop" in general takes less time than reading a property from a node. It does however require you to create a separate index for the entities.
Your queries would look like this :
v2.0
MATCH company:COMPANY
RETURN count(company)
v1.9
START entity=node:entityindex(value='company')
MATCH company-[:IS_ENTITIY]->entity
RETURN count(company)

ADO.NET (WCF) Data Services Query Interceptor Hangs IIS

I have an ADO.NET Data Service that's supposed to provide read-only access to a somewhat complex database.
Logically I have table-per-type (TPT) inheritance in my data model but the EDM doesn't implement inheritance. (Limitation of Data Services and navigation properties on derived types. STILL not fixed in .NET 4!) I can query my EDM directly (using a separate project) using a copy of the query I'm trying to run against the web service, results are returned within 10 seconds. Disabling the query interceptors I'm able to make the same query against the web service, results are returned similarly quickly. I can enable some of the query interceptors and the results are returned slowly, up to a minute or so later. Alternatively, I can enable all the query interceptors, expand less of the properties on the main object I'm querying, and results are returned in a similar period of time. (I've increased some of the timeout periods)
Up til this point Sql Profiler indicates the slow-down is the database. (That's a post for a different day) But when I enable all my query interceptors and expand all the properties I'd like to have the IIS worker process pegs the CPU for 20 minutes and a query is never even made against the database, ie the query never makes it past the web server. This implies to me that yes, my implementation probably sucks but regardless the Data Services "tier" is having an issue it shouldn't. WCF tracing didn't reveal anything interesting to my untrained eye.
Details:
Data model: Agent->Person->Student
Student has a collection of referrals
Students and referrals are private, queries against the web service should only return "your" students and referrals. This means Person and Agent need to be filtered too. Other entities (Agent->Organization->School) can be accessed by anyone who has authenticated.
The existing security model is poorly suited to perform this type of filtering for this type of data access, the query interceptors are complicated and cause EF to generate some entertaining sql queries.
Sample Interceptor
[QueryInterceptor("Agents")]
public Expression<Func<Agent, Boolean>> OnQueryAgents()
{
//Agent is a Person(1), Educator(2), Student(3), or Other Person(13); allow if scope permissions exist
return ag =>
(ag.AgentType.AgentTypeId == 1 || ag.AgentType.AgentTypeId == 2 || ag.AgentType.AgentTypeId == 3 || ag.AgentType.AgentTypeId == 13) &&
ag.Person.OrganizationPersons.Count<OrganizationPerson>(op =>
op.Organization.ScopePermissions.Any<ScopePermission>
(p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124) ||
op.Organization.HierarchyDescendents.Any<OrganizationsHierarchy>(oh => oh.AncestorOrganization.ScopePermissions.Any<ScopePermission>
(p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124))) > 0;
}
The query interceptors for Person, Student, Referral are all very similar, ie they traverse multiple same/similar tables to look for ScopePermissions as above.
Sample Query
This sample query is just that, a sample, intended to illustrate to third parties how to access the data using the provided web service. I realize a production query wouldn't have that many expands. (But also remember that to get the entire object in the OOP sense I need an Agent, Person, and Student row.)
var referrals =
(from r in service.Referrals
.Expand("Organization/ParentOrganization")
.Expand("Educator/Person/Agent")
.Expand("Student/Person/Agent")
.Expand("Student")
.Expand("Grade")
.Expand("ProblemBehavior")
.Expand("Location")
.Expand("Motivation")
.Expand("AdminDecision")
.Expand("OthersInvolved")
where
r.DateCreated >= coupledays &&
r.DateDeleted == null
select r);
Any suggestions or tips would be greatly associated, for fixing my current implementation or in developing a new one, with the caveat that existing database logic can't be changed (though I can add to it) and that ultimately I need to expose a large portion of the database via a web service that limits data access to the data authorized for, for the purpose of data integration with multiple outside parties. These outside parties will be performing regular batch jobs to import our data into their database/data-warehouse.
THANK YOU!!!
UPDATE: Posted this issue on MSDN, received similar feedback. http://social.msdn.microsoft.com/Forums/en-US/adodotnetdataservices/thread/1ccfc96c-dd35-4879-b36b-57e915d5e02f/
I'm just guessing here... but doing that many expands is almost never a good idea. The query will undoubtedly expand into some pretty awful SQL, that could easily cause timeouts.
Add TPT to the equation and things only get worse :(
Alex