Paging dynamic query results - sql

Paging using LINQ can be easily done using the Skip() and Take() extensions.
I have been scratching my head for quite some time now, trying to find a good way to perform paging of a dynamic collection of entities - i.e. a collection that can change between two sequential queries.
Assume a query that without paging will return 20 objects of type MyEntity.
The following two lines will trigger two DB hits that will populate results1 and results2 with all of the objects in the dataset.
List<MyEntity> results1 = query.Take(10).ToList();
List<MyEntity> results2 = query.Skip(10).Take(10).ToList();
Now let's assume the data is dynamic, and that a new MyEntity is inserted into the DB between the two queries, in such a way that the original query will place the new entity in the first page.
In that case, results2 list will contain an entity that also exists in results1,
causing duplication of results being sent to the consumer of the query.
Assuming a record from the first page was deleted, it will result missing a record that should have originally appear on results2.
I thought about adding a Where() clause to the query that verify that the records where not retrieved on a previous page, but it seems like the wrong way to go, and it won't help with the second scenario.
I thought about keeping a record of query executions' timestamps, attaching a LastUpdatedTimestamp to each entity and filtering entities that were changed after the previous page request. That direction will fail on the third page afterwards...
How is this normally done?
Is there a way to run a query against an old snapshot of the DB?
Edit:
The background is an Asp.NET MVC WebApi service that responds to a simple GET request to retrieve the list of entities. The client retrieves a page of entities, and presents them to the user. When scrolled down, the client sends another request to the service, to retrieve another page of entities, which should be added to the first page which is already presented to the user.

One way to solve this is to keep a TimeStamp of the first query and exclude any results that have been added after that time. However this solution puts the data into a "stale" state, i.e the data is fixed at a certain point in time, which is usually a bad idea unless you have a very good reason to do it.
Another way of doing this (again depending on your application) would be to store the list of all objects in memory and then only display a certain number of results at one time.
List<Entity> _list;
public class SomeCtor()
{
_list = _db.Entities.ToList();
}
public List<Entity> GetFirst10
{
get
{
return _list.Take(10);
}
}
public List<Entity> GetSecond10
{
get
{
return _list.Skip(10).Take(10);
}
}

Related

Checking Whether Table Data Exists, Updating / Inserting Into Two Tables & Posting End Outcome

I am working on my cron system which gathers informaiton via an API call. For most, it has been fairly straight forward, but now I am faced with multiple difficulties, as the API call is dependant on who is making the API request. It runs through each users API Key and certain information will be visible/hidden to them and visaversa to the public.
There are teams, and users are part of teams. A user can stealth their move, however all information will be showed to them and their team, however this will not be visible to their oponent, however both teams share the same id and have access tothe same informaiton, just one can see more of it than the other.
Defendants Point Of View
"attacks": {
"12345`": {
"timestamp": 1645345234,
"attacker_id": "",
"attacker_team_id": "",
"defender_id": 321,
"defender_team_id": 1,
"stealthed": 1
}
}
Attackers Point Of View
"attacks": {
"12345`": {
"timestamp": 1645345234,
"attacker_id": 123,
"attacker_team_id": 2
"defender_id": 321,
"defender_team_id": 1,
"stealthed": 1,
"boosters": {
"fair_fight": 3,
"retaliation": 1,
"group_attack": 1
}
}
}
So, if the defendant's API key is first used, id 12345 will already be in the team_attacks table but will not include the attacker_id and attacker_team_id. For each insert there after, I need to check to see whether the new insert's ID already exist and has any additional information to add to the row.
Here is the part of my code that loops through the API and obtains the data, it loops through all the attacks per API Key;
else if ($category === "attacks") {
$database = new Database();
foreach($data as $attack_id => $info) {
$database->query('INSERT INTO team_attacks (attack_id, attacker_id, attacker_team_id, defender_id, defender_team_id) VALUES (:attack_id, :attacker_id, :attacker_team_id, :defender_id, :defender_team_id)');
$database->bind(':attack_id', $attack_id);
$database->bind(':attacker_id', $info["attacker_id"]);
$database->bind(':attacker_team_id', $info["attacker_team_id"]);
$database->bind(':defender_id', $info["defender_id"]);
$database->bind(':defender_team_id', $info["defender_team_id"]);
$database->execute();
}
}
I have also been submitting to the news table, and typically I have simply been submitting X new entries have been added or whatnot, however I haven't a clue if there is a way to check during the above if any new entries and any updated entries to produce two news feeds:
2 attacks have bee updated.
49 new attack information added.
For this part, I was simply counting how many is in the array, but this only works for the first ever upload, I know I cannot simply count the array length on future inserts which require additional checks.
If The attack_id Does NOT Already Exist I also need to submit the boosters into another table, for this I was adding them to an array during the above loop and then looping through them to submit those, but this also depends on the above, not simply attempting to upload for each one without any checks. Boosters will share the attack_id.
With over 1,000 teams who will potentially have at least one members join my site, I need to be as efficient as this as possible. The API will give the last 100 attacks per call and I want this to be within my cron which collects any new data every 30 seconds, so I need to sort through potentially 100,000.
In SQL, you can check conditions when inserting new data using merge:
https://en.wikipedia.org/wiki/Merge_(SQL)
Depending on the database you are using, the name and syntax of the command might be different. Common names for the command are also upsert and replace.
But: If you are seeking for high performance and almost-realtimeness, consider using a cache holding critical aggregated data instead of doing the aggregation 100'000 times per minute.
This may or may not be the "answer" you're looking for. The question(s) imply use of a single table for both teams. It's worth considering one table per team for writes to avoid write contention altogether. The two data sets could be combined at query time in order to return "team" results via the API. At scale, you could have another process calculating and storing combined team results in an API-specific cache table that serves the API request.

Include count of linked records in OData result

I've got a table "Events" with a linked table "Registrations", and I want to create an OData service that returns records from the Events table, plus the number of registrations for each event. The data will be consumed client-side in JavaScript, so I want to keep the size of the returned data down and not include all linked registration records completely.
For example:
ID Title Date Regs
1 Breakfast 01.01.01 12:00 4
2 Party 01.01.01 20:00 20
I'm building the service with ASP.NET MVC4. The tables are in an MSSQL database. I am really just getting started with OData and LINQ.
I tried using the WebAPI OData system first (using classes of EntitySetController) but was getting cryptic server errors as soon as I included the Registrations table in the entity set. ("The complex type 'Models.Registration' refers to the entity type 'Models.Event' through the property 'Event'.")
I had more success building a WCF OData system, and can request event information and information on related registrations.
However, I have no clue how to include the aggregate count information in the event result set. Do I need to create a custom entity set that will be the source for the OData service? I probably included too litte information here for finding a solution, but I don't really know where to look. Can somebody help me?
If you're willing to make an extra request per Event, you could query http://.../YourService.svc/Events(<key>)/Registrations/$count (or http://.../YourService.svc/Events(<key>)/$links/Registrations?$inlinecount=allpages if you're also using the links to the Registration entities).
Examples of both of these approaches on a public service:
http://services.odata.org/V3/OData/OData.svc/Suppliers(0)/Products/$count
http://services.odata.org/V3/OData/OData.svc/Suppliers(0)/$links/Products?$inlinecount=allpages&$format=json
I'm guessing that you'd prefer this information to come bundled together with the rest of the Events response though. It's not ideal, but you could issue a query along these lines:
http://services.odata.org/V3/OData/OData.svc/Suppliers?$format=json&$expand=Products&$select=Products/ID,*
I'm expanding Products (analogous to your Registrations) and selecting Products/ID in order to force the response to include an array that is the same size as the nested Products collection. I don't care about ID -- I just chose a piece of data that would be small. With this JSON response, your javascript client can get the length of the Products array and use that as the number of Products that are linked to the given Supplier.
(Note: to have your service support $select queries using WCF Data Services, you'll need to include this line when you initialize the service: config.DataServiceBehavior.AcceptProjectionRequests = true;)
Edit to add: The approach using $expand and $select won't be guaranteed to give you the correct count if your server does server-driving paging. In general, there isn't a simple single-response way to do what you're asking for in OData v3, but in OData v4, this will be possible with the new expand/select syntax.
i'm using oData v4 and i used this syntax :
var url = '.../odata/clients?$expand=Orders($count=true)';
// ...
a field called Orders#odata.count has been added to the response entity which contains the correct count.
and now to access the JSON property containing a dash you have to do it like this :
var ordersCount = response.value['Orders#odata.count'];
hope this helps.
Can you edit your Event model and add a RegistrationCount property? That'd be the simplest way I think
What I ended up doing was actually very simple; I created a View in SQL Server that returns the table including the registration counts. Never thought about using a view, since I've never used them before...
I used this to get the child count without returning the entities:
/Parents$expand=Children($count=true;$top=0)

Selecting specific joined record from findAll() with a hasMany() include

(I tried posting this to the CFWheels Google Group (twice), but for some reason my message never appears. Is that list moderated?)
Here's my problem: I'm working on a social networking app in CF on Wheels, not too dissimilar from the one we're all familiar with in Chris Peters's awesome tutorials. In mine, though, I'm required to display the most recent status message in the user directory. I've got a User model with hasMany("statuses") and a Status model with belongsTo("user"). So here's the code I started with:
users = model("user").findAll(include="userprofile, statuses");
This of course returns one record for every status message in the statuses table. Massive overkill. So next I try:
users = model("user").findAll(include="userprofile, statuses", group="users.id");
Getting closer, but now we're getting the first status record for each user (the lowest status.id), when I want to select for the most recent status. I think in straight SQL I would use a subquery to reorder the statuses first, but that's not available to me in the Wheels ORM. So is there another clean way to achieve this, or will I have to drag a huge query result or object the statuses into my CFML and then filter them out while I loop?
You can grab the most recent status using a calculated property:
// models/User.cfc
function init() {
property(
name="mostRecentStatusMessage",
sql="SELECT message FROM statuses WHERE userid = users.id ORDER BY createdat DESC LIMIT 1,1"
);
}
Of course, the syntax of the SELECT statement will depend on your RDBMS, but that should get you started.
The downside is that you'll need to create a calculated property for each column that you need available in your query.
The other option is to create a method in your model and write custom SQL in <cfquery> tags. That way is perfectly valid as well.
I don't know your exact DB schema, but shouldn't your findAll() look more like something such as this:
statuses = model("status").findAll(include="userprofile(user)", where="userid = users.id");
That should get all statuses from a specific user...or is it that you need it for all users? I'm finding your question a little tricky to work out. What is it you're exactly trying to get returned?

How to not get related data in Entity Framework

I have the following scenario.
I have one table in the database A that contains a lot of entries. This table is lightweight and only have a bit of data for each entry.
Another table B has a many to one relationship with A, and can contain lots of data.
I have a client and a server in my application that communicates using WCF.
My problem is that when the client calls a method on the server that should return all the A's, I get much more data than I need to. On the server I basically have a single line of code:
return entityContext.A.ToList();
My problem is that on the client, if I debug and inspect the returned collection, each element has the property B populated with all the data from the expensive table.
I only needed the basic data from the A table to show a list on the client, but ended up sending a ton of data over the wire.
So the question is, how can I make the server ignore the B table when I fetch the data to send to the client. Basically, I need something like the oposite of Include.
You need to turn off lazy loading. It can be done either through designer or in code by calling:
entityContext.ContextOptions.LazyLoadingEnabled = false;

WCF data services - Limiting related objects returned based on critera

I have an object graph consisting of a base employee object, and a set of related message objects.
I am able to return the employee objects based on search criteria on the employee properties (eg team) etc. However, if I expand on the messages, I get the full collection of messages back. I would like to be able to either take the top n messages (i.e. restrict to 10 most recent) or ideally use a date range on the message objects to limit how many are brought back.
So far I have not been able to figure out a way of doing this:
I get an error if I attempt to filter on properties on the message (&$filter=employee/message/StartDate gives an error ">No property 'StartDate' exists in type 'System.Data.Objects.DataClasses.EntityCollection`1).
Attempting to use Top on the message related object doesn't work either.
I have also tried using a WebGet extension that takes a string list of employee IDs. That works until the list gets too long, and then fails due to the URL getting too long (it might be possible to setup a paging mechanism on this approach)...
Unfortunately the UI control I am using requires the data to be in a fairly specific hierarchical shape, so I can't easily come at this from starting on the message side and working backwards.
Outside of making multiple calls does anyone know of a method to accomplish this with wcf data services?
Thanks!
M.
Looks like the only real way of doing this is in fact to reverse the direction of the query.
So instead of starting from the Employee, I go from the message side. You can filter back on the employee properties, and restrict on the Messages collection. Its not ideal, as it means iterating the collection on return to re-center it on the employee for what I am attempting to do, but it will work. The async nature of silverlight and rich client at least means while an extra iteration is required, it still appears to be reasonably fast.
Another interesting thing to note: the current version of odata/wcf data services does not support querying on properties of inherited classes, so I had to move the start/end date properties up to the base class in order to be able to restrict my search on them.
http://Site/Service.svc/Messages()?&$filter=Employee/OfficeName eq 'Toronto' and (year(StartDate) eq 2010 and month(StartDate) ge 9 )