I'm using three TChains friended together in my analysis code, with addresses set only for some branches on each chain. What is the best way for me to use TTreeCache in this scenario? Do I have to manually specify which branches to cache? I want to cache all the branches, which have their addresses set, but only those.
On a separate note, I'd like to first only get entries from the first tree, and only in case I decide to further analyze the event get entries from the other friended trees.
Related
Question
I have a complex query that joins three tables and returns a set of rows with each row having data from it's sibling tables. How is it possible to represent this in a RESTful way?
FWIW I know there is not necessarily a "right" way to do it, but I'm interested in learning about what might be the most extensible and durable solution for this situation.
Background
In the past I've represented single tables that more or less mirror the literal structure of the url. For example, the url GET /agents/1/policies would result in a query like select * from policies where agent_id = 1.
Assumption
It seems like the url doesn't necessarily have to be so tightly coupled to the structure of the database layer. For example, if the complex query was something like:
select
agent.name as agent_name,
policy.status as policy_status,
vehicle.year as vehicle_year
from
policies as policy
join agents as agent on policy.agent_id = agent.id
join vehicles as vehicle on vehicle.policy_id = policy.id
where 1=1
and policy.status = 'active';
# outputs something like:
{ "agent_name": "steve", "policy_status": "single", "vehicle_year": "1999" }
I could represent this QUERY as a url instead of TABLES as url. The url for this could be /vehicles, and if someone were to want to query it (with id or some other parameter like /vehicles?vehicle_color=red) I could just pass that value into a prepared statement.
Bonus Questions
Is this an antipattern?
Should my queries always be run against EXISTING tables instead of prepared statements?
Thanks for your help!
You want to step back from the database tables and queries and think about the basic resources. In your examples, clearly these are agent, customer vehicle and policy.
Resources vs Collections
One misstep I see in your examples is that you don't separate collections from resources using plurals which can be useful when you are dealing with Searching and logistically, for your controller routes. In your example you have:
GET /agents/1/policies
Suppose instead, that this was GET /agent/1/policies.
Now you have a clear differentiation between location of an Idempotent resource: /agent/1, and finding/searching for a collection of agents: /agents.
Following this train of thought, you start to disassociate enumerating relationships from each side of the relationship in your API, which is inherently redundant.
In your example, clearly, policies are not specifically owned by an agent. A policy should be a resource that stands on its own, identifiable via some Idempotent url using whatever ID uniquely identifies that policy for the purpose of finding that policy ie. /policy/{Id}
Searching Collections
What this now does for you is allow you to separate the finding of a policy through: /policies where returning only policies for a specific Agent is but one of a number of different ways you might access that collection.
So rather than having GET /agents/1/policies you would instead find the policies associated with an agent via: GET /policies?agent=1
The expected result of this would be a collection of resource identifiers for the matching policies:
{ "policies" : ["/policy/234234", "/policy/383282"] }
How do you then get the final result?
For a given policy, you would expect a complete return of associated information, as in your query, only without the limitations of the select clause. Since what you want is a filtered version, a way to handle that would be to include filter criteria.
GET /policy/234234?filter=agentName,policyStatus,vehicleYear
With that said, this approach has pitfalls, and I question it for a number of reasons. If you look at your original list of resources, each one can be considered an object. If you are building an object graph in the client, then the complete information for a policy would instead include resource locators for all the associated resources:
{ ... Policy data + "customer": "/customer/2834", "vehicle": "/vehicle/88328", "agent": "/agent/32" }
It is the job of the client to access the data for an agent, a vehicle and a customer, and not your job to regurgitate all that data redundantly anytime you need some view of that data.
This is better because it is both restful, and supports many of the aims of REST to support Idempotency, caching etc.
This also better allows the client to cache locally the data for an Agent, and to determine whether or not it needs to get that data or just access data it already cached. At worst case there are maybe 3 or 4 REST calls that need to be made.
Bonus questions
REST has some grey area. You have to interpret Fielding and for that reason, there are frequently different opinions in regards to how to do things. While the approach of providing an api like GET /agents/1/policies to provide the list of policies associated with an agent is frequently used, there is a point where that becomes limiting and redundant in my experience, as it requires the end users to become familiar with the way you model relationships to the underlying resources.
As for your question on queries, it makes no difference how you access the underlying data and organize it, so long as you are consistent. What often happens (for the purposes of performance) is that the api doesn't return resource identifiers and starts returning the data as I illustrated previously. This is a slippery slope where you are just turning your REST api into a frontend to a bunch of queries, and at that point your API might as well just be: GET \query?filter=agent.name, policy.status, vehicle.year&from=policies&join=agents,vehicles&where=...
So I have a custom Postgresql query that retrieves all rows within a specified longitude latitude radius, like so:
SELECT *,
earth_distance(ll_to_earth($1,$2), ll_to_earth(lat, lng)) as distance_metres
FROM RoomsWithUsers
WHERE earth_box(ll_to_earth($1,$2), $3) #> ll_to_earth(lat, lng)
ORDER by distance_metres;
And in my node server, I want to be able to be notified every time the number of rows in this query changes. I have looked into using a Node library such as pg-live-query but I would much rather pg-pubsub that works with existing Postgres LISTEN/NOTIFY in order to avoid unnecessary overhead. However, as far as I am able to understand, PostgreSQL TRIGGERs only work on UPDATE/INSERT/DELETE operations and not on any specific queries themselves. Is there any way to accomplish what I'm doing?
You need to set up the right triggers that will call NOTIFY for all the clients that use LISTENon the same channel.
It is difficult to advise how exactly you implement your NOTIFY logic inside the triggers, because it depends on the following:
How many clients the message is intended for?
How heavy/large is the query that's being evaluated?
Can the triggers know the logic of the query to evaluate it?
Based on the answers you might consider different approaches, which includes but not limited to the following options and their combinations:
execute the query/view when the outcome cannot be evaluated, and cache the result
provide smart notification, if the query's outcome can be evaluated
use the payload to pass in the update details to the listeners
schedule query/view re-runs for late execution, if it is heavy
do entire notification as a separate job
Certain scenarios can grow quite complex. For example, you may have a master client that can do the change, and multiple slaves that need to be notified. In this case the master executes the query, checks if the result has changed, and then calls a function in the PostgreSQL server to trigger notifications across all slaves.
So again, lots and lots of variations are possible, depending on specific requirements of the task at hand. In your case you do not provide enough details to offer any specific path, but the general guidelines above should help you.
Async & LISTEN/NOTIFY is the right way!
You can add trigger(s) on UPDATE/INSERT and execute your query in the body of trigger, save number of rows in simple table and if value changed call NOTIFY. If you need multiple params combinations in query you can create/destroy triggers from inside your program.
When I specify multiple IDs as query parameters in my conceptual search, would the results have only those documents which refer conceptually to all of the searched IDs? Or will it have documents that refer conceptually to any one of the IDs?
Thanks
Vipin
The intended behavior of passing multiple ids to the /conceptual_search query is a logical AND. So the search is trying to find documents that have relationship to all the ids listed in the query.
The OR behavior cannot be performed through a single query, but can be emulated by performing separate queries to the individual ids, followed by a merge (with proper sorting) of the results on the client side.
Right now on my (beta) site i have a table called user data which stores name, hash(password), ipaddr, sessionkey, email and message number. Now i would like the user to have a profile description, signature, location (optional) and maybe other things. Should i have this in a separate mysql table? or should i share the table? and why?
This is called vertical partitioning, and in general it will not make much difference whichever option you go for.
One vertical partitioning method is to split fields that are updated frequently from fields that are updated rarely. Imagine a Users table in Stack Overflow's database, having User_ID, DisplayName, Location, etc... and then UpVotes, DownVotes, LastSeen, NumberOfQuestionsAsked, etc. The first set of fields changes rarely (only on profile edits), while the latter set change frequently (on normal activity).
Another splitting method could be between data that is accessed frequently, from data that is accessed rarely.
This kind of partitioning, in some particular situations, can yield better performance.
Use one table.
Because there is no reason to separate.
IMHO, the authentication information and the profile info should be separate. This will secure your data. Of course, if the trust level is high, you can go for a merged table. But the information in the table might grow over time and at the end you will have to separate them out. So why to mess now?
If the tables you are thinking of will have a one-to-one relationship, there's usually no reason to do it.
Exceptions, both of which only apply when there are many, many columns:
if there are a few columns that will always be filled and the rest almost never -- in this case, there can be some benefit from omitting empty rows in the second table, and only fetching them when needed.
if there are a few columns that will constantly be retrieved/updated and the rest almost never.
But again, this is not an optimization you should do at the beginning. If you have your query code reasonably isolated, it's not hard to do this later on.
There are also some relevant comments on this elsewhere on StackOverflow
I think it depends on the your application nature or you can say requirement.
I prefer it should be in the different tables.
Consider example where I need users email, message number and store's name.
So when I find all the the user from the table and all my profile related data in the same table, I get all the unwanted columns in the result set. To overcome this, I can use the SELECT only columns I want but that makes my query very ugly.
Similarly when I need all profile data I have to use profile columns in select clause.
So always suggest to separate the tables wherever it is possible.
I have a user table, and then a number of dependent tables with a one to many relationship
e.g. an email table, an address table and a groups table. (i.e. one user can have multiple email addresses, physical addresses and can be a member of many groups)
Is it better to:
Join all these tables, and process the heap of data in code,
Use something like GROUP_CONCAT and return one row, and split apart the fields in code,
Or query each table independently?
Thanks.
It really depends on how much data you have in the related tables and on how many users you're querying at a time.
Option 1 tends to be messy to deal with in code.
Option 2 tends to be messy to deal with as well in addition to the fact that grouping tends to be slow especially on large datasets.
Option 3 is easiest to deal with but generates more queries overall. If your data-set is small and you're not planning to scale much beyond your current needs its probably the best option. It's definitely the best option if you're only trying to display one record.
There is a fourth option however that is a middle of the road approach which I use in my job in which we deal with a very similar situation. Instead of getting the related records for each row 1 at a time, use IN() to get all of the related records for your results set. Then loop in your code to match them to the appropriate record for display. If you cache search queries you can cache that second query as well. Its only two queries and only one loop in the code (no parsing, use hashes to relate things by their key)
Personally, assuming my table indexes where up to scratch I'd going with a table join and get all the data out in one go and then process that to end up with a nested data structure. This way you're playing to each systems strengths.
Generally speaking, do the most efficient query for the situation you're in. So don't create a mega query that you use in all cases. Create case specific queries that return just the information you need.
In terms of processing the results, if you use GROUP_CONCAT you have to split all the resulting values during processing. If there are extra delimiter characters in your GROUP_CONCAT'd values, this can be problematic. My preferred method is to put the GROUPed BY field into a $holder during the output loop. Compare that field to the $holder each time through and change your output accordingly.