Transcoding HTTP to gRPC: Same endpoint with different parameters - api

I already have a working gRPC project working. I'm looking to build an API to be able to do some HTTP requests.
I have the following 2 types:
message FindRequest {
ModelType model_type = 1;
oneof by {
string id = 2;
string name = 3;
}
}
message GetAllRequest {
ModelType model_type = 1;
int32 page_size = 2;
oneof paging {
int32 page = 3;
bool skip_paging = 4;
}
}
And then, I would like to have those 2 endpoints:
// Get a data set by ID or name. Returns an empty data set if there is no such
// data set in the database. If there are multiple data sets with the same
// name in the database it returns the first of these data sets.
rpc Find(FindRequest) returns (DataSet){
option (google.api.http) = { get: "/datasets" };
}
// Get (a page of) all data sets of a given type. If no page size is given
// (page <= 0) it defaults to 100. An unset page (page <= 0) defaults to the
// first page.
rpc GetAll(GetAllRequest) returns (GetAllResponse){
option (google.api.http) = { get: "/datasets" };
}
It makes sense to me to have 2 different endpoints with the same name, but that differ with the parameters.
For instance, requesting /datasets?model-type=XXX should be mapped to the GetAll function, while requesting /datasets?model-type=XXX&name=YYY should be mapped to Find function.
However, it doesn't work, since the mapping fails I guess, so none of these endpoints returns me anything.
I think the solution to make the mapping working would be to force the parameter to be required, however, I am working with proto3, which has disallowed the required field.
So how could I be able to have 2 endpoints with the same name, but different parameters, with proto3 ?
I know that if I am using different endpoint names, it is working, for example for the findRequest, I could have the following endpoint : /findDatasets, but regarding the best practice about API naming convention, it is not advisable, or is it ?

The conventional way to solve this problem is to use different methods. My hunch is that it's an anti-pattern to try to differentiate using the fields in the request string.
service YourService {
rpc FindSomething(FindSomethingRequest) returns (FindSomethingResponse){
option (google.api.http) = { get: "/something/find" };
}
rpc ListSomething(ListSomethingRequest) returns (ListSomethingResponse){
option (google.api.http) = { get: "/something/list" };
}
}
message FindSomethingRequest {
ModelType model_type = 1;
string id = 2;
string name = 3;
}
message ListSomethingRequest {
int32 page_size = 2;
int32 page_token = 3;
}
message ListSomethingResponse {
repeated ModelType model_types = 1;
int32 page_size = 2;
int32 next_page_token = 3;
}
I'm unsure of your underlying thing structure but, I think it's better practice to model things with all possible properties and permit leaving some unset (e.g. either id or name or possibly both in FindSomethingRequest) rather than creating different message types for all possible queries. You model the thing not how you interact with it.
In your implementation (!) of FindSomething, you then deal with the permutations of how users of the message may construct the fields. Perhaps reporting an error "Either id or name is required`.
I think ListSomething's messages could be simpler too. You request a List of (ModelTypes) and give a page_size and an page_token (that could be ""). It returns a list of ModelTypes, the size of the page returned (possibly less than requested) and a next_page_token if there is more data, that you can use to make the next ListSomething request.

Related

How do perform a graph query and join?

I apologize for the title, I don't exactly know how to word it. But essentially, this is a graph-type query but I know RavenDB's graph functionality will be going away so this probably needs to be solved with Javascript.
Here is the scenario:
I have a bunch of documents of different types, call them A, B, C, D. Each of these particular types of documents have some common properties. The one that I'm interested in right now is "Owner". The owner field is an ID which points to one of two other document types; it can be a Group or a User.
The Group document has a 'Members' field which contains an ID which either points to a User or another Group. Something like this
It's worth noting that the documents in play have custom IDs that begin with their entity type. For example Users and Groups begin with user: and group: respectively. Example IDs look like this: user:john#castleblack.com or group:the-nights-watch. This comes into play later.
What I want to be able to do is the following type of query:
"Given that I have either a group id or a user id, return all documents of type a, b, or c where the group/user id is equal to or is a descendant of the document's owner."
In other words, I need to be able to return all documents that are owned by a particular user or group either explicitly or implicitly through a hierarchy.
I've considered solving this a couple different ways with no luck. Here are the two approaches I've tried:
Using a function within a query
With Dejan's help in an email thread, I was able to devise a function that would walk it's way down the ownership graph. What this attempted to do was build a flat array of IDs which represented explicit and implicit owners (i.e. root + descendants):
declare function hierarchy(doc, owners){
owners = owners || [];
while(doc != null) {
let ownerId = id(doc)
if(ownerId.startsWith('user:')) {
owners.push(ownerId);
} else if(ownerId.startsWith('group:')) {
owners.push(ownerId);
doc.Members.forEach(m => {
let owner = load(m, 'Users') || load(m, 'Groups');
owners = hierarchy(owner, owners);
});
}
}
return owners;
}
I had two issues with this. 1. I don't actually know how to use this in a query lol. I tried to use it as part of the where clause but apparently that's not allowed:
from #all_docs as d
where hierarchy(d) = 'group:my-group-d'
// error: method hierarchy not allowed
Or if I tried anything in the select statement, I got an error that I have exceeded the number of allowed statements.
As a custom index
I tried the same idea through a custom index. Essentially, I tried to create an index that would produce an array of IDs using roughly the same function above, so that I could just query where my id was in that array
map('#all_docs', function(doc) {
function hierarchy(n, graph) {
while(n != null) {
let ownerId = id(n);
if(ownerId.startsWith('user:')) {
graph.push(ownerId);
return graph;
} else if(ownerId.startsWith('group:')){
graph.push(ownerId);
n.Members.forEach(g => {
let owner = load(g, 'Groups') || load(g, 'Users');
hierarchy(owner, graph);
});
return graph;
}
}
}
function distinct(value, index, self){ return self.indexOf(value) === index; }
let ownerGraph = []
if(doc.Owner) {
let owner = load(doc.Owner, 'Groups') || load(doc.Owner, 'Users');
ownerGraph = hierarchy(owner, ownerGraph).filter(distinct);
}
return { Owners: ownerGraph };
})
// error: recursion is not allowed by the javascript host
The problem with this is that I'm getting an error that recursion is not allowed.
So I'm stumped now. Am I going about this wrong? I feel like this could be a subquery of sorts or a filter by function, but I'm not sure how to do that either. Am I going to have to do this in two separate queries (i.e. two round-trips), one to get the IDs and the other to get the docs?
Update 1
I've revised my attempt at the index to the following and I'm not getting the recursion error anymore, but assuming my queries are correct, it's not returning anything
// Entity/ByOwnerGraph
map('#all_docs', function(doc) {
function walkGraph(ownerId) {
let owners = []
let idsToProcess = [ownerId]
while(idsToProcess.length > 0) {
let current = idsToProcess.shift();
if(current.startsWith('user:')){
owners.push(current);
} else if(current.startsWith('group:')) {
owners.push(current);
let group = load(current, 'Groups')
if(!group) { continue; }
idsToProcess.concat(group.Members)
}
}
return owners;
}
let owners = [];
if(doc.Owner) {
owners.concat(walkGraph(doc.Owner))
}
return { Owners: owners };
})
// query (no results)
from index Entity/ByOwnerGraph as x
where x.Owners = "group:my-group-id"
// alternate query (no results)
from index Entity/ByOwnerGraph as x
where x.Owners ALL IN ("group:my-group-id")
I still can't use this approach in a query either as I get the same error that there are too many statements.

What are the advantages of using tf.train.SequenceExample over tf.train.Example for variable length features?

Recently I read this guide on undocumented featuers in TensorFlow, as I needed to pass variable length sequences as input. However, I found the protocol for tf.train.SequenceExample relatively confusing (especially due to lack of documentation), and managed to build an input pipe using tf.train.Example just fine instead.
Are there any advantages to using tf.train.SequenceExample? Using the standard example protocol when there is a dedicated one for variable length sequences seems like a cheat, but does it bear any consequence?
Here are the definitions of the Example and SequenceExample protocol buffers, and all the protos they may contain:
message BytesList { repeated bytes value = 1; }
message FloatList { repeated float value = 1 [packed = true]; }
message Int64List { repeated int64 value = 1 [packed = true]; }
message Feature {
oneof kind {
BytesList bytes_list = 1;
FloatList float_list = 2;
Int64List int64_list = 3;
}
};
message Features { map<string, Feature> feature = 1; };
message Example { Features features = 1; };
message FeatureList { repeated Feature feature = 1; };
message FeatureLists { map<string, FeatureList> feature_list = 1; };
message SequenceExample {
Features context = 1;
FeatureLists feature_lists = 2;
};
An Example contains a Features, which contains a mapping from feature name to Feature, which contains either a bytes list, or a float list or an int64 list.
A SequenceExample also contains a Features, but it also contains a FeatureLists, which contains a mapping from list name to FeatureList, which contains a list of Feature. So it can do everything an Example can do, and more. But do you really need that extra functionality? What does it do?
Since each Feature contains a list of values, a FeatureList is a list of lists. And that's the key: if you need lists of lists of values, then you need SequenceExample.
For example, if you handle text, you can represent it as one big string:
from tensorflow.train import BytesList
BytesList(value=[b"This is the first sentence. And here's another."])
Or you could represent it as a list of words and tokens:
BytesList(value=[b"This", b"is", b"the", b"first", b"sentence", b".", b"And", b"here",
b"'s", b"another", b"."])
Or you could represent each sentence separately. That's where you would need a list of lists:
from tensorflow.train import BytesList, Feature, FeatureList
s1 = BytesList(value=[b"This", b"is", b"the", b"first", b"sentence", b"."])
s2 = BytesList(value=[b"And", b"here", b"'s", b"another", b"."])
fl = FeatureList(feature=[Feature(bytes_list=s1), Feature(bytes_list=s2)])
Then create the SequenceExample:
from tensorflow.train import SequenceExample, FeatureLists
seq = SequenceExample(feature_lists=FeatureLists(feature_list={
"sentences": fl
}))
And you can serialize it and perhaps save it to a TFRecord file.
data = seq.SerializeToString()
Later, when you read the data, you can parse it using tf.io.parse_single_sequence_example().
The link you provided lists some benefits. You can see how parse_single_sequence_example is used here https://github.com/tensorflow/magenta/blob/master/magenta/common/sequence_example_lib.py
If you managed to get the data into your model with Example, it should be fine. SequenceExample just gives a little more structure to your data and some utilities for working with it.

Why can I not use Continuation when using a proxy class to access MS CRM 2013?

So I have a standard service reference proxy calss for MS CRM 2013 (i.e. right-click add reference etc...) I then found the limitation that CRM data calls limit to 50 results and I wanted to get the full list of results. I found two methods, one looks more correct, but doesn't seem to work. I was wondering why it didn't and/or if there was something I'm doing incorrectly.
Basic setup and process
crmService = new CrmServiceReference.MyContext(new Uri(crmWebServicesUrl));
crmService.Credentials = System.Net.CredentialCache.DefaultCredentials;
var accountAnnotations = crmService.AccountSet.Where(a => a.AccountNumber = accountNumber).Select(a => a.Account_Annotation).FirstOrDefault();
Using Continuation (something I want to work, but looks like it doesn't)
while (accountAnnotations.Continuation != null)
{
accountAnnotations.Load(crmService.Execute<Annotation>(accountAnnotations.Continuation.NextLinkUri));
}
using that method .Continuation is always null and accountAnnotations.Count is always 50 (but there are more than 50 records)
After struggling with .Continutation for a while I've come up with the following alternative method (but it seems "not good")
var accountAnnotationData = accountAnnotations.ToList();
var accountAnnotationFinal = accountAnnotations.ToList();
var index = 1;
while (accountAnnotationData.Count == 50)
{
accountAnnotationData = (from a in crmService.AnnotationSet
where a.ObjectId.Id == accountAnnotationData.First().ObjectId.Id
select a).Skip(50 * index).ToList();
accountAnnotationFinal = accountAnnotationFinal.Union(accountAnnotationData).ToList();
index++;
}
So the second method seems to work, but for any number of reasons it doesn't seem like the best. Is there a reason .Continuation is always null? Is there some setup step I'm missing or some nice way to do this?
The way to get the records from CRM is to use paging here is an example with a query expression but you can also use fetchXML if you want
// Query using the paging cookie.
// Define the paging attributes.
// The number of records per page to retrieve.
int fetchCount = 3;
// Initialize the page number.
int pageNumber = 1;
// Initialize the number of records.
int recordCount = 0;
// Define the condition expression for retrieving records.
ConditionExpression pagecondition = new ConditionExpression();
pagecondition.AttributeName = "address1_stateorprovince";
pagecondition.Operator = ConditionOperator.Equal;
pagecondition.Values.Add("WA");
// Define the order expression to retrieve the records.
OrderExpression order = new OrderExpression();
order.AttributeName = "name";
order.OrderType = OrderType.Ascending;
// Create the query expression and add condition.
QueryExpression pagequery = new QueryExpression();
pagequery.EntityName = "account";
pagequery.Criteria.AddCondition(pagecondition);
pagequery.Orders.Add(order);
pagequery.ColumnSet.AddColumns("name", "address1_stateorprovince", "emailaddress1", "accountid");
// Assign the pageinfo properties to the query expression.
pagequery.PageInfo = new PagingInfo();
pagequery.PageInfo.Count = fetchCount;
pagequery.PageInfo.PageNumber = pageNumber;
// The current paging cookie. When retrieving the first page,
// pagingCookie should be null.
pagequery.PageInfo.PagingCookie = null;
Console.WriteLine("#\tAccount Name\t\t\tEmail Address");while (true)
{
// Retrieve the page.
EntityCollection results = _serviceProxy.RetrieveMultiple(pagequery);
if (results.Entities != null)
{
// Retrieve all records from the result set.
foreach (Account acct in results.Entities)
{
Console.WriteLine("{0}.\t{1}\t\t{2}",
++recordCount,
acct.EMailAddress1,
acct.Name);
}
}
// Check for more records, if it returns true.
if (results.MoreRecords)
{
// Increment the page number to retrieve the next page.
pagequery.PageInfo.PageNumber++;
// Set the paging cookie to the paging cookie returned from current results.
pagequery.PageInfo.PagingCookie = results.PagingCookie;
}
else
{
// If no more records are in the result nodes, exit the loop.
break;
}
}

SQL to Magento model understanding

Understanding Magento Models by reference of SQL:
select * from user_devices where user_id = 1
select * from user_devices where device_id = 3
How could I perform the same using my magento models? getModel("module/userdevice")
Also, how can I find the number of rows for each query
Following questions have been answered in this thread.
How to perform a where clause ?
How to retrieve the size of the result set ?
How to retrieve the first item in the result set ?
How to paginate the result set ? (limit)
How to name the model ?
You are referring to Collections
Some references for you:
http://www.magentocommerce.com/knowledge-base/entry/magento-for-dev-part-5-magento-models-and-orm-basics
http://alanstorm.com/magento_collections
http://www.magentocommerce.com/wiki/1_-_installation_and_configuration/using_collections_in_magento
lib/varien/data/collection/db.php and lib/varien/data/collection.php
So, assuming your module is set up correctly, you would use a collection to retrieve multiple objects of your model type.
Syntax for this is:
$yourCollection = Mage::getModel('module/userdevice')->getCollection()
Magento has provided some great features for developers to use with collections. So your example above is very simple to achieve:
$yourCollection = Mage::getModel('module/userdevice')->getCollection()
->addFieldToFilter('user_id', 1)
->addFieldToFilter('device_id', 3);
You can get the number of objects returned:
$yourCollection->count() or simply count($yourCollection)
EDIT
To answer the question posed in the comment: "what If I do not require a collection but rather just a particular object"
This depends if you still require both conditions in the original question to be satisfied or if you know the id of the object you wish to load.
If you know the id of the object then simply:
Mage::getModel('module/userdevice')->load($objectId);
but if you wish to still load based on the two attributes:
user_id = 1
device_id = 3
then you would still use a collection but simply return the first object (assuming that only one object could only ever satisfy both conditions).
For reuse, wrap this logic in a method and place in your model:
public function loadByUserDevice($userId, $deviceId)
{
$collection = $this->getResourceCollection()
->addFieldToFilter('user_id', $userId)
->addFieldToFilter('device_id', $deviceId)
->setCurPage(1)
->setPageSize(1)
;
foreach ($collection as $obj) {
return $obj;
}
return false;
}
You would call this as follows:
$userId = 1;
$deviceId = 3;
Mage::getModel('module/userdevice')->loadByUserDevice($userId, $deviceId);
NOTE:
You could shorten the loadByUserDevice to the following, though you would not get the benefit of the false return value should no object be found:
public function loadByUserDevice($userId, $deviceId)
{
$collection = $this->getResourceCollection()
->addFieldToFilter('user_id', $userId)
->addFieldToFilter('device_id', $deviceId)
;
return $collection->getFirstItem();
}

Proper Way to Retrieve More than 128 Documents with RavenDB

I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}
It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.
The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default
The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging
Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.