Determine request Uri from WCF Data Services LINQ query for FirstOrDefault against Azure without executing it? - wcf

Problem
I would like to trace the Uri that will be generated by a LINQ query executed against a Microsoft.WindowsAzure.StorageClient.TableServiceContext object. TableServiceContext just extends System.Data.Services.Client.DataServiceContext with a couple of properties.
The issue I am having is that the query executes fine against our Azure Table Storage instance when we run the web role on a dev machine in debug mode (we are connecting to Azure storage in the cloud not using Dev Storage). I can get the resulting query Uri using Fiddler or just hovering over the statement in the debugger.
However, when we deploy the web role to Azure the query fails against the exact same Azure Table Storage source with a ResourceNotFound DataServiceClientException. We have had ResoureNotFound errors before that dealt with the behavior of FirstOrDefault() on empty tables. This is not the problem here.
As one approach to the problem, I wanted to compare the query Uri that is being generated when the web role is deployed versus when it is running on a dev machine.
Question
Does anyone know a way to get the query Uri for the query that will be sent when the FirstOrDefault() method is called. I know that you can call ToString() on the IQueryable returned from the TableServiceContext but my concern is that when FirstOrDefault() is called the Uri might be further optimized and ToString() on IQueryable might not be what is ultimately sent to the server when FirstOrDefault() is called.
If someone has another approach to the problem I am open to suggestions. It seems to be a general problem with LINQ when trying to determine what will happen when the expression tree is finally evaluated. I am open to suggestions here as well because my LINQ skills could use some improvement.
Sample Code
public void AddSomething(string ProjectID, string Username) {
TableServiceContext context = new TableServiceContext();
var qry = context.Somethings.Where(m => m.RowKey == Username
&& m.PartitionKey == ProjectID);
System.Diagnostics.Trace.TraceInformation(qry.ToString());
// ^ Here I would like to trace the Uri that will be generated
// and sent to the server when the qry.FirstOrDefault() call below is executed.
if (qry.FirstOrDefault() == null) {
// ^ This statement generates an error when the web role is running
// in the fabric
...
}
}
Edit Update and Answer
Steve provided the write answer. Our problem was as exactly described in this post which describes an issue with PartitionKey/RowKey ordering in Single Entity query which was fixed with an update to the Azure OS. This explains the discrepancy between our dev machines and when the web role was deployed to Azure.
When I indicated we had dealt with the ResourceNotFound issue before in our existence checks, we had dealt with it in two ways in our code. One way was using exception handling to deal with the ResourceNotFound error the other way was to put the RowKey first in the LINQ query (as some MS people had indicated was appropriate).
It turns out we have several places where the RowKey was first instead of using the exception handling. We will address this by refactoring our code to target .NET 4 and using the .IgnoreResourceNotFoundException = true property of theTableServiceContext .
Lesson learned (more than once): Don't depend on quirky undocumented behavior.
Aside
We were able to get the query Uri's. They did turn out to be different (as indicated they would be in the blog post). Here are the results:
Query Uri from Dev Fabric
`https://ourproject.table.core.windows.net/Somethings()?$filter=(RowKey eq 'test19#gmail.com') and (PartitionKey eq '41e0c1ae-e74d-458e-8a93-d2972d9ea53c')
Query Uri from Azure Fabric
`https://ourproject.table.core.windows.net/Somethings(RowKey='test19#gmail.com',PartitionKey='41e0c1ae-e74d-458e-8a93-d2972d9ea53c')

I can do one better... I think I know what the problem is. :)
See http://blogs.msdn.com/b/windowsazurestorage/archive/2010/07/26/how-wcf-data-service-changes-in-os-1-4-affects-windows-azure-table-clients.aspx.
Specifically, it used to be the case (in previous Guest OS builds) that if you wrote the query as you did (with the RowKey predicate before the PartitionKey predicate), it resulted in a filter query (while the reverse, PartitionKey preceding RowKey) resulted in the kind of query that raises an exception if the result set is empty.
I think the right fix for you (as indicated in the above blog post) is to set the IgnoreResourceNotFoundException to true on your context.

Related

Elastic APM show total number of SQL Queries executed on .Net Core API Endpoint

Currently have Elastic Apm setup with: app.UseAllElasticApm(Configuration); which is working correctly. I've just been trying to find a way to record exactly how many SQL Queries are run via Entity Framework for each transaction.
Ideally when viewing the Apm data in Kibana the metadata tab could just include an EntityFramework.ExecutedSqlQueriesCount.
Currently on .Net Core 2.2.3
One thing you can use is the Filter API for this.
With that you have access to all transactions and spans before they are sent to the APM Server.
You can't run through all the spans on a given transaction, so you need some tweaking - for this I use a Dictionary in my sample.
var numberOfSqlQueries = new Dictionary<string, int>();
Elastic.Apm.Agent.AddFilter((ITransaction transaction) =>
{
if (numberOfSqlQueries.ContainsKey(transaction.Id))
{
// We make an assumption here: we assume that all SQL requests on a given transaction end before the transaction ends
// this in practice means that you don't do any "fire and forget" type of query. If you do, you need to make sure
// that the numberOfSqlQueries does not leak.
transaction.Labels["NumberOfSqlQueries"] = numberOfSqlQueries[transaction.Id].ToString();
numberOfSqlQueries.Remove(transaction.Id);
}
return transaction;
});
Elastic.Apm.Agent.AddFilter((ISpan span) =>
{
// you can't relly filter whether if it's done by EF Core, or another database library
// but you have all sorts of other info like db instance, also span.subtype and span.action could be helpful to filter properly
if (span.Context.Db != null && span.Context.Db.Instance == "MyDbInstance")
{
if (numberOfSqlQueries.ContainsKey(span.TransactionId))
numberOfSqlQueries[span.TransactionId]++;
else
numberOfSqlQueries[span.TransactionId] = 1;
}
return span;
});
Couple of thing here:
I assume you don't do "fire and forget" type of queries, if you do, you need to handle those extra
The counting isn't really specific to EF Core queries, but you have info like db name, database type (mssql, etc.) - hopefully based on that you'll be able filter the queries you want.
With transaction.Labels["NumberOfSqlQueries"] we add a label to the given transction, and you'll be able to see this data on the transaction in Kibana.

Enable Impala Impersonation on Superset

Is there a way to make the logged user (on superset) to make the queries on impala?
I tried to enable the "Impersonate the logged on user" option on Databases but with no success because all the queries run on impala with superset user.
I'm trying to achieve the same! This will not completely answer this question since it does not still work but I want to share my research in order to maybe help another soul that is trying to use this instrument outside very basic use cases.
I went deep in the code and I found out that impersonation is not implemented for Impala. So you cannot achieve this from the UI. I found out this PR https://github.com/apache/superset/pull/4699 that for whatever reason was never merged into the codebase and tried to copy&paste code in my Superset version (1.1.0) but it didn't work. Adding some logs I can see that the configuration with the impersonation is updated, but then the actual Impala query is with the user I used to start the process.
As you can imagine, I am a complete noob at this. However I found out that the impersonation thing happens when you create a cursor and there is a constructor parameter in which you can pass the impersonation configuration.
I managed to correctly (at least to my understanding) implement impersonation for the SQL lab part.
In the sql_lab.py class you have to add in the execute_sql_statements method the following lines
with closing(engine.raw_connection()) as conn:
# closing the connection closes the cursor as well
cursor = conn.cursor(**database.cursor_kwargs)
where cursor_kwargs is defined in db_engine_specs/impala.py as the following
#classmethod
def get_configuration_for_impersonation(cls, uri, impersonate_user, username):
logger.info(
'Passing Impala execution_options.cursor_configuration for impersonation')
return {'execution_options': {
'cursor_configuration': {'impala.doas.user': username}}}
#classmethod
def get_cursor_configuration_for_impersonation(cls, uri, impersonate_user,
username):
logger.debug('Passing Impala cursor configuration for impersonation')
return {'configuration': {'impala.doas.user': username}}
Finally, in models/core.py you have to add the following bit in the get_sqla_engine def
params = extra.get("engine_params", {}) # that was already there just for you to find out the line
self.cursor_kwargs = self.db_engine_spec.get_cursor_configuration_for_impersonation(
str(url), self.impersonate_user, effective_username) # this is the line I added
...
params.update(self.get_encrypted_extra()) # already there
#new stuff
configuration = {}
configuration.update(
self.db_engine_spec.get_configuration_for_impersonation(
str(url),
self.impersonate_user,
effective_username))
if configuration:
params.update(configuration)
As you can see I just shamelessy pasted the code from the PR. However this kind of works only for the SQL lab as I already said. For the dashboards there is an entirely different way of querying Impala that I did not still find out.
This means that queries for the dashboards are handled in a different way and there isn't something like this
with closing(engine.raw_connection()) as conn:
# closing the connection closes the cursor as well
cursor = conn.cursor(**database.cursor_kwargs)
My gut (and debugging) feeling is that you need to first understand the sqlalchemy part and extend a new ImpalaEngine class that uses a custom cursor with the impersonation conf. Or something like that, however it is not simple (if we want to call this simple) as the sql_lab part. So, the trick is to find out where the query is executed and create a cursor with the impersonation configuration. Easy, isnt'it ?
I hope that this could shed some light to you and the others that have this issue. Let me know if you did find out another way to solve this issue, or if this comment was useful.
Update: something really useful
A colleague of mine succesfully implemented impersonation with impala without touching any superset related, but instead working directly with the impyla lib. A PR was open with the code to change. You can apply the patch directly in the impyla src used by superset. You have to edit both dbapi.py and hiveserver2.py.
As a reminder: we are still testing this and we do not know if it works with different accounts using the same superset instance.

Changing the GemFire query ResultSender batch size

I am experiencing a performance issue related to the default batch size of the query ResultSender using client/server config. I believe the default value is 100.
If I run a simple query to get keys (with some order by columns due to the PARTITION Region type), this default batch size causes too many chunks being sent back for even 1000 records. In my tests, even the total query time is only less than 100 ms, however, the app takes more than 10 seconds to process those chunks.
Reading between the lines in your problem statement, it seems you are:
Executing an OQL query on a PARTITION Region (PR).
Running the query inside a Function as recommended when executing queries on a PR.
Sending batch results (as opposed to streaming the results).
I also assume since you posted exclusively in the #spring-data-gemfire channel, that you are using Spring Data GemFire (SDG) to:
Execute the query (e.g. by using the SDG GemfireTemplate; Of course, I suppose you could also be using the GemFire Query API inside your Function directly, too)?
Implemented the server-side Function using SDG's Function annotation support?
And, are possibly (indirectly) using SDG's BatchingResultSender, as described in the documentation?
NOTE: The default batch size in SDG is 0, NOT 100. Zero means stream the results individually.
Regarding #2 & #3, your implementation might look something like the following:
#Component
class MyApplicationFunctions {
#GemfireFunction(id = "MyFunction", batchSize = "1000")
public List<SomeApplicationType> myFunction(FunctionContext functionContext) {
RegionFunctionContext regionFunctionContext =
(RegionFunctionContext) functionContext;
Region<?, ?> region = regionFunctionContext.getDataSet();
if (PartitionRegionHelper.isPartitionRegion(region)) {
region = PartitionRegionHelper.getLocalDataForContext(regionFunctionContext);
}
GemfireTemplate template = new GemfireTemplate(region);
String OQL = "...";
SelectResults<?> results = template.query(OQL); // or `template.find(OQL, args);`
List<SomeApplicationType> list = ...;
// process results, convert to SomeApplicationType, add to list
return list;
}
}
NOTE: Since you are most likely executing this Function "on Region", the FunctionContext type will actually be a RegionFunctionContext in this case.
The batchSize attribute on the SDG #GemfireFunction annotation (used for Function "implementations") allows you to control the batch size.
Of course, instead of using SDG's GemfireTemplate to execute queries, you can, of course, use the GemFire Query API directly, as mentioned above.
If you need even more fine grained control over "result sending", then you can simply "inject" the ResultSender provided by GemFire to the Function, even if the Function is implemented using SDG, as shown above. For example you can do:
#Component
class MyApplicationFunctions {
#GemfireFunction(id = "MyFunction")
public void myFunction(FunctionContext functionContext, ResultSender resultSender) {
...
SelectResults<?> results = ...;
// now process the results and use the `resultSender` directly
}
}
This allows you to "send" the results however you see fit, as required by your application.
You can batch/chunk results, stream, whatever.
Although, you should be mindful of the "receiving" side in this case!
The 1 thing that might not be apparent to the average GemFire user is that GemFire's default ResultCollector implementation collects "all" the results first before returning them to the application. This means the receiving side does not support streaming or batching/chunking of the results, allowing them to be processed immediately when the server sends the results (either streamed, batched/chunked, or otherwise).
Once again, SDG helps you out here since you can provide a custom ResultCollector on the Function "execution" (client-side), for example:
#OnRegion("SomePartitionRegion", resultCollector="myResultCollector")
interface MyApplicationFunctionExecution {
void myFunction();
}
In your Spring configuration, you would then have:
#Configuration
class ApplicationGemFireConfiguration {
#Bean
ResultCollector myResultCollector() {
return ...;
}
}
Your "custom" ResultCollector could return results as a stream, a batch/chunk at a time, etc.
In fact, I have prototyped a "streaming" ResultCollector implementation that will eventually be added to SDG, here.
Anyway, this should give you some ideas on how to handle the performance problem you seem to be experiencing. 1000 results is not a lot of data so I suspect your problem is mostly self-inflicted.
Hope this helps!
John,
Just to clarify, I use client/server topology(actually wan, but that is not important in here). My client is a spring boot web app which has kendo grid as ui. Users can filter/sort on any combination of the columns, which will be passed to the spring boot app for generating dynamic OQL and create the pagination. Till now, except for being dynamic, my OQL queries are quite straight forward. I do not want to introduce server side functions due to the complexity of our global deployment process. But I can if you think that is something I have to do.
Again, thanks for your answers.

GemFire getRegion() returns null whereas OQL query gives result

I am using Pivotal GemFire 9.0.0 with 1 Locator and 1 Server. The Server has a Region called "submissions", like below -
<gfe:replicated-region id="submissionsRegion" name="submissions"
statistics="true" template="replicateRegionTemplate">
...
</gfe:replicated-region>
I am getting Region as null when executing the following code -
Region<K, V> region = clientCache.getRegion("submissions");
Surprisingly, the same ClientCache returns all the records when I query using OQL and QueryService as shown below -
String queryString = "SELECT * FROM /submissions";
QueryService queryService = clientCache.getQueryService();
Query query = queryService.newQuery(queryString);
SelectResults results = (SelectResults) query.execute();
I am initializing my ClientCache like this -
ClientCache clientCache = new ClientCacheFactory()
.addPoolLocator("localhost", 10479)
.set("name", "MyClientCache")
.set("log-level", "error")
.create();
I am really baffled by this. Any pointer or help would be great.
You need to configure your ClientCache (either through a cache.xml or pure GemFire API) with the regions as well. Using your example:
ClientRegionFactory regionFactory = clientCache.createClientRegionFactory(ClientRegionShortcut.PROXY);
Region region = regionFactory.create("submissions");
The ClientRegionShortcut.PROXY is used just for the sake of simplicity, you should use the shortcut that meets your needs.
The OQL works as expected because you are obtaining the QueryService through the ClientCache.getQueryService() method (instead of ClientCache.getLocalQueryService()), so the query is actually executed on Server Side.
You can get more information about how to configure the Client/Server topology in
Client/Server Configuration.
Hope this helps.
Cheers.
Yes, you need to "define" the corresponding client-side Region, matching the server-side REPLICATE Region by name (i.e. "submissions"). Actually this is a requirement independent of the server Regions' DataPolicy type (e.g. REPLICATE or PARTITION).
This is necessary since not every client wants to know about or even needs have data/events from every possible server Region. Of course, this is also configurable through subscription and "Interests Registration" (with Client/Server Event Messaging, or alternatively, CQs).
Anyway, you can completely avoid the use of the GemFire API directly or even GemFire's native cache.xml (highly recommend avoiding) by using either SDG's XML namespace...
<gfe:client-cache properties-ref="gemfireProperties" ... />
<gfe:client-region id="submissions" shortcut="PROXY"/>
Or by using Spring JavaConfig with SDG's API...
#Configuration
class GemFireConfiguration {
Properties gemfireProperties() {
Properties gemfireProperties = new Properties();
gemfireProperties.setProperty("log-level", "config");
...
return gemfireProperties;
}
#Bean
ClientCacheFactoryBean gemfireCache() {
ClientCacheFactoryBean gemfireCache = new ClientCacheFactoryBean();
gemfireCache.setClose(true);
gemfireCache.setProperties(gemfireProperties());
...
return gemfireCache;
}
#Bean(name = "submissions");
ClientRegionFactoryBean submissionsRegion(GemFireCache gemfireCache) {
ClientRegionFactoryBean submissions = new ClientRegionFactoryBean();
submissions.setCache(gemfireCache);
submissions.setClose(false);
submissions.setShortcut(ClientRegionShortcut.PROXY);
...
return submissions;
}
...
}
The "submissions" Region can be wrapped with SDG's GemfireTemplate, which will handle getting the "correct" QueryService on your behalf when running queries using the find(..) method.
Of course, you may be interested in making your client "submissions" Region a CACHING_PROXY" too. Of course, you will then need to register "interests" in the keys or data of interests. CQs are the best way to do this as it uses query criteria to define the data of "interests".
CACHING_PROXY is exactly as it sounds, caching data locally in the client based on the interests policies. This also gives you the ability to use the "local" QueryService to query data locally, avoiding the network hop.
Anyway, many options here.
Cheers,
John

JayData oData request with custom headers - ROUND 2

Few month back I was working on some Odata WCF project and I had some problems with parsing custom headers for token auth (apiKey).
At that time, being quite a noob (still am!), I posted this SO question: JayData oData request with custom headers
Today I am working on a new project with Jaydata Odata server and client library and this:
application.context.prepareRequest = function (r) {
r[0].headers['apikey'] = '123456';
};
was working fine till I had to do a MERGE request. I found out that somehow MERGE request was overriding my headers so I investigated further.
It appears at first that in the oDataProvider.js (~line 617) in the _saveRest method the headers are not inherited:
request = {
requestUri: this.providerConfiguration.oDataServiceHost + '/',
headers: {
MaxDataServiceVersion: this.providerConfiguration.maxDataServiceVersion
}
};
but a few lines later we get:
this.context.prepareRequest.call(this, requestData);
which "should" call my own prepareRequest, but doesnt... Instead it still points to:
//Line 11302 jaydata.js
prepareRequest: function () { },
which of course does... nothing! Funnilly enough, when you execute a simple GET the same code supposedly on the same context instance works and points to my prepareRequest override.
I can assert with enough confidence that somehow the context between GET/MERGE is not the same instance. I cant see, however, any place where the context instance is reassigned.
Has anyone got a clue?
PS: this is NOT a CORS issue. My OPTIONS is passing fine and manually feeding the headers in oDataProvider works.
More
I followed the lead on different context instances and found something interesting. calling EntitySet.save() ends up calling the EntityContext constructor. see trace:
$data.Class.define.constructor (jaydata.js:10015)
EntityContext (VM110762:7)
Service (VM110840:8)
storeToken.factory (jaydata.js:14166)
$data.Class.define._getContextPromise (jaydata.js:13725)
$data.Class.define._getStoreContext (jaydata.js:13700)
$data.Class.define._getStoreEntitySet (jaydata.js:13756)
$data.Class.define.EntityInstanceSave (jaydata.js:13837)
$data.Entity.$data.Class.define.save (jaydata.js:9774)
(anonymous function) (app.js:162) // My save()
That explains why I get two different instances...
Hack
Replacing the prepareRequest function direcly in the class definition works, but its ugly!
for now I can cope with this:
$data.EntityContext.prototype.prepareRequest = function (r) {
r[0].headers['apikey'] = '12345';
};
This works fine as long as you only need to talk to a single endpoint.
Final word based on my experience
As much as I like JayData, it is obvious that they created a monster and its getting out of their hands (poor forum, no community, half-documented,...).
I chose JD because I was lazy and wanted to keep working with my old WCF DataService. Switching to Web API seemed wrong or too much work for me.
Also as a .net dev I liked strong typing of my entities and the ability to work with a concrete model generated from the JD tools. However, in the end, I was adding confusion. Every time my server side model changed I had to fetch the new metadata and scaffold a new entityModel.
I ended up by switching to Web Api and migrated my data service layer to Breeze. And seriously! its a breeze to work with it!
The documentation is absolutely brilliant and here on S.O you can always count on Ward or Jay Tarband to reply with a very high amount of professionalism.
In the end I realize this should probably be more a wiki than a Question.....