dask jobqueue and scheduler_file

dask jobqueue and scheduler_file - dask-distributed

For dask_jobqueue, is it OK to pass a SGECluster and scheduler_file when creating a Client?
Something like this:
client = Client(cluster, scheduler_file='shirley.json')
The reason is really, I want to my workers from dask_jobqueue to run on a specified IP/port. Thanks so much for the help.

Related

Enable Impala Impersonation on Superset

Is there a way to make the logged user (on superset) to make the queries on impala?
I tried to enable the "Impersonate the logged on user" option on Databases but with no success because all the queries run on impala with superset user.

I'm trying to achieve the same! This will not completely answer this question since it does not still work but I want to share my research in order to maybe help another soul that is trying to use this instrument outside very basic use cases.
I went deep in the code and I found out that impersonation is not implemented for Impala. So you cannot achieve this from the UI. I found out this PR https://github.com/apache/superset/pull/4699 that for whatever reason was never merged into the codebase and tried to copy&paste code in my Superset version (1.1.0) but it didn't work. Adding some logs I can see that the configuration with the impersonation is updated, but then the actual Impala query is with the user I used to start the process.
As you can imagine, I am a complete noob at this. However I found out that the impersonation thing happens when you create a cursor and there is a constructor parameter in which you can pass the impersonation configuration.
I managed to correctly (at least to my understanding) implement impersonation for the SQL lab part.
In the sql_lab.py class you have to add in the execute_sql_statements method the following lines
with closing(engine.raw_connection()) as conn:
# closing the connection closes the cursor as well
cursor = conn.cursor(**database.cursor_kwargs)
where cursor_kwargs is defined in db_engine_specs/impala.py as the following
#classmethod
def get_configuration_for_impersonation(cls, uri, impersonate_user, username):
logger.info(
'Passing Impala execution_options.cursor_configuration for impersonation')
return {'execution_options': {
'cursor_configuration': {'impala.doas.user': username}}}
#classmethod
def get_cursor_configuration_for_impersonation(cls, uri, impersonate_user,
username):
logger.debug('Passing Impala cursor configuration for impersonation')
return {'configuration': {'impala.doas.user': username}}
Finally, in models/core.py you have to add the following bit in the get_sqla_engine def
params = extra.get("engine_params", {}) # that was already there just for you to find out the line
self.cursor_kwargs = self.db_engine_spec.get_cursor_configuration_for_impersonation(
str(url), self.impersonate_user, effective_username) # this is the line I added
...
params.update(self.get_encrypted_extra()) # already there
#new stuff
configuration = {}
configuration.update(
self.db_engine_spec.get_configuration_for_impersonation(
str(url),
self.impersonate_user,
effective_username))
if configuration:
params.update(configuration)
As you can see I just shamelessy pasted the code from the PR. However this kind of works only for the SQL lab as I already said. For the dashboards there is an entirely different way of querying Impala that I did not still find out.
This means that queries for the dashboards are handled in a different way and there isn't something like this
with closing(engine.raw_connection()) as conn:
# closing the connection closes the cursor as well
cursor = conn.cursor(**database.cursor_kwargs)
My gut (and debugging) feeling is that you need to first understand the sqlalchemy part and extend a new ImpalaEngine class that uses a custom cursor with the impersonation conf. Or something like that, however it is not simple (if we want to call this simple) as the sql_lab part. So, the trick is to find out where the query is executed and create a cursor with the impersonation configuration. Easy, isnt'it ?
I hope that this could shed some light to you and the others that have this issue. Let me know if you did find out another way to solve this issue, or if this comment was useful.
Update: something really useful
A colleague of mine succesfully implemented impersonation with impala without touching any superset related, but instead working directly with the impyla lib. A PR was open with the code to change. You can apply the patch directly in the impyla src used by superset. You have to edit both dbapi.py and hiveserver2.py.
As a reminder: we are still testing this and we do not know if it works with different accounts using the same superset instance.

Perl6 Redis stuck when ask for output

For some unknown reason (not even in Redis log), this piece of code will stuck forever... Please help..
use v6;
use Redis;
my $redis = Redis.new("127.0.0.1:6379");
$redis.auth("xxxxxxxxx");
$redis.set("key", "value");
say $redis.get("key");
say $redis.info();
$redis.quit();

I wonder if the issue is because the Redis library is a bit old and there's been a few changes to the runtime in the intervening time.
Have you tried Redis::Async? It seems more up to date.

MediaWiki Database: Why the incredibly long response time?

I have been consolidating 3 Databases into one via prefixes in my mediawiki installation. I got three wikis using the same database like so:
en_interwiki
de_interwiki
es_interwiki
Everything works fine out of visitor perspective... but whenever a USER wants to post a new article or commit edits, the database takes up to 35 seconds to respond. This is unacceptable.
I activated debugging like so:
# Debugging:
$wgDBerrorLog = '/var/log/mediawiki/WikiDBerror.log';
$wgShowSQLErrors = true;
$wgDebugDumpSql = true;
$wgDebugLogFile = '/var/log/mediawiki/WikiDebug.log';
$wgShowDBErrorBacktrace = true;
I am getting debug info, and it seems that pagelinks is the culprit, but i am not one hundred percent sure.
Did anyone ever have this issue before?
Please help me!
Best regards,
Max

I could fix it. In my case, the memcache had the wrong port. Everything is back to normal.
In case anyone uses memcache with their MediaWiki installation: Be sure to use the right port on your server, or you will end up like me, with 30 second-wait-times.

Rails 3 - cache web service call

In my application, in the homepage action, I call a specific web service that returns JSON.
parsed = JSON.parse(open("http://myservice").read)
#history = parsed['DATA']
This data will not change more than once per 60 seconds and does not change on a per-visitor basis, so i would like to, ideally, cache the #history variable itself (since the parsing will not result in a new result) and auto invalidate it if it is more than a minute old.
I'm unsure of the best way to do this. The default Rails caching methods all seem to be more oriented towards content that needs to be manually expired. I'm sure there is a quick and easy method to do this, I just don't know what it is!

You can use the built in Rails cache for this:
#history = Rails.cache.fetch('parsed_myservice_data', :expires_in => 1.minute) do
JSON.parse connector.get_response("http://myservice")
end
One problem with this approach is when the rebuilding of the data to be cached takes
quite a long time. If you get many client requests during this time, each of them will
get a cache miss and call your block, resulting in lots of duplicated effort, not to mention slow response times.
EDIT: In Rails 3.x you can pass the option :race_condition_ttl to the fetch method to avoid this problem. Read more about it here.
A good solution to this in previous versions of Rails is to setup a background/cron job to be run at regular intervals that will fetch and parse the data and update the cache.
In your controller or model:
#history = Rails.cache.fetch('parsed_myservice_data') do
JSON.parse connector.get_response("http://myservice")
end
In your background/cron job:
Rails.cache.write('parsed_myservice_data',
JSON.parse connector.get_response("http://myservice"))
This way, your client requests will always get fresh cached data (except for the first
request if the background/cron job hasn't been run yet.)

I don't know of an easy railsy way of doing this. You might want to look into using redis. Redis lets you set expiration times on the data you store in it. Depending on which redis gem you use it'd look something like this:
#history = $redis.get('history')
if not #history
#history = JSON.parse(open("http://myservice").read)['DATA']
$redis.set('history', #history)
$redis.expire('history', 60)
end
Because there's only one redis service this will work for all your rails processes.

We had a similar requirement and we ended up using Squid as a forward proxy for all the webservice calls from the rails server. Squid was configured to have a cache-expiry time of 60 seconds.
http_connection_factory.rb:
class HttpConnectionFactory
def self.connection
AppConfig.use_forward_proxy ? Net::HTTP::Proxy(AppConfig.forward_proxy_host, AppConfig.forward_proxy_port) : Net::HTTP
end
end
In your application's home page action, you can use the proxy instead of making the call directly.
connector = HttpConnectionFactory.connection
parsed = JSON.parse(connector.get_response("http://myservice"))
#history = parsed['DATA']
We had second thoughts about using Redis or Memcache. But, we had several service calls and wanted to avoid all the hassles of generating keys and sweeping them at appropriate times.
So, in our case, the forward proxy took care of all those nitty gritties. Please refer to Squid Wiki for the configuration parameters necessary.

Problems starting an NServiceBus

I've created a very simple NServiceBus console application to send messages. However I cannot start the bus as it complains with a very vague error about 'Object reference not set to an instance of an object'.
Configure config = Configure.With();
config = config.DefaultBuilder();
config = config.BinarySerializer();
config = config.UnicastBus();
IStartableBus startableBus = config.CreateBus();
IBus Bus2 = startableBus.Start(); // **barf**
It's driving me mad, what am I missing? I thought the DefaultBuilder should be filling in any blanks?

Hmm, looks like a reference to ncqrs.NserviceBus is causing it to go wrong even though I'm not actually using it yet

Looks like manually adding the Assemblies in the overload to With() did the trick, not sure what's upsetting it but that's for another day

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

dask jobqueue and scheduler_file - dask-distributed

For dask_jobqueue, is it OK to pass a SGECluster and scheduler_file when creating a Client? Something like this: client = Client(cluster, scheduler_file='shirley.json') The reason is really, I want to my workers from dask_jobqueue to run on a specified IP/port. Thanks so much for the help.

Related

Enable Impala Impersonation on Superset

Perl6 Redis stuck when ask for output

MediaWiki Database: Why the incredibly long response time?

Rails 3 - cache web service call

Problems starting an NServiceBus

Categories

Resources