python script to connect to mongodb master

python script to connect to mongodb master - pymongo

I need to write a script to deploy Databases/Collections/Indexes.
Boto gives me a list of IPs to connect to.
What is the preferred way to figure out which mongo instance from a list is primary using pymongo? Should I loop through them or is there a more elegant approach?

primary = ''
for inst in instances:
client = MongoClient(inst, 8000)
if client.is_primary:
primary = inst
return primary

Related

How can I open my child database with FaunaDB Shell?

I have a FaunaDB database of "RaspberryPi" and their child database of "00000000790f4c7c" as following:
So how can I open the child database "00000000790f4c7c"?
I've tried to open 00000000790f4c7c and RaspberryPi/00000000790f4c7c, but both just get errored.
MacBook-Air:~ takeyuki$ fauna shell RaspberryPi/00000000790f4c7c
Error: Database 'RaspberryPi/00000000790f4c7c' doesn't exist
MacBook-Air:~ takeyuki$ fauna shell 00000000790f4c7c
Error: Database '00000000790f4c7c' doesn't exist
Thank you for your kindly help!

Unfortunately the shell doesn't have great support for nested databases at the moment. You want to either create an endpoint to the parent say "RaspberryPi" with an admin key then invoke fauna shell 00000000790f4c7c or you can create a key inside RaspberryPi with CreateKey({role: "server", database: "00000000790f4c7c"}) and create and endpoint with that secret, or access it directly with fauna shell --secret XXX where XXX is the secret from the created key.
The key (no pun intended) is that whatever your current endpoint is fauna shell $db will try to access a database $db nested inside the database point at by that endpoint. By default that's / so fauna shell $db lands in /$db if you have an endpoint /$parent and invoke fauna shell $child then you'll end up in /$parent/$child. If you leave off $db then you end up in whatever database the endpoint is pointing at. so if you have an endpoint n components deep you have access to it and all it's children at n + 1 only.
Better support for nested databases is on the roadmap, because that's not particularly ergonomic.

Idiomatic approach to conditional update of key

I'd like to use Redis to cache the most recent piece of data that a user has sent to me. However, I can't just use SET, because the user may send data out of order, I need to condition the SET based on the value of another key, e.g.:
latest_timestamp = GET "latest_timestamp:<new_data.user_id>"
if latest_timestamp < new_data.timestamp {
SET "latest_timestamp:<new_data.user_id>" new_data.timestamp
SET "latest_data:<new_data.user_id>" new_data.to_string()
}
What is the idiomatic way to handle this situation?

A server-side Lua script (see EVAL) is the idiomatic-est approach IMO.
Make sure that your code passes the full names (i.e. does all substitutions) of both keys, as well as the new timestamp and the new data as arguments. The script should look something like this:
local lts = tonumber(redis.call('GET', KEYS[1]))
local nts = tonumber(ARGV[1])
if lts < nts then
redis.call('SET', KEYS[1], nts)
redis.call('SET, KEYS[2], ARGV[2])
end

How to list all databases and tables in AWS Glue Catalog?

I created a Development Endpoint in the AWS Glue console and now I have access to SparkContext and SQLContext in gluepyspark console.
How can I access the catalog and list all databases and tables? The usual sqlContext.sql("show tables").show() does not work.
What might help is the CatalogConnection Class but I have no idea in which package it is. I tried importing from awsglue.context and no success.

I spend several hours trying to find some info about CatalogConnection class but haven't found anything. (Even in the aws-glue-lib repository https://github.com/awslabs/aws-glue-libs)
In my case I needed table names in Glue Job Script console
Finally I used boto library and retrieved database and table names with Glue client:
import boto3
client = boto3.client('glue',region_name='us-east-1')
responseGetDatabases = client.get_databases()
databaseList = responseGetDatabases['DatabaseList']
for databaseDict in databaseList:
databaseName = databaseDict['Name']
print '\ndatabaseName: ' + databaseName
responseGetTables = client.get_tables( DatabaseName = databaseName )
tableList = responseGetTables['TableList']
for tableDict in tableList:
tableName = tableDict['Name']
print '\n-- tableName: '+tableName
Important thing is to setup the region properly
Reference:
get_databases - http://boto3.readthedocs.io/en/latest/reference/services/glue.html#Glue.Client.get_databases
get_tables - http://boto3.readthedocs.io/en/latest/reference/services/glue.html#Glue.Client.get_tables

Glue returns back one page per response. If you have more than 100 tables, make sure you use NextToken to retrieve all tables.
def get_glue_tables(database=None):
next_token = ""
while True:
response = glue_client.get_tables(
DatabaseName=database,
NextToken=next_token
)
for table in response.get('TableList'):
print(table.get('Name'))
next_token = response.get('NextToken')
if next_token is None:
break

The boto3 api also supports pagination, so you could use the following instead:
import boto3
glue = boto3.client('glue')
paginator = glue.get_paginator('get_tables')
page_iterator = paginator.paginate(
DatabaseName='database_name'
)
for page in page_iterator:
print(page['TableList'])
That way you don't have to mess with while loops or the next token.

Default project id in BigQuery Java API

I am performing a query using the BigQuery Java API with the following code:
try (FileInputStream input = new FileInputStream(serviceAccountKeyFile)) {
GoogleCredentials credentials = GoogleCredentials.fromStream(input);
BigQuery bigQuery = BigQueryOptions.newBuilder()
.setCredentials(credentials)
.build()
.getService();
QueryRequest request = QueryRequest.of("SELECT * FROM foo.Bar");
QueryResponse response = bigQuery.query(request);
// Handle the response ...
}
Notice that I am using a specific service account whose key file is given by serviceAccountKeyFile.
I was expecting that the API would pick up the project_id from the key file. But it is actually picking up the project_id from the default key file referenced by the GOOGLE_APPLICATION_CREDENTIALS environment variable.
This seems like a bug to me. Is there a way to workaround the bug by setting the default project explicitly?

Yeah, that doesn't sound right at all. It does sound like a bug. I always just use the export the GOOGLE_APPLICATION_CREDENTIALS environment variable in our applications.
Anyway, you try explicitly setting the project id to see if it works:
BigQuery bigQuery = BigQueryOptions.newBuilder()
.setCredentials(credentials)
.setProjectId("project-id") //<--try setting it here
.build()
.getService();

I don't believe the project is coming from GOOGLE_APPLICATION_CREDENTIALS. I suspect that the project being picked up is the gcloud default project set by gcloud init or gcloud config set project.
From my testing, BigQuery doesn't use a project where the service account is created. I think the key is used only for authorization, and you always have to set a target project. There are a number of ways:
.setProjectId(<target-project>) in the builder
Define GOOGLE_CLOUD_PROJECT
gcloud config set project <target-project>
The query job will then be created in target-project. Of course, your service key should have access to target-project, which may or may not be the same project where your key is created. That is, you can run a query on projects other than the project where your key is created, as long as your key has permission to do so.

connect to two databases in dhis.conf

I need to deploy a second instance of DHIS2 on my server. I already have the first one running very well.
The challenge I have is that DHIS2 only uses one configuration file with the code below. I am confused how to setup a connection to my second database.
Please advise.
connection.dialect = org.hibernate.dialect.PostgreSQLDialect
connection.driver_class = org.postgresql.Driver
connection.url = jdbc:postgresql:millenium
connection.username = dhis
connection.password = dhis
connection.schema = update
encryption.password = abcd

You can run your instances with separate environment variables to point to two different DHIS2_HOME paths. Each path can contain it's own dhis2.conf, with it's own database.
For example, if you are using Tomcat to host your instances, you can set the "setenv.sh" file to set up the DHIS2_HOME variable.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

python script to connect to mongodb master - pymongo

I need to write a script to deploy Databases/Collections/Indexes. Boto gives me a list of IPs to connect to. What is the preferred way to figure out which mongo instance from a list is primary using pymongo? Should I loop through them or is there a more elegant approach?

primary = '' for inst in instances: client = MongoClient(inst, 8000) if client.is_primary: primary = inst return primary

Related

How can I open my child database with FaunaDB Shell?

Idiomatic approach to conditional update of key

How to list all databases and tables in AWS Glue Catalog?

Default project id in BigQuery Java API

connect to two databases in dhis.conf

Categories

Resources