I'm trying to prototype using the SmartRedis Python client to interact with the SmartSim Orchestrator. Is it possible to launch the orchestrator without any other models in the experiment? If so, what would be the best way to do so?
It is entirely possible to do that. A SmartSim Experiment can contain different types of 'entities' including Models, Ensembles (i.e. groups of Models), and Orchestrator (i.e. the Redis-backed database). None of these entities, however, are 'required' to be in the Experiment.
Here's a short script that creates an experiment which includes only a database.
from SmartSim import Experiment
NUM_DB_NODES = 3
exp = Experiment("Database Only")
db = exp.create_database(db_nodes=NUM_DB_NODES)
exp.generate(db)
exp.start(db)
After this, the Orchestrator (with the number of shards specified by NUM_DB_NODES) will have been spunup. You can then connect the Python client using the following line:
client = smartredis.Client(db.get_address()[0],NUM_DB_NODES>1)
I have below configuration and I wanted to write TC for it in ruby. I am new to ruby and wanted to understand how we can set the configuration of Fog to point to mock and use it in test-case.
class TestUploader < CarrierWave::Uploader::Base
storage :fog
def fog_credentials
{
:provider => 'google',
:google_project =>'my project',
:google_json_key_location =>'myCredentialFile.json'
}
end
def fog_provider
'fog/google'
end
def fog_directory
'{#bucket-name}'
end
def store_dir
when :File
"#{file.getpath}/file"
when :audio
"#{file.getpath}/audio"
else
p " Invalid file "
end
end
end
class TestModel
mount_uploader :images, TestUploader
end
Could someone please assist me from configuring to write and execute the unit test on it with few example. Any help would be really appreciated.
From the test I did, I got the following sample code working with Google Cloud Storage using Fog gem:
require "fog/google"
# Uncomment the following line if you want to use Mock
#Fog.mock!
# Bucket name
bucket = "an-existing-bucket"
# Timestamp used as sample string
test = Time.now.utc.strftime("%Y%m%d%H%M%S")
connection = Fog::Storage.new({
:provider => "Google",
:google_project => "your-project",
:google_json_key_location => "path-to-key.json",
})
# Lists objects in a bucket
puts connection.list_objects(bucket)
#Creates new object
connection.put_object(bucket, test, test)
puts "Object #{test} was created."
It works in production, but fails using mock mode with the following error:
`not_implemented': Contributions welcome! (Fog::Errors::MockNotImplemented)
It seems that it is not implemented as shown at the put_object method definition in the documentation.
Also, this is said in this GitHub issue:
Closing issue. 1.0.0 is out, and we have no more mocks for json backed objects.
Credentials
As shown in the Fog's documentation, to configure Google Credentials you have to them as follows:
connection = Fog::Storage.new({
:provider => 'Google',
:google_storage_access_key_id => YOUR_SECRET_ACCESS_KEY_ID,
:google_storage_secret_access_key => YOUR_SECRET_ACCESS_KEY
})
Mock
In the GitHub - Fog::Google documentation, there is also a minimal config to integrate Fog with Carrierwave.
In order to use the Cloud Storage mock, you can use the following line:
Fog.mock!
connection = Fog::Storage.new(config_hash)
Provider Specific Resources
In the provider documentation section, you will find links to provider specific documentation and examples.
Community supported providers can get assistance by filing Github Issues on the appropriate repository.
Provider
Documentation
Examples
Support Type
Google
Documentation
fog-google-examples
Community
In order to maximize the benefits of open source, you are encouraged submit bugs to Github Issues
In this GitHub example, you could find an implementation for Google Cloud Storage.
Class List
At the RubyGems documentation for fog-google, you can find the class definitions and parameters. For example, the list_objects method:
#list_objects(bucket, options = {}) ⇒ Google::Apis::StorageV1::Objects
Lists objects in a bucket matching some criteria.
Parameters:
bucket (String) — Name of bucket to list
options (Hash) (defaults to: {}) — Optional hash of options
Options Hash (options):
:delimiter (String) — Delimiter to collapse objects under to emulate a directory-like mode
:max_results (Integer) — Maximum number of results to retrieve
:page_token (String) — Token to select a particular page of results
:prefix (String) — String that an object must begin with in order to be returned
:projection ("full", "noAcl") — Set of properties to return (defaults to “noAcl”)
:versions (Boolean) — If true, lists all versions of an object as distinct results (defaults to False)
Returns:
(Google::Apis::StorageV1::Objects)
I am creating a group of users within TFS 2013 and I want to add them to the none default access level (ex. the full access group) but I noticed I am only able to do this through the web interface by adding a TFS Group under that certain level. I am wondering if there is a way to do this via the developer tool (command line) as everything I am doing is being done in a batch script.
Any input would be appreciated. Thanks!
Create 3 TFS server groups; add these groups to the different access levels (e.g. TFS_ACCESS_LEVEL_(NONE|STANDARD|FULL)). Now use the TFSSecurity commandline tool to add groups to these existing and mapped groups(tfssecurity /g+ TFS_ACCESS_LEVEL_NONE GroupYouWantToHaveThisAccessLevel). There is no other way to directly add people to the access levels, except probably through the Object Model using C#.
For the record, tfssecurity may require the URI, which can be obtained via API. This is easy to do in Powershell, here is how to create a TFS group
[psobject] $tfs = get-tfs -serverName $collection
$projectUri = ($tfs.CSS.ListAllProjects() | where { $_.Name -eq $project }).Uri
& $TFSSecurity /gc $projectUri $groupName $groupDescription /collection:$collection
Full script at TfsSecurity wrapper.
I'm working on a web project to implement some endpoints to enable CRUD operations for users, I have got the flow working and I'm able to list notebooks in my sandbox account but I can only list notes from the notebook that I choose to share publicly, is this some api keys permission related issue or am I missing something here or is this supposed to happen ? Any help is much appreciated. I am pasting the error below
/Users/mac/.rbenv/versions/2.1.0/lib/ruby/gems/2.1.0/gems/evernote-thrift-1.25.1/lib/Evernote/EDAM/note_store.rb:486:in recv_findNotesMetadata'
/Users/mac/.rbenv/versions/2.1.0/lib/ruby/gems/2.1.0/gems/evernote-thrift-1.25.1/lib/Evernote/EDAM/note_store.rb:476:infindNotesMetadata'
/Users/mac/.rbenv/versions/2.1.0/lib/ruby/gems/2.1.0/gems/evernote_oauth-0.2.3/lib/evernote_oauth/thrift_client_delegation.rb:16:in method_missing'
/Users/mac/Documents/rails/ms-core/app/api/secm.rb:1158:inblock (3 levels) in '
/Users/mac/Documents/rails/ms-core/app/api/secm.rb:1149:in each'
/Users/mac/Documents/rails/ms-core/app/api/secm.rb:1149:inblock (2 levels) in '
/Users/mac/Documents/rails/ms-core/app/api/helpers.rb:378:in `return_elegant_errors'
the code used to grab notes from notebook is
note_store ||= client.note_store
notebooks ||= note_store.listNotebooks(token[:oauth_token])
note_filter = Evernote::EDAM::NoteStore::NoteFilter.new
notesMetadataResultSpec = Evernote::EDAM::NoteStore::NotesMetadataResultSpec.new
notebook_details = Array.new()
notebookArray = Array.new()
notesMetadataResultSpec.includeTitle = true
notebooks.each do |notebook|
note_filter.notebookGuid = notebook.guid
notes_metadata = note_store.findNotesMetadata(token[:oauth_token],note_filter,0,10,notesMetadataResultSpec)
validnotes = notes_metadata.notes
validnotes.each do |note|
notebook_details << Array('noteTitle' => note.title, 'noteGuid'=>note.guid)
end
end
Thanks in advance.
Evernote API key has two permission levels. http://dev.evernote.com/doc/articles/permissions.php
If that's the case, you can ask dev support to bump up to full access. http://dev.evernote.com/support/faq.php#getsupport
Otherwise, please provide a bit more details like errors you got, code snippet, and so on.
It has been suggested on Amazon docs http://aws.amazon.com/dynamodb/ among other places, that you can backup your dynamodb tables using Elastic Map Reduce,
I have a general understanding of how this could work but I couldn't find any guides or tutorials on this,
So my question is how can I automate dynamodb backups (using EMR)?
So far, I think I need to create a "streaming" job with a map function that reads the data from dynamodb and a reduce that writes it to S3 and I believe these could be written in Python (or java or a few other languages).
Any comments, clarifications, code samples, corrections are appreciated.
With introduction of AWS Data Pipeline, with a ready made template for dynamodb to S3 backup, the easiest way is to schedule a back up in the Data Pipeline [link],
In case you have special needs (data transformation, very fine grain control ...) consider the answer by #greg
There are some good guides for working with MapReduce and DynamoDB. I followed this one the other day and got data exporting to S3 going reasonably painlessly. I think your best bet would be to create a hive script that performs the backup task, save it in an S3 bucket, then use the AWS API for your language to pragmatically spin up a new EMR job flow, complete the backup. You could set this as a cron job.
Example of a hive script exporting data from Dynamo to S3:
CREATE EXTERNAL TABLE my_table_dynamodb (
company_id string
,id string
,name string
,city string
,state string
,postal_code string)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name"="my_table","dynamodb.column.mapping" = "id:id,name:name,city:city,state:state,postal_code:postal_code");
CREATE EXTERNAL TABLE my_table_s3 (
,id string
,name string
,city string
,state string
,postal_code string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://yourBucket/backup_path/dynamo/my_table';
INSERT OVERWRITE TABLE my_table_s3
SELECT * from my_table_dynamodb;
Here is an example of a PHP script that will spin up a new EMR job flow:
$emr = new AmazonEMR();
$response = $emr->run_job_flow(
'My Test Job',
array(
"TerminationProtected" => "false",
"HadoopVersion" => "0.20.205",
"Ec2KeyName" => "my-key",
"KeepJobFlowAliveWhenNoSteps" => "false",
"InstanceGroups" => array(
array(
"Name" => "Master Instance Group",
"Market" => "ON_DEMAND",
"InstanceType" => "m1.small",
"InstanceCount" => 1,
"InstanceRole" => "MASTER",
),
array(
"Name" => "Core Instance Group",
"Market" => "ON_DEMAND",
"InstanceType" => "m1.small",
"InstanceCount" => 1,
"InstanceRole" => "CORE",
),
),
),
array(
"Name" => "My Test Job",
"AmiVersion" => "latest",
"Steps" => array(
array(
"HadoopJarStep" => array(
"Args" => array(
"s3://us-east-1.elasticmapreduce/libs/hive/hive-script",
"--base-path",
"s3://us-east-1.elasticmapreduce/libs/hive/",
"--install-hive",
"--hive-versions",
"0.7.1.3",
),
"Jar" => "s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar",
),
"Name" => "Setup Hive",
"ActionOnFailure" => "TERMINATE_JOB_FLOW",
),
array(
"HadoopJarStep" => array(
"Args" => array(
"s3://us-east-1.elasticmapreduce/libs/hive/hive-script",
"--base-path",
"s3://us-east-1.elasticmapreduce/libs/hive/",
"--hive-versions",
"0.7.1.3",
"--run-hive-script",
"--args",
"-f",
"s3n://myBucket/hive_scripts/hive_script.hql",
"-d",
"INPUT=Var_Value1",
"-d",
"LIB=Var_Value2",
"-d",
"OUTPUT=Var_Value3",
),
"Jar" => "s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar",
),
"Name" => "Run Hive Script",
"ActionOnFailure" => "CANCEL_AND_WAIT",
),
),
"LogUri" => "s3n://myBucket/logs",
)
);
}
AWS Data Pipeline is costly and the complexity of managing a templated process cannot compare to a simplicity of a CLI command you can make changes to and run on a schedule (using cron, Teamcity or your CI tool of choice)
Amazon promotes Data Pipeline as they make a profit on it. I'd say that it only really makes sense if you have a very large database (>3GB), as the performance improvement will justify it.
For small and medium databases (1GB or less) I'd recommend you use one of the many tools available, all three below can handle backup and restore processes from the command line:
dynamo-backup-to-s3 ==>
Streaming restore to S3, using NodeJS/npm
SEEK-Jobs dynamotools ==> Streaming restore to S3, using Golang
dynamodump ==> Local backup/restore using python, upload/download S3 using aws s3 cp
Bear in mind that due to bandwidth/latency issues these will always perform better from an EC2 instance than your local network.
With the introduction of DynamoDB Streams and Lambda - you should be able to take backups and incremental backups of your DynamoDB data.
You can associate your DynamoDB Stream with a Lambda Function to automatically trigger code for every data update (Ie: data to another store like S3)
A lambda function you can use to tie up with DynamoDb for incremental backups:
https://github.com/PageUpPeopleOrg/dynamodb-replicator
I've provided a detailed explanation how you can use DynamoDB Streams, Lambda and S3 versioned buckets to create incremental backups for your data in DynamoDb on my blog:
https://www.abhayachauhan.com/category/aws/dynamodb/dynamodb-backups
Edit:
As of Dec 2017, DynamoDB has released On Demand Backups/Restores. This allows you to take backups and store them natively in DynamoDB. They can be restored to a new table.
A detailed walk through is provided here, including code to schedule them:
https://www.abhayachauhan.com/2017/12/dynamodb-scheduling-on-demand-backups
HTH
You can use my simple node.js script dynamo-archive.js, which scans an entire Dynamo table and saves output to a JSON file. Then, you upload it to S3 using s3cmd.
You can use this handy dynamodump tool which is python based (uses boto) to dump the tables into JSON files. And then upload to S3 with s3cmd
I found the dynamodb-backup lambda function to be really helpful. Took me 5 minutes to setup and can easily be configured to use a Cloudwatch Schedule event (don't forget to run npm install in the beginning though).
It's also a lot cheaper for me coming from Data Pipeline (~$40 per month), I estimate the costs to be around 1.5 cents per month (both without S3 storage). Note that it backs up all DynamoDB tables at once by default, which can easily be adjusted within the code.
The only missing part is to be notified if the function fails, which the Data Pipeline was able to do.
aws data pipeline has limit regions.
It took me 2 hours to debug the template.
https://docs.aws.amazon.com/general/latest/gr/rande.html#datapipeline_region
You can now backup your DynamoDB data straight to S3 natively, without using Data Pipeline or writing custom scripts. This is probably the easiest way to achieve what you wanted because it does not require you to write any code and run any task/script because it's fully managed.
Since 2020 you can export a DynamoDB table to S3 directly in the AWS UI:
https://aws.amazon.com/blogs/aws/new-export-amazon-dynamodb-table-data-to-data-lake-amazon-s3/
You need to activate PITR (Point in Time Recovery) first. You can choose between JSON and Amazon ION format.
In the Java SDK (Version 2), you can do something like this:
// first activate PITR
PointInTimeRecoverySpecification pointInTimeRecoverySpecification = PointInTimeRecoverySpecification
.builder()
.pointInTimeRecoveryEnabled(true)
.build();
UpdateContinuousBackupsRequest updateContinuousBackupsRequest = UpdateContinuousBackupsRequest
.builder()
.tableName(myTable.getName())
.pointInTimeRecoverySpecification(pointInTimeRecoverySpecification)
.build();
UpdateContinuousBackupsResponse updateContinuousBackupsResponse;
try{
updateContinuousBackupsResponse = dynamoDbClient.updateContinuousBackups(updateContinuousBackupsRequest);
}catch(Exception e){
log.error("Point in Time Recovery Activation failed: {}",e.getMessage());
}
String updatedPointInTimeRecoveryStatus=updateContinuousBackupsResponse
.continuousBackupsDescription()
.pointInTimeRecoveryDescription()
.pointInTimeRecoveryStatus()
.toString();
log.info("Point in Time Recovery for Table {} activated: {}",myTable.getName(),
updatedPointInTimeRecoveryStatus);
// ... now get the table ARN
DescribeTableRequest describeTableRequest=DescribeTableRequest
.builder()
.tableName(myTable.getName())
.build();
DescribeTableResponse describeTableResponse = dynamoDbClient.describeTable(describeTableRequest);
String tableArn = describeTableResponse.table().tableArn();
String s3Bucket = "myBucketName";
// choose the format (JSON or ION)
ExportFormat exportFormat=ExportFormat.ION;
ExportTableToPointInTimeRequest exportTableToPointInTimeRequest=ExportTableToPointInTimeRequest
.builder()
.tableArn(tableArn)
.s3Bucket(s3Bucket)
.s3Prefix(myTable.getS3Prefix())
.exportFormat(exportFormat)
.build();
dynamoDbClient.exportTableToPointInTime(exportTableToPointInTimeRequest);
Your dynamoDbClient needs to be an instance of software.amazon.awssdk.services.dynamodb.DynamoDbClient, the DynamoDbEnhancedClient or DynamoDbEnhancedAsyncClient will not work.