I've built an application using DynamoDB Local and now I'm at the point where I want to setup on AWS. I've gone through numerous tools but have had no success finding a way to take my local DB and setup the schema and migrate data into AWS.
For example, I can get the data into a CSV format but AWS has no way to recognize that. It seems that I'm forced to create a Data Pipeline... Does anyone have a better way to do this?
Thanks in advance
As was mentioned earlier, DynamoDB local is there for testing purposes. However, you can still migrate your data if you need to. One approach would be to save data into some format, like json or csv and store it into S3, and then use something like lambdas or your own server to read from S3 and save into your new DynamoDB. As for setting up schema, You can use the same code you used to create your local table to create remote table via AWS SDK.
you can create a standalone application to get the list of tables from the local dynamoDB and create them in your AWS account after that you can get all the data for each table and save them.
I'm not sure which language you familiar with but will explain some API might help you in Java.
DynamoDB.listTables();
DynamoDB.createTable(CreateTableRequest);
example about how to create table using the above API
ProvisionedThroughput provisionedThroughput = new ProvisionedThroughput(1L, 1L);
try{
CreateTableRequest groupTableRequest = mapper.generateCreateTableRequest(Group.class); //1
groupTableRequest.setProvisionedThroughput(provisionedThroughput); //2
// groupTableRequest.getGlobalSecondaryIndexes().forEach(index -> index.setProvisionedThroughput(provisionedThroughput)); //3
Table groupTable = client.createTable(groupTableRequest); //4
groupTable.waitForActive();//5
}catch(ResourceInUseException e){
log.debug("Group table already exist");
}
1- you will create TableRequest against mapping
2- setting the provision throughput and this will vary depend on your requirements
3- if the table has global secondary index you can use this line (Optional)
4- the actual table will be created here
5- the thread will be stopped till the table become active
I didn't mention the API related to data access (insert ... etc), I supposed that you're familiar with since you already use them in local dynamodb
I did a little work setting up my local dev environment. I use SAM to create the dynamodb tables in AWS. I didn't want to do the work twice so I ended up copying the schema from AWS to my local instance. The same approach can work the other way around.
aws dynamodb describe-table --table-name chess_lobby \
| jq '.Table' \
| jq 'del(.TableArn)' \
| jq 'del(.TableSizeBytes)' \
| jq 'del(.TableStatus)' \
| jq 'del(.TableId)' \
| jq 'del(.ItemCount)' \
| jq 'del(.CreationDateTime)' \
| jq 'del(.GlobalSecondaryIndexes[].IndexSizeBytes)' \
| jq 'del(.ProvisionedThroughput.NumberOfDecreasesToday)' \
| jq 'del(.GlobalSecondaryIndexes[].IndexStatus)' \
| jq 'del(.GlobalSecondaryIndexes[].IndexArn)' \
| jq 'del(.GlobalSecondaryIndexes[].ItemCount)' \
| jq 'del(.GlobalSecondaryIndexes[].ProvisionedThroughput.NumberOfDecreasesToday)' > chess_lobby.json
aws dynamodb create-table \
--cli-input-json file://chess_lobby.json \
--endpoint-url http://localhost:8000
The top command uses describe table aws cli capabilities to get the schema json. Then I use jq to delete all unneeded keys, since create-table is strict with its parameter validation. Then I can use create-table to create the table in the local environent by using the --endpoint-url command.
You can use the --endpoint-url parameter on the top command instead to fetch your local schema and then use the create-table without the --endpoint-url parameter to create it directly in AWS.
Related
is there a way we can create or update apim named values through power-shell or azure cli.
i tried searching the Microsoft documents i was not able to get anything related.
is there any command let , im trying to create new named values or update existing named values through power-shell or azure cli
As far as a quick research told me, there is no idiomatic Azure CLI command to do this at the moment.
However, you can achieve the same result using the az rest command:
az rest \
--method put \
--uri https://management.azure.com/subscriptions/$CURRENT_SUBSCRIPTION/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.ApiManagement/service/$APIM/namedValues/$NAME?api-version=2019-12-01 \
--body '{"properties":{"displayName":"$DISPLAYNAME","value": "$VALUE","secret":$SECRET_BOOLEAN}}'
Replace the $CURRENT_SUBSCRIPTION, $RESOURCE_GROUP, $APIM, $NAME, $DISPLAYNAME, $VALUE, $SECRET_BOOLEAN with your values or PowerShell variables.
I have tables in BigQuery which I want to export and import in Datastore.
How to achieve that?
Table from BigQuery can be exported and imported to your datastore.
Download the jar file from https://github.com/yu-iskw/bigquery-to-datastore/releases
Then run the command
java -cp bigquery-to-datastore-bundled-0.5.1.jar com.github.yuiskw.beam.BigQuery2Datastore --project=yourprojectId --runner=DataflowRunner --inputBigQueryDataset=datastore --inputBigQueryTable=metainfo_internal_2 --outputDatastoreNamespace=default --outputDatastoreKind=meta_internal --keyColumn=key --indexedColumns=column1,column2 --tempLocation=gs://gsheetbackup_live/temp --gcpTempLocation=gs://gsheetlogfile_live/temp
--tempLocation and --gcpTempLocation are valid cloud storage bucket urls.
--keyColumn=key - the key here is the unique field on your big query table
2020 anwer,
use GoogleCloudPlatform/DataflowTemplates, BigQueryToDatastore
# Builds the Java project and uploads an artifact to GCS
mvn compile exec:java \
-Dexec.mainClass=com.google.cloud.teleport.templates.BigQueryToDatastore \
-Dexec.cleanupDaemonThreads=false \
-Dexec.args=" \
--project=<project-id> \
--region=<region-name> \
--stagingLocation=gs://<bucket-name>/staging \
--tempLocation=gs://<bucket-name>/temp \
--templateLocation=gs://<bucket-name>/templates/<template-name>.json \
--runner=DataflowRunner"
# Uses the GCS artifact to run the transfer job
gcloud dataflow jobs run <job-name> \
--gcs-location=<template-location> \
--zone=<zone> \
--parameters "\
readQuery=SELECT * FROM <dataset>.<table>,readIdColumn=<id>,\
invalidOutputPath=gs://your-bucket/path/to/error.txt,\
datastoreWriteProjectId=<project-id>,\
datastoreWriteNamespace=<namespace>,\
datastoreWriteEntityKind=<kind>,\
errorWritePath=gs://your-bucket/path/to/errors.txt"
I hope this will get a proper user interface in GCP Console on day! (as this is already possible for Pub/Sub to BigQuery using Dataflow SQL)
You may export BigQuery data to CSV, then import CSV into Datastore. The first step is easy and well documented https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_stored_in_bigquery. For the second step, there are many resources that help you achieve that. For example,
https://groups.google.com/forum/#!topic/google-appengine/L64wByP7GAY
Import CSV into google cloud datastore
As I am working on network security project. I need to create private lookup table for individual users, such that any other user shouldn't see the content of other users Lookup table.
I have created Lookup table by:
curl -k -u username:pwd https://localhost:8089/servicesNS/nobody/*appname*/data/lookup-table-files -d 'eai:data=/opt/splunk/var/run/splunk/lookup_tmp/april.csv' -d 'name=12_april_lookup.csv'
This created 12_april_lookup.csv file inside .../my_app/lookup/ folder. This Lookup table permission is private at this point.
But,
When I add some data to Lookup table by below search command:
| makeresults | eval name="xyz" | eval token="12345"| outputlookup 12_april_lookup.csv append=True createinapp=True
then file will get created in other app folder with become global permission. Now all user can view file content by
|inputlookup 12_april_lookup.csv
Need to run below command with same app search section:
As this command was running on global app level, so file was created at global level with global permission.
In splunk every app has search section. Based on which app search section file will be created in that app lookup folder.
Make sure every search we do in splunk, You are in correct app section.
| makeresults | eval name="xyz" | eval token="12345"| outputlookup 12_april_lookup.csv append=True createinapp=True
I have a dozen of buckets that I would like to remove in AWS S3, all having a similar name containing bucket-to-remove and with some objects in it.
Using the UI is quite slow, is there a solution to remove all these buckets quickly using the CLI?
You could try this sample line to delete all at once. Remember that this is highly destructive, so I hope you know what you are doing:
for bucket in $(aws s3 ls | awk '{print $3}' | grep my-bucket-pattern); do aws s3 rb "s3://${bucket}" --force ; done
You are done with that. May take a while depending on the amount of buckets and their content.
I did this
aws s3api list-buckets \
--query 'Buckets[?starts_with(Name, `bucket-pattern `) == `true`].[Name]' \
--output text | xargs -I {} aws s3 rb s3://{} --force
Then update the bucket pattern as needed.
Be careful though, this is a pretty dangerous operation.
The absolute easiest way to bulk delete S3 buckets is to not write any code at all. I use Cyberduck to browse my S3 account and delete buckets and their contents quite easily.
Using boto3 you cannot delete buckets that have objects in it thus you first need to remove the objects before deleting the bucket. The easiest solution is a simple Python script such as:
import boto3
import botocore
import json
s3_client = boto3.client(
"s3",
aws_access_key_id="<your key id>",
aws_secret_access_key="<your secret access key>"
)
response = s3_client.list_buckets()
for bucket in response["Buckets"]:
# Only removes the buckets with the name you want.
if "bucket-to-remove" in bucket["Name"]:
s3_objects = s3_client.list_objects_v2(Bucket=bucket["Name"])
# Deletes the objects in the bucket before deleting the bucket.
if "Contents" in s3_objects:
for s3_obj in s3_objects["Contents"]:
rm_obj = s3_client.delete_object(
Bucket=bucket["Name"], Key=s3_obj["Key"])
print(rm_obj)
rm_bucket = s3_client.delete_bucket(Bucket=bucket["Name"])
print(rm_bucket)
Here is a windows solution.
First test the filter before you delete
aws s3 ls ^| findstr "<search here>"
and then execute
for /f "tokens=3" %a in ('<copy the correct command between the quotes>') do aws s3 rb s3://%a --force
According to the S3 docs you can remove a bucket using the CLI command aws s3 rb only if the bucket does not have versioning enabled. If that's the case you can write a simple bash script to get the bucket names and delete them one by one, like:
#!/bin/bash
# get buckets list => returns the timestamp + bucket name separated by lines
S3LS="$(aws s3 ls | grep 'bucket-name-pattern')"
# split the lines into an array. #see https://stackoverflow.com/a/13196466/6569593
oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1}
lines=( $S3LS )
IFS="$oldIFS"
for line in "${lines[#]}"
do
BUCKET_NAME=${line:20:${#line}} # remove timestamp
aws s3 rb "s3://${BUCKET_NAME}" --force
done
Be careful to don't remove important buckets! I recommend to output each bucket name before actually remove them. Also be aware that the aws s3 rb command takes a while to run, because it recursively deletes all the objects inside the bucket.
For deleting all s3 buckets in you account use below technique, It's work very well using local
Step 1 :- export your profile using below command Or you can export access_key and secrete_access_key locally as well
export AWS_PROFILE=<Your-Profile-Name>
Step 2:- Use below python code, Run it on local and see your all s3 buckets will delete.
import boto3
client = boto3.client('s3', Region='us-east-2')
response = client.list_buckets()
for bucket in response['Buckets']:
s3 = boto3.resource('s3')
s3_bucket = s3.Bucket(bucket['Name'])
bucket_versioning = s3.BucketVersioning(bucket['Name'])
if bucket_versioning.status == 'Enabled':
s3_bucket.object_versions.delete()
else:
s3_bucket.objects.all().delete()
response = client.delete_bucket(Bucket=bucket['Name'])
If you see error like boto3 not found please go to link and install it
Install boto3 using pip
I have used lambda for deleting buckets with the specified prefix.
It will delete all the objects regardless versioning is enabled or not.
Note that: You should give appropriate S3 access to your lambda.
import boto3
s3_client = boto3.client('s3')
s3 = boto3.resource('s3')
def lambda_handler(event, context):
bucket_prefix = "your prefix"
response = s3_client.list_buckets()
for bucket in response["Buckets"]:
# Only removes the buckets with the name you want.
if bucket_prefix in bucket["Name"]:
s3_bucket = s3.Bucket(bucket['Name'])
bucket_versioning = s3.BucketVersioning(bucket['Name'])
if bucket_versioning.status == 'Enabled':
s3_bucket.object_versions.delete()
else:
s3_bucket.objects.all().delete()
response = s3_client.delete_bucket(Bucket=bucket['Name'])
return {
'message' : f"delete buckets with prefix {bucket_prefix} was successfull"
}
if you're using PowerShell, this will work:
Foreach($x in (aws s3api list-buckets --query
'Buckets[?starts_with(Name, `name-pattern`) ==
`true`].[Name]' --output text))
{aws s3 rb s3://$x --force}
The best option that I find is to use the Cyberduck. You can select all the buckets from the GUI and delete them. I provide a screenshot for how to do it.
I have a bunch of instances running in GCE. I want to programmatically get a list of the internal IP addresses of them without logging into the instances (locally).
I know I can run:
gcloud compute instances list
But are there any flags I can pass to just get the information I want?
e.g.
gcloud compute instances list --internal-ips
or similar? Or am I going to have to dust off my sed/awk brain and parse the output?
I also know that I can get the output in JSON using --format=json, but I'm trying to do this in a bash script.
The simplest way to programmatically get a list of internal IPs (or external IPs) without a dependency on any tools other than gcloud is:
$ gcloud --format="value(networkInterfaces[0].networkIP)" compute instances list
$ gcloud --format="value(networkInterfaces[0].accessConfigs[0].natIP)" compute instances list
This uses --format=value which also requires a projection which is a list of resource keys that select resource data values. For any command you can use --format=flattened to get the list of resource key/value pairs:
$ gcloud --format=flattened compute instances list
A few things here.
First gcloud's default output format for listing is not guaranteed to be stable, and new columns may be added in the future. Don't script against this!
The three output modes are three output modes that are accessible with the format flag, --format=json, --format=yaml, and format=text, are based on key=value pairs and can scripted against even if new fields are introduced in the future.
Two good ways to do what you want are to use JSON and the jq tool,
gcloud compute instances list --format=json \
| jq '.[].networkInterfaces[].networkIP'
or text format and grep + line-oriented using tools,
gcloud compute instances list --format=text \
| grep '^networkInterfaces\[[0-9]\+\]\.networkIP:' | sed 's/^.* //g'
I hunted around and couldn't find a straight answer, probably because efficient tools weren't available when others replied to the original question. GCP constantly updates their libraries & APIs and we can use the filter and projections to extract targeted attributes.
Here I outline how to reserve an external static IP, see how it's attributes are named & organised, and then export the external IP address so that I can use it in other scripts (e.g. assign this to a VM instance or authorise this network (IP address) on a Cloud SQL instance.
Reserve a static IP in a region of your choice
gcloud compute --project=[PROJECT] addresses create [NAME] --region=[REGION]
[Informational] View the details of the regional static IP that was reserved
gcloud compute addresses describe [NAME] --region [REGION] --format=flattened
[Informational] List the attributes of the static IP in the form of key-value pairs
gcloud compute addresses describe [NAME] --region [REGION] --format='value(address)'
Extract the desired value (e.g. external IP address) as a parameter
export STATIC_IP=$(gcloud compute addresses describe [NAME] --region [REGION] --format='value(address)’)
Use the exported parameter in other scripts
echo $STATIC_IP
The best possible way would be to have readymade gcloud command use the same as and when needed.
This can be achieved using table() format option with gcloud as per below:
gcloud compute instances list --format='table(id,name,status,zone,networkInterfaces[0].networkIP :label=Internal_IP,networkInterfaces[0].accessConfigs[0].natIP :label=External_IP)'
What does it do for you?
Get you data in clean format
Give you option to add or remove columns
Need additional columns? How to find column name even before you run the above command?
Execute the following, which will give you data in raw JSON format consisting value and its name, copy those names and add them into your table() list. :-)
gcloud compute instances list --format=json
Plus Point: This is pretty much same syntax you can tweak with any GCP resources data to fetch including with gcloud, kubectl etc.
As far as I know you can't filter on specific fields in the gcloud tool.
Something like this will work for a Bash script, but it still feels a bit brittle:
gcloud compute instances list --format=yaml | grep " networkIP:" | cut -c 14-100
I agree with #Christiaan. Currently there is no automated way to get the internal IPs using the gcloud command.
You can use the following command to print the internal IPs (4th column):
gcloud compute instances list | tail -n+2 | awk '{print $4}'
or the following one if you want to have the pair <instance_name> <internal_ip> (1st and 4th column)
gcloud compute instances list | tail -n+2 | awk '{print $1, $4}'
I hope it helps.