How to detach or remove a data catalog tag from a BigQuery table using gcloud command - google-bigquery

Could anyone please share ETL tag template in GCP data catalog?
I'd like to refresh a tag value with its ETL status every time a BigQuery table is updated. I'm trying to use gcloud commands to create a tag template. I need to remove the tag from the template using the gcloud command and add another tag to that template, so that I can maintain the ETL status through automation.
I am able to remove the tag through UI manually. I need corresponding gcloud command for the same.

The procedure is explained in the Data Catalog documentation:
Suppose that you have the following tag template created:
gcloud data-catalog tag-templates create demo_template \
--location=us-central1 \
--display-name="Demo Tag Template" \
--field=id=source,display-name="Source of data asset",type=string,required=TRUE \
--field=id=num_rows,display-name="Number of rows in the data asset",type=double \
--field=id=has_pii,display-name="Has PII",type=bool \
--field=id=pii_type,display-name="PII type",type='enum(EMAIL_ADDRESS|US_SOCIAL_SECURITY_NUMBER|NONE)'
You need to lookup for Data Catalog entry created for your BigQuery table:
ENTRY_NAME=$(gcloud data-catalog entries lookup '//bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET/tables/TABLE' --format="value(name)")
Once you have entry name you can:
Create a tag if it wasn't created on the entry or if it was deleted earlier:
cat > tag_file.json << EOF
{
"source": "BigQuery",
"num_rows": 1000,
"has_pii": true,
"pii_type": "EMAIL_ADDRESS"
}
EOF
gcloud data-catalog tags create --entry=${ENTRY_NAME} --tag-template=demo_template --tag-template-location=us-central1 --tag-file=tag_file.json
The command returns (among others) tag name that can be used with update or delete.
Delete a tag from entry:
gcloud data-catalog tags delete TAG_NAME
Update a tag, so that you don't have to delete it and recreate:
gcloud data-catalog tags update --entry=${ENTRY_NAME} --tag-template=demo_template --tag-template-location=us-central1 --tag-file=tag_file.json
If you lost your tag name use list command:
gcloud data-catalog tags list --entry=${ENTRY_NAME}

Related

APIM named values create/update through powershell/cli

is there a way we can create or update apim named values through power-shell or azure cli.
i tried searching the Microsoft documents i was not able to get anything related.
is there any command let , im trying to create new named values or update existing named values through power-shell or azure cli
As far as a quick research told me, there is no idiomatic Azure CLI command to do this at the moment.
However, you can achieve the same result using the az rest command:
az rest \
--method put \
--uri https://management.azure.com/subscriptions/$CURRENT_SUBSCRIPTION/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.ApiManagement/service/$APIM/namedValues/$NAME?api-version=2019-12-01 \
--body '{"properties":{"displayName":"$DISPLAYNAME","value": "$VALUE","secret":$SECRET_BOOLEAN}}'
Replace the $CURRENT_SUBSCRIPTION, $RESOURCE_GROUP, $APIM, $NAME, $DISPLAYNAME, $VALUE, $SECRET_BOOLEAN with your values or PowerShell variables.

Copy table structure alone in Bigquery

In Google's Big query, is there a way to clone (copy the structure alone) a table without data?
bq cp doesn't seem to have an option to copy structure without data.
And Create table as Select (CTAS) with filter such as "1=2" does create the table without data. But, it doesn't copy the partitioning/clustering properties.
BigQuery now supports CREATE TABLE LIKE explicitly for this purpose.
See documentation linked below:
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_table_like
You can use DDL and limit 0, but you need to express partitioning and clustering in the query as well
#standardSQL
CREATE TABLE mydataset.myclusteredtable
PARTITION BY DATE(timestamp)
CLUSTER BY
customer_id
AS SELECT * FROM mydataset.myothertable LIMIT 0
If you want to clone structure of table along with partitioning/clustering properties w/o having need in knowing what exactly those partitioning/clustering properties - follow below steps:
Step 1: just copy your_table to new table - let's say your_table_copy. This will obviously copy whole table including all properties (including such like descriptions, partition's expiration etc. - which is very simple to miss if you will try to set them manually) and data. Note: copy is cost free operation
Step 2: To get rid of data in newly created table - run below query statement
SELECT * FROM `project.dataset.your_table_copy` LIMIT 0
while running above make sure you set project.dataset.your_table_copy as destination table with 'Overwrite Table' as 'Write Preference'. Note: this is also cost free step (because of LIMIT 0)
You can easily do both above steps from within Web UI or Command Line or API or any client of your choice - whatever you are most comfortable with
This is possible with the BQ CLI.
First download the schema of the existing table:
bq show --format=prettyjson project:dataset.table | jq '.schema.fields' > table.json
Then, create a new table with the provided schema and required partitioning:
bq mk \
--time_partitioning_type=DAY \
--time_partitioning_field date_field \
--require_partition_filter \
--table dataset.tablename \
table.json
See more info on bq mk options: https://cloud.google.com/bigquery/docs/tables
Install jq with: npm install node-jq
You can use BigQuery API to run a select, as you suggested, which will return an empty result and set the partition and cluster fields.
This is an example (Only partition but cluster works as well)
curl --request POST \
'https://www.googleapis.com/bigquery/v2/projects/myProject/jobs' \
--header 'Authorization: Bearer [YOUR_BEARER_TOKEN]' \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--data '{"configuration":{"query":{"query":"SELECT * FROM `Project.dataset.audit` WHERE 1 = 2","timePartitioning":{"type":"DAY"},"destinationTable":{"datasetId":"datasetId","projectId":"projectId","tableId":"test"},"useLegacySql":false}}}' \
--compressed
Result
Finally, I went with below python script to detect the schema/partitioning/clustering properties to re-create(clone) the clustered table without data. I hope we get an out of the box feature from bigquery to clone a table structure without the need for a script such as this.
import commands
import json
BQ_EXPORT_SCHEMA = "bq show --schema --format=prettyjson %project%:%dataset%.%table% > %path_to_schema%"
BQ_SHOW_TABLE_DEF="bq show --format=prettyjson %project%:%dataset%.%table%"
BQ_MK_TABLE = "bq mk --table --time_partitioning_type=%partition_type% %optional_time_partition_field% --clustering_fields %clustering_fields% %project%:%dataset%.%table% ./%cluster_json_file%"
def create_table_with_cluster(bq_project, bq_dataset, source_table, target_table):
cmd = BQ_EXPORT_SCHEMA.replace('%project%', bq_project)\
.replace('%dataset%', bq_dataset)\
.replace('%table%', source_table)\
.replace('%path_to_schema%', source_table)
commands.getstatusoutput(cmd)
cmd = BQ_SHOW_TABLE_DEF.replace('%project%', bq_project)\
.replace('%dataset%', bq_dataset)\
.replace('%table%', source_table)
(return_value, output) = commands.getstatusoutput(cmd)
bq_result = json.loads(output)
clustering_fields = bq_result["clustering"]["fields"]
time_partitioning = bq_result["timePartitioning"]
time_partitioning_type = time_partitioning["type"]
time_partitioning_field = ""
if "field" in time_partitioning:
time_partitioning_field = "--time_partitioning_field " + time_partitioning["field"]
clustering_fields_list = ",".join(str(x) for x in clustering_fields)
cmd = BQ_MK_TABLE.replace('%project%', bq_project)\
.replace('%dataset%', bq_dataset)\
.replace('%table%', target_table)\
.replace('%cluster_json_file%', source_table)\
.replace('%clustering_fields%', clustering_fields_list)\
.replace('%partition_type%', time_partitioning_type)\
.replace('%optional_time_partition_field%', time_partitioning_field)
commands.getstatusoutput(cmd)
create_table_with_cluster('test_project', 'test_dataset', 'source_table', 'target_table')

In splunk, how to create Private Lookup table for individual?

As I am working on network security project. I need to create private lookup table for individual users, such that any other user shouldn't see the content of other users Lookup table.
I have created Lookup table by:
curl -k -u username:pwd https://localhost:8089/servicesNS/nobody/*appname*/data/lookup-table-files -d 'eai:data=/opt/splunk/var/run/splunk/lookup_tmp/april.csv' -d 'name=12_april_lookup.csv'
This created 12_april_lookup.csv file inside .../my_app/lookup/ folder. This Lookup table permission is private at this point.
But,
When I add some data to Lookup table by below search command:
| makeresults | eval name="xyz" | eval token="12345"| outputlookup 12_april_lookup.csv append=True createinapp=True
then file will get created in other app folder with become global permission. Now all user can view file content by
|inputlookup 12_april_lookup.csv
Need to run below command with same app search section:
As this command was running on global app level, so file was created at global level with global permission.
In splunk every app has search section. Based on which app search section file will be created in that app lookup folder.
Make sure every search we do in splunk, You are in correct app section.
| makeresults | eval name="xyz" | eval token="12345"| outputlookup 12_april_lookup.csv append=True createinapp=True

DynamoDB Local to DynamoDB AWS

I've built an application using DynamoDB Local and now I'm at the point where I want to setup on AWS. I've gone through numerous tools but have had no success finding a way to take my local DB and setup the schema and migrate data into AWS.
For example, I can get the data into a CSV format but AWS has no way to recognize that. It seems that I'm forced to create a Data Pipeline... Does anyone have a better way to do this?
Thanks in advance
As was mentioned earlier, DynamoDB local is there for testing purposes. However, you can still migrate your data if you need to. One approach would be to save data into some format, like json or csv and store it into S3, and then use something like lambdas or your own server to read from S3 and save into your new DynamoDB. As for setting up schema, You can use the same code you used to create your local table to create remote table via AWS SDK.
you can create a standalone application to get the list of tables from the local dynamoDB and create them in your AWS account after that you can get all the data for each table and save them.
I'm not sure which language you familiar with but will explain some API might help you in Java.
DynamoDB.listTables();
DynamoDB.createTable(CreateTableRequest);
example about how to create table using the above API
ProvisionedThroughput provisionedThroughput = new ProvisionedThroughput(1L, 1L);
try{
CreateTableRequest groupTableRequest = mapper.generateCreateTableRequest(Group.class); //1
groupTableRequest.setProvisionedThroughput(provisionedThroughput); //2
// groupTableRequest.getGlobalSecondaryIndexes().forEach(index -> index.setProvisionedThroughput(provisionedThroughput)); //3
Table groupTable = client.createTable(groupTableRequest); //4
groupTable.waitForActive();//5
}catch(ResourceInUseException e){
log.debug("Group table already exist");
}
1- you will create TableRequest against mapping
2- setting the provision throughput and this will vary depend on your requirements
3- if the table has global secondary index you can use this line (Optional)
4- the actual table will be created here
5- the thread will be stopped till the table become active
I didn't mention the API related to data access (insert ... etc), I supposed that you're familiar with since you already use them in local dynamodb
I did a little work setting up my local dev environment. I use SAM to create the dynamodb tables in AWS. I didn't want to do the work twice so I ended up copying the schema from AWS to my local instance. The same approach can work the other way around.
aws dynamodb describe-table --table-name chess_lobby \
| jq '.Table' \
| jq 'del(.TableArn)' \
| jq 'del(.TableSizeBytes)' \
| jq 'del(.TableStatus)' \
| jq 'del(.TableId)' \
| jq 'del(.ItemCount)' \
| jq 'del(.CreationDateTime)' \
| jq 'del(.GlobalSecondaryIndexes[].IndexSizeBytes)' \
| jq 'del(.ProvisionedThroughput.NumberOfDecreasesToday)' \
| jq 'del(.GlobalSecondaryIndexes[].IndexStatus)' \
| jq 'del(.GlobalSecondaryIndexes[].IndexArn)' \
| jq 'del(.GlobalSecondaryIndexes[].ItemCount)' \
| jq 'del(.GlobalSecondaryIndexes[].ProvisionedThroughput.NumberOfDecreasesToday)' > chess_lobby.json
aws dynamodb create-table \
--cli-input-json file://chess_lobby.json \
--endpoint-url http://localhost:8000
The top command uses describe table aws cli capabilities to get the schema json. Then I use jq to delete all unneeded keys, since create-table is strict with its parameter validation. Then I can use create-table to create the table in the local environent by using the --endpoint-url command.
You can use the --endpoint-url parameter on the top command instead to fetch your local schema and then use the create-table without the --endpoint-url parameter to create it directly in AWS.

Using documentParser function in Teradata Aster

I'm working with Teradata's Aster and am trying to parse a pdf(or html) file such that it is inserted into a table in the Beehive database in Aster. The entire pdf should correspond to a single row of data in the table.
This is to be done by using one of Aster's SQL-MR functions called documentParser. This will produce a text file(.rtf) containing a single row produced by parsing all the chapters from the pdf file, which would be then loaded into the table in Beehive.
I have been given this script that shows the use of documentParser and other steps involved in this parsing process -
/* SHELL INSTRUCTIONS */
--transform file in b64 (change file names to your relevant file)
base64 pp.pdf>pp.b64
--prepare a loadfile
rm my_load_file.txt
-- get the content of the file
var=$(cat pp.b64)
-- put in file
echo \""pp.b64"\"","\""$var"\" >> "my_load_file.txt"
-- create staging table
act -U db_superuser -w db_superuser -d beehive -c "drop table if exists public.cf_load_file;"
act -U db_superuser -w db_superuser -d beehive -c "create dimension table public.cf_load_file(file_name varchar, content varchar);"
-- load into staging table
ncluster_loader -U db_superuser -w db_superuser -d beehive --csv --verbose public.cf_load_file my_load_file.txt
-- use document parser to load the clean text (you will need to create the table beforehand)
act -U db_superuser -w db_superuser -d beehive -c "INSERT INTO got_data.cf_got_text_data (file_name, content) SELECT * FROM documentParser (ON public.cf_load_file documentCol ('content') mode ('text'));"
--done
However, I am stuck on the last step of the script because it looks like there is no function called documentParser in the list of functions that are available in Aster. This is the error I get -
ERROR: function "documentparser" does not exist
I tried to search for this function several times with the command \dF, but did not get any match.
I've attached a picture which present the gist of what I'm trying to do.
SQL-MR Document Parser
I would appreciate any help if any one has any experience with this.
What happened is that someone told you about this function documentParser but never gave you the function archive file (documentParser.zip) to install in Aster. This function does exist but it's not part of the official Aster Analytics Foundation (AAF). Please contact person who gave you this info for help.
documentParser belongs to so-called field functions that are developed and used by the Aster field team only. Not that you can't use it, but don't expect support to help you - only whoever gave you access to it.
If you don't have any contacts then next course of action I'd suggest to go to Aster Community Network and ask question about it there.