Powershell command to retrieve storage account resource id, blob resourceid from Storage account V2 - azure-storage

I want to retrieve all components resource id from Storage account using powershell command.
Storage Account Resource ID
Blob service-Resource ID
File service-Resource ID
Queue service-Resource ID
Table service-Resource ID
I tried with below command for blob service, but resource id is not available in the property
Get-AzStorageBlobServiceProperty -ResourceGroupName "rg-eval" -AccountName "dlseval"
How can I retrieve the Resource Id's dynamically?

AFAIK, There is no direct way to get all the ID that you are looking for, but once you get Storage Account Resource ID, you can combine other service ID yourself, they are fixed, just try the code below:
$account = Get-AzStorageAccount -ResourceGroupName '<Group Namw>' -Name '<account Name>'
$accountResID = $account.ID
$blobResID = $account.ID + "/blobServices/default"
$fileResID = $account.ID + "/fileServices/default"
$queueResID = $account.ID + "/queueServices/default"
$tableResID = $account.ID + "/tableServices/default"

Related

Azure Data Factory - How to find total count of objects with more than 1 files with same "prefix" in an ADF expression?

Let's say have bunch of random sample files in a Blob which I want to copy into datalake as .parquet using ADF copy.
abc.1.txt,
abc.2.txt,
abc.3.txt,
def.1.txt,
ghi.1.txt,
xyz.1.txt,
xyz.2.txt
All abc & xyz object files should be merged/appended into their respective single .parquet file and remaining def, ghi as its individual .parquet file in data lake.
Need the output something like:
[
{
name: abc
count: 3
},
name: def
count: 1
},
name: ghi
count: 1
},
name: xyz
count: 2
}
]
Pipeline flow would look something similar:
GetMetadata -->Filter if only 1 file -->Run ForEach file -->Copy activity(without merge)
GetMetadata -->SetVariable-->Filter if >1 file -->Run ForEach file -->Copy (with merge)
However, how do I get the count() of total files with same prefix in the Filter activity ?
A quick thought,
You could get the file details in get meta data activity and push that to a SQL table and do a group by there and return count.
And loop over the result set in foreach and use if condition to check the count.
Here is what I'm doing.
My data lake:
Firstly, get the childitems in getmetadata.
I'm then writing it to a string variable (optional)
From the string variable write to a text file
From the text file to Azure SQL server table.
Use the below query in the script activity
select count(1), fileprefix from( select
substring(name,0,charindex('.',name)-1) fileprefix,type from
[dbo].[temptest] CROSS APPLY OPENJSON(json) WITH(
name varchar(200)
, type varchar(60)
) as my_json_array)a group by fileprefix
My script output:
Thanks.

Is it possible to change the delimiter of AWS athena output file

Here is my sample code where I create a file in S3 bucket using AWS Athena. The file by default is in csv format. Is there a way to change it to pipe delimiter ?
import json
import boto3
def lambda_handler(event, context):
s3 = boto3.client('s3')
client = boto3.client('athena')
# Start Query Execution
response = client.start_query_execution(
QueryString="""
select * from srvgrp
where category_code = 'ACOMNCDU'
""",
QueryExecutionContext={
'Database': 'tmp_db'
},
ResultConfiguration={
'OutputLocation': 's3://tmp-results/athena/'
}
)
queryId = response['QueryExecutionId']
print('Query id is :' + str(queryId))
There is a way to do that with CTAS query.
BUT:
This is a hacky way and not what CTAS queries are supposed to be used for, since it will also create a new table definition in AWS Glue Data Catalog.
I'm not sure about performance
CREATE TABLE "UNIQU_PREFIX__new_table"
WITH (
format = 'TEXTFILE',
external_location = 's3://tmp-results/athena/__SOMETHING_UNIQUE__',
field_delimiter = '|',
bucketed_by = ARRAY['__SOME_COLUMN__'],
bucket_count = 1
) AS
SELECT *
FROM srvgrp
WHERE category_code = 'ACOMNCDU'
Note:
It is important to set bucket_count = 1, otherwise Athena will create multiple files.
Name of the table in CREATE_TABLE ... also should be unique, e.g. use timestamp prefix/suffix which you can inject during python runtime.
External location should be unique, e.g. use timestamp prefix/suffix which you can inject during python runtime. I would advise to embed table name into S3 path.
You need to include in bucketed_by only one of the columns from SELECT.
At some point you would need to clean up AWS Glue Data Catalog from all table defintions that were created in such way

BigQuery validator malfunction

I'm using the BigQuery web UI and sometimes the validator will display the green tick to show that all is good to go but then the query does not execute even while the validator has approved the current query. Other times it just seems to continuously keep thinking about it while never validating anything and then I don't know how to proceed. I'm very reliant on it as I'm new to SQL so I'm getting stuck a lot when it malfunctions.
I tried to delete the table and recreating it but it just gives me a blank error. Screenshot below.
Can this be caused by bad latency to US servers? I just tested my internet speed and it's looking very good but I am in South Africa. If this is the case, what would the work around be?
The error simply says "Cannot run query" as in attached screenshots. The query looks like this now:
SELECT *,
CASE
WHEN STORE = 'Somerset Mall' THEN 'Somerset'
WHEN STORE = 'Pavilion 8ta Flagship' THEN 'Pavilion'
WHEN STORE = 'N1 City' THEN 'N1'
WHEN STORE = 'GALLIERIA' THEN 'Galleria'
WHEN STORE = 'KWADUKUZA' THEN 'Stanger'
WHEN STORE = 'Çape Town' THEN 'ÇBD'
WHEN STORE = 'Walmer Park' THEN 'Walmer'
WHEN STORE = 'Canal Walk' THEN 'Canal Walk'
WHEN STORE = 'Cape Gate' THEN 'Çape Gate'
WHEN STORE = 'CAVENDISH' THEN 'Cavendish'
WHEN STORE = 'Kenilworth' THEN 'Kenilworth'
WHEN STORE = 'Table View' THEN 'Table View'
WHEN STORE = 'Old Mutual Pinelands' THEN 'Old Mutual'
WHEN STORE = 'Sea Point' THEN 'Sea Point'
WHEN STORE = 'Knysna' THEN 'Knysna'
WHEN STORE = 'George' THEN 'George'
WHEN STORE = 'Mossel Bay' THEN 'Mossel Bay'
WHEN STORE = 'Hermanus' THEN 'Hermanus'
WHEN STORE = 'Mitchells Plain' THEN 'Mitchells Plain'
WHEN STORE = 'Stellenbosch' THEN 'Stellenbosch'
WHEN STORE = 'Tygervalley' THEN 'Tygervalley'
WHEN STORE = 'Worcester' THEN 'Worcester'
WHEN STORE = 'Gateway' THEN 'Gateway'
WHEN STORE = 'Musgrave' THEN 'Musgrave'
WHEN STORE = 'Pietermaritzburg' THEN 'Pietermaritzburg'
WHEN STORE = 'Richards Bay' THEN 'Richards Bay'
WHEN STORE = 'ETHEKWENI' THEN 'eThekwini'
WHEN STORE = 'Bluff' THEN 'Bluff'
WHEN STORE = 'Chatsworth' THEN 'Chatsworth'
WHEN STORE = 'Ballito' THEN 'Ballito'
WHEN STORE = 'Hemmingways 8ta Flagship' THEN 'Hemmingways'
WHEN STORE = 'Baywest' THEN 'Baywest'
WHEN STORE = 'Greenacres' THEN 'Bridge'
WHEN STORE = 'Vincent Park' THEN 'Vincent Park'
WHEN STORE = 'Bloemfontein' THEN 'Bloemfontein'
WHEN STORE = 'Welkom' THEN 'Welkom'
WHEN STORE = 'Kimberley' THEN 'Kimberley'
ELSE 'NEW QMAN STORE?'
END AS STORE_NAME
FROM `tester-253410.test1.Qman_data`
blank error
screenshot of a failed query (bottom left) while validator is green
screenshot of query beginning
I do not think that delete the table fix the issue.
Seems an issue with the BigQuery UI or with your browser, try to Clear cache & cookies.
My suggestion is to try with the command-line tool thru Cloud Shell running interactive and batch query jobs using the CLI setting the --dry_run flag (If set, do not run this job. A valid query will return a mostly empty response with some processing statistics, while an invalid query will return the same error it would if it was not a dry run) to validate your queries.
For example:
bq query \
--use_legacy_sql=false \
--dry_run \
'SELECT
COUNTRY,
AIRPORT,
IATA
FROM
`project_id`.dataset.airports
LIMIT
1000'
Returning:
Query successfully validated. Assuming the tables are not modified, running this query will process 122 bytes of data.

Create View that will extract metadata information about dataset and table sizes in different environments

We need to monitor table sizes in different environments.
Use Google metadata API to get the information for a given Project/Environment.
Need to create a view which will provide
1. What are all the datasets
2. What tables in each dataset
3. Table sizes
4. Dataset size
BigQuery has such views for you already built-in: INFORMATION_SCHEMA is a series of views that provide access to metadata about datasets, tables, and views
For example, below returns metadata for all datasets in the default project
SELECT * FROM INFORMATION_SCHEMA.SCHEMATA
or
for my_project
SELECT * FROM my_project.INFORMATION_SCHEMA.SCHEMATA
There are other such views for tables also
In addition, there is a meta table that can be used to get more info about tables in given dataset: __TABLES__SUMMARY and __TABLES__
SELECT * FROM `project.dataset.__TABLES__`
For example:
SELECT table_id,
DATE(TIMESTAMP_MILLIS(creation_time)) AS creation_date,
DATE(TIMESTAMP_MILLIS(last_modified_time)) AS last_modified_date,
row_count,
size_bytes,
CASE
WHEN type = 1 THEN 'table'
WHEN type = 2 THEN 'view'
WHEN type = 3 THEN 'external'
ELSE '?'
END AS type,
TIMESTAMP_MILLIS(creation_time) AS creation_time,
TIMESTAMP_MILLIS(last_modified_time) AS last_modified_time,
dataset_id,
project_id
FROM `project.dataset.__TABLES__`
In order to automatize the query to check for every dataset in the project instead of adding them manually with UNION ALL, you can follow the advice given by #ZinkyZinky here and create a query that generates the UNION ALL calls for every dataset.__TABLES_. I have not managed to use this solution fully automatically in BigQuery because I don’t find a way to execute a command generated as a string (That is what string_agg is creating). Anyhow, I have managed to develop the solution in Python, adding the generated string in the next query. You can find the code below. It also creates a new table and stores the results there:
from google.cloud import bigquery
client = bigquery.Client()
project_id = "wave27-sellbytel-bobeda"
# Construct a full Dataset object to send to the API.
dataset_id = "project_info"
dataset = bigquery.Dataset(".".join([project_id, dataset_id]))
dataset.location = "US"
# Send the dataset to the API for creation.
# Raises google.api_core.exceptions.Conflict if the Dataset already
# exists within the project.
dataset = client.create_dataset(dataset) # API request
print("Created dataset {}.{}".format(client.project, dataset.dataset_id))
schema = [
bigquery.SchemaField("dataset_id", "STRING", mode="REQUIRED"),
bigquery.SchemaField("table_id", "STRING", mode="REQUIRED"),
bigquery.SchemaField("size_bytes", "INTEGER", mode="REQUIRED"),
]
table_id = "table_info"
table = bigquery.Table(".".join([project_id, dataset_id, table_id]), schema=schema)
table = client.create_table(table) # API request
print(
"Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)
job_config = bigquery.QueryJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table(table_id)
job_config.destination = table_ref
# QUERIES
# 1. Creating the UNION ALL list with the table information of each dataset
query = (
r"SELECT string_agg(concat('SELECT * from `', schema_name, '.__TABLES__` '), 'union all \n') "
r"from INFORMATION_SCHEMA.SCHEMATA"
)
query_job = client.query(query, location="US") # API request - starts the query
select_tables_from_all_datasets = ""
for row in query_job:
select_tables_from_all_datasets += row[0]
# 2. Using the before mentioned list to create a table.
query = (
"WITH ALL__TABLES__ AS ({})"
"SELECT dataset_id, table_id, size_bytes FROM ALL__TABLES__;".format(select_tables_from_all_datasets)
)
query_job = client.query(query, location="US", job_config=job_config) # job_config configures in which table the results will be stored.
for row in query_job:
print row
print('Query results loaded to table {}'.format(table_ref.path))

how can I get table_name , report_name and universe_name from sdk?

I want to get table_name , report_name and universe_name from sdk,
it it possible with java sdk?
I can get query like this:
IInfoObjects infoObjectsUniverse2;
IInfoStore iStore2;
IEnterpriseSession es2=null;
try {
es2 = CrystalEnterprise.getSessionMgr().logon( user, password, CMSName, cmsAuthType);
//session.setAttribute( "enterpriseSession", es );
iStore2 = (IInfoStore)es2.getService("", "InfoStore");
IInfoObjects getuniv;
String queryUniverse = "SELECT * FROM ci_appobjects WHERE SI_Kind='DSL.MetaDataFile' and SI_SPECIFIC_KIND = 'DSL.Universe'";
getuniv = iStore2.query(queryUniverse);
You can retrieve the WI report's name (SI_NAME) and its associated universes (SI_UNIVERSE,SI_DSL_UNIVERSE) from the repository:
SELECT si_id, si_name, si_universe, si_dsl_universe FROM ci_infoobjects WHERE si_kind='Webi' and si_instance=0
Both the SI_UNIVERSE and SI_DSL_UNIVERSE properties are collections of IDs that you'll need to serialize and include in a second query to get the details about the universes:
SELECT * FROM ci_appobjects WHERE si_id IN ([serialized list of IDs])
If you just want a list of the UNV or UNX universes, use this query:
SELECT * FROM ci_appobjects WHERE si_kind IN ('Universe','DSL.Universe')
You'll need to use one of the Universe SDKs to access the universe's collection of tables and such.
You could also try the RESTful Raylight SDK, but good luck finding the documentation--all the links that I've seen are orphans.