What is a straightforward way to parse hierarchical namespaces for filenames in Azure Storage Uri's using KQL? - kql

I'm trying to create a Log Analytics Workbook for some Azure Blob Storage accounts. The accounts are ADLS Gen2 w/ Hierarchical Namespaces enabled. The issue is that some files are uploaded directly under the Container-level, some are uploaded inside subdirectories or subsubdirectories (Hierarchical Namespaces).
This makes it rather challenging to surface filenames using KQL.
Is there a simple way to grab each subdirectory if they exist and each filename or do I have to come up with a custom method for each blob container?
Example:
// Define variables
let varStartTime = todatetime('2022-10-02T18:00:00Z');
let varEndTime = todatetime('2022-10-02T20:00:00Z');
let varAccountName = 'stgtest';
//
StorageBlobLogs
| where
AccountName == varAccountName
and TimeGenerated between (varStartTime .. varEndTime)
| extend FileName = todynamic(split(Uri, '/')) // herein lies the issue
| project
TimeGenerated,
AccountName,
OperationName,
FileName
| sort by TimeGenerated asc
Results:

parse_url
parse_path
StorageBlobLogs
| sample 15
| project Uri
| extend Path = parse_path(tostring(parse_url(Uri).Path))
| evaluate bag_unpack(Path, "Path_")
Azure Log Analytics | Demo Environment

Related

Default SQL Server Session for Intellij

When opening a .bdy(/sql/vew/..)-File in IntelliJ, it always greets me with semantic errors for almost every line. That is because it needs a dbsession to check the references against. DataGrip behaves identical.
For reference:
Can I somehow state a default here per file/dir/proj/global?
It's in File | Settings | Languages & Frameworks | SQL Resolution Scopes, here you can specify global Project Mapping to data source/database/schema, or define mapping for any directory/file.
in DataGrip setting path will be as follow File | Settings | Database | SQL Resolution Scopes.

Jenkins Allure report is not showing all the results when we have multiple scenarios

I have the following scenario outline and I am generating allure report, But in the report we are not getting all the scenarios data, its showing only last run data.
It is showing only | uat1_nam | Password01 | test data result
Jenkins plugin version i am using is 2.13.6
Scenario Outline: Find a transaction based on different criteria and view the details
Given I am login to application with user "<user id>" and password "<password>"
When I navigate to Balances & Statements -> Find a transaction
Then I assert I am on screen Balances & Statements -> Find a transaction -> Find a transaction
#UAT1
Examples:
| user id | password |
| uat1_moz | Password01 |
| uat1_nam | Password01 |
I got a similar issue.
We are running test with the same software on Linux and Windows, and generating results into 2 separate folders.
Then we have:
allure-reports
|_ linux_report
|_ windows_report
Then we are using the following command in the Jenkinsfile:
allure([
includeProperties: false,
jdk: '',
properties: [],
reportBuildPolicy: 'ALWAYS',
results: [[path: 'allure-reports/linux-report'], [path: 'allure-reports/windows-report']]
])
Similar to Sarath, only the results of the from the last run are available...
I also tried to run the cli directly on my machine, same results.
allure serve allure-reports/linux-report allure-reports/windows-report
I already found many methods, actually this one is very similar to my use case, but I do not understand why it works here, and not for me...
https://github.com/allure-framework/allure2/issues/1051
I also tried with the following method, but the Docker container is not running properly on Linux, due to permission issues... But I am running the container from a folder where I got all permissions. Same results if I give my userID in parameters:
https://github.com/fescobar/allure-docker-service#MULTIPLE-PROJECTS---REMOTE-REPORTS
I was able to get deeper into the topic, and I am finally able to prove why the data are overwritten.
I used a very simple example to generate 2 different reports, where only allure.epic was different.
As I thought, if we generate 2 different reports, with the same source folder, but generate 2 separate reports, then only the latest report will be considered (allure.epic name was updated in between).
If I have 2 different folders, with the same code (but only the allure.epic is different), then I have the all data available, and stored in different Suites!
Then, to make sure that allure considers the reports as different, and make a different classification for each OS, we have to make tests on code which is stored in different locations. Which does not fit with my usecase, as the same code is tested on both Linux and Windows.
Or maybe is there an option for pytest-allure to specify the root classification?

Template_searchpath gives TemplateNotFound error in Airflow and cannot find the SQL script

I have a DAG described like this :
tmpl_search_path = '/home/airflow/gcs/sql_requests/'
with DAG(dag_id='pipeline', default_args=default_args, template_searchpath = [tmpl_search_path]) as dag:
create_table = bigquery_operator.BigQueryOperator(
task_id = 'create_table',
sql = 'create_table.sql',
use_legacy_sql = False,
destination_dataset_table = some_table)
)
The task create_table calls a SQL script create_table.sql. This SQL script is not in the same folder as the DAG folder : it is in a sql_requests folder at the same level as the DAG folder.
This is the architecture inside the bucket of the GCP Composer (which is the Google Airflow) is :
bucket_name
|- airflow.cfg
|- dags
|_ pipeline.py
|- ...
|_ sql_requests
|_ create_table.sql
What path do I need to set for template_searchpath to reference the folder sql_requests inside the Airflow bucket on GCP ?
I have tried template_searchpath= ['/home/airflow/gcs/sql_requests'], template_searchpath= ['../sql_requests'], template_searchpath= ['/sql_requests'] but none of these have worked.
The error message I get is 'jinja2.exceptions.TemplateNotFound'
According to https://cloud.google.com/composer/docs/concepts/cloud-storage it is not possible to store files that are needed to execute dags elsewhere than in the folders dags or plugins :
To avoid a workflow failure, store your DAGs, plugins, and Python modules in the dags/ or plugins/ folders—even if your Python modules do not contain DAGs or plugins.
This is the reason why I had the TemplateNotFound error.
You can store in mounted/known paths which are dags/plugins OR data
data folder has no capacity limits but it's easy to throw yourself off using it to store anything that the web server would need to read, because the web server can't access that folder (e.g if you put SQL files in /data folder, you would not be able to parse rendered template in the UI, but any tasks that need to access the file during run time would just run fine)
Change 'sql_requests' folder into the 'dag' folder so that your code will be like this:
tmpl_search_path = '/home/airflow/dags/sql_requests/'
with DAG(dag_id='pipeline', default_args=default_args, template_searchpath = [tmpl_search_path]) as dag:
create_table = bigquery_operator.BigQueryOperator(
task_id = 'create_table',
sql = 'create_table.sql',
use_legacy_sql = False,
destination_dataset_table = some_table
)
)
For me, it works!
I believe by default the operator looks for sql files in the DAG folder, so you could put your SQL into the folder
gs://composer-bucket-name/dags/sql/create_table.sql
And then reference it as
sql = '/sql/create_table.sql'
If that doesn't work, try it without the leading / (which I'm not sure you need)
Edit
If you want to put them in a folder at the root of the bucket, try
sql = '../sql/create_table.sql'

How to handle file inputs with changing schemas in Talend

Questions: How do I continue to process files that differ substantially from a base schema and that trigger tSchemaComplianceCheck errors?
Background
Suppose I have a folder with Customer xls files called file1,file2,....file1000. Assume I have imported the file schema into Talend repository and called it 6Columns and I have the talend job configured to iterate through each of the files and process them
1-tFileInput ->2-tSchemaCompliance-6Columns -> 3-tMap ->4-FurtherProcessing
Read each excel file
Compare it to the schema 6Columns
Format the output (rename columns)
Take the collection of Customer data and process it more
While processing I notice that the schema compliance is generating errors (errorCode 16) which points to a number of files (200) with a different schema 13Columns but there isn't a way to identify the files in advance to filter then into a subjob
How do I amend my processing to correctly integrate the files with 13Columnsschema into the process (whats the recommended way of handling) and designing incase other schema changes occur
1-tFileInput ->2-tSchemaCompliance-6Columns -> 3-tMap ->4-FurtherProcessing
|
|Reject Flow (ErrorCode 16)
|Schema-13Columns
|
|-> ??
Current Thinking When ErrorCode 16 detected
Option 1 Parallel. Take the file path for the current file and process it against 13Columns using a new FileInput before merging the 2 flows back into 1
Option 2 Serial. Collect the list of files that triggered the error and process them after I've finished with the compliance files?
You could try something like below :
tFileList - Read your input repository
tFileInput "schema6" - tSchemaComplianceCheck : read files as 6-columns schema
tMap_1 : further processing
In the reject part :
tMap after reject link : add a new column containing the filepath that has been rejected
tFlowToIterate : used to get an iterate link, acceptable input for tFileInputDelimited that follows.
tFileInput : read data as 13-columns schema. Following components are the same as in part 1.
After that, you can push your data to tHashOutput, in order to read them further in another subjob.

How to find the size of the largest doc lib

Is there a way to find the document library of which total size (the sum of sizes of its contained documents) is the larges in the farm.
I checked web Analytics --> Inventory --> Storage Usage
But it didn't provide what I need
I posted the same question to SharePoint community and I got the below answer and it worked
if you have access to powershell then use this
#Get the Site collection
$Site = Get-SPsite "http://sharepoint.crescent.com"
#Returns a DataTable similar to "Storage Management Page" in Site settings
$DataTable = $Site.StorageManagementInformation(2,0x11,0,10)
$DataTable | Select Title, ItemCount, Size, Directory | Format-Table
Get SharePoint Library Size with PowerShell
Here it is