KQL/Search-AzGraph join operator - kql

I have just begun using AzGraph and I am learning how to use its queries, I am running into an issue when attempting to pull key vault whitelisted IP addresses, below is the query that I am currently running:
Search-AzGraph -Query "resources
|where type == 'microsoft.keyvault/vaults'
|where properties.publicNetworkAccess == 'Enabled'
|mv-expand properties.networkAcls.ipRules
|project kvName = name, kvRule = properties.networkAcls.ipRules"
The output instead of providing a list of addresses per vault returns a bunch of duplicated lines for the same vaults, this only occurs after a certain number of whitelisted addresses, I am not sure on the number:
kvName kvRule
------ ------
Vault1
Vault2
Vault3 {#{value=1.1.1.1/32}}
Vault4 {#{value=1.1.1.1/32}, #{value=2.2.2.2/32}}
Vault5 {#{value=1.1.1.1/32}, #{value=2.2.2.2/32}}
Vault6 {#{value=1.1.1.1/32}}
Vault7
Vault8
Vault9 <-- {#{value=1.1.1.1/32}, #{value=2.2.2.2/32}, #{value=3.3.3.3/32}, #{value=4.4.4.4/32}...}
Vault9 <-- {#{value=1.1.1.1/32}, #{value=2.2.2.2/32}, #{value=3.3.3.3/32}, #{value=4.4.4.4/32}...}
Vault9 <-- {#{value=1.1.1.1/32}, #{value=2.2.2.2/32}, #{value=3.3.3.3/32}, #{value=4.4.4.4/32}...}
Vault9 <-- {#{value=1.1.1.1/32}, #{value=2.2.2.2/32}, #{value=3.3.3.3/32}, #{value=4.4.4.4/32}...}
Vault9 <-- {#{value=1.1.1.1/32}, #{value=2.2.2.2/32}, #{value=3.3.3.3/32}, #{value=4.4.4.4/32}...}
I have also tried extending the property values to see if that helped, but instead, the format changed to below
Code:
Search-AzGraph -Query "resources
|where type == 'microsoft.keyvault/vaults'
|where properties.publicNetworkAccess == 'Enabled'
|extend kvRule=parsejson(tostring(properties.networkAcls.ipRules))
|mv-expand kvRule
|project kvName = name, kvRule.value"
Output:
kvName kvRule
------ ------
Vault1
Vault2 1.1.1.1/32
Vault3 1.2.3.4/32
Vault4 5.6.7.8/32
Vault5 1.2.3.3/32
Vault6
Vault7
Vault8
Vault9 <-- 1.1.1.1/32
Vault9 <-- 2.2.2.2/32
Vault9 <-- 3.3.3.3/32
Vault9 <-- 4.4.4.4/32
I came across the join operator, and attempted to use the examples against my queries but failed, the output was always similar to the above output, or I received an error:
This query outputs similar to the second example:
Search-AzGraph -Query "Resources
| join kind=leftouter (resources | where type=='microsoft.keyvault/vaults' | where properties.publicNetworkAccess == 'Enabled' | extend kvRule=parsejson(tostring(properties.networkAcls.ipRules)) | mv-expand kvRule | project id, kvName = name, kvURI = properties.vaultUri, kvRule) on id
| where type == 'microsoft.keyvault/vaults'
| project id, name, kvType = type, kvLoc = location, kvSub = subscriptionId, kvURI, kvRule= properties.networkAcls.ipRules"
I also attempted the below query, which errored out that the ipRules are dynamic:
Search-AzGraph -Query "Resources
|where type == 'microsoft.keyvault/vaults'
|where properties.publicNetworkAccess == 'Enabled'
|extend kvRule=parsejson(tostring(properties.networkAcls.ipRules))
|project kvID = id, name, kvLoc = location, kvSub = subscriptionId, kvURI = properties.vaultUri, kvRule
| join kind=leftouter (
Resources
|where type == 'microsoft.keyvault/vaults'
|where properties.publicNetworkAccess == 'Enabled'
|project name, kvRule = tolower(id))
on kvRule
| summarize by name"
Error:
"code": "InvalidQuery",
"message": "Query is invalid. Please refer to the documentation for the Azure Resource Graph service and fix the error before retrying."
"code": "Default",
"message": "join key 'kvRule' is of a 'dynamic' type. Please use an explicit cast using extend operator in the join legs (for example, '... | extend kvRule = tostring(kvRule) | join (... | extend kvRule = tostring(kvRule)) on kvRule') as join on a 'dynamic' type is not supported."
I am struggling to understand how I can make this query work, I really believe that I need to use the join operator to get this query right, but I do not have enough understanding of KQL/DB queries to do so, looking to be educated on how I can correctly perform this query.
My goal is to have the output be a single vault name, with kvRule including a full list of the addresses in its whitelist if there are any all in a single line:
kvName kvRule
------ ------
Vault1
Vault2 1.1.1.1/32, 2.2.2.2/32, 3.3.3.3/32
Vault3 1.1.1.1/32, 2.2.2.2/32, 3.3.3.3/32, 1.1.1.3/32, 2.2.2.4/32, 3.3.3.1/32

to fix the query, instead of
|extend kvRule=parsejson(tostring(properties.networkAcls.ipRules))
make it
|extend kvRule=tostring(parsejson(properties.networkAcls.ipRules))
and for better filtering, you may consider using
| where isnotempty(kvRule)
and for clear results
kind=inner

Related

KQL - Joining 2 tables sing Equality by Value

I am attempting to join two tables in KQL within Microsoft Defender.
These tables don't have matching columns however they do have matching fields.
LeftTable: EmailEvents Field: RecipientEmailAddress
RightTable: IdentityInfo Field: AccountUpn
The query I am using is as follows
EmailEvents
| where EmailDirection == "Inbound"
| where Subject == "invoice" or SenderFromAddress == "testtest#outlook.com"
| project RecipientEmailAddress, Subject, InternetMessageId, SenderFromAddress
| join kind=inner (IdentityInfo
| distinct AccountUpn, AccountDisplayName, JobTitle , Department, City, Country)
on $left.RecipientEmailAddress -- $right.AccountUpn
I am seeing the error
Semantic error
Error message
join: only column entities or equality expressions are allowed in this context.
How to resolve
Fix semantic errors in your query
Can someone assist I am not sure where I am going wrong here.
try replacing this:
on $left.RecipientEmailAddress -- $right.AccountUpn
with this:
on $left.RecipientEmailAddress == $right.AccountUpn

How to use `sum` within `summarize` in a KQL query?

I'm working at logging an Azure Storage Account. Have a Diagnostic Setting applied and am using Log Analytics to write KQL queries.
My goal is to determine the number of GetBlob requests (OperationName) for a given fileSize (RequestBodySize).
The challenge is that I need to sum the RequestBodySize for all GetBlob operations on each file. I'm not sure how to nest sum in summarize.
Tried so far:
StorageBlobLogs
| where TimeGenerated >= ago(5h)
and AccountName == 'storageAccount'
and OperationName == 'GetBlob'
| summarize count() by Uri, fileSize = format_bytes(RequestBodySize)
| render scatterchart
Results in:
Also tried: fileSize = format_bytes(sum(RequestBodySize)) but this errored out.
Any ideas?
EDIT 1: Testing out #Yoni's solution.
Here is an example of RequestBodySize with no summarization:
When implementing the summarize query (| summarize count() by Uri, fileSize = format_bytes(RequestBodySize)), the results are 0 bytes.
Though its clear there are multiple calls for a given Uri, the sum doesn't seem to be working.
EDIT 2:
And yeah... pays to verify the field names! There is no RequestBodySize field available, only ResponseBodySize. Using the correct value worked (imagine that!).
I need to sum the RequestBodySize for all GetBlob operations on each file
If I understood your question correctly, you could try this:
StorageBlobLogs
| where TimeGenerated >= ago(5h)
and AccountName == 'storageAccount'
and OperationName == 'GetBlob'
| summarize count(), total_size = format_bytes(sum(RequestBodySize)) by Uri
Here's an example using a dummy data set:
datatable(Url:string, ResponseBodySize:long)
[
"https://something1", 33554432,
"https://something3", 12341234,
"https://something1", 33554432,
"https://something2", 12345678,
"https://something2", 98765432,
]
| summarize count(), total_size = format_bytes(sum(ResponseBodySize)) by Url
Url
count_
total_size
https://something1
2
64 MB
https://something3
1
12 MB
https://something2
2
106 MB

Get the newest partner record for each NAST table message type?

The question is generic (not anymore after the edits...), because I understand this is a common problem for other tables as well, but I will describe my particular problem with the selection of partners for output messages.
For a given invoice, I want to get the partner linked to each message type in NAST table. There could be multiple entries for the same message type so I want the newest one based on fields ERDAT and ERUHR (date and time).
I tried to do it with subqueries, but it got very ugly, especially the time field requires a double subquery because you first need to get the latest date...
Then I implemented this solution but I don't like it and I was hoping for something better
DATA: lt_msg_type_rg TYPE RANGE OF kschl.
lt_msg_type_rg = VALUE #( FOR ls_msg_type IN me->mt_message_type
( sign = 'I' option = 'EQ' low = ls_msg_type-kschl ) ).
SELECT FROM nast AS invoice_msg_status
FIELDS invoice_msg_status~kschl AS message_type,
invoice_msg_status~parnr AS partner_num,
CONCAT( invoice_msg_status~erdat, invoice_msg_status~eruhr ) AS create_timestamp
WHERE invoice_msg_status~kappl = #c_app_invoicing
AND invoice_msg_status~objky = #me->m_invoice_num
AND invoice_msg_status~kschl IN #lt_msg_type_rg
ORDER BY create_timestamp DESCENDING
INTO TABLE #DATA(lt_msg_partner).
DATA: lt_partner_rg TYPE RANGE OF parnr.
LOOP AT lt_msg_partner ASSIGNING FIELD-SYMBOL(<lgr_msg_partner>) GROUP BY <lgr_msg_partner>-message_type.
lt_partner_rg = COND #( WHEN line_exists( lt_partner_rg[ low = <lgr_msg_partner>-partner_num ] )
THEN lt_partner_rg
ELSE VALUE #( BASE lt_partner_rg ( sign = 'I' option = 'EQ' low = <lgr_msg_partner>-partner_num ) ) ).
ENDLOOP.
Example input (skipped irrelevant fields)
+-------+-------+-------+-------+------------+-------+
| KAPPL | OBJKY | KSCHL | PARNR | ERDAT | ERUHR |
+-------+-------+-------+-------+------------+-------+
| V3 | 12345 | Z001 | 11 | 27.10.2020 | 11:00 |
| V3 | 12345 | Z001 | 12 | 27.10.2020 | 12:00 |
| V3 | 12345 | Z002 | 13 | 27.10.2020 | 11:00 |
+-------+-------+-------+-------+------------+-------+
Expected output:
[12]
[13]
Unfortunately, SQL does not provide a simple syntax for this rather common kind of selection. Solutions will always involve multiple subsequent or nested selects.
According to your description, I assume you already found the do-it-all-in-a-single-deeply-nested ABAP SQL statement, but you are not satisfied with it because readability suffers too much.
For cases like this, we often resort to ABAP-Managed Database Procedures (AMDPs). They allow decomposing complicated nested selects into a series of simple subsequent selects.
CLASS cl_read_nast DEFINITION
PUBLIC FINAL CREATE PUBLIC.
PUBLIC SECTION.
INTERFACES if_amdp_marker_hdb.
TYPES:
BEGIN OF result_row_type,
parnr TYPE char2,
END OF result_row_type.
TYPES result_table_type
TYPE STANDARD TABLE OF result_row_type
WITH EMPTY KEY.
TYPES:
BEGIN OF key_range_row_type,
kschl TYPE char4,
END OF key_range_row_type.
TYPES key_range_table_type
TYPE STANDARD TABLE OF key_range_row_type
WITH EMPTY KEY.
CLASS-METHODS select
IMPORTING
VALUE(application) TYPE char2
VALUE(invoice_number) TYPE char5
VALUE(message_types) TYPE key_range_table_type
EXPORTING
VALUE(result) TYPE result_table_type.
ENDCLASS.
CLASS cl_read_nast IMPLEMENTATION.
METHOD select
BY DATABASE PROCEDURE FOR HDB LANGUAGE SQLSCRIPT
USING nast.
last_changed_dates =
select kappl, objky, kschl,
max( erdat || eruhr ) as last_changed_on
from nast
where kappl = :application
and objky = :invoice_number
and kschl in
( select kschl from :message_types )
group by kappl, objky, kschl;
last_changers =
select nast.kschl,
max( nast.parnr ) as parnr
from nast
inner join :last_changed_dates
on nast.kappl = :last_changed_dates.kappl
and nast.objky = :last_changed_dates.objky
and nast.kschl = :last_changed_dates.kschl
and nast.erdat || nast.eruhr = :last_changed_dates.last_changed_on
group by nast.kschl;
result =
select distinct parnr
from :last_changers;
ENDMETHOD.
ENDCLASS.
Verified with the following integration test:
CLASS integration_tests DEFINITION
FOR TESTING RISK LEVEL CRITICAL DURATION SHORT.
PRIVATE SECTION.
TYPES db_table_type
TYPE STANDARD TABLE OF nast
WITH EMPTY KEY.
CLASS-METHODS class_setup.
METHODS select FOR TESTING.
ENDCLASS.
CLASS integration_tests IMPLEMENTATION.
METHOD class_setup.
DATA(sample) =
VALUE db_table_type(
( kappl = 'V3' objky = '12345' kschl = 'Z001' parnr = '11' erdat = '20201027' eruhr = '1100' )
( kappl = 'V3' objky = '12345' kschl = 'Z001' parnr = '12' erdat = '20201027' eruhr = '1200' )
( kappl = 'V3' objky = '12345' kschl = 'Z002' parnr = '13' erdat = '20201027' eruhr = '1100' ) ).
MODIFY nast
FROM TABLE #sample.
COMMIT WORK AND WAIT.
ENDMETHOD.
METHOD select.
DATA(invoicing) = 'V3'.
DATA(invoice_number) = '12345'.
DATA(message_types) =
VALUE zcl_fh_read_nast=>key_range_table_type(
( kschl = 'Z001' )
( kschl = 'Z002' ) ).
cl_read_nast=>select(
EXPORTING
application = invoicing
invoice_number = invoice_number
message_types = message_types
IMPORTING
result = DATA(actual_result) ).
DATA(expected_result) =
VALUE cl_read_nast=>result_table_type(
( parnr = '12' )
( parnr = '13' ) ).
cl_abap_unit_assert=>assert_equals(
act = actual_result
exp = expected_result ).
ENDMETHOD.
ENDCLASS.
First of all, your piece is not correct, because you are checking existence (deduplication) only by partner number, and potentially the same partner could serve different message types, at least in my dataset on my test system I see such rows. So you should check by message type also. Grouping loop by message type and deduplication by partner number makes no sense, as you are stripping valid partners, which occurs in different types. You need:
SELECT
....
ORDER BY message_type, create_timestamp DESCENDING
....
So your LOOP grouping can be simplified into these two lines:
DELETE ADJACENT DUPLICATES FROM lt_msg_partner COMPARING message_type.
lt_partner_rg = VALUE #( BASE lt_partner_rg FOR GROUPS value_no OF <line_no> IN lt_msg_partner GROUP BY ( partner_num = <line_no>-partner_num ) WITHOUT MEMBERS ( sign = 'I' option = 'EQ' low = value_no-partner_num ) ).
As suggested in the comments to the AMDP-variant answer, this can also be done with CDS views.
First, we need a view that timestamps the data:
#AbapCatalog.sqlViewName: 'timednast'
define view timestamped_nast as select from nast {
kappl,
objky,
kschl,
parnr,
concat(erdat, eruhr) as timestamp
}
Second, because CDS' syntax doesn't allow timestamping and grouping in a single view, we need another view that calculates the latest change dates for each message type:
#AbapCatalog.sqlViewName: 'lchgnast'
define view last_changed_nast as
select from timestamped_nast {
kappl,
objky,
kschl,
max(timestamp) as last_changed_on
} group by kappl, objky, kschl
Third, we need to select the partner numbers associated with these time points:
#AbapCatalog.sqlViewName: 'lchbnast'
define view last_changers_nast as
select from last_changed_nast
inner join timestamped_nast
on timestamped_nast.kappl = last_changed_nast.kappl
and timestamped_nast.objky = last_changed_nast.objky
and timestamped_nast.kschl = last_changed_nast.kschl
and timestamped_nast.timestamp = last_changed_nast.last_changed_on
{
timestamped_nast.kappl,
timestamped_nast.objky,
timestamped_nast.kschl,
parnr
}
A SELECT on the last view last_changers_nast, including the selection criteria on kappl, objky, and kschl will then produce the list of latest changers.
I am not sure about the keys of the nast table. The third view assumes that there will be no two entries with exactly identical timestamps for one object. If this isn't true, the third view should add another aggregation by using max(parnr) instead of parnr

Report Builder Query Text Editor Where Clause Parentheses

My employer has switched data systems and reporting tools. We used to use Report Builder with a nicely built data model that allowed me to do some complex filtering easily. Then we used Business Objects, and though I didn't like it very much, it also let me do some complex filtering. Now we're back to Report Builder, but the data model is different, and the only filtering I seem to be able to do is a string of AND operators.
(Note: I'm self-taught on both Report Builder and Business Objects. I have minimal experience with the SQL coding language itself. Also, actual data labels have been changed in this example.)
I'm pulling from a large amount of data, so I need to filter on the query level. I first need to include data based on five criteria, like this.
| SYSTEM.REGION.REGION_STATUS_CODE = N'1'
| SYSTEM.STATE.STATE_STATUS_CODE = N'1'
AND | SYSTEM.ORDERS.DISCARDED_DATE IS NULL
| SYSTEM.SERVICE.SERVICE_DISCARDED_DATE IS NULL
| SYSTEM.SERVICE.SERVICE_STATUS_CODE = N'01'
Then I need to include data that fits one of two pairings, like this.
| | SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Retail'
| AND | SYSTEM.ORDERS.DISCOUNT_CODE = N'N/A'
OR |
| | SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Wholesale'
| AND | SYSTEM.ORDERS.DISCOUNT_CODE != N'N/A'
After I built my query using the query designer and switched to text mode, it gave me this.
WHERE
SYSTEM.REGION.REGION_STATUS_CODE = N'1'
AND SYSTEM.STATE.STATE_STATUS_CODE = N'1'
AND SYSTEM.ORDERS.DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_STATUS_CODE = N'01'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Retail'
AND SYSTEM.ORDERS.DISCOUNT_CODE = N'N/A'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Wholesale'
AND SYSTEM.ORDERS.DISCOUNT_CODE != N'N/A'
I've tried putting parentheses in, but I must have done it wrong because the query ran for ages before essentially giving me the entire database.
Anybody care to help a SQL newbie?
Presuming everything else is right, it should just be about applying parenthesis to get the logic right. Using slightly exaggerated whitespace to try and make it clear:
WHERE
SYSTEM.REGION.REGION_STATUS_CODE = N'1'
AND SYSTEM.STATE.STATE_STATUS_CODE = N'1'
AND SYSTEM.ORDERS.DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_STATUS_CODE = N'01'
AND (
(SYSTEM.ORDERS.DISCOUNT_CODE = N'N/A'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Retail')
OR
(SYSTEM.ORDERS.DISCOUNT_CODE != N'N/A'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Wholesale')
)
(It still may run forever, but that's more a factor of database size and indexing.)

Problem in the result after doing a select in the database

I'm having a problem with the result obtained on a select in my sqlite.
I already have a database fed, and I'm doing queries in my application in adobe air. In my table I have 6 columns:
id | name | email | CITY_ID | state_id | phone
When I do a select of the entire table, it returns me an array of objects.
result[30].id = 30;
result [30]. name = John;
result [30]. email = john#xxx.com;
result [30]. city_id = 1352;
result [30]. state_id = 352;
result [30]. phone = xxxxxxxxx;
All information came right, but the id value is incorrect ( correct is not 30 ) . It seems to me that i'm getting the numerical order and not getting the id column value.
Has anyone had this problem?
UPDATE
My query is:
_selectStat = new SQLStatement();
_selectStat.addEventListener( SQLEvent.RESULT, onDataLoaded );
_selectStat.addEventListener( SQLErrorEvent.ERROR, onSqlError );
_selectStat.sqlConnection = _connection;
var sql:String = 'SELECT * FROM "main"."agencia"';
_selectStat.text = sql;
_selectStat.execute();
I'm not familiar with Adobe development or sqlite, so I'm speculating here. If 'id' is a database property, then you may need to indicate that you want the column 'id', and not the property 'id'. There should be a way to do this with the adobe application syntax or the sqlite SQL syntax. In mssql, brackets [] are used for this, so it would be [id] to indicate the column 'id' and not the property. There should be something similar for your environment.