How would I remove specific words or text from KQL query? - kql

I have the following query which provides me with all the data I need exported but I would like text '' removed from my final query. How would I achieve this?
| where type == "microsoft.security/assessments"
| project id = tostring(id),
Vulnerabilities = properties.metadata.description,
Severity = properties.metadata.severity,
Remediations = properties.metadata.remediationDescription
| parse kind=regex id with '/virtualMachines/' Name '/providers/'
| where isnotempty(Name)
| project Name, Severity, Vulnerabilities, Remediations ```

You could use replace_string() (https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/replace-string-function) to replace any substring with an empty string

Related

Splunk - Add Conditional On Input

I have a Splunk Dashboard. This dashboard has a Text input where the user can enter a path. After entering the input, I would like to apply some conditional logic to the path input by the user before the search is executed. Is this possible in Splunk? Is there a way for me to take the Text input (i.e. path) and do something like:
var parameter1 = "value-a";
if (path == "/endpoint-1")
parameter1 = "value-b";
else if (path == "/endpoint-2")
parameter1 = "/endpoint-3";
// Execute search with parameter1
Thank you.
Subsearches!
Eg:
index=data [
| makeresults 1
| eval path="$inputToken$"
| eval parameter1=case(
path="/endpoint-1","value-b,
path="/endpoint-2","/endpoint-3")
| fields parameter1
| format]
the subsearches are run before the main search, and alter that main search.
the main search here after the subsearch would be something like.
index=data parameter1="value-b"
Related reading to help on your sub search journey
https://docs.splunk.com/Documentation/Splunk/latest/SearchTutorial/Useasubsearch
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Format
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Return

Get value from json dimensional array in oracle

I have below JSON from which i need to fetch the value of issuedIdentValue where issuedIdentType = PANCARD
{
"issuedIdent": [
{"issuedIdentType":"DriversLicense","issuedIdentValue":"9797979797979797"},
{"issuedIdentType":"SclSctyNb","issuedIdentValue":"078-01-8877"},
{"issuedIdentType":"PANCARD","issuedIdentValue":"078-01-8877"}
]
}
I can not hard-code the index value [2] in my below query as the order of these records can be changed. So want to get rid off any hardcoded index.
select json_value(
'{"issuedIdent": [{"issuedIdentType":"DriversLicense","issuedIdentValue":"9797979797979797"},{"issuedIdentType":"SclSctyNb","issuedIdentValue":"078-01-8877"}, {"issuedIdentType":"PANCARDSctyNb","issuedIdentValue":"078-01-8877"}]}',
'$.issuedIdent[2].issuedIdentValue'
) as output
from d1entzendev.ExternalEventLog
where
eventname = 'CustomerDetailsInqSVC'
and applname = 'digitalBANKING'
and requid = '4fe1fa1b-abd4-47cf-834b-858332c31618';
What changes will need to apply in json_value function to achieve the expected result
In Oracle 12c or higher, you can use JSON_TABLE() for this:
select value
from json_table(
'{"issuedIdent": [{"issuedIdentType":"DriversLicense","issuedIdentValue":"9797979797979797"},{"issuedIdentType":"SclSctyNb","issuedIdentValue":"078-01-8877"}, {"issuedIdentType":"PANCARD","issuedIdentValue":"078-01-8877"}]}',
'$.issuedIdent[*]' columns
type varchar(50) path '$.issuedIdentType',
value varchar(50) path '$.issuedIdentValue'
) t
where type = 'PANCARD'
This returns:
| VALUE |
| :---------- |
| 078-01-8877 |

HIVE-SQL_SERVER: HadoopExecutionException: Not enough columns in this line

I have a hive table with the following structure and data:
Table structure:
CREATE EXTERNAL TABLE IF NOT EXISTS db_crprcdtl.shcar_dtls
ID string,
CSK string,
BRND string,
MKTCP string,
AMTCMP string,
AMTSP string,
RLBRND string,
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/on/hadoop/dir/'
-------------------------------------------------------------------------------
ID | CSK | BRND | MKTCP | AMTCMP
-------------------------------------------------------------------------------
782 flatn,grpl,mrtn hnd,mrc,nsn 34555,56566,66455 38900,59484,71450
1231 jikl,bngr su,mrc,frd 56566,32333,45000 59872,35673,48933
123 unsrvl tyt,frd,vlv 25000,34789,33443 29892,38922,36781
Trying to push this data into the SQL Server. But while doing so, getting the following error message:
SQL Error [107090] [S0001]: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopExecutionException: Not enough columns in this line.
What I tried:
There's an online article where the author has documented similar kind of issues. I tried to implement one of them Looked in Excel and found two columns that had carriage returns but this also doesn't come handy.
Any suggestion/help would be really appreciated. Thanks
If I'm able to understand your issue, then it seems that your , separated data is getting divided into various columns rather one column on the SQL-SERVER, something like:
------------------------------
ID |CSK |BRND |MKTCP |AMTCMP
------------------------------
782 flatn grpl mrtn hnd mrc nsn 345 56566 66455 38900 59484 71450
1231 jikl bngr su mrc frd 56566 32333 45000 59872 35673 48933
123 unsrvl tyt frd vlv 25000 34789 33443 29892 38922 36781
So, if you look on Hive there are only 5 columns. While on SQL-SERVER the same. This I presume as you haven't shared the schema. But if that's the case, then you see that there are more than 5 values are being passed. While the schema definition is only of 5 columns.
So the error is populating.
Refer this Document by MS and try to create a FILE_FORMAT with FIELD_TERMINATOR ='\t',
like:
CREATE EXTERNAL FILE FORMAT <name>
WITH (   
FORMAT_TYPE = DELIMITEDTEXT,   
FORMAT_OPTIONS (        
FIELD_TERMINATOR ='\t',
| STRING_DELIMITER = string_delimiter
| First_Row = integer -- ONLY AVAILABLE SQL DW
| DATE_FORMAT = datetime_format
| USE_TYPE_DEFAULT = { TRUE | FALSE }
| Encoding = {'UTF8' | 'UTF16'} )
);
Hope that helps to resolve to your issue :)

How do I identify problematic documents in S3 when querying data in Athena?

I have a basic Athena query like this:
SELECT *
FROM my.dataset LIMIT 10
When I try to run it I get an error message like this:
Your query has the following error(s):
HIVE_BAD_DATA: Error parsing field value for field 2: For input string: "32700.000000000004"
How do I identify the S3 document that has the invalid field?
My documents are JSON.
My table looks like this:
CREATE EXTERNAL TABLE my.data (
`id` string,
`timestamp` string,
`profile` struct<
`name`: string,
`score`: int>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'ignore.malformed.json' = 'true'
)
LOCATION 's3://my-bucket-of-data'
TBLPROPERTIES ('has_encrypted_data'='false');
Inconsistent schema
Inconsistent schema is when values in some rows are of different data type. Let's assume that we have two json files
// inside s3://path/to/bad.json
{"name":"1Patrick", "age":35}
{"name":"1Carlos", "age":"eleven"}
{"name":"1Fabiana", "age":22}
// inside s3://path/to/good.json
{"name":"2Patrick", "age":35}
{"name":"2Carlos", "age":11}
{"name":"2Fabiana", "age":22}
Then a simple query SELECT * FROM some_table will fail with
HIVE_BAD_DATA: Error parsing field value 'eleven' for field 1: For input string: "eleven"
However, we can exclude that file within WHERE clause
SELECT
"$PATH" AS "source_s3_file",
*
FROM some_table
WHERE "$PATH" != 's3://path/to/bad.json'
Result:
source_s3_file | name | age
---------------------------------------
s3://path/to/good.json | 1Patrick | 35
s3://path/to/good.json | 1Carlos | 11
s3://path/to/good.json | 1Fabiana | 22
Of course, this is the best case scenario when we know which files are bad. However, you can employ this approach to somewhat manually infer which files are good. You can also use LIKE or regexp_like to walk through multiple files at a time.
SELECT
COUNT(*)
FROM some_table
WHERE regexp_like("$PATH", 's3://path/to/go[a-z]*.json')
-- If this query doesn't fail, that those files are good.
The obvious drawback of such approach is cost to execute query and time spent, especially if it is done file by file.
Malformed records
In the eyes of AWS Athena, good records are those which are formatted as a single JSON per line:
{ "id" : 50, "name":"John" }
{ "id" : 51, "name":"Jane" }
{ "id" : 53, "name":"Jill" }
AWS Athena supports OpenX JSON SerDe library which can be set to evaluate malformed records as NULL by specifying
-- When you create table
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
when you create table. Thus, the following query will reveal files with malformed records:
SELECT
DISTINCT("$PATH")
FROM "some_database"."some_table"
WHERE(
col_1 IS NULL AND
col_2 IS NULL AND
col_3 IS NULL
-- etc
)
Note: you can use only a single col_1 IS NULL if you are 100% sure that it doesn't contain empty fields other then in corrupted rows.
In general, malformed records are not that big of a deal provided that 'ignore.malformed.json' = 'true'. For example the following query will still succeed
For example if a file contains:
{"name": "2Patrick","age": 35,"address": "North Street"}
{
"name": "2Carlos",
"age": 11,
"address": "Flowers Street"
}
{"name": "2Fabiana","age": 22,"address": "Main Street"}
the following query will still succeed
SELECT
"$PATH" AS "source_s3_file",
*
FROM some_table
Result:
source_s3_file | name | age | address
-----------------------------|----------|-----|-------------
1 s3://path/to/malformed.json| 2Patrick | 35 | North Street
2 s3://path/to/malformed.json| | |
3 s3://path/to/malformed.json| | |
4 s3://path/to/malformed.json| | |
5 s3://path/to/malformed.json| | |
6 s3://path/to/malformed.json| | |
7 s3://path/to/malformed.json| 2Fabiana | 22 | Main Street
While with 'ignore.malformed.json' = 'false' (which is the default behaviour) exactly the same query will throw an error
HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: A JSONObject text must end with '}' at 2 [character 3 line 1]

concatening rows in a rails style

Suppose I want to check if some string appears as name-surname in the concatenation of two rows name and surname of a table. How do I write valid this sql in a rails style ? And is these syntaxes correct ?
SELECT (name + '-'+ surname) FROM table1 where (name + '-'+ surname = string)
table.select(:name+'-'+:surname).where((:name+'-'+:surname) == string)
I am not sure if I am understanding your question correctly, but I think this is what you are wanting. For the following string variable,
string = "John - Doe"
you want to pull a record like this from the User table
id | name | sur_name
1 | John | Doe
If this is what you want, you can actually massage your string variable like this
parsed_string = string.split('-')
name = parsed_string[0].strip # strip to remove white spaces
sur_name = parse_string[1].strip
Then, you can run the following code to get what you want:
users = User.where(:name => name, :surname => sur_name)
Hope this answers your question.