Stream Analytics query to exclude records - azure-stream-analytics

I am successfully joining a reference data stream like this:
TenantInput AS
(
SELECT
Input.userId,
Input.tenantId,
FROM
Input
JOIN
Tenants TNTS ON Input.tenantId = TNTS.tenantId
)
Whereas TNTS is a JSON file in a storage blob:
[
{
"tenantId": "t1"
},
{
"tenantId": "t2"
}
]
This works well and the output only contains records for t1 + t2.
On a second output, I would like to have all data except for tenant t1 + t2 and so far I have not found a solution. I tried things like below, but this is not supported.
OtherTenantInput AS
(
SELECT
Input.userId,
Input.tenantId,
FROM
Input
WHERE
Input.tenanId NOT IN (SELECT * FROM TNTS)
)
Any ideas are welcome.

How about:
TenantInput AS
(
SELECT
Input.userId,
Input.tenantId,
FROM
Input
LEFT JOIN
Tenants TNTS ON Input.tenantId = TNTS.tenantId
WHERE TNTS.tenantId IS NULL
)
This will only output events from Input, where there is no tenantId can be found in TNTS.

Related

Extracting nested JSON using json_extract_array: getting null results on populated data fields?

I am attempting to extract the following configuration data from this nested json file (Image of JSON file), using this query:
SELECT
json_extract_scalar(f, '$.affiliationTypes') as affiliation_types
, json_extract_scalar(f, '$.organizationTypes') as organization_types
, json_extract_scalar(f, '$.verificationTypes') as verification_types
, json_extract_scalar(f, '$.notifierIds') as notifier_ids
, json_extract_scalar(f, '$.metadata') as metadata
, json_extract_scalar(body, '$.accountId') as account_id
FROM `my database`
left join unnest(json_extract_array(body, '$.request.config')) as f
LIMIT 100;
but am getting null results for all of the selected config data. I've even filtered on accounts/requests where I know there is config data listed still nothing. Any ideas? (many of these fields have data I've just nulled them out for this post).
AS an example, I would like for the first select statement to output: STUDENT_PART_TIME, EMPLOYEE, FACULTY, VETERAN, STUDENT_FULL_TIME, ACTIVE_DUTY
For those who dont want to open image, this is the json chunk I'm interested in:
"body": {
"accountId":""
,"activeAffiliationTypes":[]
,"affiliations":[]
,"aggregateState":"OPEN"
,"birthMonth":null
,"birthYear":null
,"customerFacingConclusiveVerificationType":null
,"customerFacingRequestedAffiliationType":""
,"dataSourceErrorCodes":null
,"dataSourceResponseHttpStatusCategory":"UNKNOWN"
,"dataSourceResponseHttpStatusCode":-1
,"emailDomain":null
,"errorCodes":[]
,"errors":[]
,"issuedRewards":[]
,"metadata":{}
,"request": {
"accountId":""
,"active":false
,"assetMap":{}
,"config": {
"affiliationTypes":["STUDENT_PART_TIME","EMPLOYEE","FACULTY","VETERAN","STUDENT_FULL_TIME","ACTIVE_DUTY"]
,"assetTypes":[]
,"consolationRewardIds":[]
,"ignoreVerifierClasses":null
,"locale":"en_US"
,"metadata":{}
,"notifierIds":null
,"organizationTypes":[]
,"rewardIds":[]
,"testMode":false
,"verificationModelClass":""
,"verificationSourceClasses":[]
,"verificationTypes":["AUTHORITATIVE"]
}
,"created":null
SELECT STRING(c)
FROM 'my database'
LEFT JOIN UNNEST(
JSON_QUERY_ARRAY(body, '$.request.config.affiliationTypes')
) c

Asking for help on correct way to us SQL with CTE to create JSON_OBJECT

The requested JSON needs to be in this form:
{
"header": {
"InstanceName": "US"
},
"erpReferenceData": {
"erpReferences": [
{
"ServiceID": "fb16e421-792b-4e9c-935b-3cea04a84507",
"ERPReferenceID": "J0000755"
},
{
"ServiceID": "7d13d907-0932-44c0-ad81-600c9b97b6e5",
"ERPReferenceID": "J0000756"
}
]
}
}
The program that I created looks like this:
dcl-s OutFile sqltype(dbclob_file);
exec sql
With x as (
select json_object(
'InstanceName' : trim(Cntry) ) objHeader
from xmlhdr
where cntry = 'US'),
y as (
select json_object(
'ServiceID' VALUE S.ServiceID,
'ERPReferenceID' VALUE I.RefCod) oOjRef
FROM IMH I
INNER JOIN GUIDS G ON G.REFCOD = I.REFCOD
INNER JOIN SERV S ON S.GUID = G.GUID
WHERE G.XMLTYPE = 'Service')
VALUES (
select json_object('header' : objHeader Format json ,
'erpReferenceData' : json_object(
'erpReferences' VALUE
JSON_ARRAYAGG(
ObjRef Format json)))
from x
LEFT OUTER JOIN y ON 1=1
Group by objHeader)
INTO :OutFile;
This is the compile error I get:
SQL0122: Position 41 Column OBJHEADER or expression in SELECT list not valid.
I am asking if this is the correct way to create this SQL statement, is there a better easier way? Any idea how to rewrite the SQL statement to make it work correctly?
The key with generating JSON or XML for that matter is to start from the inside and work your way out.
(I've simplified the raw data into just a test table...)
with elm as(select json_object
('ServiceID' VALUE ServiceID,
'ERPReferenceID' VALUE RefCod) as erpRef
from jsontst)
select * from elm;
Now add the next layer as a CTE the builds on the first CTE..
with elm as(select json_object
('ServiceID' VALUE ServiceID,
'ERPReferenceID' VALUE RefCod) as erpRef
from jsontst)
, arr (arrDta) as (values json_array (select erpRef from elm))
select * from arr;
And the next layer...
with elm as(select json_object
('ServiceID' VALUE ServiceID,
'ERPReferenceID' VALUE RefCod) as erpRef
from jsontst)
, arr (arrDta) as (values json_array (select erpRef from elm))
, erpReferences (refs) as ( select json_object
('erpReferences' value arrDta )
from arr)
select *
from erpReferences;
Nice thing about building with CTE's is at each step, you can see the results so far...
You can actually always go back and stick a Select * from CTE; in the middle to see what you have at some point.
Note that I'm building this in Run SQL Scripts. Once you have the statement complete, you can embed it in your RPG program.

How to group by duplicate value of nested array in Postgresql?

Previously question : How to group by duplicate value and nested the array Postgresql
Using this query :
SELECT json_build_object(
'nama_perusahaan',"a"."nama_perusahaan",
'proyek', json_agg(
json_build_object(
'no_izin',"b"."no_izin",
'kode',c.kode,
'judul_kode',d.judul
)
)
)
FROM "t_pencabutan" "a"
LEFT JOIN "t_pencabutan_non" "b" ON "a"."id_pencabutan" = "b"."id_pencabutan"
LEFT JOIN "t_pencabutan_non_b" "c" ON "b"."no_izin" = "c"."no_izin"
LEFT JOIN "t_pencabutan_non_c" "d" ON "c"."id_proyek" = "d"."id_proyek"
GROUP BY "a"."nama_perusahaan"
The result is shown below:
{
"nama_perusahaan" : "JASA FERRIE",
"proyek" :
{
"no_izin" : "26A/E/IU/PMA/D8FD",
"kode" : "14302",
"judul_kode" : "IND"
}
{
"no_izin" : "26A/E/IU/PMA/D8FD",
"kode" : "13121",
"judul_kode" : "IND B"
}
}
As you could see, the proyek have been nested, so the duplicate proyek will be grouped. Now i have to group the same value of no_izin so it will double nested array like expected result below.
{
"nama_perusahaan" : "JASA FERRIE",
"proyek" :
[{
"no_izin" : "26A/E/IU/PMA/D8FD",
"kode_list":[
{
"kode" : "14302",
"judul_kode" : "IND"
},
{
"kode" : "13121",
"judul_kode" : "IND B"
}]
}]
}
I tried to use this query:
SELECT json_build_object(
'nama_perusahaan',"a"."nama_perusahaan",
'proyek', json_agg(
json_build_object(
'no_izin',"b"."no_izin",
'kode_list',json_agg(
json_build_object(
'kode',c.kode,
'judul_kode',d.judul
)
)
)
)
)
FROM "t_pencabutan" "a"
LEFT JOIN "t_pencabutan_non" "b" ON "a"."id_pencabutan" = "b"."id_pencabutan"
LEFT JOIN "t_pencabutan_non_b" "c" ON "b"."no_izin" = "c"."no_izin"
LEFT JOIN "t_pencabutan_non_c" "d" ON "c"."id_proyek" = "d"."id_proyek"
GROUP BY "a"."nama_perusahaan", b.no_izin
but it didnt work, it gives ERROR: aggregate function calls cannot be nested LINE 6:'kode_list',json_agg(.
What could go wrong with my code ?
Disclaimer: It is very hard for us to construct a query without knowing the input data and table structure and have to handle a language we don't know. Please try to minimize your further questions (e.g. For your question it is not relevant that you need to join some tables before converting the result into a JSON output), create examples in English (handling foreign languages makes the code looking confusing and leads to spelling errors, so the probably right idea fails on writing the words wrong) and add the input data! This would help you as well: You would get an answer faster and the chance of code mistakes is much more less (because now without the data we cannot create a runnable example to check our ideas).
Creating a nested JSON structure is only possible doing it from the innermost nested object to the outermost one. So first you have to create the no_izin array in a subquery. This can be used to create the proyek object:
SELECT
json_build_object(
'nama_perusahaan',"s"."nama_perusahaan",
'proyek', json_agg(no_izin)
)
)
FROM (
SELECT
"a"."nama_perusahaan",
json_build_object(
'no_izin',
"b"."no_izin",
'kode_list',
json_agg(
json_build_object(
'kode',c.kode,
'judul_kode',d.judul
)
)
) AS no_izin
FROM "t_pencabutan" "a"
LEFT JOIN "t_pencabutan_non" "b" ON "a"."id_pencabutan" = "b"."id_pencabutan"
LEFT JOIN "t_pencabutan_non_b" "c" ON "b"."no_izin" = "c"."no_izin"
LEFT JOIN "t_pencabutan_non_c" "d" ON "c"."id_proyek" = "d"."id_proyek"
GROUP BY "c"."id_proyek", "a"."nama_perusahaan"
) AS s
GROUP BY "s"."nama_perusahaan"

BigQuery select expect double nested column

I am trying to remove a column from a BigQuery table and I've followed the instructions as stated here:
https://cloud.google.com/bigquery/docs/manually-changing-schemas#deleting_a_column_from_a_table_schema
This did not work directly as the column I'm trying to remove is nested twice in a struct. The following SO questions are relevant but none of them solve this exact case.
Single nested field:
BigQuery select * except nested column
Double nested field (solution has all fields in the schema enumerated, which is not useful for me as my schema is huge):
BigQuery: select * replace from multiple nested column
I've tried adapting the above solutions and I think I'm close but can't quite get it to work.
This one will remove the field, but returns only the nested field, not the whole table (for the examples I want to remove a.b.field_name. See the example schema at the end):
SELECT AS STRUCT * EXCEPT(a), a.* REPLACE (
(SELECT AS STRUCT a.b.* EXCEPT (field_name)) AS b
)
FROM `table`
This next attempt gives me an error: Scalar subquery produced more than one element:
WITH a_tmp AS (
SELECT AS STRUCT a.* REPLACE (
(SELECT AS STRUCT a.b.* EXCEPT (field_name)) AS b
)
FROM `table`
)
SELECT * REPLACE (
(SELECT AS STRUCT a.* FROM a_tmp) AS a
)
FROM `table`
Is there a generalised way to solve this? Or am I forced to use the enumerated solution in the 2nd link?
Example Schema:
[
{
"name": "a",
"type": "RECORD",
"fields": [
{
"name": "b",
"type": "RECORD"
"fields": [
{
"name": "field_name",
"type": "STRING"
},
{
"name": "other_field_name".
"type": "STRING"
}
]
},
]
}
]
I would like the final schema to be the same but without field_name.
Below is for BigQuery Standard SQL
#standardSQL
SELECT * REPLACE(
(SELECT AS STRUCT(SELECT AS STRUCT a.b.* EXCEPT (field_name)) b)
AS a)
FROM `project.dataset.table`
you can test, play with it using dummy data as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT STRUCT<b STRUCT<field_name STRING, other_field_name STRING>>(STRUCT('1', '2')) a
)
SELECT * REPLACE(
(SELECT AS STRUCT(SELECT AS STRUCT a.b.* EXCEPT (field_name)) b)
AS a)
FROM `project.dataset.table`

MySQL: Compare differences between two tables

Same as oracle diff: how to compare two tables? except in mysql.
Suppose I have two tables, t1 and t2 which are identical in layout but which may contain different data.
What's the best way to diff these two tables?
To be more precise, I'm trying to figure out a simple SQL query that tells me if data from one row in t1 is different from the data from the corresponding row in t2
It appears I cannot use the intersect nor minus. When I try
SELECT * FROM robot intersect SELECT * FROM tbd_robot
I get an error code:
[Error Code: 1064, SQL State: 42000] You have an error in your SQL
syntax; check the manual that corresponds to your MySQL server version
for the right syntax to use near 'SELECT * FROM tbd_robot' at line 1
Am I doing something syntactically wrong? If not, is there another query I can use?
Edit: Also, I'm querying through a free version DbVisualizer. Not sure if that might be a factor.
INTERSECT needs to be emulated in MySQL:
SELECT 'robot' AS `set`, r.*
FROM robot r
WHERE ROW(r.col1, r.col2, …) NOT IN
(
SELECT col1, col2, ...
FROM tbd_robot
)
UNION ALL
SELECT 'tbd_robot' AS `set`, t.*
FROM tbd_robot t
WHERE ROW(t.col1, t.col2, …) NOT IN
(
SELECT col1, col2, ...
FROM robot
)
You can construct the intersection manually using UNION. It's easy if you have some unique field in both tables, e.g. ID:
SELECT * FROM T1
WHERE ID NOT IN (SELECT ID FROM T2)
UNION
SELECT * FROM T2
WHERE ID NOT IN (SELECT ID FROM T1)
If you don't have a unique value, you can still expand the above code to check for all fields instead of just the ID, and use AND to connect them (e.g. ID NOT IN(...) AND OTHER_FIELD NOT IN(...) etc)
I found another solution in this link
SELECT MIN (tbl_name) AS tbl_name, PK, column_list
FROM
(
SELECT ' source_table ' as tbl_name, S.PK, S.column_list
FROM source_table AS S
UNION ALL
SELECT 'destination_table' as tbl_name, D.PK, D.column_list
FROM destination_table AS D
) AS alias_table
GROUP BY PK, column_list
HAVING COUNT(*) = 1
ORDER BY PK
select t1.user_id,t2.user_id
from t1 left join t2 ON t1.user_id = t2.user_id
and t1.username=t2.username
and t1.first_name=t2.first_name
and t1.last_name=t2.last_name
try this. This will compare your table and find all matching pairs, if any mismatch return NULL on left.
Based on Haim's answer I created a PHP code to test and display all the differences between two databases.
This will also display if a table is present in source or test databases.
You have to change with your details the <> variables content.
<?php
$User = "<DatabaseUser>";
$Pass = "<DatabasePassword>";
$SourceDB = "<SourceDatabase>";
$TestDB = "<DatabaseToTest>";
$link = new mysqli( "p:". "localhost", $User, $Pass, "" );
if ( mysqli_connect_error() ) {
die('Connect Error ('. mysqli_connect_errno() .') '. mysqli_connect_error());
}
mysqli_set_charset( $link, "utf8" );
mb_language( "uni" );
mb_internal_encoding( "UTF-8" );
$sQuery = 'SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA="'. $SourceDB .'";';
$SourceDB_Content = query( $link, $sQuery );
if ( !is_array( $SourceDB_Content) ) {
echo "Table $SourceDB cannot be accessed";
exit(0);
}
$sQuery = 'SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA="'. $TestDB .'";';
$TestDB_Content = query( $link, $sQuery );
if ( !is_array( $TestDB_Content) ) {
echo "Table $TestDB cannot be accessed";
exit(0);
}
$SourceDB_Tables = array();
foreach( $SourceDB_Content as $item ) {
$SourceDB_Tables[] = $item["TABLE_NAME"];
}
$TestDB_Tables = array();
foreach( $TestDB_Content as $item ) {
$TestDB_Tables[] = $item["TABLE_NAME"];
}
//var_dump( $SourceDB_Tables, $TestDB_Tables );
$LookupTables = array_merge( $SourceDB_Tables, $TestDB_Tables );
$NoOfDiscrepancies = 0;
echo "
<table border='1' width='100%'>
<tr>
<td>Table</td>
<td>Found in $SourceDB (". count( $SourceDB_Tables ) .")</td>
<td>Found in $TestDB (". count( $TestDB_Tables ) .")</td>
<td>Test result</td>
<tr>
";
foreach( $LookupTables as $table ) {
$FoundInSourceDB = in_array( $table, $SourceDB_Tables ) ? 1 : 0;
$FoundInTestDB = in_array( $table, $TestDB_Tables ) ? 1 : 0;
echo "
<tr>
<td>$table</td>
<td><input type='checkbox' ". ($FoundInSourceDB == 1 ? "checked" : "") ."></td>
<td><input type='checkbox' ". ($FoundInTestDB == 1 ? "checked" : "") ."></td>
<td>". compareTables( $SourceDB, $TestDB, $table ) ."</td>
</tr>
";
}
echo "
</table>
<br><br>
No of discrepancies found: $NoOfDiscrepancies
";
function query( $link, $q ) {
$result = mysqli_query( $link, $q );
$errors = mysqli_error($link);
if ( $errors > "" ) {
echo $errors;
exit(0);
}
if( $result == false ) return false;
else if ( $result === true ) return true;
else {
$rset = array();
while ( $row = mysqli_fetch_assoc( $result ) ) {
$rset[] = $row;
}
return $rset;
}
}
function compareTables( $source, $test, $table ) {
global $link;
global $NoOfDiscrepancies;
$sQuery = "
SELECT column_name,ordinal_position,data_type,column_type FROM
(
SELECT
column_name,ordinal_position,
data_type,column_type,COUNT(1) rowcount
FROM information_schema.columns
WHERE
(
(table_schema='$source' AND table_name='$table') OR
(table_schema='$test' AND table_name='$table')
)
AND table_name IN ('$table')
GROUP BY
column_name,ordinal_position,
data_type,column_type
HAVING COUNT(1)=1
) A;
";
$result = query( $link, $sQuery );
$data = "";
if( is_array( $result ) && count( $result ) > 0 ) {
$NoOfDiscrepancies++;
$data = "<table><tr><td>column_name</td><td>ordinal_position</td><td>data_type</td><td>column_type</td></tr>";
foreach( $result as $item ) {
$data .= "<tr><td>". $item["column_name"] ."</td><td>". $item["ordinal_position"] ."</td><td>". $item["data_type"] ."</td><td>". $item["column_type"] ."</td></tr>";
}
$data .= "</table>";
return $data;
}
else {
return "Checked but no discrepancies found!";
}
}
?>
Problem below, is to compare table before and after i do big update!.
If you use Linux, you can use commands as follow:
In terminal,
mysqldump -hlocalhost -uroot -p schema_name_here table_name_here > /home/ubuntu/database_dumps/dump_table_before_running_update.sql
mysqldump -hlocalhost -uroot -p schema_name_here table_name_here > /home/ubuntu/database_dumps/dump_table_after_running_update.sql
diff -uP /home/ubuntu/database_dumps/dump_some_table_after_running_update.sql /home/ubuntu/database_dumps/dump_table_before_running_update.sql > /home/ubuntu/database_dumps/diff.txt
You will need online tools for
Formatting SQL exported from the dumps,
e.g http://www.dpriver.com/pp/sqlformat.htm [Not the best I've seen]
We have diff.txt, you have to take manually the + - showing inside, which is 1 line of insert statements, that has the values.
Do diff online for the 2 lines - & + in diff.txt, past them in online diff tool
e.g https://www.diffchecker.com [you can save and share it, and has no limit on file size!]
Note: be extra careful if its sensitive/production data!
you can try The big data comparison platform in https://github.com/zhugezifang/dataCompare
this is a introduction of it
Design and practice of open source big data comparison platform
1. Background & current situation
In the process of developing large numbers, it is often encountered that data migration or upgrade, or different business parties have processed data according to their needs, but think that the data on both sides is still the same, so it will be necessary to manually compare the data. So is the data on both sides consistent? If not, what are the differences?
If there is no platform, you need to manually write some SQL scripts for comparison, and there is no evaluation standard. This is inefficient.
"Alibaba's Road to Big Data" actually mentions such a platform, but because it is not used externally, the introduction in the book is relatively simple. Based on previous work experience, a big data comparison platform was developed to assist in verifying data, named dataCompare.
Main solutions:
(1) Verify data and data comparison, which wastes great labor costs
(2) Without a set of standards, the results of verification are difficult to evaluate
(3) Automatic data verification and comparison can be achieved by interface interaction, check or low-code
[enter image description here][1]
2. Purpose
(1) Automatic data verification and comparison can be achieved by interface interaction, check or low-code.
(2) The data team's data comparison efficiency is increased by at least about 50%.
(3) A set of unified data verification scheme to meet the standard specifications of data verification and comparison
3. System architecture design
4. The current version has implemented the following functions
(1) Low-code simple configuration completes the core function of data comparison
(2) Data magnitude comparison and data consistency comparison
5. Follow-up Development Plan
(1) Discrepancy case finding
(2) Data pointer detection---- enumeration value detection, range detection, numerical detection, primary key mode detection
(3) Data comparison task is scheduled and automatically scheduled
(4) Automatically send an email report to the comparison results
6. The core code is opening in githup
https://github.com/zhugezifang/dataCompare
[enter image description here][1]
Based on Haim's answer here's a simplified example if you're looking to compare values that exist in BOTH tables, otherwise if there's a row in one table but not the other it will also return it....
Took me a couple of hours to figure out. Here's a fully tested simply query for comparing "tbl_a" and "tbl_b"
SELECT ID, col
FROM
(
SELECT
tbl_a.ID, tbl_a.col FROM tbl_a
UNION ALL
SELECT
tbl_b.ID, tbl_b.col FROM tbl_b
) t
WHERE ID IN (select ID from tbl_a) AND ID IN (select ID from tbl_b)
GROUP BY
ID, col
HAVING COUNT(*) = 1
ORDER BY ID
So you need to add the extra "where in" clause:
WHERE ID IN (select ID from tbl_a) AND ID IN (select ID from tbl_b)
Also:
For ease of reading if you want to indicate the table names you can use the following:
SELECT tbl, ID, col
FROM
(
SELECT
tbl_a.ID, tbl_a.col, "name_to_display1" as "tbl" FROM tbl_a
UNION ALL
SELECT
tbl_b.ID, tbl_b.col, "name_to_display2" as "tbl" FROM tbl_b
) t
WHERE ID IN (select ID from tbl_a) AND ID IN (select ID from tbl_b)
GROUP BY
ID, col
HAVING COUNT(*) = 1
ORDER BY ID
you can user my own developed tool
https://github.com/hardeepvicky/MySql-Schema-Compare
I tried the above answer but found that if one table has null values and the second table has values in a column then the intersect code above does not report this fact.
select p.pcn,p.period,p.account_no,p.ytd_debit,a.ytd_debit
-- select count(*) -- 157,283
from Plex.account_period_balance p -- 157,283/202207,148,998
join Azure.account_period_balance a -- 157,283/202207,148,998
on p.pcn = a.pcn
and p.period = a.period
and p.account_no = a.account_no -- 157,283
where p.period_display = a.period_display -- 157,283
and p.debit = a.debit -- 157,283
-- and p.ytd_debit = a.ytd_debit -- 148,998
-- and p.ytd_debit != a.ytd_debit -- 0