I have two tables that are populated with the same structure into 2 different tables: MST3_CURR and MST4_CURR. Accounts are populated into one of the two tables; each table holds accounts that are in a different 'state'. In order to generate a complete list of accounts the tables need to be fully joined and have the most current data for an account pulled.
There are several other tables which follow the exact same approach where I am using a UNION ALL operator without issue. However, for some reason with these two tables when I perform the UNION ALL I receive the record for account 4700121500023998 which is found in MST3_CURR, but the other accounts are in MST4_CURR and are not present in the final dataset. When I reverse the UNION ALL order and have MST4_CURR first followed by MST3_CURR the reverse is true.
WITH cchm_d_curr AS (
SELECT * FROM hcus_raw.cchm_d_mst3_curr
UNION ALL
SELECT * FROM hcus_raw.cchm_d_mst4_curr
)
SELECT chd_current_balance FROM cchm_d_curr
WHERE
chd_account_number IN (4700121500023998, 4700121500090430, 4700121500044101, 4700121500250492, 4700121500250013)
;
I have am at a loss to finding any kind of answer to this peculiar behaviour that Oracle 12c is exhibiting. Please let me know if there is information missing that would help answer my question.
Thank you.
It might be that order of columns in the queried tables is different, so reversing tables in the union leads to different columns to be filtered by "where", i.e.
select a, b
from (select A, b from t1
union all
select b, A from t2)
where a=1
returns something different from what expected from this query
select a, b
from (select A, b from t1
union all
select A, b from t2)
where a=1
I would check if columns order in tables hcus_raw.cchm_d_mst3_curr and hcus_raw.cchm_d_mst4_curr in the original question is same.
what does the following return [m3 first, then m4 first]??
{
WITH cchm_d_curr AS (
SELECT 'm3' src, m3.* FROM hcus_raw.cchm_d_mst3_curr m3
UNION ALL
SELECT 'm4' src, m4.* FROM hcus_raw.cchm_d_mst4_curr m4
)
SELECT src, chd_account_number, chd_current_balance FROM cchm_d_curr
WHERE
chd_account_number IN (4700121500023998, 4700121500090430, 4700121500044101, 4700121500250492, 4700121500250013)
;
}
Related
Have a table test.
select b from test
b is a text column and contains Apartment,Residential
The other table is a parcel table with a classification column. I'd like to use test.b to select the right classifications in the parcels table.
select * from classi where classification in(select b from test)
this returns no rows
select * from classi where classification =any(select '{'||b||'}' from test)
same story with this one
I may make a function to loop through the b column but I'm trying to find an easier solution
Test case:
create table classi as
select 'Residential'::text as classification
union
select 'Apartment'::text as classification
union
select 'Commercial'::text as classification;
create table test as
select 'Apartment,Residential'::text as b;
You don't actually need to unnest the array:
SELECT c.*
FROM classi c
JOIN test t ON c.classification = ANY (string_to_array(t.b, ','));
db<>fiddle here
The problem is that = ANY takes a set or an array, and IN takes a set or a list, and your ambiguous attempts resulted in Postgres picking the wrong variant. My formulation makes Postgres expect an array as it should.
For a detailed explanation see:
How to match elements in an array of composite type?
IN vs ANY operator in PostgreSQL
Note that my query also works for multiple rows in table test. Your demo only shows a single row, which is a corner case for a table ...
But also note that multiple rows in test may produce (additional) duplicates. You'd have to fold duplicates or switch to a different query style to get de-duplicate. Like:
SELECT c.*
FROM classi c
WHERE EXISTS (
SELECT FROM test t
WHERE c.classification = ANY (string_to_array(t.b, ','))
);
This prevents duplication from elements within a single test.b, as well as from across multiple test.b. EXISTS returns a single row from classi per definition.
The most efficient query style depends on the complete picture.
You need to first split b into an array and then get the rows. A couple of alternatives:
select * from nj.parcels p where classification = any(select unnest(string_to_array(b, ',')) from test)
select p.* from nj.parcels p
INNER JOIN (select unnest(string_to_array(b, ',')) from test) t(classification) ON t.classification = p.classification;
Essential to both is the unnest surrounding string_to_array.
SELECT
ID, PRIM_EMAIL, SEC_EMAIL, PHONE
FROM
STUDENTS.RECORDS
WHERE
ID IN (SELECT ID FROM STUDENTS.INFO WHERE ROLL_NO = '554')
UNION
SELECT NAME
FROM STUDENTS.INFO
WHERE ROLL_NO = '554';
Here Roll_No is a user inserted data so for now I have hard coded it. Basically with the help of ROLL_NO I sort the STUDENTS_INFO table from where I get the ID and based on that I try to get PRIM_EMAIL, SEC_EMAIL, PHONE from the STUDENTS.RECORDS table while matching the foreign keys of both the tables. In addition to the current result set I also want to have the prov_name column.
Any help is very much appreciated. Thank you!
I suspect that you want to put all this information on the same row, which suggests a join rather than union all:
select
r.ID,
r.PRIM_EMAIL,
r.SEC_EMAIL,
r.PHONE,
r.NAME
from STUDENTS.RECORDS r
inner join STUDENTS.INFO i ON i.ID = r.ID
where I.ROLL_NO = '554';
I think the source of your error query block has incorrect number of result columns is coming from trying to union together a table with 4 columns (id, prim_email, sec_email, phone) with 1 column (name).
From your question, I am gathering that you want a single table of id, prim_email, sec_email, phone from students.records and name from students.info.
I think the following query using CTE's might get you (partially) to your final result. You may want to refactor for optimizing performance.
with s_records as ( select * from students.records ),
s_info as ( select * from students.info ),
joined as (
select
s_records.id,
s_records.prim_email,
s_records.sec_email,
s_records.phone,
s_info.name
from s_records
left join s_info
on s_records.roll_no = s_info.roll_no
where roll_np = '554' )
select * from joined
Overall, I think that a join will be part of your solution rather than a union :-)
I want to write a big query to fetch the value in a column from multiple tables in a given dataset. But the column name in each table is different like colA, colB, colC and so on. How to accomplish this?
I have many tables in my dataset where one of the column contains web URL. However this column name is different in each table. I want to process all the URL's of all the tables.
I checked in this link How to combine multiple tables that vary slightly in columns. However it talks about limited number of column name variation and limited number of tables.
I know
SELECT
column_name
FROM
`bq-project.bq-dataset.INFORMATION_SCHEMA.COLUMNS`
group by 1
will give distinct column, but not sure how to proceed
You may create a view to translate column names.
CREATE VIEW my_dataset.aggregated_tables AS
SELECT * EXCEPT (colA), colA as url FROM table_a
UNION
SELECT * EXCEPT (colB), colB as url FROM table_b
UNION
SELECT * EXCEPT (colC), colC as url FROM table_c;
For fun, discovering what column has an URL, using JS UDFs:
CREATE TEMP FUNCTION urls(x STRING)
RETURNS STRING
LANGUAGE js AS r"""
function isURL(str) {
// https://stackoverflow.com/a/49185442/132438
return /^(?:\w+:)?\/\/([^\s\.]+\.\S{2}|localhost[\:?\d]*)\S*$/.test(str);
}
obj = JSON.parse(x);
for (var key in obj){
if(isURL(obj[key])) return(obj[key]);
}
""";
WITH table_a AS (SELECT 'https://google.com/' aa)
,table_b AS (SELECT 'http://medium.com/#hoffa' ba, 'noturl' bb)
,table_c AS (SELECT 'bigquery' ca, 'noturl' cb, 'https://twitter.com/felipehoffa' cc)
SELECT urls(x) url
FROM (
SELECT TO_JSON_STRING(t) x FROM table_a t
UNION ALL
SELECT TO_JSON_STRING(t) FROM table_b t
UNION ALL
SELECT TO_JSON_STRING(t) FROM table_c t
)
I have columns A,B in Table 1 and columns B,C in Table 2.Need to perform Union between them .
For example : select A,B from Table 1 UNION select 0,B from Table 2.
I dont need this zero to solve the column mismatch.Instead is there any other solution?
Am asking the question by providing simple example . But in my case the table structure is very large and the queries are already built.Now I need to fix this union query by replacing this zero.(due to DB2 upgrade)
Can anyone help?
For two legs A and B in a union to be union compatible it is required that:
a) A and B have the same number of columns
b) The types for each column in A is compatible with the corresponding column in B
In your query you can use null that is part of every type:
select a, b from T1
UNION
select null, b from T2
Under certain circumstances, you may have to explicitly cast null to the same type as A has (probably not in this case):
select a, b from Table 1
UNION
select cast(null as ...), b from Table 2
A column returned in a SQL result set can only have one data type.
A union or union all results in rows from the first query and the rows of the second query (in case of union they are deduplicated).
So the first column of the first query needs to match the data type of the first column of the second query.
You can check this by running a describe:
describe select a,b from t1
If you work within a GUI (JDBC Connection) you could also use
call admin_cmd('describe select a,b from t1')
So if the some column does not match you have to explicitly cast the data types.
I've been playing around with the sample on Jeff' Server blog to compare two tables to find the differences.
In my case the tables are a backup and the current data. I can get what I want with this SQL statement (simplified by removing most of the columns). I can then see the rows from each table that don't have an exact match and I can see from which table they come.
SELECT
MIN(TableName) as TableName
,[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
FROM
(SELECT
'Old' as TableName
,[JAS001].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001].[dbo].[AR_CustomerAddresses].[strPostalCode]
FROM
[JAS001].[dbo].[AR_CustomerAddresses]
UNION ALL
SELECT
'New' as TableName
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strPostalCode]
FROM
[JAS001new].[dbo].[AR_CustomerAddresses]) tmp
GROUP BY
[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
HAVING
COUNT(*) = 1
This Stack Overflow Answer gives me a much cleaner SQL query but does not tell me from which table the rows come.
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
UNION
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
INTERSECT
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
I could use the first version but I have many tables that I need to compare and I think that there has to be an easy way to add the source table column to the second query. I've tried several things and googled to no avail. I suspect that maybe I'm just not searching for the correct thing since I'm sure it's been answered before.
Maybe I'm going down the wrong trail and there is a better way to compare the databases?
Could you use the following setup to accomplish your goal?
SELECT 'New not in Old' Descriptor, *
FROM
(
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
) a
UNION
SELECT 'Old not in New' Descriptor, *
FROM
(
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
) b
You can't add the table name there because union, except, and intersection all compare all columns. This means you can't differentiate between them by adding the table name to the query. A group by gives you control over what columns are considered in finding duplicates so you can exclude the table name.
To help you with the large number of tables you need to compare you could write a sql query off the metadata tables that hold table names and columns and generate the sql commands dynamically off those values.
Derive one column using table names like below
SELECT MIN(TableName) as TableName
,[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
,table_name_came
FROM
(SELECT 'Old' as TableName
,[JAS001].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001].[dbo].[AR_CustomerAddresses].[strPostalCode]
,'[JAS001].[dbo].[AR_CustomerAddresses]' as table_name_came
FROM [JAS001].[dbo].[AR_CustomerAddresses]
UNION ALL
SELECT 'New' as TableName
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strPostalCode]
,'[JAS001new].[dbo].[AR_CustomerAddresses]' as table_name_came
FROM [JAS001new].[dbo].[AR_CustomerAddresses]
) tmp
GROUP BY [strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
,table_name_came
HAVING COUNT(*) = 1