SQL Server : compare two tables with UNION and Select * plus additional label column - sql

I've been playing around with the sample on Jeff' Server blog to compare two tables to find the differences.
In my case the tables are a backup and the current data. I can get what I want with this SQL statement (simplified by removing most of the columns). I can then see the rows from each table that don't have an exact match and I can see from which table they come.
SELECT
MIN(TableName) as TableName
,[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
FROM
(SELECT
'Old' as TableName
,[JAS001].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001].[dbo].[AR_CustomerAddresses].[strPostalCode]
FROM
[JAS001].[dbo].[AR_CustomerAddresses]
UNION ALL
SELECT
'New' as TableName
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strPostalCode]
FROM
[JAS001new].[dbo].[AR_CustomerAddresses]) tmp
GROUP BY
[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
HAVING
COUNT(*) = 1
This Stack Overflow Answer gives me a much cleaner SQL query but does not tell me from which table the rows come.
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
UNION
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
INTERSECT
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
I could use the first version but I have many tables that I need to compare and I think that there has to be an easy way to add the source table column to the second query. I've tried several things and googled to no avail. I suspect that maybe I'm just not searching for the correct thing since I'm sure it's been answered before.
Maybe I'm going down the wrong trail and there is a better way to compare the databases?

Could you use the following setup to accomplish your goal?
SELECT 'New not in Old' Descriptor, *
FROM
(
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
) a
UNION
SELECT 'Old not in New' Descriptor, *
FROM
(
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
) b

You can't add the table name there because union, except, and intersection all compare all columns. This means you can't differentiate between them by adding the table name to the query. A group by gives you control over what columns are considered in finding duplicates so you can exclude the table name.
To help you with the large number of tables you need to compare you could write a sql query off the metadata tables that hold table names and columns and generate the sql commands dynamically off those values.

Derive one column using table names like below
SELECT MIN(TableName) as TableName
,[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
,table_name_came
FROM
(SELECT 'Old' as TableName
,[JAS001].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001].[dbo].[AR_CustomerAddresses].[strPostalCode]
,'[JAS001].[dbo].[AR_CustomerAddresses]' as table_name_came
FROM [JAS001].[dbo].[AR_CustomerAddresses]
UNION ALL
SELECT 'New' as TableName
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strPostalCode]
,'[JAS001new].[dbo].[AR_CustomerAddresses]' as table_name_came
FROM [JAS001new].[dbo].[AR_CustomerAddresses]
) tmp
GROUP BY [strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
,table_name_came
HAVING COUNT(*) = 1

Related

UNION tables with wildcard in BigQuery

I have over 40 tables I want to append in BigQuery using standard SQL. I have already formatted them to have the exact same schema. When I try to use the '*' wildcard at the end of table name in my FROM clause, I get the following error:
Syntax error: Expected end of input but got "*" at [95:48]
I ended up manually doing a UNION DISTINCT on all my tables. Is this the best way to do this? Any help would be appreciated. Thank you!
CREATE TABLE capstone-320521.Trips.Divvy_Trips_All AS
SELECT * FROM capstone-320521.Trips.Divvy_Trips_*;
--JOIN all 2020-21 trips data
CREATE TABLE capstone-320521.Trips.Divvy_Trips_Raw_2020_2021 AS
SELECT * FROM capstone-320521.Trips.Divvy_Trips_2020_04
UNION DISTINCT
SELECT * FROM capstone-320521.Trips.Divvy_Trips_2020_05
UNION DISTINCT
SELECT * FROM capstone-320521.Trips.Divvy_Trips_2020_06
UNION DISTINCT
Syntax error: Expected end of input but got "*"
I think the problem is in missing ticks around the table references. Try below
CREATE TABLE `capstone-320521.Trips.Divvy_Trips_All` AS
SELECT * FROM `capstone-320521.Trips.Divvy_Trips_*`
Note: The wildcard table name contains the special character (*), which means that you must enclose the wildcard table name in backtick (`) characters. See more at Enclose table names with wildcards in backticks
I am unaware of any such UNION DISTINCT syntax. If your intention is to do a union of the 3 tables and remove any duplicate records in the process, then just using UNION should suffice:
CREATE TABLE capstone-320521.Trips.Divvy_Trips_Raw_2020_2021 AS
SELECT * FROM capstone-320521.Trips.Divvy_Trips_2020_04
UNION
SELECT * FROM capstone-320521.Trips.Divvy_Trips_2020_05
UNION
SELECT * FROM capstone-320521.Trips.Divvy_Trips_2020_06;
Note that in general it is bad practice to use SELECT * with union queries. The reason has to do with that a union between two (or more) tables is generally only valid if the two queries involved have the number and types of columns. Using SELECT * obfuscates what columns are actually being selected, and so it is preferable to always explicitly list out the columns.

How to select the nth column, and order columns' selection in BigQuery

I have this huge table upon which I apply a lot of processing (using CTEs), and I want to perform a UNION ALL on 2 particular CTEs.
SELECT *
, 0 AS orders
, 0 AS revenue
, 0 AS units
FROM secondary_prep_cte WHERE purchase_event_flag IS FALSE
UNION ALL
SELECT *
FROM results_orders_and_revenues_cte
I get a "Column 1164 in UNION ALL has incompatible types : STRING,DATE at [97:5]
Obviously I don't know the name of the column, and I'd like to debug this but I feel like I'm going to waste a lot of time if I can't pin-point which column is 1164.
I also think this is a problem of the order of columns between the CTEs, so I have 2 questions:
How do I identify the 1164th column
How do I order my columns before performing the UNION ALL
I found this similar question but it is for MSSQL, I am using BigQuery
You can get information from INFORMATION_SCHEMA.COLUMNS but you'll need to create a table or view from the CTE:
CREATE OR REPLACE VIEW `project.dataset.secondary_prep_view` as select * from (select 1 as id, "a" as name, "b" as value)
Then:
SELECT * FROM dataset.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'secondary_prep_view';

Want to concatenate column of the second query to the first query but getting errors such as "query block has incorrect number of result columns"

SELECT
ID, PRIM_EMAIL, SEC_EMAIL, PHONE
FROM
STUDENTS.RECORDS
WHERE
ID IN (SELECT ID FROM STUDENTS.INFO WHERE ROLL_NO = '554')
UNION
SELECT NAME
FROM STUDENTS.INFO
WHERE ROLL_NO = '554';
Here Roll_No is a user inserted data so for now I have hard coded it. Basically with the help of ROLL_NO I sort the STUDENTS_INFO table from where I get the ID and based on that I try to get PRIM_EMAIL, SEC_EMAIL, PHONE from the STUDENTS.RECORDS table while matching the foreign keys of both the tables. In addition to the current result set I also want to have the prov_name column.
Any help is very much appreciated. Thank you!
I suspect that you want to put all this information on the same row, which suggests a join rather than union all:
select
r.ID,
r.PRIM_EMAIL,
r.SEC_EMAIL,
r.PHONE,
r.NAME
from STUDENTS.RECORDS r
inner join STUDENTS.INFO i ON i.ID = r.ID
where I.ROLL_NO = '554';
I think the source of your error query block has incorrect number of result columns is coming from trying to union together a table with 4 columns (id, prim_email, sec_email, phone) with 1 column (name).
From your question, I am gathering that you want a single table of id, prim_email, sec_email, phone from students.records and name from students.info.
I think the following query using CTE's might get you (partially) to your final result. You may want to refactor for optimizing performance.
with s_records as ( select * from students.records ),
s_info as ( select * from students.info ),
joined as (
select
s_records.id,
s_records.prim_email,
s_records.sec_email,
s_records.phone,
s_info.name
from s_records
left join s_info
on s_records.roll_no = s_info.roll_no
where roll_np = '554' )
select * from joined
Overall, I think that a join will be part of your solution rather than a union :-)

Oracle: Query identical table in multiple schema in single line

In Oracle, is there a way to query the same, structurally identical table out of multiple schema within the same database in single line? Obviously assuming the user has permissions to access all schema, I could build a query like:
select * from schema1.SomeTable
union all
select * from schema2.SomeTable
But is it possible, given the right permissions to say something like:
select * from allSchema.SomeTable
...and bring back all rows for all the schema? And related to this, is it possible to pick which schema, such as:
select * from allSchema.SomeTable where schemaName in ('schema1','schema2')
The simplest option, as far as I can tell, is to create a VIEW (as UNION of all tables across all those users), and then SELECT FROM VIEW.
For example:
create or replace view my_view as
select 'schema_1' source_schema, id, name from schema_1.table union
select 'schema_2' source_schema, id, name from schema_2.table union
...
-- select all
select * from my_view;
-- select all that belongs to one of schemas
select * from my_view where source_schema = 'schema_1';

Alternative SQL ways of looking up multiple items of known IDs?

Is there a better solution to the problem of looking up multiple known IDs in a table:
SELECT * FROM some_table WHERE id='1001' OR id='2002' OR id='3003' OR ...
I can have several hundreds of known items. Ideas?
SELECT * FROM some_table WHERE ID IN ('1001', '1002', '1003')
and if your known IDs are coming from another table
SELECT * FROM some_table WHERE ID IN (
SELECT KnownID FROM some_other_table WHERE someCondition
)
The first (naive) option:
SELECT * FROM some_table WHERE id IN ('1001', '2002', '3003' ... )
However, we should be able to do better. IN is very bad when you have a lot of items, and you mentioned hundreds of these ids. What creates them? Where do they come from? Can you write a query that returns this list? If so:
SELECT *
FROM some_table
INNER JOIN ( your query here) filter ON some_table.id=filter.id
See Arrays and Lists in SQL Server 2005
ORs are notoriously slow in SQL.
Your question is short on specifics, but depending on your requirements and constraints I would build a look-up table with your IDs and use the EXISTS predicate:
select t.id from some_table t
where EXISTS (select * from lookup_table l where t.id = l.id)
For a fixed set of IDs you can do:
SELECT * FROM some_table WHERE id IN (1001, 2002, 3003);
For a set that changes each time, you might want to create a table to hold them and then query:
SELECT * FROM some_table WHERE id IN
(SELECT id FROM selected_ids WHERE key=123);
Another approach is to use collections - the syntax for this will depend on your DBMS.
Finally, there is always this "kludgy" approach:
SELECT * FROM some_table WHERE '|1001|2002|3003|' LIKE '%|' || id || '|%';
In Oracle, I always put the id's into a TEMPORARY TABLE to perform massive SELECT's and DML operations:
CREATE GLOBAL TEMPORARY TABLE t_temp (id INT)
SELECT *
FROM mytable
WHERE mytable.id IN
(
SELECT id
FROM t_temp
)
You can fill the temporary table in a single client-server roundtrip using Oracle collection types.
We have a similar issue in an application written for MS SQL Server 7. Although I dislike the solution used, we're not aware of anything better...
'Better' solutions exist in 2008 as far as I know, but we have Zero clients using that :)
We created a table valued user defined function that takes a comma delimited string of IDs, and returns a table of IDs. The SQL then reads reasonably well, and none of it is dynamic, but there is still the annoying double overhead:
1. Client concatenates the IDs into the string
2. SQL Server parses the string to create a table of IDs
There are lots of ways of turning '1,2,3,4,5' into a table of IDs, but the Stored Procedure which uses the function ends up looking like...
CREATE PROCEDURE my_road_to_hell #IDs AS VARCHAR(8000)
AS
BEGIN
SELECT
*
FROM
myTable
INNER JOIN
dbo.fn_split_list(#IDs) AS [IDs]
ON [IDs].id = myTable.id
END
The fastest is to put the ids in another table and JOIN
SELECT some_table.*
FROM some_table INNER JOIN some_other_table ON some_table.id = some_other_table.id
where some_other_table would have just one field (ids) and all values would be unique