UNION tables with wildcard in BigQuery - sql

I have over 40 tables I want to append in BigQuery using standard SQL. I have already formatted them to have the exact same schema. When I try to use the '*' wildcard at the end of table name in my FROM clause, I get the following error:
Syntax error: Expected end of input but got "*" at [95:48]
I ended up manually doing a UNION DISTINCT on all my tables. Is this the best way to do this? Any help would be appreciated. Thank you!
CREATE TABLE capstone-320521.Trips.Divvy_Trips_All AS
SELECT * FROM capstone-320521.Trips.Divvy_Trips_*;
--JOIN all 2020-21 trips data
CREATE TABLE capstone-320521.Trips.Divvy_Trips_Raw_2020_2021 AS
SELECT * FROM capstone-320521.Trips.Divvy_Trips_2020_04
UNION DISTINCT
SELECT * FROM capstone-320521.Trips.Divvy_Trips_2020_05
UNION DISTINCT
SELECT * FROM capstone-320521.Trips.Divvy_Trips_2020_06
UNION DISTINCT

Syntax error: Expected end of input but got "*"
I think the problem is in missing ticks around the table references. Try below
CREATE TABLE `capstone-320521.Trips.Divvy_Trips_All` AS
SELECT * FROM `capstone-320521.Trips.Divvy_Trips_*`
Note: The wildcard table name contains the special character (*), which means that you must enclose the wildcard table name in backtick (`) characters. See more at Enclose table names with wildcards in backticks

I am unaware of any such UNION DISTINCT syntax. If your intention is to do a union of the 3 tables and remove any duplicate records in the process, then just using UNION should suffice:
CREATE TABLE capstone-320521.Trips.Divvy_Trips_Raw_2020_2021 AS
SELECT * FROM capstone-320521.Trips.Divvy_Trips_2020_04
UNION
SELECT * FROM capstone-320521.Trips.Divvy_Trips_2020_05
UNION
SELECT * FROM capstone-320521.Trips.Divvy_Trips_2020_06;
Note that in general it is bad practice to use SELECT * with union queries. The reason has to do with that a union between two (or more) tables is generally only valid if the two queries involved have the number and types of columns. Using SELECT * obfuscates what columns are actually being selected, and so it is preferable to always explicitly list out the columns.

Related

Column name ambiguous when trying to query multiple tables in BigQuery

I want to query multiple tables, all with the same column names in the same order, and combine the results.
SELECT SUBSTR(arrest_date, 0, 4) arrest_year, *
FROM
`OBTS.circuit11`,
`OBTS.circuit15`,
`OBTS.circuit17`,
`OBTS.circuit19`
WHERE
init_statute LIKE '%3%22%32%' OR
init_statute LIKE '%3%22%34%' OR
LOWER(init_charge_descrip) LIKE '%suspend%';
When I run this BigQuery gives me the following error.
Column name init_statute is ambiguous at [8:3]
How do I query these tables and combine all the resulting rows into one set of results?
I think you are looking for UNION ALL vs CROSS JOIN (note: comma in BigQuery Standard SQL is used for express CROSS JOIN)
So, you most likely looking for below
SELECT SUBSTR(arrest_date, 0, 4) arrest_year, *
FROM (
SELECT * FROM `OBTS.circuit11` UNION ALL
SELECT * FROM `OBTS.circuit15` UNION ALL
SELECT * FROM `OBTS.circuit17` UNION ALL
SELECT * FROM `OBTS.circuit19`
)
WHERE
init_statute LIKE '%3%22%32%' OR
init_statute LIKE '%3%22%34%' OR
LOWER(init_charge_descrip) LIKE '%suspend%'

select TableData where ColumnData start with list of strings

Following is the query to select column data from table, where column data starts with a OR b OR c. But the answer i am looking for is to Select data which starts with List of Strings.
SELECT * FROM Table WHERE Name LIKE '[abc]%'
But i want something like
SELECT * FROM Table WHERE Name LIKE '[ab,ac,ad,ae]%'
Can anybody suggest what is the best way of selecting column data which starts with list of String, I don't want to use OR operator, List of strings specifically.
The most general solution you would have to use is this:
SELECT *
FROM Table
WHERE Name LIKE 'ab%' OR Name LIKE 'ac%' OR Name LIKE 'ad%' OR Name LIKE 'ae%';
However, certain databases offer some regex support which you might be able to use. For example, in SQL Server you could write:
SELECT *
FROM Table
WHERE NAME LIKE 'a[bcde]%';
MySQL has a REGEXP operator which supports regex LIKE operations, and you could write:
SELECT *
FROM Table
WHERE NAME REGEXP '^a[bcde]';
Oracle and Postgres also have regex like support.
To add to Tim's answer, another approach could be to join your table with a sub-query of those values:
SELECT *
FROM mytable t
JOIN (SELECT 'ab' AS value
UNION ALL
SELECT 'ac'
UNION ALL
SELECT 'ad'
UNION ALL
SELECT 'ae') v ON t.vame LIKE v.value || '%'

Oracle: Query identical table in multiple schema in single line

In Oracle, is there a way to query the same, structurally identical table out of multiple schema within the same database in single line? Obviously assuming the user has permissions to access all schema, I could build a query like:
select * from schema1.SomeTable
union all
select * from schema2.SomeTable
But is it possible, given the right permissions to say something like:
select * from allSchema.SomeTable
...and bring back all rows for all the schema? And related to this, is it possible to pick which schema, such as:
select * from allSchema.SomeTable where schemaName in ('schema1','schema2')
The simplest option, as far as I can tell, is to create a VIEW (as UNION of all tables across all those users), and then SELECT FROM VIEW.
For example:
create or replace view my_view as
select 'schema_1' source_schema, id, name from schema_1.table union
select 'schema_2' source_schema, id, name from schema_2.table union
...
-- select all
select * from my_view;
-- select all that belongs to one of schemas
select * from my_view where source_schema = 'schema_1';

The number of columns in the two selected tables or queries of a Union query do not match

I have been facing error in MS Access and error is "The number of columns in the two selected tables or queries of a Union query do not match."
Here is my SQL query:
SELECT sale_head.suppliername AS sale_head_suppliername,
sale_head.invoiceno AS sale_head_invoiceno, sale_head.invoicedate,
sale_details.invoiceno AS sale_details_invoiceno, sale_details.suppliername AS sale_details_suppliername,
sale_details.product_code, sale_details.qty, sale_details.totalkg, sale_details.Rate, sale_details.subtotal FROM sale_head
INNER JOIN sale_details ON sale_head.[invoiceno] = sale_details.[invoiceno]
UNION ALL select 'Total', sum(sale_details.subtotal) from sale_details
WHERE (((sale_head.suppliername)='Ramkrishna Creation'));
Am I missing something ? If yes please do let me know.
When you union two or more queries together each query should have the same columns of data with same data type for example :
SELECT Name,LastName,SUM(Salary) FROM tabel1
UNION
SELECT Text1,Text2,SomeMoney FROM table2
is valid (assuming that Name and Text1,LastName and Text2 and Sum of salary and SomeMoney have the same data type but :
SELECT Name,LastName,SUM(Salary) FROM tabel1
UNION
SELECT Text1,SomeMoney FROM table2
(cloumns count mismatch)or
SELECT Name,LastName,SUM(Salary) FROM tabel1
UNION
SELECT Text1,SomeMoney,Text2 FROM table2
(data type mismatch)are not valid union statements.
UPDATE : My answer is according to SQL Standard Definition of Union Statement which states :
The UNION operator is used to combine the
result-set of two or more SELECT statements.
Notice that each SELECT statement within the UNION must have the same
number of columns. The columns must also have similar data types.
Also, the columns in each SELECT statement must be in the same order.
In a UNION, both datasets must have the same number of columns but they don't need to be the same datatype
All queries in a UNION operation must request the same number of fields; however, the fields do not have to be of the same size or data type.
UNION Operation (Microsoft Access SQL)

SQL Server : compare two tables with UNION and Select * plus additional label column

I've been playing around with the sample on Jeff' Server blog to compare two tables to find the differences.
In my case the tables are a backup and the current data. I can get what I want with this SQL statement (simplified by removing most of the columns). I can then see the rows from each table that don't have an exact match and I can see from which table they come.
SELECT
MIN(TableName) as TableName
,[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
FROM
(SELECT
'Old' as TableName
,[JAS001].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001].[dbo].[AR_CustomerAddresses].[strPostalCode]
FROM
[JAS001].[dbo].[AR_CustomerAddresses]
UNION ALL
SELECT
'New' as TableName
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strPostalCode]
FROM
[JAS001new].[dbo].[AR_CustomerAddresses]) tmp
GROUP BY
[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
HAVING
COUNT(*) = 1
This Stack Overflow Answer gives me a much cleaner SQL query but does not tell me from which table the rows come.
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
UNION
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
INTERSECT
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
I could use the first version but I have many tables that I need to compare and I think that there has to be an easy way to add the source table column to the second query. I've tried several things and googled to no avail. I suspect that maybe I'm just not searching for the correct thing since I'm sure it's been answered before.
Maybe I'm going down the wrong trail and there is a better way to compare the databases?
Could you use the following setup to accomplish your goal?
SELECT 'New not in Old' Descriptor, *
FROM
(
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
) a
UNION
SELECT 'Old not in New' Descriptor, *
FROM
(
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
) b
You can't add the table name there because union, except, and intersection all compare all columns. This means you can't differentiate between them by adding the table name to the query. A group by gives you control over what columns are considered in finding duplicates so you can exclude the table name.
To help you with the large number of tables you need to compare you could write a sql query off the metadata tables that hold table names and columns and generate the sql commands dynamically off those values.
Derive one column using table names like below
SELECT MIN(TableName) as TableName
,[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
,table_name_came
FROM
(SELECT 'Old' as TableName
,[JAS001].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001].[dbo].[AR_CustomerAddresses].[strPostalCode]
,'[JAS001].[dbo].[AR_CustomerAddresses]' as table_name_came
FROM [JAS001].[dbo].[AR_CustomerAddresses]
UNION ALL
SELECT 'New' as TableName
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strPostalCode]
,'[JAS001new].[dbo].[AR_CustomerAddresses]' as table_name_came
FROM [JAS001new].[dbo].[AR_CustomerAddresses]
) tmp
GROUP BY [strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
,table_name_came
HAVING COUNT(*) = 1