SQL join for Loop? - sql

I have beginner knowledge on SQL and I am wondering whether this is possible in SQL.
SQL query 1 >>
select distinct(id) as active_pod from schema_naming
Query 1 output >>
active_pod
DB_1
DB_2
...
DB_20
SQL query 2 >>
select * from DB_1.mapping UNION
select * from DB_2.mapping UNION
....
select * from DB_20.mapping UNION
Due to my limited knowledge on SQL, I'm currently running #1 query first and change DB_1, DB2,.. DB_20 in query 2 everytime and run #2.
However, I was wondering whether there's way to this in one query so I don't have to manually change DB number in the #2 query and don't have to union every line.
something like this..(but not sure what to do with union)
select * from {
select distinct id from schema_naming}.user_map
It will be great if someone can shed light on this. (I'm trying to do this on Oracle SQL)
thank you in advance.

Are you trying to get something like this?
SELECT 'SELECT * FROM ' || active_pod || '.' || 'Mapping UNION'
FROM
(
select distinct(id) as active_pod from schema_naming
) as DT;

Alternatively, use PL/SQL block:
BEGIN
For i in (SELECT 'SELECT * FROM ' || ACTIVE_POD || '.MAPPING UNION' AS QUERY
FROM SCHEMA_NAMING) loop
dbms_output.put_line(i.query);
end loop;
END
Your queries will appear in the output window on your IDE.

This is a definitely a hack but it might make your life easier until a better solution is proposed. Basically use a query to generate your 2nd query, only manual edit needed would to remove the unecessary UNION on the final line.
SELECT 'SELECT * FROM ' || ACTIVE_POD || '.MAPPING UNION' AS QUERY
FROM SCHEMA_NAMING
Results:
SELECT * FROM DB_1.MAPPING UNION
SELECT * FROM DB_2.MAPPING UNION
SELECT * FROM DB_3.MAPPING UNION
SELECT * FROM DB_4.MAPPING UNION
SELECT * FROM DB_5.MAPPING UNION
SELECT * FROM DB_6.MAPPING UNION
SELECT * FROM DB_7.MAPPING UNION
SELECT * FROM DB_8.MAPPING UNION
SELECT * FROM DB_9.MAPPING UNION
SELECT * FROM DB_10.MAPPING UNION
SELECT * FROM DB_11.MAPPING UNION
SELECT * FROM DB_12.MAPPING UNION
SELECT * FROM DB_13.MAPPING UNION
SELECT * FROM DB_14.MAPPING UNION
SELECT * FROM DB_15.MAPPING UNION
SELECT * FROM DB_16.MAPPING UNION
SELECT * FROM DB_17.MAPPING UNION
SELECT * FROM DB_18.MAPPING UNION
SELECT * FROM DB_19.MAPPING UNION
SELECT * FROM DB_20.MAPPING UNION

Related

Using REGXP_LIKE within queryExecute()

I'm looking to modify an existing query to use REGEXP_LIKE but falling foul of some syntax I'm not understanding properly. We've currently a CF query into an Oracle DB using the following:
result = QueryExecute("
SELECT paramOne, paramTwo FROM someTable WHERE fieldOne = :PUBLISHER
", {PUBLISHER=publisherId}, {datasource="someDB"});
which works. However, I want to modify the underlying query to be:
result = QueryExecute("
SELECT paramOne, paramTwo FROM someTable WHERE REGEXP_LIKE(fieldOne, '(^|,)(:PUBLISHER)($|,)', 'i')
", {PUBLISHER=publisherId}, {datasource="someDB"});
but it's not delivering the expected results. A few things I've noted as I try to debug...
The underlying query(without using a variable) works and has been verified in Oracle SQL
If I go to the source code and replace :PUBLISHER with a 'hard coded' value the things work as expected.
I've tried escaping the ':' but that's not the answer.
I fell there's something I'm not understanding about passing variables into a REGEX expression within queryExecute(), so would appreciate any thoughts.
Any input gratefully received,
Phil
SQL>
with t (fieldOne) as (
select 'abc, def' from dual union all
select 'def cba' from dual union all
select ':publisher' from dual
)
select * from t where REGEXP_LIKE(fieldOne, '(^|,)(:PUBLISHER)($|,)', 'i');
FIELDONE
----------
:publisher
SQL>
with t (fieldOne) as (
select 'abc, def' from dual union all
select 'def cba' from dual union all
select ':publisher' from dual
)
select * from t where REGEXP_LIKE(fieldOne, '(^|,)(' || :PUBLISHER || ')($|,)', 'i');
FIELDONE
----------
abc, def

BigQuery select __TABLES__ from all tables within project?

Using BigQuery, is there a way I can select __TABLES__ from every dataset within my project? I've tried SELECT * FROM '*.__TABLES' but that is not allowed within BigQuery. Any help would be great, thanks!
You can use this SQL query to generate the list of dataset in your project:
select string_agg(
concat("select * from `[PROJECT ID].", schema_name, ".__TABLES__` ")
, "union all \n"
)
from `[PROJECT ID]`.INFORMATION_SCHEMA.SCHEMATA;
You will have this list:
select * from `[PROJECT ID].[DATASET ID 1].__TABLES__` union all
select * from `[PROJECT ID].[DATASET ID 2].__TABLES__` union all
select * from `[PROJECT ID].[DATASET ID 3].__TABLES__` union all
select * from `[PROJECT ID].[DATASET ID 4].__TABLES__`
...
Then put the list within this query:
SELECT
table_id
,DATE(TIMESTAMP_MILLIS(creation_time)) AS creation_date
,DATE(TIMESTAMP_MILLIS(last_modified_time)) AS last_modified_date
,row_count
,size_bytes
,round(safe_divide(size_bytes, (1000*1000)),1) as size_mb
,round(safe_divide(size_bytes, (1000*1000*1000)),2) as size_gb
,CASE
WHEN type = 1 THEN 'table'
WHEN type = 2 THEN 'view'
WHEN type = 3 THEN 'external'
ELSE '?'
END AS type
,TIMESTAMP_MILLIS(creation_time) AS creation_time
,TIMESTAMP_MILLIS(last_modified_time) AS last_modified_time
,FORMAT_TIMESTAMP("%Y-%m", TIMESTAMP_MILLIS(last_modified_time)) as last_modified_month
,dataset_id
,project_id
FROM
(
select * from `[PROJECT ID].[DATASET ID 1].__TABLES__` union all
select * from `[PROJECT ID].[DATASET ID 2].__TABLES__` union all
select * from `[PROJECT ID].[DATASET ID 3].__TABLES__` union all
select * from `[PROJECT ID].[DATASET ID 4].__TABLES__`
)
ORDER BY dataset_id, table_id asc
__TABLES__ syntax is supported only for specific dataset and does not work across datasets
What you can do is something as below
#standardSQL
WITH ALL__TABLES__ AS (
SELECT * FROM `bigquery-public-data.1000_genomes.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.baseball.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.bls.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.census_bureau_usa.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.cloud_storage_geo_index.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.cms_codes.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.common_us.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.fec.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.genomics_cannabis.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.ghcn_d.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.ghcn_m.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.github_repos.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.hacker_news.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.irs_990.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.medicare.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.new_york.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.nlm_rxnorm.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.noaa_gsod.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.open_images.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.samples.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.san_francisco.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.stackoverflow.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.usa_names.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.utility_us.__TABLES__`
)
SELECT *
FROM ALL__TABLES__
In this case you need to know in advance list of datasets, which you can easily do via Datasets: list API or using respective bq ls
Please note: above approach will work only for datasets with data in same location. If you have datasets with data in different locations you will need to query them in two different queries
For example:
#standardSQL
WITH ALL_EU__TABLES__ AS (
SELECT * FROM `bigquery-public-data.common_eu.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.utility_eu.__TABLES__`
)
SELECT *
FROM ALL_EU__TABLES__
I know that you ask for using BigQuery, but I did a Python Script to get this information that you are asking for, maybe it is can help another coders:
Pip install:
!pip install google-cloud
!pip install google-api-python-client
!pip install oauth2client
Code:
import subprocess
import sys
import threading
from google.cloud import bigquery
def _worker_query(project, dataset_id, results_scan ):
query_str = 'SELECT * FROM `{}.{}.__TABLES__`'.format(project, dataset_id )
QUERY = (query_str)
query_job = client.query(QUERY)
rows = query_job.result()
count=0;
for row in rows:
count = count+1
results_scan.append({'dataset_id':dataset_id, 'count':count})
def main_execute():
project = 'bigquery-public-data'
dataset = client.list_datasets(project)
count = 0
threads_project = []
results_scan = []
for d in dataset:
t = threading.Thread(target=_worker_query, args=(project,d.dataset_id, results_scan))
threads_project.append(t)
t.start()
for t in threads_project:
t.join()
total_count = 0
for result in results_scan:
print(result)
total_count = total_count + result['count']
print('\n\nTOTAL TABLES: "{}"'.format(total_count))
JSON_FILE_NAME = 'sa_bq.json'
client = bigquery.Client.from_service_account_json(JSON_FILE_NAME)
main_execute()
You can extend Mikhail Berlyant's answer and automatically generate the SQL with a single query.
The INFORMATION_SCHEMA.SCHEMATA lists all the datasets. You can use a WHILE loop to generate all of the UNION ALL statements dynamically, like so:
DECLARE schemas ARRAY<string>;
DECLARE query string;
DECLARE i INT64 DEFAULT 0;
DECLARE arrSize INT64;
SET schemas = ARRAY(select schema_name from <your_project>.INFORMATION_SCHEMA.SCHEMATA);
SET query = "SELECT * FROM (";
SET arrSize = ARRAY_LENGTH(schemas);
WHILE i < arrSize - 1 DO
SET query = CONCAT(query, "SELECT '", schemas[OFFSET(i)], "', table_ID, row_count, size_bytes from <your project>.", schemas[OFFSET(i)], '.__TABLES__ UNION ALL ');
SET i = i + 1;
END WHILE;
SET query = CONCAT(query, "SELECT '", schemas[ORDINAL(arrSize)], "', table_ID, row_count, size_bytes from <your project>.", schemas[ORDINAL(arrSize)], '.__TABLES__` )');
EXECUTE IMMEDIATE query;
Building on #mikhail-berlyant 's nice solution above it's now possible to take advantage of BigQuery's scripting features to automate gathering the list of dataset and retreiving table metadata. Simply replace the *_name variables to generate a view of metadata for all of your tables in a given project.
DECLARE project_name STRING;
DECLARE dataset_name STRING;
DECLARE table_name STRING;
DECLARE view_name STRING;
DECLARE generate_metadata_query_for_all_datasets STRING;
DECLARE retrieve_table_metadata STRING;
DECLARE persist_table_metadata STRING;
DECLARE create_table_metadata_view STRING;
SET project_name = "your-project";
SET dataset_name = "your-dataset";
SET table_name = "your-table";
SET view_name = "your-view";
SET generate_metadata_query_for_all_datasets = CONCAT("SELECT STRING_AGG( CONCAT(\"select * from `",project_name,".\", schema_name, \".__TABLES__` \"), \"union all \\n\" ) AS datasets FROM `",project_name,"`.INFORMATION_SCHEMA.SCHEMATA");
SET
retrieve_table_metadata = generate_metadata_query_for_all_datasets;
SET create_table_metadata_view = CONCAT(
"""
CREATE VIEW IF NOT EXISTS
`""",project_name,".",dataset_name,".",view_name,"""`
AS
SELECT
project_id
,dataset_id
,table_id
,DATE(TIMESTAMP_MILLIS(creation_time)) AS created_date
,TIMESTAMP_MILLIS(creation_time) AS created_at
,DATE(TIMESTAMP_MILLIS(last_modified_time)) AS last_modified_date
,TIMESTAMP_MILLIS(last_modified_time) AS last_modified_at
,row_count
,size_bytes
,round(safe_divide(size_bytes, (1000*1000)),1) as size_mb
,round(safe_divide(size_bytes, (1000*1000*1000)),2) as size_gb
,CASE
WHEN type = 1 THEN 'native table'
WHEN type = 2 THEN 'view'
WHEN type = 3 THEN 'external table'
ELSE 'unknown'
END AS type
FROM `""",project_name,".",dataset_name,".",table_name,"""`
ORDER BY dataset_id, table_id asc""");
EXECUTE IMMEDIATE retrieve_table_metadata INTO persist_table_metadata;
EXECUTE IMMEDIATE CONCAT("CREATE OR REPLACE TABLE `",project_name,".",dataset_name,".",table_name,"` AS (",persist_table_metadata,")");
EXECUTE IMMEDIATE create_table_metadata_view;
After that you can query your new view.
SELECT * FROM `[PROJECT ID].[DATASET ID].[VIEW NAME]`
Maybe you can use INFORMATION_SCHEMA instead of TABLES:
SELECT * FROM region-us.INFORMATION_SCHEMA.TABLES;
Just replace region-us for the region where your datasets are.
If you have more than a region, you'll need to use UNION ALL.. but it's simpler than using UNION for all the datasets.
Or you can use a query to get all the unions all, like this:
With SelectTable AS (
SELECT 1 AS ID,'SELECT * FROM '|| table_schema ||'.__TABLES__ UNION ALL' AS SelectColumn FROM region-us.INFORMATION_SCHEMA.TABLES
GROUP BY table_schema
)
Select STRING_AGG(SelectColumn,'\n') FROM SelectTable
GROUP BY ID
slight changes referring to > #Dinh Tran answer:
#!/bin/bash
project_name="abc-project-name"
echo -e "project_id,dataset_id,table_id,row_count,size_mb,size_gb,type,partiton,partition_expiration_days,cluster_key" > /tmp/bq_out.csv
for dataset in $(bq ls|tail -n +3); do
bq query --format=csv --use_legacy_sql=false '
SELECT
t1.project_id as project_id,
t1.dataset_id as dataset_id ,
t1.table_id as table_id,
t1.row_count as row_count,
round(safe_divide(t1.size_bytes, (1000*1000)),1) as size_mb,
round(safe_divide(t1.size_bytes, (1000*1000*1000)),2) as size_gb,
case
when t1.type = 1 then "table"
when t1.type = 2 then "view"
when t1.type = 3 then "external"
else "?"
END AS type,
case
when t2.ddl like "%PARTITION BY%" then "Yes"
else "No"
end as partiton,
REGEXP_EXTRACT(t2.ddl, r".*partition_expiration_days=([0-9-].*)") as partition_expiration_days,
REGEXP_EXTRACT(t2.ddl, r"CLUSTER BY(.*)") as cluster_key,
FROM `'"${project_name}"'.'"${dataset}"'.__TABLES__` as t1,`'"${project_name}"'.'"${dataset}"'.INFORMATION_SCHEMA.TABLES` as t2
where t1.table_id=t2.table_name' | sed "1d" >> /tmp/bq_out.csv
done
Mikhail Berlyant's answer is very good. I would like to add there's a cleaner way to use in some cases.
So, if you have only one dataset, the tables are within the same dataset and they follow a pattern, you could query them using a wildcard table.
Let's say you want to query the noaa_gsod dataset (its tables have the following names gsod1929, gsod1930, ... 2018, 2019), then simply use
FROM
`bigquery-public-data.noaa_gsod.gsod*`
This is going to match all tables in the noaa_gsod dataset that begin with the string gsod.

sql select on multiple db's

I got around 18 db's. All these db's have the same structure. I want to query all these db's once to get my results.
Example:
ShopA
ShopB
ShopC
These db's got all the table article (and also the same rows).
How do I get all articles in one result with a WHERE?
I thought:
select *
from shopa.dbo.article
shopb.dbo.article
shopc.dbo.article
where color = 'red'
Did someone got an idea?
Have you considered doing a UNION ALL?
So something like:
SELECT 'a' AS Shop, *
FROM shopa.dbo.article
WHERE color = 'red'
UNION ALL
SELECT 'b' AS Shop, *
FROM shopb.dbo.article
WHERE color = 'red'
UNION ALL
SELECT 'c' AS Shop, *
FROM shopc.dbo.article
WHERE color = 'red'
Or, with a CTE (if you RDBMS supports it)
;WITH allstores AS (
SELECT 'a' AS Shop, *
FROM shopa.dbo.article
UNION ALL
SELECT 'b' AS Shop, *
FROM shopb.dbo.article
UNION ALL
SELECT 'c' AS Shop, *
FROM shopc.dbo.article
)
SELECT *
FROM allstores
WHERE color = 'red'
you could use UNION
if you can simply select the db names you could also use a cursor select with OPENQUERY on a dynamically created string insert into a temp table and select from that
You can create a View wich is populated from your select as this:
CREATE VIEW view_name AS
SELECT * FROM shopa.dbo.article
UNION
SELECT * FROM shopb.dbo.article
UNION
SELECT * FROM shopc.dbo.article
Then you can try to run a query by the View
Select * from view_name
where color = 'red'
Then if you want write another query with another condition, you don't write another big query with union or other code.
You can just write a query on a VIEW

Using UNION with Sequel

I want to define a SQL-command like this:
SELECT * FROM WOMAN
UNION
SELECT * FROM MEN
I tried to define this with the following code sequence in Ruby + Sequel:
require 'sequel'
DB = Sequel::Database.new()
sel = DB[:women].union(DB[:men])
puts sel.sql
The result is (I made some pretty print on the result):
SELECT * FROM (
SELECT * FROM `women`
UNION
SELECT * FROM `men`
) AS 't1'
There is an additional (superfluous?) SELECT.
If I define multiple UNION like in this code sample
sel = DB[:women].union(DB[:men]).union(DB[:girls]).union(DB[:boys])
puts sel.sql
I get more superfluous SELECTs.
SELECT * FROM (
SELECT * FROM (
SELECT * FROM (
SELECT * FROM `women`
UNION
SELECT * FROM `men`
) AS 't1'
UNION
SELECT * FROM `girls`
) AS 't1'
UNION
SELECT * FROM `boys`
) AS 't1'
I detected no problem with it up to now, the results seem to be the same.
My questions:
Is there a reason for the additional selects (beside sequel internal procedures)
Can I avoid the selects?
Can I get problems with this additional selects? (Any Performance issue?)
The reason for the extra SELECTs is so code like DB[:girls].union(DB[:boys]).where(:some_column=>1) operates properly. You can use DB[:girls].union(DB[:boys], :from_self=>false) to not wrap it in the extra SELECTs, as mentioned in the documentation.

How can I treat a UNION query as a sub query

I have a set of tables that are logically one table split into pieces for performance reasons. I need to write a query that effectively joins all the tables together so I use a single where clause of the result. I have successfully used a UNION on the result of using the WHERE clause on each subtable explicitly as in the following
SELECT * FROM FRED_1 WHERE CHARLIE = 42
UNION
SELECT * FROM FRED_2 WHERE CHARLIE = 42
UNION
SELECT * FROM FRED_3 WHERE CHARLIE = 42
but as there are ten separate subtables updating the WHERE clause each time is a pain. What I want is something like this
SELECT *
FROM (
SELECT * FROM FRED_1
UNION
SELECT * FROM FRED_2
UNION
SELECT * FROM FRED_3)
WHERE CHARLIE = 42
If it makes a difference the query needs to run against a DB2 database.
Here is a more comprehensive (sanitised) version of what I need to do.
select *
from ( select * from FRD_1 union select * from FRD_2 union select * from FRD_3 ) as FRD,
( select * from REQ_1 union select * from REQ_2 union select * from REQ_3 ) as REQ,
( select * from RES_1 union select * from RES_2 union select * from RES_3 ) as RES
where FRD.KEY1 = 123456
and FRD.KEY1 = REQ.KEY1
and FRD.KEY1 = RES.KEY1
and REQ.KEY2 = RES.KEY2
NEW INFORMATION:
It looks like the problem has more to do with the number of fields in the union than anything else. If I greatly restrict the fields I can get most of the syntax variations below working. Unfortunately, restricting the fields so much means the resulting query, while potentially useful, is not giving me the result I wanted. I've managed to get an additional 3 fields from one of the tables in addition to the 2 keys. Any more than that and the query fails.
I believe you have to give a name to your subquery result. I don't know db2 so I'm taking a shot in the dark, but I know this works on several other platforms.
SELECT *
FROM (
SELECT * FROM FRED_1
UNION
SELECT * FROM FRED_2
UNION
SELECT * FROM FRED_3) AS T1
WHERE CHARLIE = 42
If the logical implementation is a single table but the physical implementation is multiple tables then how about creating a view that defines the logical model.
CREATE VIEW VW_FRED AS
SELECT * FROM FRED_1
UNION
SELECT * FROM FRED_2
UNION
SELECT * FROM FRED_3
then it's a simple matter of
SELECT * FROM VW_FRED WHERE CHARLIE = 42
Again, I'm not familiar with db2 syntax but this gives you the general idea.
with
FRD as ( select * from FRD_1 union select * from FRD_2 union select * from FRD_3 ),
REQ as ( select * from REQ_1 union select * from REQ_2 union select * from REQ_3 ),
RES as ( select * from RES_1 union select * from RES_2 union select * from RES_3 )
SELECT * from FRD, REQ, RES
WHERE FRD.KEY1 = 123456
and FRD.KEY1 = REQ.KEY1
and FRD.KEY1 = RES.KEY1
and REQ.KEY2 = RES.KEY2
I'm not familiar with DB2 syntax but why aren't you doing this as an INNER JOIN or LEFT JOIN?
SELECT *
FROM FRED_1
INNER JOIN FRED_2
ON FRED_1.Charlie = FRED_2.Charlie
INNER JOIN FRED_3
ON FRED_1.Charlie = FRED_3.Charlie
WHERE FRED_1.Charlie = 42
If the values don't exist in FRED_2 or FRED_3 then use a LEFT/OUTER JOIN. I'm assuming that FRED_1 is a master table, and if a record exists then it will be in this table.
maybe:
SELECT * FROM
(select * from FRD_1
union
select * from FRD_2
union
select * from FRD_3) FRD
INNER JOIN (select * from REQ_1 union select * from REQ_2 union select * from REQ_3) REQ
on FRD.KEY1 = REQ.KEY1
INNER JOIN (select * from RES_1 union select * from RES_2 union select * from RES_3) RES
on FRD.KEY1 = RES.KEY1
WHERE FRD.KEY1 = 123456 and REQ.KEY2 = RES.KEY2