Multiple Linked Servers in one select statement with one where clause, possible? - sql

Got a tricky one today (Might even just be me):
I have 8 Linked SQL 2012 servers configured to my main SQL server and I need to create table views so that I can filter all these combined table results only using one where clause, currently I use UNION because they all have the same table structures.
Currently my solution looks as follows:
SELECT * FROM [LinkedServer_1].[dbo].[Table] where value = 'xxx'
UNION
SELECT * FROM [LinkedServer_2].[dbo].[Table] where value = 'xxx'
UNION
SELECT * FROM [LinkedServer_3].[dbo].[Table] where value = 'xxx'
UNION
SELECT * FROM [LinkedServer_4].[dbo].[Table] where value = 'xxx'
UNION
SELECT * FROM [LinkedServer_5].[dbo].[Table] where value = 'xxx'
UNION
SELECT * FROM [LinkedServer_6].[dbo].[Table] where value = 'xxx'
UNION
SELECT * FROM [LinkedServer_7].[dbo].[Table] where value = 'xxx'
UNION
SELECT * FROM [LinkedServer_8].[dbo].[Table] where value = 'xxx'
As you can see this is becoming quite ugly because I have a select statement and where clause for each linked server and would like to know if there was a simpler way of doing this!
Appreciate the feedback.
Brakkie101

Instead of using views, you can use inline table-valued functions (a view with parameters). It will not save initial efforts for creating the queries, but could save some work in the future:
CREATE FUNCTION [dbo].[fn_LinkedSever] (#value NVARCHAR(128))
AS
RETURNS TABLE
AS
RETURN
(
SELECT * FROM [LinkedServer_1].[dbo].[Table] where value = #value
UNION
SELECT * FROM [LinkedServer_2].[dbo].[Table] where value = #value
UNION
SELECT * FROM [LinkedServer_3].[dbo].[Table] where value = #value
UNION
SELECT * FROM [LinkedServer_4].[dbo].[Table] where value = #value
UNION
SELECT * FROM [LinkedServer_5].[dbo].[Table] where value = #value
UNION
SELECT * FROM [LinkedServer_6].[dbo].[Table] where value = #value
UNION
SELECT * FROM [LinkedServer_7].[dbo].[Table] where value = #value
UNION
SELECT * FROM [LinkedServer_8].[dbo].[Table] where value = #value
);
Also, if possible, use UNION ALL instead of UNION.

Related

SQL - Run Select Statement Based On Where Query

Hi i want to create a query which does the following. When the paramter 25 is selected it only runs part A of the query, if any other number is selected run both Table A and B select queries.
Example Below:
DECLARE #Type varchar (200)
select * from
(SELECT sort_code FROM dbo.Test 1
WHERE FUNDING_YEAR = 26)
union
(SELECT sort_code FROM dbo.Test 2
WHERE FUNDING_YEAR = 26)
Where case when #Type = 25 then select * from table2 else table 1
You just need to reference the variable in the WHERE clause
SELECT *
FROM TableA
WHERE #Type = 25
UNION
SELECT *
FROM TableB
The query above will always select everything in TableB and will only select everything in TableA when the variable is equal to 25.
Since you are using SSRS, what I would do is write the query to return all of the rows and then apply a filter in the SSRS report when the Paramater is 25. I wouldn't pass a paramater value to the SQL side unless it greatly reduces the run time of the query.
(I would have put this in a comment.)

SQL Conditional Union Based On Parameter

I have this query which takes in 3 parameters, StrParameter1 being the one which is key to what I'm trying to accomplish.
SELECT ai.strName,
ai.strId,
ai.lngBKey,
ci.strFormattedId,
ci.lngAKey,
ts.dtmPeriod,
ts.strT2Type,
ts.strImpact,
ts.curAmount,
ts.curBalance
FROM TBlCInfo ci,
tblAInfo ai,
tblTStage ts
WHERE ci.lngAKey = ai.lngAKey
AND ai.lngBKey = ts.lngBKey
AND ts.lngVersion = 0
AND ts.blnReversed = 0
AND ai.strType = #StrParameter1
AND ts.dtmPeriod >= #DtmParameter2
AND ts.dtmPeriod <= #DtmParameter3
I'd like to union this query with another, but only if StrParameter1 equals let's say, "AAAA". Otherwise, I only want the top portion of the query to fire. There's like 30 other circumstances where I would not need the union, but only 1 where it's needed.
Query 1
If StrParameter1 = 'AAAA' Then
UNION
QUERY 2
You can use below pattern to do what you are looking for
declare #param varchar(4) = 'AAAA'
select *
from MyTable
union
select *
from MyOtherTable
where #param = 'AAAA'
In this case if the #param is 'AAAA', the union will take in to place and if it is not, the second query will not return any result.
So this way you don't need to change your queries.

BigQuery select __TABLES__ from all tables within project?

Using BigQuery, is there a way I can select __TABLES__ from every dataset within my project? I've tried SELECT * FROM '*.__TABLES' but that is not allowed within BigQuery. Any help would be great, thanks!
You can use this SQL query to generate the list of dataset in your project:
select string_agg(
concat("select * from `[PROJECT ID].", schema_name, ".__TABLES__` ")
, "union all \n"
)
from `[PROJECT ID]`.INFORMATION_SCHEMA.SCHEMATA;
You will have this list:
select * from `[PROJECT ID].[DATASET ID 1].__TABLES__` union all
select * from `[PROJECT ID].[DATASET ID 2].__TABLES__` union all
select * from `[PROJECT ID].[DATASET ID 3].__TABLES__` union all
select * from `[PROJECT ID].[DATASET ID 4].__TABLES__`
...
Then put the list within this query:
SELECT
table_id
,DATE(TIMESTAMP_MILLIS(creation_time)) AS creation_date
,DATE(TIMESTAMP_MILLIS(last_modified_time)) AS last_modified_date
,row_count
,size_bytes
,round(safe_divide(size_bytes, (1000*1000)),1) as size_mb
,round(safe_divide(size_bytes, (1000*1000*1000)),2) as size_gb
,CASE
WHEN type = 1 THEN 'table'
WHEN type = 2 THEN 'view'
WHEN type = 3 THEN 'external'
ELSE '?'
END AS type
,TIMESTAMP_MILLIS(creation_time) AS creation_time
,TIMESTAMP_MILLIS(last_modified_time) AS last_modified_time
,FORMAT_TIMESTAMP("%Y-%m", TIMESTAMP_MILLIS(last_modified_time)) as last_modified_month
,dataset_id
,project_id
FROM
(
select * from `[PROJECT ID].[DATASET ID 1].__TABLES__` union all
select * from `[PROJECT ID].[DATASET ID 2].__TABLES__` union all
select * from `[PROJECT ID].[DATASET ID 3].__TABLES__` union all
select * from `[PROJECT ID].[DATASET ID 4].__TABLES__`
)
ORDER BY dataset_id, table_id asc
__TABLES__ syntax is supported only for specific dataset and does not work across datasets
What you can do is something as below
#standardSQL
WITH ALL__TABLES__ AS (
SELECT * FROM `bigquery-public-data.1000_genomes.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.baseball.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.bls.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.census_bureau_usa.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.cloud_storage_geo_index.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.cms_codes.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.common_us.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.fec.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.genomics_cannabis.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.ghcn_d.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.ghcn_m.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.github_repos.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.hacker_news.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.irs_990.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.medicare.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.new_york.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.nlm_rxnorm.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.noaa_gsod.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.open_images.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.samples.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.san_francisco.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.stackoverflow.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.usa_names.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.utility_us.__TABLES__`
)
SELECT *
FROM ALL__TABLES__
In this case you need to know in advance list of datasets, which you can easily do via Datasets: list API or using respective bq ls
Please note: above approach will work only for datasets with data in same location. If you have datasets with data in different locations you will need to query them in two different queries
For example:
#standardSQL
WITH ALL_EU__TABLES__ AS (
SELECT * FROM `bigquery-public-data.common_eu.__TABLES__` UNION ALL
SELECT * FROM `bigquery-public-data.utility_eu.__TABLES__`
)
SELECT *
FROM ALL_EU__TABLES__
I know that you ask for using BigQuery, but I did a Python Script to get this information that you are asking for, maybe it is can help another coders:
Pip install:
!pip install google-cloud
!pip install google-api-python-client
!pip install oauth2client
Code:
import subprocess
import sys
import threading
from google.cloud import bigquery
def _worker_query(project, dataset_id, results_scan ):
query_str = 'SELECT * FROM `{}.{}.__TABLES__`'.format(project, dataset_id )
QUERY = (query_str)
query_job = client.query(QUERY)
rows = query_job.result()
count=0;
for row in rows:
count = count+1
results_scan.append({'dataset_id':dataset_id, 'count':count})
def main_execute():
project = 'bigquery-public-data'
dataset = client.list_datasets(project)
count = 0
threads_project = []
results_scan = []
for d in dataset:
t = threading.Thread(target=_worker_query, args=(project,d.dataset_id, results_scan))
threads_project.append(t)
t.start()
for t in threads_project:
t.join()
total_count = 0
for result in results_scan:
print(result)
total_count = total_count + result['count']
print('\n\nTOTAL TABLES: "{}"'.format(total_count))
JSON_FILE_NAME = 'sa_bq.json'
client = bigquery.Client.from_service_account_json(JSON_FILE_NAME)
main_execute()
You can extend Mikhail Berlyant's answer and automatically generate the SQL with a single query.
The INFORMATION_SCHEMA.SCHEMATA lists all the datasets. You can use a WHILE loop to generate all of the UNION ALL statements dynamically, like so:
DECLARE schemas ARRAY<string>;
DECLARE query string;
DECLARE i INT64 DEFAULT 0;
DECLARE arrSize INT64;
SET schemas = ARRAY(select schema_name from <your_project>.INFORMATION_SCHEMA.SCHEMATA);
SET query = "SELECT * FROM (";
SET arrSize = ARRAY_LENGTH(schemas);
WHILE i < arrSize - 1 DO
SET query = CONCAT(query, "SELECT '", schemas[OFFSET(i)], "', table_ID, row_count, size_bytes from <your project>.", schemas[OFFSET(i)], '.__TABLES__ UNION ALL ');
SET i = i + 1;
END WHILE;
SET query = CONCAT(query, "SELECT '", schemas[ORDINAL(arrSize)], "', table_ID, row_count, size_bytes from <your project>.", schemas[ORDINAL(arrSize)], '.__TABLES__` )');
EXECUTE IMMEDIATE query;
Building on #mikhail-berlyant 's nice solution above it's now possible to take advantage of BigQuery's scripting features to automate gathering the list of dataset and retreiving table metadata. Simply replace the *_name variables to generate a view of metadata for all of your tables in a given project.
DECLARE project_name STRING;
DECLARE dataset_name STRING;
DECLARE table_name STRING;
DECLARE view_name STRING;
DECLARE generate_metadata_query_for_all_datasets STRING;
DECLARE retrieve_table_metadata STRING;
DECLARE persist_table_metadata STRING;
DECLARE create_table_metadata_view STRING;
SET project_name = "your-project";
SET dataset_name = "your-dataset";
SET table_name = "your-table";
SET view_name = "your-view";
SET generate_metadata_query_for_all_datasets = CONCAT("SELECT STRING_AGG( CONCAT(\"select * from `",project_name,".\", schema_name, \".__TABLES__` \"), \"union all \\n\" ) AS datasets FROM `",project_name,"`.INFORMATION_SCHEMA.SCHEMATA");
SET
retrieve_table_metadata = generate_metadata_query_for_all_datasets;
SET create_table_metadata_view = CONCAT(
"""
CREATE VIEW IF NOT EXISTS
`""",project_name,".",dataset_name,".",view_name,"""`
AS
SELECT
project_id
,dataset_id
,table_id
,DATE(TIMESTAMP_MILLIS(creation_time)) AS created_date
,TIMESTAMP_MILLIS(creation_time) AS created_at
,DATE(TIMESTAMP_MILLIS(last_modified_time)) AS last_modified_date
,TIMESTAMP_MILLIS(last_modified_time) AS last_modified_at
,row_count
,size_bytes
,round(safe_divide(size_bytes, (1000*1000)),1) as size_mb
,round(safe_divide(size_bytes, (1000*1000*1000)),2) as size_gb
,CASE
WHEN type = 1 THEN 'native table'
WHEN type = 2 THEN 'view'
WHEN type = 3 THEN 'external table'
ELSE 'unknown'
END AS type
FROM `""",project_name,".",dataset_name,".",table_name,"""`
ORDER BY dataset_id, table_id asc""");
EXECUTE IMMEDIATE retrieve_table_metadata INTO persist_table_metadata;
EXECUTE IMMEDIATE CONCAT("CREATE OR REPLACE TABLE `",project_name,".",dataset_name,".",table_name,"` AS (",persist_table_metadata,")");
EXECUTE IMMEDIATE create_table_metadata_view;
After that you can query your new view.
SELECT * FROM `[PROJECT ID].[DATASET ID].[VIEW NAME]`
Maybe you can use INFORMATION_SCHEMA instead of TABLES:
SELECT * FROM region-us.INFORMATION_SCHEMA.TABLES;
Just replace region-us for the region where your datasets are.
If you have more than a region, you'll need to use UNION ALL.. but it's simpler than using UNION for all the datasets.
Or you can use a query to get all the unions all, like this:
With SelectTable AS (
SELECT 1 AS ID,'SELECT * FROM '|| table_schema ||'.__TABLES__ UNION ALL' AS SelectColumn FROM region-us.INFORMATION_SCHEMA.TABLES
GROUP BY table_schema
)
Select STRING_AGG(SelectColumn,'\n') FROM SelectTable
GROUP BY ID
slight changes referring to > #Dinh Tran answer:
#!/bin/bash
project_name="abc-project-name"
echo -e "project_id,dataset_id,table_id,row_count,size_mb,size_gb,type,partiton,partition_expiration_days,cluster_key" > /tmp/bq_out.csv
for dataset in $(bq ls|tail -n +3); do
bq query --format=csv --use_legacy_sql=false '
SELECT
t1.project_id as project_id,
t1.dataset_id as dataset_id ,
t1.table_id as table_id,
t1.row_count as row_count,
round(safe_divide(t1.size_bytes, (1000*1000)),1) as size_mb,
round(safe_divide(t1.size_bytes, (1000*1000*1000)),2) as size_gb,
case
when t1.type = 1 then "table"
when t1.type = 2 then "view"
when t1.type = 3 then "external"
else "?"
END AS type,
case
when t2.ddl like "%PARTITION BY%" then "Yes"
else "No"
end as partiton,
REGEXP_EXTRACT(t2.ddl, r".*partition_expiration_days=([0-9-].*)") as partition_expiration_days,
REGEXP_EXTRACT(t2.ddl, r"CLUSTER BY(.*)") as cluster_key,
FROM `'"${project_name}"'.'"${dataset}"'.__TABLES__` as t1,`'"${project_name}"'.'"${dataset}"'.INFORMATION_SCHEMA.TABLES` as t2
where t1.table_id=t2.table_name' | sed "1d" >> /tmp/bq_out.csv
done
Mikhail Berlyant's answer is very good. I would like to add there's a cleaner way to use in some cases.
So, if you have only one dataset, the tables are within the same dataset and they follow a pattern, you could query them using a wildcard table.
Let's say you want to query the noaa_gsod dataset (its tables have the following names gsod1929, gsod1930, ... 2018, 2019), then simply use
FROM
`bigquery-public-data.noaa_gsod.gsod*`
This is going to match all tables in the noaa_gsod dataset that begin with the string gsod.

sql select on multiple db's

I got around 18 db's. All these db's have the same structure. I want to query all these db's once to get my results.
Example:
ShopA
ShopB
ShopC
These db's got all the table article (and also the same rows).
How do I get all articles in one result with a WHERE?
I thought:
select *
from shopa.dbo.article
shopb.dbo.article
shopc.dbo.article
where color = 'red'
Did someone got an idea?
Have you considered doing a UNION ALL?
So something like:
SELECT 'a' AS Shop, *
FROM shopa.dbo.article
WHERE color = 'red'
UNION ALL
SELECT 'b' AS Shop, *
FROM shopb.dbo.article
WHERE color = 'red'
UNION ALL
SELECT 'c' AS Shop, *
FROM shopc.dbo.article
WHERE color = 'red'
Or, with a CTE (if you RDBMS supports it)
;WITH allstores AS (
SELECT 'a' AS Shop, *
FROM shopa.dbo.article
UNION ALL
SELECT 'b' AS Shop, *
FROM shopb.dbo.article
UNION ALL
SELECT 'c' AS Shop, *
FROM shopc.dbo.article
)
SELECT *
FROM allstores
WHERE color = 'red'
you could use UNION
if you can simply select the db names you could also use a cursor select with OPENQUERY on a dynamically created string insert into a temp table and select from that
You can create a View wich is populated from your select as this:
CREATE VIEW view_name AS
SELECT * FROM shopa.dbo.article
UNION
SELECT * FROM shopb.dbo.article
UNION
SELECT * FROM shopc.dbo.article
Then you can try to run a query by the View
Select * from view_name
where color = 'red'
Then if you want write another query with another condition, you don't write another big query with union or other code.
You can just write a query on a VIEW

Using UNION with Sequel

I want to define a SQL-command like this:
SELECT * FROM WOMAN
UNION
SELECT * FROM MEN
I tried to define this with the following code sequence in Ruby + Sequel:
require 'sequel'
DB = Sequel::Database.new()
sel = DB[:women].union(DB[:men])
puts sel.sql
The result is (I made some pretty print on the result):
SELECT * FROM (
SELECT * FROM `women`
UNION
SELECT * FROM `men`
) AS 't1'
There is an additional (superfluous?) SELECT.
If I define multiple UNION like in this code sample
sel = DB[:women].union(DB[:men]).union(DB[:girls]).union(DB[:boys])
puts sel.sql
I get more superfluous SELECTs.
SELECT * FROM (
SELECT * FROM (
SELECT * FROM (
SELECT * FROM `women`
UNION
SELECT * FROM `men`
) AS 't1'
UNION
SELECT * FROM `girls`
) AS 't1'
UNION
SELECT * FROM `boys`
) AS 't1'
I detected no problem with it up to now, the results seem to be the same.
My questions:
Is there a reason for the additional selects (beside sequel internal procedures)
Can I avoid the selects?
Can I get problems with this additional selects? (Any Performance issue?)
The reason for the extra SELECTs is so code like DB[:girls].union(DB[:boys]).where(:some_column=>1) operates properly. You can use DB[:girls].union(DB[:boys], :from_self=>false) to not wrap it in the extra SELECTs, as mentioned in the documentation.