Oracle SQL: single SELECT request for multiple owners - sql

I would like your advice on the best method to use.
From a single Oracle server, we have 3 different owners that contain the exact same tables/data structures. Essentially, the owners allow us to separate the data by administrative regions.
Currently, when I want to do a SQL query on the entire data set (regions), I have to do 3 separate queries:
select * from owner1.Table1 union all
select * from owner2.Table1 union all
select * from owner3.Table1
In this simple example, there are no issues, but if the query is complex, it quickly becomes very difficult to maintain.
So, would there be a more efficient way to make only one global query instead of 3. I guess it's possible to do it via a PL/SQL script, or Dynamic SQL, but I don't know...
Basically, I would like to be able to do (where owners would contain the names of my 3 owners):
select * from owners.Table1
It is not possible to build views that would contain the data of all 3 owners (there would be too many).
Thanks

In this simple example, there are no issues, but if the query is complex, it quickly becomes very difficult to maintain.
So, would there be a more efficient way to make only one global query instead of 3.
Use a sub-query factoring clause (a.k.a. a CTE) to combine the queries with the simple UNION ALL queries and then perform the complex query on the combined table (rather than trying to perform the queries on the individual tables):
WITH subquery_name AS (
select * from owner1.Table1 union all
select * from owner2.Table1 union all
select * from owner3.Table1
)
SELECT <your complex query>
FROM subquery_name;

Related

Count vs select query performance in PostgreSQL

We have a PosgtreSQL DB table with lots of data. I wonder what type of query is faster / has a better performance and why?
select * from table
select count (*) from table
While both queries do iterate over the entire table, the first query will perform generally much worse at an application level versus the second one. This is because the select * query will need to send the entire table across the network to the application which is executing the query. The count(*) query, on the other hand, only needs to send across a single integer count.

Any resources for this SQL filtering?

I have 100 tables each of size of order of few tenths of GB. The schema of each table is the following:
A: string | B: string | C: string
In each table I would like to retain only the rows for which the (B, C) appears at least 10 times in a concatenation of all 100 tables. Is there any efficient way to achieve this?
A very vague question, excluding your DBMS as well isn't helpful as SQL comes in different forms.
But first, you would have to join all of the tables together - there may be a faster way of doing this, but without knowing which flavor of SQL you are using it is hard to tell.
Something like this will work:
SELECT * FROM table_1
UNION
SELECT * FROM table_2
...
UNION
SELECT * FROM table_100
Once you have all of the data you do something like this:
WITH tables_with_counts as (SELECT
A,
B,
C,
COUNT(1) OVER(PARTITION BY(B, C)) AS bc_count
FROM
aggragated_tables)
SELECT
A,
B,
C
FROM
tables_with_counts
WHERE
bc_count >= 10
Here is my take:
Step 1 : Aggregate all tables into one. It would be bulky but if you are using Oracle database, I think it shouldn't be an issue.
Step 2: Create md5 checksum hash values for B,C columns like below :
SELECT APEX_ITEM.MD5_CHECKSUM(B,C) md5_cks,
A,B,C
FROM aggregated_tables
Step 3: take count based on checksum values and retain the rows where count > 10
Step 4: Get rid of duplicate data using rank() or dense rank() in delete statement.
The short answer, which I'm sure that you don't want to hear, is "no." In the context of relational databases there is no efficient query to merge 100 tables.
It is not all bad news though. If it were just one table (let's say it was named "combined" just to have concrete examples) you could use an elegant SQL using windowed functions
select A,B,C from (select A,B,C,count(1) over (partition by B,C) as counts from combined)counted where counts>=10
Option 1. So the question is how to get a "combined" table so that the snippet above works. If we stick with ANSI (standard) sql, you could use UNION ALL, which and collect it into a WITH clause to keep things neat.
Here is an example:
with
combined as (
select * from table_1
union all
select * from table_2),
counted as (
select
A,B,C,
count(1) over (partition by B,C) as counts
from
combined)
select A,B,C from counted where counts>=10;
I only included 2 tables, but the real query would extend that up to table_100. Thats a lot of typing and not very efficient with the programmer's time. Also unions and union all's are notoriously poor performing for databases, so this is not efficient in terms of system resources or time, either. Personally I would not do it this way, but it is an answer.
Option 2 There are other options which do not exactly match your question, but may be helpful to know. Any time you are tempted to create multiple tables with exactly the same schema, you will be better off creating a single table with multiple partitions. see MySQL, Postgres, Sql Server, Oracle, Hive. Every database platform has its own syntax for partitioning tables but they are all similar. For this table, each of the original tables becomes a single partition in the table, and the table name would be a really good candidate for the string value in the partition identifier (partition column)
If you are able to stuff all of your 100 tables into 100 partitions of one table then you can run the first query after all. The advantage is that the database can optimize that query because all modern databases are optimized to manage partitioned queries.
In addition, adding a partition to a table is really no more trouble than creating a new table instead, but supporting and maintaining one table is a lot less trouble than 100 tables.
A third option, since you tagged "big data" is to use a big data engine like Spark with SparkSQL. This would be objectively best because you can actually load a dataframe with 100 combined tables very efficiently with spark, and the SQL after that is not much different from the relational database sql we have been considering. That's kind of out of scope here, but worth considering. If you submit a more specific question and specifically for spark we could go into more details.

Use EXPLAIN in snowflake but on the results of another query. Using it on each row of the result

So I want to get the tables used in the query_history table by using EXPLAIN on the query_text in the query_history. I know that I can give the SQL to EXPLAIN and it will give me the tables used but I need to do this over many different queries and return it all at once.
Basically I have 3 queries in the query_history table as follows:
select * from a.table1
select * from a.table2
select * from a.table3
And I need to do something like this:
SELECT "objects"
FROM TABLE(EXPLAIN_JSON(SYSTEM$EXPLAIN_PLAN_JSON(query_history.query_text)))
WHERE "operation"='TableScan';
This would return:
a.table1
a.table2
a.table3
I cannot do this in a programming language because it is going to be put into a BI tool that can only use SQL.

Select From Multiple and New Tables will be created

My Simple Query is
SELECT MAX(DATETIME)
FROM gss.dbo.contacts_23
WHERE Identification = ''
GROUP BY Identification
How can I select Max(date) from 25 tables and new tables can be created?
These tables like (contacts_22,contacts_25,contacts_29,contacts_36,.. and the new)
I tried to think as following
use union but what about the new tables will be created next time
select all tables start with 'contacts_' to fetch all these tables and the tables will be created next time
SELECT TABLE_NAME
FROM GSS.INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME like 'contacts_%'
but it's not doable with FROM clause
How can I do that GET MAX(DATE) FROM .... AND NEW TABLES
I'm using SQL Server 2012
Thanks in advance
The data model is horrible, and there is no clean way of achieving what you want.
The first way is to create a view which unions all your contacts_ tables, and then to run your query on that view.
Create view contacts_view as
select * from contacts_20
union
select * from contacts_21
...
This is the simplest solution, but only works if you can be certain that you're not creating more tables which need to be included in the view.
The alternative is to use dynamic SQL. This would use your query on the schema database to construct a SQL statement, which you then execute. This can get moderately complex, and is generally hard to test and debug, but does allow new tables to be automatically included in the results.

Trying to pull from multiple tables and multiple columns within tables

I have tried this query to pull up multiple tables and columns it works but comes back blank.
select * from onshore.contracting where code between 18789 and 18798;
select * from onshore.safety_incident where code between 18789 and 18798;
For your immediate problem, the following SQL will work if you really want all the data:
select * from onshore.contracting where code between 18789 and 18798;
select * from onshore.safety_incident where code between 18789 and 18798;
... and so on.
These tables probably have different columns so you need a separate select statement for each one.
If you are going to do more with SQL then it really would be worthwhile to learn it. There is a free resource here: https://www.w3schools.com/sql, my contribution is at http://www.thedatastudio.net and there are many others. It is a bit dangerous to use SQL without understanding it.