Merge SQL scripts for row condition and table merging - sql

I have these following SQL codes to do clean-up tasks:
SELECT the first n-rows from the table that satisfies a condition and put them in a new table. Note that the [source].var='1' varies for different tables.
SELECT TOP n * INTO tablen
FROM source
WHERE [source].var='1';
# concrete example
SELECT TOP n * INTO table1
FROM source
WHERE [source].var1='1';
SELECT TOP n * INTO table2
FROM source
WHERE [source].var2='1';
SELECT TOP n * INTO table3
FROM source
WHERE [source].var3='1';
SELECT TOP n * INTO table4
FROM source
WHERE [source].var4='1';
SELECT TOP n * INTO table5
FROM source
WHERE [source].var5='1';
After making n-tables from the first step, I concatenate them using a query.
# code2
SELECT * FROM table1
UNION ALL
SELECT * FROM table2
UNION ALL
SELECT * FROM table3
UNION ALL
SELECT * FROM table4
UNION ALL
SELECT * FROM table5
Finally I put the results of number two into a new table so I can use it.
SELECT *
INTO dest
FROM code2
Does anyone know how to put these set of tedious tasks into one SQL query so that I don't have to repeat 15 times?

Ill try to answer this, but this is my first answer so go easy on me. First I'm not sure if the source table is the same for each "TOP" row query. If it is then you dont need tables, 1-5. Unless of course you want the 5 temp tables for some reason...and I assume you know what you're doing with UNION ALL.
Just query the original table inline:
select a.* into dest
from (select TOP n *
from source
where source.var1='1'
UNION ALL
select TOP n *
from source
where source.var2='1'
UNION ALL
select TOP n *
from source
where source.var3='1'
UNION ALL
select TOP n *
from source
where source.var4='1'
UNION ALL
select TOP n *
from source
where source.var5='1'
) a

Related

How to Union Multiple Tables with same column names on Google Big Query?

I am trying to create a union with 3 tables that have the same column names. However, the queries tested seems not to be working.
The first query that I have used is the following:
Select * FROM table1
UNION ALL
SELECT * FROM table2
SELECT * FROM table3
UNION ALL
The second query used is the following:
SELECT *
FROM
(select * from table_1),
(select * from table_2),
(select * from table_3)
Both of them are not working for me. Please someone can help me with this?
This should work if columns are identical:
SELECT * FROM table1 UNION ALL
SELECT * FROM table2 UNION ALL
SELECT * FROM table3

Is there a SQL function to expand table?

I vaguely remember there being a function that does this, but I think I may be going crazy.
Say I have a datatable, call it table1. It has three columns: column1, column2, column3. The query
SELECT * FROM table1
returns all rows/columns from table1. Isn't there some type of EXPAND function that allows me to duplicate that result? For example, if I want to duplicate everything from the SELECT * FROM table1 query three times, I can do something like EXPAND(3) ?
In BigQuery, I would recommend a CROSS JOIN:
SELECT t1.*
FROM table1 CROSS JOIN
(SELECT 1 as n UNION ALL SELECT 2 UNION ALL SELECT 3) n;
This can get cumbersome for lots of copies, but you can simplify this by generating the numbers:
SELECT t1.*
FROM table1 CROSS JOIN
UNNEST(GENERATE_ARRAY(1, 3)) n
This creates an array with three elements and unnests it into rows.
In both these cases, you can include n in the SELECT to distinguish the copies.
Below is for BigQuery Standard SQL
I think below is close enough to what "got you crazy" o)
#standardSQL
SELECT copy.*
FROM `project.dataset.tabel1` t, UNNEST(FN.EXPAND(t, 3)) copy
To be able to do so, you can leverage recently announced support for persistent standard SQL UDFs, namely - you need to create FN.EXPAND() function as in below example (note: you need to have FN dataset in your project - or use existing dataset in which case you should use YOUR_DATASET.EXPAND() reference
#standardSQL
CREATE FUNCTION FN.EXPAND(s ANY TYPE, dups INT64) AS (
ARRAY (
SELECT s FROM UNNEST(GENERATE_ARRAY(1, dups))
)
);
Finally, if you don't want to create persistent UDF - you can use temp UDF as in below example
#standardSQL
CREATE TEMP FUNCTION EXPAND(s ANY TYPE, dups INT64) AS ( ARRAY(
SELECT s FROM UNNEST(GENERATE_ARRAY(1, dups))
));
SELECT copy.*
FROM `project.dataset.tabel1` t, UNNEST(EXPAND(t, 3)) copy
if you want a cartesian product (all the combination on a row ) you could use
SELECT a.*, b.*, c.*
FROM table1 a
CROSS JOIN table1 b
CROSS JOIN table1 c
if you want the same rows repeated you can use UNION ALL
SELECT *
FROM table1
UNION ALL
SELECT *
FROM table1
UNION ALL
SELECT *
FROM table1
Use union all
Select * from table1
Union all
Select * from table1
Union all
Select * from table1
Union all
Select * from table1
For reuse purposes can embed this code in a procedure like
Create Procedure
expandTable(tablename
varchar2(50))
As
Select * from table1
Union all
Select * from table1
Union all
Select * from table1
Union all
Select * from table1
End
/

Select limited rows from multiple tables

So this is fairly common knowledge to select rows from multiple tables and stack the results on top of each other:
SELECT * FROM table1
UNION
SELECT * FROM table2
UNION
...
However, if I want only a limited number of rows from each table, then how should I write it?
SELECT * FROM table1 LIMIT 2
UNION
SELECT * FROM table2 LIMIT 2
UNION
...
Clearly doesn't work.
Note that in my case, I have 51 tables, all with the same exact columns.
could be work this way
( SELECT * FROM table1 LIMIT 2 )
UNION
( SELECT * FROM table2 LIMIT 2 )
UNION
...

SELECT TOP ... FROM UNION

What is the best way to SELECT TOP N records from UNION of 2 queries?
I can't do
SELECT TOP N ... FROM
(SELECT ... FROM Table1
UNION
SELECT ... FROM Table2)
because both queries return huge results I need every bit of optimization possible and would like to avoid returning everything. For the same reason I cannot insert results into #TEMP table first either.
I can't use SET ROWCOUNT N either because I may need to group results and this command will limit number of grouped rows, and not underlying row selections.
Any other ideas? Thanks!
Use the Top keyword for inner queries also:
SELECT TOP N ... FROM
(SELECT TOP N... FROM Table1
UNION
SELECT TOP N... FROM Table2) as result
You could try the following. It uses a CTE which would give you the results of both queries first and then you would select your TOP N from the CTE.
WITH table_cte (
(
[col1],
[col2],
...
)
AS
(
SELECT *
FROM table1
UNION ALL
SELECT *
FROM table2
)
SELECT TOP 1000 * FROM table_cte

UNION ALL query: "Too Many Fields Defined"

I'm trying to get a UNION of 3 tables, each of which have 97 fields. I've tried the following:
select * from table1
union all
select * from table2
union all
select * from table3
This gives me an error message:
Too many fields defined.
I also tried explicitly selecting all the field names from the first table (ellipses added for brevity):
select [field1],[field2]...[field97] from table1
union all
select * from table2
union all
select * from table3
It works fine when I only UNION two tables like this:
select * from table1
union all
select * from table2
I shouldn't end up with more than 97 fields as a result of this query; the two-table UNION only has 97. So why am I getting Too many fields with 3 tables?
EDIT: As RichardTheKiwi notes below, Access is summing up the field count of each SELECT query in the UNION chain, which means that my 3 tables exceed the 255 field maximum. So instead, I need to write the query like this:
select * from table1
union all
select * from
(select * from table2
union all
select * from table3)
which works fine.
It appears that the number of fields being tracked (limit 255) is counted against ALL parts of the UNION ALL. So 3 x 97 = 291, which is in excess. You could probably create a query as a UNION all of 2 parts, then another query with that and the 3rd part.
I had two tables with 173 fields each (2 x 173 > 255!). So I had to resort to splitting the tables in half (keeping the primary key in both), before using the UNION statement and reassembling the resulting output tables using a JOIN.
select u1.*, u2.*
from (
select [field1_PKID],[field2],...,[field110]
from table1
union all
select [field1_PKID],[field2],...,[field110]
from table2
) as u1
inner join (
select [field1_PKID],[field111],...,[field173]
from table1
union all
select [field1_PKID],[field111],...,[field173]
from table2
) as u2
on [u1].[field1_PKID] = [u2].[field2_PKID]
Perhaps if your 3 tables have duplicate records you can go with UNION instead of UNION ALL which may reduce the number of fields to be tracked. Because UNION will always serve the business purpose which removes duplicates. In that case your query will be like following,
select * from table1
union
select * from table2
union
select * from table3;