Weird results from: Create table .. as select from - sql

Could it be that the following query gives weird results (without errors):
CREATE TABLE MY_TABLE
AS (
SELECT COL_1, COL2
FROM EXISTING_TABLE_1
UNION
SELECT COL_1, COL2
FROM EXISTING_TABLE_2
WHERE key_id NOT IN (
SELECT key_id
FROM(
SELECT COL1, COL2
FROM EXISTING_TABLE_3
UNION
SELECT COL1, COL2
FROM EXISTING_TABLE_4
)A
)
) WITH DATA
When I run similar code, but with real table names and data, my table has for example 250K records. While, when I just run the select part, so everything between the brackets, I get 300K + records.
Is create table .... as ( select .... ) WITH DATA known for problems like this?
FYI: I don't get any errors, I noticed this a little late when doing analysis.

Related

BigQuery: Use COUNT as LIMIT

I want to select everything from mytable1 and combine that with just as many rows from mytable2. In my case mytable1 always has fewer rows than mytable2 and I want the final table to be a 50-50 mix of data from each table. While I feel like the following code expresses what I want logically, it doesn't work syntax wise:
Syntax error: Expected "#" or integer literal or keyword CAST but got
"(" at [3:1]
(SELECT * FROM `mytable1`)
UNION ALL (
SELECT * FROM `mytable2`
LIMIT (SELECT COUNT(*) FROM`mytable1`)
)
Using standard SQL in bigquery
The docs state that LIMIT clause accept only literal or parameter values. I think you can ROW_NUMBER() the rows from second table and limit based on that:
SELECT col1, col2, col3
FROM mytable1
UNION ALL
SELECT col1, col2, col3
FROM (
SELECT col1, col2, col3, ROW_NUMBER() OVER () AS rn
FROM mytable2
) AS x
WHERE x.rn <= (SELECT COUNT(*) FROM mytable1)
Each SELECT statement within UNION must have the same number of
columns
The columns must also have similar data types
The columns in each SELECT statement must also be in the same order
As your mytable1 always less column than mytable2 so you have to put same number of column by selection
select col1,col2,col3,'' as col4 from mytable1 --in case less column you can use alias
union all
select col1,col2,col3,col4 from mytable2

Inserting unique rows into a table where not exist

I am using postgres 8.4.
I am merging several tables into one. There are duplicates both within and across tables. The new table will have a unique constraint. I have inserted the first table into the new big table without trouble, but when trying to add the second table I get an error. I have tried:
INSERT INTO big_table(id, col1, col2)
SELECT DISTINCT ON (id)
id,
col1,
col2,
FROM table2
WHERE NOT EXISTS(
SELECT id, col1, col2
FROM big_table
WHERE(big_table.id = table2.id))
I get the following error:
invalid reference to FROM-clause entry for table "big_table" LINE
13: ...big_table WHERE(table2.id = big_table.id))
HINT: There is an entry for table "big_tweets", but it cannot be
referenced from this part of the query.
I think it might have something to do with the fact that big_table changes, but I'm not sure how else to exclude rows that already exist in the table.
Not related to your question. But instead you can UNION all the table before create the big table to remove the duplicates.
CREATE big_table as
SELECT id, col1, col2 FROM Table1
UNION
SELECT id, col1, col2 FROM Table2
....
UNION
SELECT id, col1, col2 FROM TableN
You also can use a CTE to solve the self reference problem
WITH cte as (
SELECT DISTINCT ON (id)
id,
col1,
col2,
FROM table2
WHERE NOT EXISTS(
SELECT id, col1, col2
FROM big_table
WHERE(big_table.id = table2.id))
)
INSERT INTO big_table
SELECT *
FROM cte

SELECT INTO with HSQLDB

I am trying to create a new table from the result of a select. This works fine with SQL Server:
SELECT * INTO newTable FROM (SELECT col1, col2, col3 FROM oldTable) x;
Now, I want to achieve the exact same thing with HSQLDB (Version 2.2). I have tried several forms like
SELECT * INTO newTable FROM (SELECT col1, col2, col3 FROM oldTable);
SELECT INTO newTable FROM SELECT col1, col2, col3 FROM oldTable;
CREATE TABLE newTable AS SELECT col1, col2, col3 FROM oldTable;
All these variants result in some form of syntax error. How can I create a table from a select with HSQLDB?
The manual has an example for this:
CREATE TABLE t (a, b, c) AS (SELECT * FROM atable) WITH DATA
HSQLDB requires parentheses around the select (unlike all other DBMS) and it also requires the WITH DATA clause
Ok I found very easier way to do this.
select * into t_bckp FROM t;
Its interesting.

union all with queries that have a different number of columns

I've run into a case where a sqlite query I'm expecting to return an error is actually succeeding and I was wondering if anyone could point out why this query is valid.
CREATE TABLE test_table(
k INTEGER,
v INTEGER
);
INSERT INTO test_table( k, v ) VALUES( 4, 5 );
SELECT * FROM(
SELECT * FROM(
SELECT k, v FROM test_table WHERE 1 = 0
)
UNION ALL
SELECT * FROM(
SELECT rowid, k, v FROM test_table
)
)
sqlfiddle of above
I would think that unioning two selects which have a different number of columns would return an error. If I remove the outermost SELECT * then I receive the error I'm expecting: SELECTs to the left and right of UNION ALL do not have the same number of result columns.
The answer to this seems to be straightforward: Yes, this is a quirk.
I'd like to demonstrate this with a short example. But beforehand, let's consult the documentation:
Two or more simple SELECT statements may be connected together to form
a compound SELECT using the UNION, UNION ALL, INTERSECT or EXCEPT
operator.
In a compound SELECT, all the constituent SELECTs must
return the same number of result columns.
So the documentations says very clearly that two SELECTs must provide the same number of columns. However, as you said, the outermost SELECT strangely avoids this 'limitation'.
Example 1
SELECT * FROM(
SELECT k, v FROM test_table
UNION ALL
SELECT k, v,rowid FROM test_table
);
Result:
k|v
4|5
4|5
The third column rowid gets simply omitted, as pointed out in the comments.
Example 2
We are only switching the order of the two select statements.
SELECT * FROM(
SELECT k, v, rowid FROM test_table
UNION ALL
SELECT k, v FROM test_table
);
Result
k|v|rowid
4|5|1
4|5|
Now, sqlite does not omit the column but add a null value.
Conclusion
This brings me to my conclusion, that sqlite simply handles the UNION ALL differently if it is processed as a subquery.
PS: If you are just using UNION it fails at any scenario.
UNION ALL will return the results with null values in the extra columns.
A basic UNION will fail because UNION without the ALL has to have the same number of columns from both tables.
So:
SELECT column1, column2 FROM table a
UNION ALL
SELECT column1, column2, column3 FROM table b
returns 3 columns with nulls in column 3.
and:
SELECT column1, column2 FROM table a
UNION
SELECT column1, column2, column3 FROM table b
should fail because the number of columns do not match.
In conclusion you could add a blank column to the UNION so that you are selecting 3 columns from each table and it would still work.
EX:
SELECT column1, column2, '' AS column3 FROM table a
UNION
SELECT column1, column2, column3 FROM table b
If your second query has less number of columns, you can do this:
select col1, col2, col3, col4, col5
from table A
union all
select col1, col2, col3, col4, NULL as col5,
from table B
Instead of NULL, one can also use some string constant - 'KPI' as col5.

Add Identity column to a view in SQL Server 2008

This is my view:
Create View [MyView] as
(
Select col1, col2, col3 From Table1
UnionAll
Select col1, col2, col3 From Table2
)
I need to add a new column named Id and I need to this column be unique so I think to add new column as identity. I must mention this view returned a large of data so I need a way with good performance, And also I use two select query with union all I think this might be some complicated so what is your suggestion?
Use the ROW_NUMBER() function in SQL Server 2008.
Create View [MyView] as
SELECT ROW_NUMBER() OVER( ORDER BY col1 ) AS id, col1, col2, col3
FROM(
Select col1, col2, col3 From Table1
Union All
Select col1, col2, col3 From Table2 ) AS MyResults
GO
The view is just a stored query that does not contain the data itself so you can add a stable ID. If you need an id for other purposes like paging for example, you can do something like this:
create view MyView as
(
select row_number() over ( order by col1) as ID, col1 from (
Select col1 From Table1
Union All
Select col1 From Table2
) a
)
There is no guarantee that the rows returned by a query using ROW_NUMBER() will be ordered exactly the same with each execution unless the following conditions are true:
Values of the partitioned column are unique. [partitions are parent-child, like a boss has 3 employees][ignore]
Values of the ORDER BY columns are unique. [if column 1 is unique, row_number should be stable]
Combinations of values of the partition column and ORDER BY columns are unique. [if you need 10 columns in your order by to get unique... go for it to make row_number stable]"
There is a secondary issue here, with this being a view. Order By's don't always work in views (long-time sql bug). Ignoring the row_number() for a second:
create view MyView as
(
select top 10000000 [or top 99.9999999 Percent] col1
from (
Select col1 From Table1
Union All
Select col1 From Table2
) a order by col1
)
Using "row_number() over ( order by col1) as ID" is very expensive.
This way is much more efficient in cost:
Create View [MyView] as
(
Select ID = isnull(cast(newid() as varchar(40)), '')
, col1
, col2
, col3
From Table1
UnionAll
Select ID = isnull(cast(newid() as varchar(40)), '')
, col1
, col2
, col3
From Table2
)
use ROW_NUMBER() with "order by (select null)" this will be less expensive and will get your result.
Create View [MyView] as
SELECT ROW_NUMBER() over (order by (select null)) as id, *
FROM(
Select col1, col2, col3 From Table1
Union All
Select col1, col2, col3 From Table2 ) R
GO