BigQuery: Use COUNT as LIMIT - sql

I want to select everything from mytable1 and combine that with just as many rows from mytable2. In my case mytable1 always has fewer rows than mytable2 and I want the final table to be a 50-50 mix of data from each table. While I feel like the following code expresses what I want logically, it doesn't work syntax wise:
Syntax error: Expected "#" or integer literal or keyword CAST but got
"(" at [3:1]
(SELECT * FROM `mytable1`)
UNION ALL (
SELECT * FROM `mytable2`
LIMIT (SELECT COUNT(*) FROM`mytable1`)
)
Using standard SQL in bigquery

The docs state that LIMIT clause accept only literal or parameter values. I think you can ROW_NUMBER() the rows from second table and limit based on that:
SELECT col1, col2, col3
FROM mytable1
UNION ALL
SELECT col1, col2, col3
FROM (
SELECT col1, col2, col3, ROW_NUMBER() OVER () AS rn
FROM mytable2
) AS x
WHERE x.rn <= (SELECT COUNT(*) FROM mytable1)

Each SELECT statement within UNION must have the same number of
columns
The columns must also have similar data types
The columns in each SELECT statement must also be in the same order
As your mytable1 always less column than mytable2 so you have to put same number of column by selection
select col1,col2,col3,'' as col4 from mytable1 --in case less column you can use alias
union all
select col1,col2,col3,col4 from mytable2

Related

Distinct Values but get all the columns

In SSMS we have a view, which is returning distinct columns by the following command,
Create View [VIEWNAME] As
`Select distinct [Col1],[Col2], Max(TimeDate) as TimeDate
from [Table]
Group By [Col1],[Col2]`
I want the column [Col3] as well from the table in view.
I tried the following so far but unfortunately, it didn't work for me
Select Distint on [Col1],[Col2] * from [Table]
Error: Incorrect syntax near 'on'.
Also,
select [Col1],[Col2],[Col3],Max[TimeDate] from [Table]
Group by [Col1],[Col2],[Col3],[TimeDate]
Error: Column TABLE.Col3 is invalid in the select list because it is not contained in either an aggregated function or GROUP BY clause.
Below is the original table sample.
enter image description here
Desired table image
enter image description here
Thanks.
You can use distinct on in Postgresql.
select distinct on (col1, col2) *
from the_table
order by col1, col2, timedate desc;
I guess that your RDBMS is SQL Server. The query below uses the SQL server equivalent of Postgresql distinct on ().
select col1, col2, col3, timedate
from
(
select *, row_number() over (partition by col1,col2 order by timedate desc) as rn
from the_table
) as t
where rn = 1;

How to express "either the single resulting record or NULL", without an inner-query LIMIT?

Consider the following query:
SELECT (SELECT MIN(col1) FROM table1) = 7;
Assuming col1 is non-NULLable, this will yield either true or false - or possibly NULL when table1 is empty;
But now suppose I have:
SELECT (
SELECT
FIRST_VALUE (col2) OVER (
ORDER BY col1
) AS col2_for_first_col1
FROM table1
) = 7;
(and assume col2 is also non-NULLable for simplicity.)
If there is a unique col2 value for the lowest col1 value, or the table is empty, then this works just like before. But if there are multiple col2 values for the lowest col1, I'm going to get a query runtime error.
My question: What is a short, elegant way to get NULL from this last query also in the case of multiple inner-query results? I could of course duplicate it and check the count, but I would rather avoid that.
Important caveat: I'm using MonetDB, and it doesn't seem to support ORDER BY ... LIMIT 1 on inner queries.
Without the MonetDB limitation, you would seem to want:
SELECT (SELECT col2
FROM table1
ORDER BY col1
LIMIT 1
) = 7;
with the limitation, you can use window functions differently:
SELECT (SELECT col2
FROM (SELECT col2, ROW_NUMBER() OVER (ORDER BY col1) as seqnum
FROM table1
) t
WHERE seqnum = 1
) = 7;

Generate Unique ID On a Select in DB2

I have a select that look like this:
SELECT * FROM (SELECT DISTICT COL1, COL2, COL3
FROM view a WHERE conditions ....
) QUERY
WHERE CONDITIONS... LIMIT 20 OFFSET 0
I'm executing this from java and I need this query return an unique id.
So i try:
SELECT TRIM(CHAR(HEX(GENERATE_UNIQUE()))) AS GUID, QUERY.* FROM (SELECT DISTICT COL1, COL2, COL3
FROM view a WHERE conditions ....
) QUERY
WHERE CONDITIONS... LIMIT 20 OFFSET 0
This one return an error telling me I can't use this function in that place.
If i try:
SELECT * FROM (SELECT DISTINCT TRIM(CHAR(HEX(GENERATE_UNIQUE()))) AS GUID, COL1, COL2, COL3
FROM view a WHERE conditions ....
) QUERY
WHERE CONDITIONS... LIMIT 20 OFFSET 0
I have duplicated rows because it is like I execute query wihout DISTINCT
Does anyone know a way to do it?
I don't know DB2 version (I have tried all solutions from How to check db2 version )
If a numeric id would do, how about just using row_number():
SELECT CAST(ROW_NUMBER() OVER (ORDER BY COL1, COL2, COL3) as VARCHAR(255)) as unique_id,
QUERY.*
FROM (SELECT DISTICT COL1, COL2, COL3
FROM view a
WHERE conditions ....
) QUERY
WHERE CONDITIONS...
LIMIT 20 OFFSET 0

SQL: How to combine count results from multiple tables into multiple columns

I have two tables that are the same in their structure, but all I really want out of them is a distinct count from a single row from each into a multi-column result set.
I keep getting syntax errors, but so far haven't been able to get something that delivers the data I want, and makes it through the parser. I'm trying to figure out if this is a SQL problem (possible given that I'm using a website's implementation, rather than native MySQl/SQL/Oracle) or a me problem (much more likely).
So what I want is two unrelated (and un-primary-keyed) tables to return a COUNT(DISTINCT column) into a single result. I have tried a couple of different approaches:
select 1,2 FROM
(SELECT COUNT(DISTINCT col1) as 1 from table1),
(SELECT COUNT(DISTINCT col2) as 2 from table2)
SELECT *
FROM (
SELECT COUNT(DISTINCT col1) AS 1
FROM table1
)
CROSS JOIN (
SELECT COUNT(DISTINCT col2) AS 2
FROM table2
)
I have also messed around with 4-5 different uses of union || union all to no avail. Curious what thoughts someone more schooled in the arts of SQL might have. Thanks.
Depending on the database RDBM you are, a few things could change in syntax. In the ANSI Sql definition your query should be:
select col1, col2 FROM
(SELECT COUNT(DISTINCT col1) as col1 from table1) as tab1,
(SELECT COUNT(DISTINCT col2) as col22 from table2) as tab2
You have to add alias for all sub queries. Also name your columns with words, not number, it is easier to understand. Though I don't recall if a number is not allowed as an alias in SQL ANSI.
Without aliases for the subqueries you can use like this:
-- For MySql, PostgreSql, SQL Server (not sure though)
select (SELECT COUNT(DISTINCT col1)
from table1) as col1,
(SELECT COUNT(DISTINCT col2) as col22
from table2) as col2
-- For Oracle
select (SELECT COUNT(DISTINCT col1)
from table1) as col1,
(SELECT COUNT(DISTINCT col2) as col22
from table2) as col2
from dual
-- For DB2
select (SELECT COUNT(DISTINCT col1)
from table1) as col1,
(SELECT COUNT(DISTINCT col2) as col22
from table2) as col2
from sysibm.sysdummy1
Side note: you can use a number as an alias if you surround it with double quotes " (this is SQL ANSI and will work everywhere) like this:
select "1", "2" FROM
(SELECT COUNT(DISTINCT col1) as "1" from table1) a, --don't forget the table alias
(SELECT COUNT(DISTINCT col2) as "2" from table2) b
Mysql Also allows you to use back ticks:
select `1`, `2` FROM
(SELECT COUNT(DISTINCT col1) as `1` from table1) a,
(SELECT COUNT(DISTINCT col2) as `2` from table2) b

Multiple rows match, but I only want one?

Sometimes I wish to perform a join whereby I take the largest value of one column. Doing this I have to perform a max() and a groupby- which prevents me from retrieving the other columns from the row which was the max (beause they were not contained in a GROUP BY or aggregate function).
To fix this, I join the max value back on the original data source, to get the other columns. However, my problem is that this sometimes returns more than one row.
So, so far I have something like:
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2
If the above query now returns three rows (which match the largest value for column2) I have a bit of a headache.
If there was an extra column- col3 and for the rows returned by the above query, I only wanted to return the one which was, say the minimum Col3 value- how would I do this?
If you are using SQL Server 2005+. Then you can do it like this:
CTE way
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr=1
Subquery way
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
) AS T
WHERE
T.RowNbr=1
As I got it can be something like this
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2 and Col3 = (select min(Col3) from table )
Assuming you are using SQL-Server 2005 or later You can make use of Window functions here. I have chosen ROW_NUMBER() but it is not hte only option.
;WITH T AS
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) [RowNumber]
FROM Table
)
SELECT *
FROM T
WHERE RowNumber = 1
The PARTITION BY within the OVER clause is equivalent to your group by in your subquery, then your ORDER BY determines the order in which to start numbering the rows. In this case Col2 DESC to start with the highest value of col2 (Equivalent to your MAX statement).