BigQuery Subtract Counts of Two Tables? - google-bigquery

In MySQL I can do SELECT (SELECT COUNT(*) FROM table1) - (SELECT COUNT(*) FROM table2) to get the difference in counts between two tables. When I try this in BigQuery, I get: Subselect not allowed in SELECT clause. How do I run a query like this in BigQuery?

2019 update:
The original question syntax is now supported with #standardSQL
SELECT (SELECT COUNT(*) c FROM `publicdata.samples.natality`)
- (SELECT COUNT(*) c FROM `publicdata.samples.shakespeare`)
As subselects are not supported inside the SELECT clause, I would use a CROSS JOIN for this specific query:
SELECT a.c - b.c
FROM
(SELECT COUNT(*) c FROM [publicdata:samples.natality]) a
CROSS JOIN
(SELECT COUNT(*) c FROM [publicdata:samples.shakespeare]) b

Related

Written a subquery that can return more than one field without using the Exists

The query below is supposed to pull records for fields with the max date.
I am getting an error
You have written a subquery that can return more than one field without using EXISTS reserved word in the Main query's FROM clause. Revise the SELECT statement of the subquery to request only one column.
Code:
SELECT *
FROM TableName
WHERE (((([Project_Name], [Date])) IN (SELECT Project_Name, MAX(Date)
FROM TableName
GROUP BY Project)));
Your probably thinking of a nested subquery used as a table, like the below:
select a.*, b.1, b.2
from FirstTable A
join (Select Id, firstcolumn as 1, secondcolumn as 2
from SecondTable) B on b.ID = a.ID
Works pretty much like a regular join except you are using a subquery. Hope that helps,
SELECT A.*
FROM TableName A
INNER JOIN (select Project_Name, max(Date) MaxDate
from TableName
group by Project) B
ON A.[Project_Name] = B.[Project_Name]
AND A.[Date] = B.MaxDate
A version using EXISTS() looks like this:
SELECT *
FROM TableName AS A
WHERE EXISTS(
SELECT * FROM (
SELECT B.Project_Name, MAX( B.Date ) AS MaxDate
FROM TableName AS B
GROUP BY B.Project_Name ) AS C
WHERE C.Project_Name = A.Project_Name AND C.MaxDate = A.Date
);
Although I have the feeling this will have poorer performance than a JOIN because the GROUP BY statement might have to be executed for each record and each call to the EXISTS() function...

get the distinct values for a column in four tables by SQL server 2008, but it very slow

I need to get the distinct values for a column in four tables by SQL server 2008.
All tables have about 8 columns and 80,000 rows. All column values are int, varchar, or double.
The query column is int.
SELECT COUNT(distinct a.id) as a_num_distinc_id,
COUNT(distinct b.id) as b_num_distinc_id,
COUNT(distinct c.id) as c_num_distinc_id,
COUNT(distinct d.id) as d_num_distinc_id
FROM table1 as a, table2 as b
table3 as c, table4 as d
If I get the distinct values for the column for each table one by one, it run fast. But, if I run them together. It run very very slow, even more than 20 minutes.
Why ? thanks !
UPDATE -------------------------------------------------
I have solve the above problem from your answers.
Now, I have a new one, which is related to OP but different.
I have a very large table 1 billion rows and 12 columns, which are int, double, varchar.
I need to know the distinct values for each volumn.
Althought I use
SELECT COUNT(distinct a.id) as num_dist_id
FROM my_large_table as a
It is very slow.
Are there better ways to do that ?
You are doing a humongous cross join on all the tables. Simple rule: Never use a comma in the from clause.
You can get what you want with nested subqueries in the select clause:
SELECT (select COUNT(distinct a.id) from table1 a) as a_num_distinc_id,
(select COUNT(distinct b.id) from table2 b) as b_num_distinc_id,
(select COUNT(distinct c.id) from table3 c) as c_num_distinc_id,
(select COUNT(distinct d.id) from table4 d) as d_num_distinc_id;
Because when you run them together, you're creating a Cartesian product of all the values in all the tables.
Try
select
(select COUNT(distinct a.id) From table1) as a_num_distinc_id,
(select COUNT(distinct b.id) From table2) as b_num_distinc_id,
(select COUNT(distinct c.id) From table3) as c_num_distinc_id,
(select COUNT(distinct d.id) From table4) as d_num_distinc_id

SQL Server ROW_NUMBER Left Join + when you don't know column names

I'm writing a page that will create a query (for non-db users) and it create the query and run it returning the results for them.
I am using row_number to handle custom pagination.
How do I do a left join and a row_number in a subquery when I don't know the specific columns I need to return. I tried to use * but I get an error that
The column '' was specified multiple times
Here is the query I tried:
SELECT * FROM
(SELECT ROW_NUMBER() OVER (ORDER BY Test) AS ROW_NUMBER, *
FROM table1 a
LEFT JOIN table2 b
ON a.ID = b.ID) x
WHERE ROW_NUMBER BETWEEN 1 AND 50
Your query is going to fail in SQL Server regardless of the row_number() call. The * returns all columns, including a.id and b.id. These both have the same name. This is fine for a query, but for a subquery, all columns need distinct names.
You can use row_number() for an arbitrary ordering by using a "subquery with constant" in the order by clause:
SELECT * FROM
(SELECT ROW_NUMBER() OVER (ORDER BY (select NULL)) AS ROW_NUMBER, *
FROM table1 a
LEFT JOIN table2 b
ON a.ID = b.ID) x
WHERE ROW_NUMBER BETWEEN 1 AND 50 ;
This removes the dependency on the underlying column name (assuming none are named ROW_NUMBER).
Try this sql. It should work.
SELECT * FROM
(SELECT ROW_NUMBER() OVER (ORDER BY a.Test) AS ROW_NUMBER, a.*,b.*
FROM table1 a
LEFT JOIN table2 b
ON a.ID = b.ID) x
WHERE ROW_NUMBER BETWEEN 1 AND 50

LEFT JOIN FIlter

If I have two tables - Table_A and Table_B - and if I am using LEFT JOIN to join them, how can I filter only those rows from Table_B which joined with the rows in the Table_A more than once?
DB flavor: Teradata
If I'm not mistaken Teradata supports window functions, so this might work:
select *
from (
select a.*,
b.*
count(*) over (partition by a.MyCol) as cnt
from Table_A a
left join Table_B b ON a.MyCol = b.MyCol
where ... -- Conditions
) t
where cnt > 1
(not tested)
Here is a Teradata-specific version of your accepted answer:
select a.*,
b.*
from Table_A a
left join Table_B b
ON a.MyCol = b.MyCol
where ... -- Conditions
QUALIFY count(*) over (partition by a.MyCol) > 1
Note that QUALIFY is a Teradata extension to the ANSI standard (and a handy one at that).
may be it's help for you
1) you can used INNER JOIN .
2) you can also check joind row is not null or blank .
Select a.*,b.* from Table_A a
left join Table_B b on condition
HAVING COUNT(DISTINCT a.value)>1
make necessary edits and check

How to select records from a Table that has a certain number of rows in a related table in SQL Server?

Not quite sure how to ask this, but I have 2 tables that are related in a 1 to many relationship, I need to select all records in the "1" table that have less than three records in the "many' table.
select b.foreignkey,count(b.foreignkey) as bidcount
from b
where b.foreignkey in (select a.id from a) and bidcount< 3
group by b.foreignkey
this doesn't work at all I know but I am at a loss how to do this.
I need to in the end select all the records from the "a" table based on this criteria. Sorry if that is confusing!
Just using your code, not tested:
SELECT
b.foreignkey,
count(b.foreignkey) as bidcount
FROM
b
WHERE
b.foreignkey IN (SELECT a.id FROM a)
GROUP BY
b.foreignkey
HAVING
count(b.foreignkey) < 3
Try this:
SELECT t1.id,COUNT(t2.parentId)
FROM table1 as t1
INNER JOIN table2 as t2
ON t1.id = t2.parentId
GROUP BY t1.id
HAVING COUNT(t2.parentId) < 3
You didn't mention which version of SQL Server you're using - if you're on SQL Server 2005 or newer, you could use this CTE (Common Table Expression):
;WITH ChildRows AS
(
SELECT A.Id, COUNT(b.Id) AS 'BCount'
FROM
dbo.TableA A
INNER JOIN
dbo.TableB B ON B.TableAId = A.Id
)
SELECT A.*, R.BCount
FROM dbo.TableA A
INNER JOIN ChildRows R ON A.Id = R.Id
The inner SELECT lists the Id columns from TableA and the count of the child rows associated with those (using the INNER JOIN to TableB) - and the outer SELECT just builds on top of that result set and shows all fields from table A (and the count from the B table)
if you want to return all fields of your (1) table in one query, I suggest you consider using CROSS APPLY:
SELECT t1.* FROM table_1 t1
CROSS APPLY (SELECT COUNT(*) cnt FROM Table_Many t2 WHERE t2.fk = t1.pk) a
where a.cnt < 3
in some particular cases, based on your indices and db structure, this query may run 4 times faster than the GROUP BY method
you have posted this question in sql server, I have a answer in oracle database system (don't know whether it will run in sql server as well or not)
this is as follow-
select [desired column list] from
(select b.*, count(*) over (partition by b.foreignkey) c_1
from b
where b.foreignkey in (select a.id from a) )
where c_1 < 3 ;
i hope it should work on sql server as well...
if not please let me update ..