Using Google BigQuery's Comma as UNION ALL with IN clause - sql

I am attempting to perform the following query:
SELECT
author, link_id, COUNT(link_id) as cnt
FROM
[fh-bigquery:reddit_comments.2015_12],
[fh-bigquery:reddit_comments.2015_11]
WHERE link_id IN (
SELECT posts.name
FROM [fh-bigquery:reddit_posts.full_corpus_201512] AS posts
WHERE posts.subreddit = 'politics'
ORDER BY posts.created_utc DESC
LIMIT 300
)
GROUP BY author, link_id
ORDER BY author
I receive this error message upon execution: JOIN (including semi-join) and UNION ALL (comma, date range) may not be combined in a single SELECT statement. Either move the UNION ALL to an inner query or the JOIN to an outer query.
Removing one of the comments tables works fine however I can't seem to figure out how BigQuery's Comma as UNION ALL works. I've attempted to move the union to an inner query but I still get the same error.

The error was in my misunderstanding of move the UNION ALL to an inner query. The resolve the error, I had to put the two tables in a basic select * from .... The working query is as follows:
SELECT
author, link_id, COUNT(link_id) as cnt
FROM (
SELECT *
FROM
[fh-bigquery:reddit_comments.2015_12],
[fh-bigquery:reddit_comments.2015_11]
)
WHERE link_id IN (
SELECT posts.name
FROM [fh-bigquery:reddit_posts.full_corpus_201512] AS posts
WHERE posts.subreddit = 'politics'
ORDER BY posts.created_utc DESC
LIMIT 300
)
GROUP BY author, link_id
ORDER BY author

Related

Order By in Union Nested selects

I have two really long selects that are both nested
(SELECT MAX(u.Username) as Identification,max(cht.SentOn) NewestMessage
from Chats cht
JOIN(some other select as u that has u.username))
union
(select Max(GC.Identification)Identification,Min(cht.SentOn) NewestMessage
from Chats cht
join(some other select as GC that has GC.Identification))
How can I order these (both queries combined into one table results) by NewestMessage which is a type of datetime?
I would slightly alter your query to:
SELECT Identification, NewestMessage
FROM
(
SELECT MAX(u.Username) AS Identification, MAX(cht.SentOn) AS NewestMessage
FROM Chats cht
INNER JOIN (some other select as u that has u.username)
UNION -- maybe UNION ALL if you don't mind duplicates?
SELECT MAX(GC.Identification), MIN(cht.SentOn)
FROM Chats cht
INNER JOIN (some other select as GC that has GC.Identification)
) t
ORDER BY NewestMessage
I am basically replacing your union of tuples to a subquery on a union query, using an ORDER BY with the column you want. Note that aliases in the second half of the union query are not necessary, and in fact will be ignored by SQL Server.
Add an order by class on the last SQL with the column name on the first SQL.
see my example here: order by TeamId at the end, in this the TeamId is from the first SQL
SELECT TeamId, TeamName
FROM [dbo].[Teams]
union
SELECT PlayerID, FirstName
FROM [dbo].[Players]
order by TeamId asc

How to work with problems correlated subqueries that reference other tables, without using Join

I am trying to work on public dataset bigquery-public-data.austin_crime.crime of the BigQuery. My goal is to get the output as three column that shows the
discription(of the crime), count of them, and top district for that particular description(crime).
I am able to get the first two columns with this query.
select
a.description,
count(*) as district_count
from `bigquery-public-data.austin_crime.crime` a
group by description order by district_count desc
and was hoping I can get that done with one query and then I tried this in order to get the third column showing me the Top district for that particular description (crime) by adding the code below
select
a.description,
count(*) as district_count,
(
select district from
( select
district, rank() over(order by COUNT(*) desc) as rank
FROM `bigquery-public-data.austin_crime.crime`
where description = a.description
group by district
) where rank = 1
) as top_District
from `bigquery-public-data.austin_crime.crime` a
group by description
order by district_count desc
The error i am getting is this. "Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN."
I think i can do that by joins. Can someone has better solution possibly to do that using without join.
Below is for BigQuery Standard SQL
#standardSQL
SELECT description,
ANY_VALUE(district_count) AS district_count,
STRING_AGG(district ORDER BY cnt DESC LIMIT 1) AS top_district
FROM (
SELECT description, district,
COUNT(1) OVER(PARTITION BY description) AS district_count,
COUNT(1) OVER(PARTITION BY description, district) AS cnt
FROM `bigquery-public-data.austin_crime.crime`
)
GROUP BY description
-- ORDER BY district_count DESC

SQL Oracle Find Max of count

I have this table called item:
| PERSON_id | ITEM_id |
|------------------|----------------|
|------CP2---------|-----A03--------|
|------CP2---------|-----A02--------|
|------HB3---------|-----A02--------|
|------BW4---------|-----A01--------|
I need an SQL statement that would output the person with the most Items. Not really sure where to start either.
I advice you to use inner query for this purpose. the inner query is going to include group by and order by statement. and outer query will select the first statement which has the most items.
SELECT * FROM
(
SELECT PERSON_ID, COUNT(*) FROM TABLE1
GROUP BY PERSON_ID
ORDER BY 2 DESC
)
WHERE ROWNUM = 1
here is the fiddler link : http://sqlfiddle.com/#!4/4c4228/5
Locating the maximum of an aggregated column requires more than a single calculation, so here you can use a "common table expression" (cte) to hold the result and then re-use that result in a where clause:
with cte as (
select
person_id
, count(item_id) count_items
from mytable
group by
person_id
)
select
*
from cte
where count_items = (select max(count_items) from cte)
Note, if more than one person shares the same maximum count; more than one row will be returned bu this query.

Get the first instance of a row using MS Access

EDITED:
I have this query wherein I want to SELECT the first instance of a record from the table petTable.
SELECT id,
pet_ID,
FIRST(petName),
First(Description)
FROM petTable
GROUP BY pet_ID;
The problem is I have huge number of records and this query is too slow. I discovered that GROUP BY slows down the query. Do you have any idea that could make this query faster? or better, a query wherein I don't need to use GROUP BY?
"The problem is I have huge number of records and this query is too slow. I discovered that GROUP BY slows down the query. Do you have any idea that could make this query faster?"
And an index on pet_ID, then create and test this query:
SELECT pet_ID, Min(id) AS MinOfid
FROM petTable
GROUP BY pet_ID;
Once you have that query working, you can join it back to the original table --- then it will select only the original rows which match based on id and you can retrieve the other fields you want from those matching rows.
SELECT pt.id, pt.pet_ID, pt.petName, pt.Description
FROM
petTable AS pt
INNER JOIN
(
SELECT pet_ID, Min(id) AS MinOfid
FROM petTable
GROUP BY pet_ID
) AS sub
ON pt.id = sub.MinOfid;
Your Query could change as,
SELECT ID, pet_ID, petName, Description
FROM petTable
WHERE ID IN
(SELECT Min(ID) As MinID FROM petTable GROUP BY pet_ID);
Or use the TOP clause,
SELECT petTable.petID, petTable.petName, petTable.[description]
FROM petTable
WHERE petTable.ID IN
(SELECT TOP 1 ID
FROM petTable AS tmpTbl
WHERE tmpTbl.petID = petTable.petID
ORDER BY tmpTbl.petID DESC)
ORDER BY petTable.petID, petTable.petName, petTable.[description];

How to count distinct values in SQL union?

I can select distinct values from two different columns, but do not know how to count them.
My guess is that i should use alias but cant figure out how to write statement correctly.
$sql = "SELECT DISTINCT author FROM comics WHERE author NOT IN
( SELECT email FROM bans ) UNION
SELECT DISTINCT email FROM users WHERE email NOT IN
( SELECT email FROM bans ) ";
Edit1: i know that i can use mysql_num_rows() in php, but i think that takes too much processing.
You could wrap the query in a subquery:
select count(distinct author)
from (
SELECT author
FROM comics
WHERE author NOT IN ( SELECT email FROM bans )
UNION ALL
SELECT email
FROM users
WHERE email NOT IN ( SELECT email FROM bans )
) as SubQueryAlias
There were two distincts in your query, and union filters out duplicates. I removed all three (the non-distinct union is union all) and moved the distinctness to the outer query with count(distinct author).
You can always do SELECT COUNT(*) FROM (SELECT DISTINCT...) x and just copy that UNION into the second SELECT (more precisely, it's called an anonymous view).