Perform Simple Group By in Google Big Query - sql

i have the simplest query on google big query that keeps returning an error
Grouping by expressions of type STRUCT is not allowed
i am simply trying to select a list of emails from two locations, union them in one cte, and count frequency in the cte to identify duplicates.
this should be very easy - what am i missing??
with a as (select properties.email as email, 'loc1' as tag from `loc1.contacts`),
b as (select properties.email as email, 'loc2' as tag from `loc2.contacts`),
c as (
select * from a
union all
select * from b
)
select email, count(email) from c group by 1
sample data:
email/tag
bob#email.com/loc1
bob#email.com/loc2
expected results:
email/count
bob#email.com/2

looks like i needed to add .value to actually get the value of the email field, following query worked as desired
with a as (select properties.email.value as email, 'loc1' as tag from `loc1.contacts`),
b as (select properties.email.value as email, 'loc2' as tag from `loc2.contacts`),
c as (
select * from a
union all
select * from b
)
select email, count(email) from c group by 1

Related

Selecting TOP 1 Columns where duplicate exists and selecting all where no duplicate exists

Given the list of Names, Accounts and Positions I am trying to:
Select the 1st position where there are more than 1 records with the same Name and Account
If there is only 1 record with the Name and Account, then select details.
My current query looks like the following:
SELECT *
FROM CTE cte1
JOIN
(
SELECT Name, OppName FROM CTE GROUP BY Name, OppName HAVING COUNT(Name)>1
) as cte2
on cte2.Name = cte1.Name and cte2.OppName = cte1.OppName
ORDER BY cte1.OppName, cte1.Name
I have not posted the rest of the CTE query as it is way to long.
However, this is only providing me with the results where the Name and Accounts are the same and the Positions are different.
I.E. If Oera worked at Christie's as a Sales Analyst and a Developer It would only Select the record where Oera worked at Christie's as a Developer.
How do I modify this query accordingly?
Are you looking for something like this?
SELECT *
FROM CTE AS cte1
JOIN
(
SELECT Name, OppName,COUNT(Name) PARTITION BY (Name,OppName) cnt
FROM CTE
) AS cte2
ON cte2.Name = cte1.Name and cte2.OppName = cte1.OppName
WHERE cnt > 1
ORDER BY cte1.OppName, cte1.Name

Case statement to determine if I should union

I currently want to do some sort of conditional union. Given the following example:
SELECT age, name
FROM users
UNION
SELECT 25 AS age, 'Betty' AS name
Say I wanted to only union the second statement if the count of 'users' was >=2 , otherwise do not union the two.
In summary I want to append a table with a row if the table only has 2 or more values.
You could use an ugly hack something like this, but I think Tim's answer is better:
SELECT age, name
FROM users
UNION ALL
SELECT 25, 'Betty'
WHERE (SELECT COUNT(*) FROM users) > 1;
If it's in a stored-procedure you could use If...Else:
IF (SELECT COUNT(*) FROM users) < 2
BEGIN
SELECT age, name
FROM users
END
ELSE
SELECT age, name
FROM users
UNION ALL
SELECT 25 AS age, 'Betty' AS name
Otherwise you could try something like this:
SELECT age, name
FROM users
UNION ALL
SELECT TOP 1 25 AS age, 'Betty' AS name
FROM users
WHERE (SELECT COUNT(*) FROM users) >= 2
Note that i've used UNION ALL since it doesn't seem that you want to eliminate duplicates.
Played around here: http://sqlfiddle.com/#!6/a7540/2323/0
Edit: Instead of my second approach i prefer Zohar's. So if you can use If....Else prefer that otherwise WHERE (SELECT COUNT(*) FROM users) > 1 without a table.
Something like the following should work:
SELECT age, name
FROM users
UNION ALL
SELECT age, name
FROM (SELECT 25 AS age, 'Betty' AS name) x
CROSS APPLY (SELECT COUNT(*) FROM users) y(cnt)
WHERE y.cnt >= 2
Second part of UNION ALL will be NULL in case users table has less than 2 records.
SELECT age
, name
FROM users
UNION
SELECT 25 As age
, 'Betty' As name
WHERE EXISTS (
SELECT Count(*)
FROM users
HAVING Count(*) >= 2
)
;

Can we use join with in same table while using group by function?

For instance, I have a table with columns below:
pk_id,address,first_name,last_name
and I have a query like this to display the first name ans last name that are repetitive(duplicates)
select first_name,last_name
from table
group by first_name,last_name
having count(*)>1;
but the above query just returns first and last names but I want to display pk_id and address too that are tied to these duplicate first and last names
Can we use joins to do this on the same table.Please help!!
A simple way of doing is to build a view with the pk_id and the count of duplicates. Once you have it, it is only a matter of using a JOIN on the base table, and a filter to only keep rows having a duplicate:
SELECT T.*
FROM T
JOIN (SELECT "pk_id",
COUNT(*) OVER(PARTITION BY "first_name", "last_name") cnt
FROM T) V
ON T."pk_id" = V."pk_id"
WHERE cnt > 1
See http://sqlfiddle.com/#!4/3ecd0/9
You have to call it from an outer query, like this:
select * from table
where first_name||last_name in
(select first_name||last_name from
(select first_name, last_name, count( * )
from table
group by first_name,last_name
having count( * ) > 1
)
)
note: you may not need to concatenate the 2 fields, but I haven't tested thaT.
with
my_duplicates as
(
select
first_name,
last_name
from
my_table
group by
first_name,
last_name
having
count(*) > 1
)
select
bb.pk_id,
bb.address,
bb.first_name,
bb.last_name
from
my_duplicates aa
join my_table bb on
(
aa.first_name = bb.first_name
and
aa.last_name = bb.last_name
)
order by
bb.last_name,
bb.first_name,
bb.pk_id

How can I add a "custom" row to the top of a select result set?

I can select and pull out a list of records by using a select statement like so with t-sql:
select * from [dbo].[testTable];
But how can I add in a "custom" row to the top of the result set?
For example, if the result set was:
John john#email.com
Max max#domain.com
I want to add a row, which is not from the table, to the result set so that it looks like so:
Name Email
John john#email.com
Max max#domain.com
The reason why I want to do this is because I'm going to export this into a csv file through sqlcmd and I want to add in those "custom row" as headers.
This is the safe way to do this:
select name, email
from ((select 'name' as name, 'email' as email, 1 as which
) union all
(select name, email, 2 as which from [dbo].[testTable]
)
) t
order by which;
In practice, union all will work:
select 'name' as name, 'email' as email
union all
select name, email from [dbo].[testTable]
However, I cannot find documentation that guarantees that the first subquery is completed before the second. The underlying operator in SQL Server does have this behavior (or at least it did in SQL Server 2008 when I last investigated it).
SELECT name, email FROM (
SELECT 'Name' AS Name, 'Email' AS Email, 1 AS o
UNION ALL
SELECT name, email, 2 AS o FROM testTable
) t
ORDER BY o, name
The o column is added to order the result sets of the UNION so that you ensure the first result set appears on top.
SELECT * FROM
(SELECT 'Name' as name, 'Email' as email, 1 'rank'
union
SELECT name, email,3 'rank')a
ORDER BY a.rank,a.name
Try the above.

How to count distinct values in SQL union?

I can select distinct values from two different columns, but do not know how to count them.
My guess is that i should use alias but cant figure out how to write statement correctly.
$sql = "SELECT DISTINCT author FROM comics WHERE author NOT IN
( SELECT email FROM bans ) UNION
SELECT DISTINCT email FROM users WHERE email NOT IN
( SELECT email FROM bans ) ";
Edit1: i know that i can use mysql_num_rows() in php, but i think that takes too much processing.
You could wrap the query in a subquery:
select count(distinct author)
from (
SELECT author
FROM comics
WHERE author NOT IN ( SELECT email FROM bans )
UNION ALL
SELECT email
FROM users
WHERE email NOT IN ( SELECT email FROM bans )
) as SubQueryAlias
There were two distincts in your query, and union filters out duplicates. I removed all three (the non-distinct union is union all) and moved the distinctness to the outer query with count(distinct author).
You can always do SELECT COUNT(*) FROM (SELECT DISTINCT...) x and just copy that UNION into the second SELECT (more precisely, it's called an anonymous view).