SQL Server GROUP BY troubles! - sql

I'm getting a frustrating error in one of my SQL Server 2008 queries. It parses fine, but crashes when I try to execute. The error I get is the following:
Msg 8120, Level 16, State 1, Line 4
Column
'customertraffic_return.company' is
invalid in the select list because it
is not contained in either an
aggregate function or the GROUP BY
clause.
SELECT *
FROM (SELECT ctr.sp_id AS spid,
Substring(ctr.company, 1, 20) AS company,
cci.email_address AS tech_email,
CASE
WHEN rating IS NULL THEN 'unknown'
ELSE rating
END AS rating
FROM customer_contactinfo cci
INNER JOIN customertraffic_return ctr
ON ctr.sp_id = cci.sp_id
WHERE cci.email_address <> ''
AND cci.email_address NOT LIKE '%hotmail%'
AND cci.email_address IS NOT NULL
AND ( region LIKE 'Europe%'
OR region LIKE 'Asia%' )
AND SERVICE IN ( '1', '2' )
AND ( rating IN ( 'Premiere', 'Standard', 'unknown' )
OR rating IS NULL )
AND msgcount >= 5000
GROUP BY ctr.sp_id,
cci.email_address) AS a
WHERE spid NOT IN (SELECT spid
FROM customer_exclude)
GROUP BY spid,
tech_email

Well, the error is pretty clear, no??
You're selecting those columns in your inner SELECT:
spid
company
tech_email
rating
and your grouping only by two of those (GROUP BY ctr.sp_id, cci.email_address).
Either you need group by all four of them (GROUP BY ctr.sp_id, cci.email_address, company, rating), or you need to apply an aggregate function (SUM, AVG, MIN, MAX) to the other two columns (company and rating).
Or maybe using a GROUP BY here is totally the wrong way to do - what is it you're really trying to do here??

The inner query:
SELECT ctr.sp_id AS spid,
Substring(ctr.company, 1, 20) AS company,
cci.email_address AS tech_email,
CASE
WHEN rating IS NULL THEN 'unknown'
ELSE rating
END AS rating
FROM customer_contactinfo cci
INNER JOIN customertraffic_return ctr
ON ctr.sp_id = cci.sp_id
WHERE cci.email_address <> ''
AND cci.email_address NOT LIKE '%hotmail%'
AND cci.email_address IS NOT NULL
AND ( region LIKE 'Europe%'
OR region LIKE 'Asia%' )
AND SERVICE IN ( '1', '2' )
AND ( rating IN ( 'Premiere', 'Standard', 'unknown' )
OR rating IS NULL )
AND msgcount >= 5000
GROUP BY ctr.sp_id,
cci.email_address
has 4 non-aggregate things in the select (sp_id, company, email_address, rating) and you only group on two of them, so it is throwing an error on the first one it sees
So you either need to not group by any of them or group by all of them

i suggest replacing the * with a fully specified column list.

you can either group by all selected columns or use the other columns (not in group by clause) in a aggregate function (like sum)
you cannot: select a,b,c from bla group by a,b
but you can: select a,b,sum(c) from bla groupy by a,b

Related

How to write a BigQuery query that produces the count of the unique transactions and the combination of column names populated

I’m trying to write a query in BigQuery that produces the count of the unique transactions and the combination of column names populated.
I have a table:
TRAN CODE
Full Name
Given Name
Surname
DOB
Phone
The result set I’m after is:
TRAN CODE
UNIQUE TRANSACTIONS
NAME OF POPULATED COLUMNS
A
3
Full Name
A
4
Full Name,Phone
B
5
Given Name,Surname
B
10
Given Name,Surname,DOB,Phone
The result set shows that for TRAN CODE A
3 distinct customers provided Full Name
4 distinct customers provided Full Name and Phone #
For TRAN CODE B
5 distinct customers provided Given Name and Surname
10 distinct customers provided Given Name, Surname, DOB, Phone #
Currently to produce my results I’m doing it manually.
I tried using ARRAY_AGG but couldn’t get it working.
Any advice work be appreciated.
Thank you.
I think you want something like this:
select tran_code,
array_to_string(array[case when full_name is not null then 'full_name' end,
case when given_name is not null then 'given_name' end,
case when surname is not null then 'surname' end,
case when dob is not null then 'dob' end,
case when phone is not null then 'phone' end
], ','),
count(*)
from t
group by 1, 2
Consider below approach - no any dependency on column names rather than TRAN_CODE - quite generic!
select TRAN_CODE,
count(distinct POPULATED_VALUES) as UNIQUE_TRANSACTIONS,
POPULATED_COLUMNS
from (
select TRAN_CODE,
( select as struct
string_agg(col, ', ' order by offset) POPULATED_COLUMNS,
string_agg(val order by offset) POPULATED_VALUES,
string_agg(cast(offset as string) order by offset) pos
from unnest(regexp_extract_all(to_json_string(t), r'"([^"]+?)":')) col with offset
join unnest(regexp_extract_all(to_json_string(t), r'"[^"]+?":("[^"]+?"|null)')) val with offset
using(offset)
where val != 'null'
and col != 'TRAN_CODE'
).*
from `project.dataset.table` t
)
group by TRAN_CODE, POPULATED_COLUMNS
order by TRAN_CODE, any_value(pos)
below is output example
#Gordon_Linoff's solution is the best, but an alternative would be to do the following:
SELECT
TRAN_CODE,
COUNT(TRAN_ROW) AS unique_transactions,
populated_columns
FROM (
SELECT
TRAN_CODE,
TRAN_ROW,
# COUNT(value) AS unique_transactions,
STRING_AGG(field, ",") AS populated_columns
FROM (
SELECT
* EXCEPT(DOB),
CAST(DOB AS STRING ) AS DOB,
ROW_NUMBER() OVER () AS TRAN_ROW
FROM
sample) UNPIVOT(value FOR field IN (Full_name,
Given_name,
Surname,
DOB,
Phone))
GROUP BY
TRAN_CODE,
TRAN_ROW )
GROUP BY
TRAN_CODE,
populated_columns
But this should be more expensive...

Count Joins from Multiple Tables

For reference, I am using Postgres 9.2.23.
I have several tables where one table (user_group) is related to some other tables (eg: posts, group_invites, and some more other ones). There is, also a groups table, but it doesn't hold any data that I need for the purposes of these queries.
Table user_group:
fk_user_group_id, fk_user_id, fk_group_id, fk_invite_id user_status, ...
Table message:
pk_message_id, fk_user_id, fk_group_id, child_message_id, ...
Table group_prospective_user:
pk_prospective_user_id, fk_group_id, ...
I want to get some statistics for each of the related tables for a list of specified group ids if the user is a member of the group.
Right now I do this with one query for each related table, eg:
select
"public"."user_group"."fk_group_id" as "groupId",
count(case
when (
"public"."message"."child_message_id" is null
and "public"."message"."pk_message_id" is not null
) then "public"."message"."pk_message_id"
end) as "numDiscussions",
count("public"."message"."pk_message_id") as "numDiscussionPosts"
from "public"."user_group"
left outer join "public"."message"
on "public"."message"."fk_group_id" = "public"."user_group"."fk_group_id"
where (
"public"."user_group"."fk_group_id" in (
1, 11, 23, 530, 1070
)
and "public"."user_group"."role" in (
'ADMINISTRATOR', 'MODERATOR', 'MEMBER'
)
and "public"."user_group"."fk_user_id" = 17517
)
group by "public"."user_group"."fk_group_id"
And for invites:
select
"public"."user_group"."fk_group_id" as "groupId",
count(case
when "public"."prospective_user"."status" = 1 then "public"."prospective_user"."pk_prospective_user_id"
end) as "numInviteesExternal"
from "public"."user_group"
left outer join "public"."prospective_user"
on "public"."prospective_user"."fk_group_id" = "public"."user_group"."fk_group_id"
where (
"public"."user_group"."fk_group_id" in (
1, 11, 23, 530, 6176
)
and "public"."user_group"."role" in (
'ADMINISTRATOR', 'MODERATOR', 'MEMBER'
)
and "public"."user_group"."fk_user_id" = 17517
)
group by "public"."user_group"."fk_group_id"
The query to count the number of group invites is very similar to the above query. Just the count when and join on change.
Each of the queries to these tables has the same related logic for checking the groups to which the current user is an active member. Is there efficient way to merge multiple similar queries like this into a single query?
I tried using multiple LEFT JOINs with select count distinct, but that ran into performance issues on groups with both lots of messages, and lots of invites. Is there a way to easily/efficiently do this with, say, a subquery?
The answer from user #Parfait was the most scalable solution I could find. I based my queries on this tutorial: https://www.sqlteam.com/articles/using-derived-tables-to-calculate-aggregate-values.
While this isn't perfect, and results in a bunch of subqueries running, it does get me all the data at once, and with a single trip to the DB.
It ended up like this:
"groups"."groupId",
coalesce(
"members"."member_count",
0
) as "numActiveMembers",
coalesce(
"members"."invitee_count",
0
) as "numInviteesInternal",
coalesce(
"discussions"."discussions_count",
0
) as "numDiscussions",
coalesce(
"discussions"."posts_count",
0
) as "numDiscussionPosts"
from (
select "public"."user_group"."fk_group_id" as "groupId"
from "public"."user_group"
where (
"public"."user_group"."fk_group_id" in (
1, 2, 3, 4, 5
)
and "public"."user_group"."role" = 'ADMINISTRATOR'
and "public"."user_group"."fk_user_id" = 123
)
group by "public"."user_group"."fk_group_id"
) as "groups"
left outer join (
select
"public"."user_group"."fk_group_id" as "members_group_id",
count(distinct case
when "public"."user_group"."role" in (
'ADMINISTRATOR', 'MODERATOR', 'MEMBER'
) then "public"."user_group"."pk_user_group_id"
end) as "member_count",
count(distinct case
when "public"."user_group"."role" = 'INVITEE' then "public"."user_group"."pk_user_group_id"
end) as "invitee_count"
from "public"."user_group"
group by "public"."user_group"."fk_group_id"
) as "members"
on "members_group_id" = "groupId"
left outer join (
select
"public"."message"."fk_group_id" as "discussions_group_id",
count(case
when (
"public"."message"."child_message_id" is null
and "public"."message"."pk_message_id" is not null
) then "public"."message"."pk_message_id"
end) as "discussions_count",
count("public"."message"."pk_message_id") as "posts_count"
from "public"."message"
group by "public"."message"."fk_group_id"
) as "discussions"
on "discussions_group_id" = "groupId"```

I want to reduce my SQL Query on big Query

I want to fetch data from bigQuery database but I get an error
=>The query is too large. The maximum query length is 256.000K characters, including comments and white space characters.
i will show a part of query which i repeated 21 times
WITH data AS
(
SELECT
IFNULL(department, 'UNKNOWN_DEPARTMENT') AS dept,
> 'C7s'
AS campus,
COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1542565800000 AND 1543170599999) AS taskCount_0,
COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1542565800000 AND 1543170599999
AND IF (task.deadline.currentEscalationLevel NOT IN
(
'ESC_ACKNOWLEDGEMENT'
)
, task.deadline.currentEscalationLevel, 'NOT_ESCALATED') NOT IN
(
'NOT_ESCALATED'
)
) AS escCount_0,
COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1541961000000 AND 1542565799999) AS taskCount_1,
COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1541961000000 AND 1542565799999
AND IF (task.deadline.currentEscalationLevel NOT IN
(
'ESC_ACKNOWLEDGEMENT'
)
, task.deadline.currentEscalationLevel, 'NOT_ESCALATED') NOT IN
(
'NOT_ESCALATED'
)
) AS escCount_1,
COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1541356200000 AND 1541960999999) AS taskCount_2,
COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1541356200000 AND 1541960999999
AND IF (task.deadline.currentEscalationLevel NOT IN
(
'ESC_ACKNOWLEDGEMENT'
)
, task.deadline.currentEscalationLevel, 'NOT_ESCALATED') NOT IN
(
'NOT_ESCALATED'
)
) AS escCount_2
FROM
> `nsimplbigquery.TaskManagement.C7s_*`
WHERE
_TABLE_SUFFIX IN
(
'2018_47_11',
'2018_45_11',
'2018_46_11'
)
AND IFNULL(department, 'UNKNOWN_DEPARTMENT') IN
(
'ENGG_AND_MAINT_DEPARTMENT',
'FNB_DEPARTMENT',
'TELECOM_DEPARTMENT',
'IT_DEPARTMENT',
'BILLING_AND_INSURANCE',
'HOUSEKEEPING_DEPARTMENT'
)
AND task.taskRaised.raisedAt.milliSeconds BETWEEN 1541356200000 AND 1543170599999
GROUP BY
dept
)
,
mainQuery AS
(
SELECT
dept,
campus,
SUM(taskCount_0) AS taskCount_0,
SUM(escCount_0) AS escCount_0,
CAST(SAFE_DIVIDE(SUM(escCount_0), SUM(taskCount_0)) * 10000 AS INT64) AS escPerc_0,
SUM(taskCount_1) AS taskCount_1,
SUM(escCount_1) AS escCount_1,
CAST(SAFE_DIVIDE(SUM(escCount_1), SUM(taskCount_1)) * 10000 AS INT64) AS escPerc_1,
SUM(taskCount_2) AS taskCount_2,
SUM(escCount_2) AS escCount_2,
CAST(SAFE_DIVIDE(SUM(escCount_2), SUM(taskCount_2)) * 10000 AS INT64) AS escPerc_2
FROM
data
GROUP BY
ROLLUP (campus, dept)
)
SELECT
dept,
campus,
taskCount_0,
escCount_0,
escPerc_0,
taskCount_1,
escCount_1,
escPerc_1,
taskCount_2,
escCount_2,
escPerc_2
FROM
mainQuery
WHERE
campus IS NOT NULL
ORDER BY
CASE
WHEN
dept IS NULL
THEN
1
ELSE
0
END
ASC, dept ASC, campus ASC;
This is the query which I repeat so many times so can due to I have so many ids Where C7s i changed with following ids
C7z,
C7u,
H0B,
IDp,
ITR,
C7i,
C7j,
C7k,
C7l,
C7m,
C7o,
C71,
C7t,
F6qZ,
C7w,
GIui,
Fs,
C70,
C7p,
C7r
if you see my explainantion i quote a line this nsimplbigquery.TaskManagement.C7s_*
so at next query the table names is changed
like
nsimplbigquery.TaskManagement.C7z_*
Instead of repeating your whole SELECT statement 21 times, rather use below approach. You will have 3x21=63 entries in the that list for _TABLE_SUFFIX - but you will be able to get around your issue with query length
FROM `nsimplbigquery.TaskManagement.*`
WHERE _TABLE_SUFFIX IN (
'C7s_2018_47_11',
'C7s_2018_45_11',
'C7s_2018_46_11',
'C7z_2018_47_11',
'C7z_2018_45_11',
'C7z_2018_46_11',
'C7u_2018_47_11',
'C7u_2018_45_11',
'C7u_2018_46_11',
...
...
...
'C7r_2018_47_11',
'C7r_2018_45_11',
'C7r_2018_46_11',
)

databasename.d.first_column_in_the_table' isn't in GROUP BY

i always get the same error whenever i run this query.
i tried using simple query to test if there is something wrong in the query.
I also noticed that error happened when im using the view in my query.
CREATE OR REPLACE VIEW v_vss_car_wash AS
SELECT
max(r.id) AS id
,d.id AS dealer_id
,d.dealer_code
,d.dealer_name
,count(*) AS total_respondents
,sum(car_washed) AS car_washed -- Car Washed
,count(*) - sum(car_washed) AS car_unwashed -- Car Unwashed
,sum(IF (car_washed AND car_satisfied, 1, 0)) AS car_satisfied
,sum(IF (car_washed AND NOT car_satisfied, 1, 0)) AS car_unsatisfied
,MONTH(r.create_date) AS create_month
,YEAR(r.create_date) AS create_year
FROM t_vss_survey_response r
LEFT JOIN t_vss_dealer d ON (r.dealer_id = d.id)
WHERE survey_code = "ASS"
GROUP BY dealer_code, YEAR(r.create_date), MONTH(r.create_date);
then this is my query im using(very simple) but i always get the same error.
select a.dealer_name, a.dealer_code
from v_vss_car_wash a
now, this is the error message
Caused by: org.eclipse.birt.data.engine.odaconsumer.OdaDataException: Cannot get the result set metadata.
org.eclipse.birt.report.data.oda.jdbc.JDBCException: SQL statement does not return a ResultSet object.
SQL error #1:'crmsdbdev.d.dealer_name' isn't in GROUP BY
;
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: 'crmsdbdev.d.dealer_name' isn't in GROUP BY
Just add the appropriate columns to the group by:
CREATE OR REPLACE VIEW v_vss_car_wash AS
SELECT
max(r.id) AS id
,d.id AS dealer_id
,d.dealer_code
,d.dealer_name
,count(*) AS total_respondents
,sum(car_washed) AS car_washed -- Car Washed
,count(*) - sum(car_washed) AS car_unwashed -- Car Unwashed
,sum(IF (car_washed AND car_satisfied, 1, 0)) AS car_satisfied
,sum(IF (car_washed AND NOT car_satisfied, 1, 0)) AS car_unsatisfied
,MONTH(r.create_date) AS create_month
,YEAR(r.create_date) AS create_year
FROM t_vss_survey_response r
LEFT JOIN t_vss_dealer d ON (r.dealer_id = d.id)
WHERE survey_code = "ASS"
GROUP BY d.id, d.dealer_code, d.dealer_name,
YEAR(r.create_date), MONTH(r.create_date);

SQL group by issue when i try to get some info

I can not figure it out:
I have a table called ImportantaRecords with fields like market, zip5, MHI, MHV, TheTable and I want to group all the records by zip5 that have TheTable = 'mg'… I tried this :
select a.Market,a.zip5,count(a.zip5),a.MHI,a.MHV,a.TheTable from
(select * from ImportantaRecords where TheTable = 'mg') a
group by a.Zip5
but it gives me the classic error with not an aggrefate function
and then I tried this:
select Market,zip5,count(zip5),MHI,MHV,TheTable from ImportantaRecords where TheTable = 'mg'
group by Zip5
and the same thing…
any help ?
You did not state what database you are using but if you are getting an error about columns not being in an aggregate function, then you might need to add the columns not in an aggregate function to the GROUP BY:
select Market,
zip5,
count(zip5),
MHI,
MHV,
TheTable
from ImportantaRecords
where TheTable = 'mg'
group by Market, Zip5, MHI, MHV, TheTable;
If grouping by the additional columns alters the result that you are expecting, then you could use a subquery to get the result:
select i1.Market,
i1.zip5,
i2.Total,
i1.MHI,
i1.MHV,
i1.TheTable
from ImportantaRecords i1
inner join
(
select zip5, count(*) Total
from ImportantaRecords
where TheTable = 'mg'
group by zip5
) i2
on i1.zip5 = i2.zip5
where i1.TheTable = 'mg'