SQL grouping/counting on a string split function - sql

Ok so my original is this
select people, count(*)
from table
group by people
but some of the people have multiple people so this aggregation will not give you pure counts for A, B, C but also each iteration
A 10
B 5
A, B 1
A, C 2
C 15
A, B, C 3
etc.
This works to get the full list of individuals in legacy sql
select split(people,",") as person
from table
But I cannot use the group by on it
select split(people,",") as person, count(*)
from table
group by person
gives the error
Cannot group by an aggregate.
I feel like the solution is a subquery, somehow, but I'm not sure how to execute it

Try wrap with an outer query
select person, count(*)
from(
select split(people,",") as person
from table
) t
group by person

Related

How to select * in addition to group by?

Consider a PostgreSQL table with fields a-z
a, b, c ... z
-------------
5, 6, 2 ... 9
5, 6, 3 ... 1
I'd like to do a group on fields a,b and keep only records where b was maximum.
SELECT a, max(b) as b, c, d, e ... z
FROM table
GROUP BY a, b
This works fine, but it's annoying to have to type out all the values in SELECT. I'd much rather do something like
SELECT max(b) as b, *
FROM TABLE
But doing so gives error
[42803] ERROR: column "table.id" must appear in the GROUP BY clause or
be used in an aggregate function.
Any idea how to avoid having to type all the column names in a lengthy table when doing a groupby operation?
You can use rank():
select t.*
from (select t.*, rank() over (partition by a order by b desc) as seqnum
from t
) t
where seqnum = 1;
Actually, in Postgres, the fastest method is usually distinct on:
select t.*
from t
order by a, b desc;
With an index on (a, b desc) this should be the fastest method.
Gordon Linoff's answer put me on the right track, namely using distinct on. This works in postgres
SELECT DISTINCT ON (a, b) *
FROM table
ORDER BY a, b DESC
Basically it lists the distinct rows of (a,b) and sorts them in order, hence taking only the first or last value depending on sort order. Actually surprised this works...

Why does select count(*) give different result that select?

For the following query, I get 2000. But when I run the query inside the
outer select Count(*) , it returns me 1100 records. Why is that?
Actual query had to be truncated due to StackOverflow whining about too much code.
SELECT COUNT(*)
FROM (
SELECT Max(sequence) Sequence,
Max(keycode) Keycode,
Min(dfo) DFO,
segmentid
FROM ( SomeTable2)
UNION ALL
SELECT TOP 1000 *
FROM SomeTable2
)
AS C
Select count (*) can admit repetitions or redundants columns that why you get more results but with distinct the count will only hive you the occurency of columns without repetitions.Example You have the following record in your table a b c d e d .With select count(*) you will get 6 but the letter d come 2 times but which distinct you will get 5 as it ignore the repeted record as it will ignore d become it's more than one time.

concatenating columns from multiple unrelated 1-row resultsets (with group by)

I found this very similar question here and all I would like to do in addition to this is group by date. There is a date column present in each sub-query.
concatenating columns from multiple unrelated 1-row resultsets
Each sub-query will all contain the exact same range of dates. I have tried putting group by in each sub-query, and in the outer query, but I can't seem to make a row combining each sub-query for each date. I am using this query in Hive, but I believe any ANSI SQL will work here. Don't quote me on that though. My scenario seems to be a minor variation on the answers found in the link I posted, however I can't seem to make it work.
Here is one query posted at the link I attached above:
select A, B, C, D
from ( SELECT SUM(A) as A, SUM(B) as B FROM X ) as U
CROSS JOIN ( SELECT SUM(C) as C, SUM(D) as D FROM Y ) as V
How do I add a GROUP BY to this when each sub-query has a date column? Or is there a better way to achieve the same result?
Is this what you want?
select u.dte, A, B, C, D
from (select dte, SUM(A) as A, SUM(B) as B
from X
group by dte
) u join
(select dte, SUM(C) as C, SUM(D) as D
from Y
group by dte
) v
on u.dte = v.dte;

Group Count in SQL

I am looking for a way to display a table where a set of multiple attributes appear more than one time.
For example, suppose I had a table, Tbl1 with attributes A, B, C, D, E
How do I make a query such that it only shows rows where A, B, C appear more than once (as in the same A, B, C as a group), but D and E may or may not be different?
My attempt:
SELECT *
FROM Tbl1
WHERE COUNT(A, B, C) > 1
and I get an error: "group function is not allowed here"
The reason for this is, that you cannot use this grouping in the WHERE-part of an sql clause.
SELECT colums
FROM tables
WHERE condition
the condition refers to a single row of the table.
What you want is HAVING
SELECT colums
FROM tables
HAVING condition
The condition after HAVING is evaluated after the grouping and there you can use aggregation functions like COUNT or SUM
Use the GROUP BY clause (SQL Server: http://msdn.microsoft.com/en-us/library/ms177673.aspx, MySQL: http://www.tutorialspoint.com/mysql/mysql-group-by-clause.htm).
Within each group, you'll want to get get the count of rows in that group (using COUNT(*)) and then use a HAVING clause to filter on that count. HAVING is like a WHERE clause for GROUP BY. It filters on the results of the grouping, and can make reference to the grouped columns (in this case, A, B and C), or any aggregates (in this case, COUNT(*)).
Here's what your query could look like. Note that you can only include columns in the SELECT field list that are mentioned in the GROUP BY or that are contained in aggregate functions such as COUNT() and MAX(). MySQL will let you get away with putting other columns in, but SQL Server will give you an error. It's best to follow this rule even if the database allows it.
SELECT A,
B,
C,
COUNT(*) AS GroupCount
FROM Tbl1
GROUP BY A, B, C
HAVING COUNT(*) > 1
If you want the full rows where this is true, then you can used a derived table:
SELECT *
FROM Tbl1
JOIN (
SELECT A,
B,
C,
COUNT(*) AS GroupCount
FROM Tbl1
GROUP BY A, B, C
HAVING COUNT(*) > 1
) AS duplicates
ON duplicates.A = Tbl1.A AND
duplicates.B = Tbl1.B AND
duplicates.C = Tbl1.C

Get row count including column values in sql server

I need to get the row count of a query, and also get the query's columns in one single query. The count should be a part of the result's columns (It should be the same for all rows, since it's the total).
for example, if I do this:
select count(1) from table
I can have the total number of rows.
If I do this:
select a,b,c from table
I'll get the column's values for the query.
What I need is to get the count and the columns values in one query, with a very effective way.
For example:
select Count(1), a,b,c from table
with no group by, since I want the total.
The only way I've found is to do a temp table (using variables), insert the query's result, then count, then returning the join of both. But if the result gets thousands of records, that wouldn't be very efficient.
Any ideas?
#Jim H is almost right, but chooses the wrong ranking function:
create table #T (ID int)
insert into #T (ID)
select 1 union all
select 2 union all
select 3
select ID,COUNT(*) OVER (PARTITION BY 1) as RowCnt from #T
drop table #T
Results:
ID RowCnt
1 3
2 3
3 3
Partitioning by a constant makes it count over the whole resultset.
Using CROSS JOIN:
SELECT a.*, b.numRows
FROM YOUR_TABLE a
CROSS JOIN (SELECT COUNT(*) AS numRows
FROM YOUR_TABLE) b
Look at the Ranking functions of SQL Server.
SELECT ROW_NUMBER() OVER (ORDER BY a) AS 'RowNumber', a, b, c
FROM table;
You could do it like this:
SELECT x.total, a, b, c
FROM
table
JOIN (SELECT total = COUNT(*) FROM table) AS x ON 1=1
which will return the total number of records in the first column, followed by fields a,b & c