How do I use T-SQL Group By - sql

I know I need to have (although I don't know why) a GROUP BY clause on the end of a SQL query that uses any aggregate functions like count, sum, avg, etc:
SELECT count(userID), userName
FROM users
GROUP BY userName
When else would GROUP BY be useful, and what are the performance ramifications?

To retrieve the number of widgets from each widget category that has more than 5 widgets, you could do this:
SELECT WidgetCategory, count(*)
FROM Widgets
GROUP BY WidgetCategory
HAVING count(*) > 5
The "having" clause is something people often forget about, instead opting to retrieve all their data to the client and iterating through it there.

GROUP BY is similar to DISTINCT in that it groups multiple records into one.
This example, borrowed from http://www.devguru.com/technologies/t-sql/7080.asp, lists distinct products in the Products table.
SELECT Product FROM Products GROUP BY Product
Product
-------------
Desktop
Laptop
Mouse
Network Card
Hard Drive
Software
Book
Accessory
The advantage of GROUP BY over DISTINCT, is that it can give you granular control when used with a HAVING clause.
SELECT Product, count(Product) as ProdCnt
FROM Products
GROUP BY Product
HAVING count(Product) > 2
Product ProdCnt
--------------------
Desktop 10
Laptop 5
Mouse 3
Network Card 9
Software 6

Group By forces the entire set to be populated before records are returned (since it is an implicit sort).
For that reason (and many others), never use a Group By in a subquery.

Counting the number of times tags are used might be a google example:
SELECT TagName, Count(*)
AS TimesUsed
FROM Tags
GROUP BY TagName ORDER TimesUsed
If you simply want a distinct value of tags, I would prefer to use the DISTINCT statement.
SELECT DISTINCT TagName
FROM Tags
ORDER BY TagName ASC

GROUP BY also helps when you want to generate a report that will average or sum a bunch of data. You can GROUP By the Department ID and the SUM all the sales revenue or AVG the count of sales for each month.

Related

How Can I Show The Top Rows Those cover 50% Of the Total Visitor in SQL

Problem: I want to fetch a set of data from a table related to search keyword metrics. I want to fetch only the keywords that cover 50% of the total unique visitors. The overall code is given below -
SELECT se_keyword
,COUNT(DISTINCT visitor_id) AS Distinct_Visitors
FROM search_table
WHERE DATE >= '20210207'
GROUP BY se_keyword
ORDER BY Distinct_Visitors DESC
This will show all the keywords with unique visitors against the search keyword. But I want to show only the top keywords based on unique visitor that will cover 50% of total unique visitor.
This is a tricky problem. One method is the following:
Reduce the data to one row per user and keyword (not necessary if there are no duplicates).
Calculate a running total of the number of duplicates using count(distinct) as a window function.
Filter for the conditions you want.
Here is what the logic looks like:
select distinct ku.keyword, ku.running_num_users
from (select ku.*,
count(distinct userid) over (order by num_users desc) as running_num_users,
count(distinct userid) as num_users_overall
from (select keyword, userid,
count(*) over (partition by keyword) as num_users
from t
group by keyword, userid
) ku
) ku
where running_num_users <= 0.5 * num_users_overall;
Note that not all databases support count(distinct) as a window function. There are simple workarounds, however.

ORDER BY an aggregated column in Report Builder 3.0

On a report builder 3.0, i retreived some items and counted them using a Count aggregate. Now i want to order them from highest to lowest. How do i use the ORDER BY function on the aggregated column? The picture below show the a column that i want to ORDER BY it, it is ticked.
Pic
The code is vers simple as shown bellow:
SELECT DISTINCT act_id,NameOfAct,
FROM Acts
Your picture indicates you also want a Total row at the bottom:
SELECT
COALESCE(NameOfAct,'Total') NameOfAct,
COUNT(DISTINCT act_id) c
FROM Acts
GROUP BY ROLLUP(NameOfAct)
ORDER BY
CASE WHEN NameOfAct is null THEN 1 ELSE 0 END,
c DESC;
Result of example data:
NameOfAct count
-------------- -------
Act_B 3
Act_A 2
Act_Z 1
Total 6
Try it with example rows at: http://sqlfiddle.com/#!18/dbd6c/2
I looked at the Pic. So you might have duplicate acts with the same name. And you want to know the number of acts that have the same unique name.
You might want to group the results by name:
GROUP BY NameOfAct
And include the act names and their counts in the query results:
SELECT NameOfAct, COUNT(*) AS ActCount
(Since the act_id column is not included in the groups, you need to omit it in the SELECT. The DISTINCT is also not necessary anymore, since all groups are unique already.)
Finally, you can sort the data (probably descending to get the acts with the largest count on top):
ORDER BY ActCount DESC
Your complete query would become something like this:
SELECT NameOfAct, COUNT(*) AS ActCount
FROM Acts
GROUP BY NameOfAct
ORDER BY ActCount DESC
Edit:
By the way, you use field "act_id" in your SELECT clause. That's somewhat confusing. If you want to know counts, you want to look at either the complete table data or group the table data into smaller groups (with the GROUP BY clause). Then you can use aggregate functions to get more information about those groups (or the whole table), like counts, average values, minima, maxima...
Single record information, like an act's ID in your case, is typically not important if you want to use statistic/aggregate methods on grouped data. Suppose your query returns an act name which is used 10 times. Then you have 10 records in your table, each with a unique act_id, but with the same name.
If you need just one act_id that represents each group / act name (and assuming act_id is an autonumbering field), you might include the latest / largest act_id value in the query using the MAX aggregate function:
SELECT NameOfAct, COUNT(*) AS ActCount, MAX(act_id) AS LatestActId
(The rest of the query remains the same.)

Count(), max(),min() fuctions definition with many selects

Lets say we have a view/table hotel(hotel_n,hotel_name, room_n, price). I want to find the cheapest room. I tried group by room_n, but I want the hotels name (hotel_name) to be shown to the board without grouping it.
So as an amateur with sql(oracle 11g) I began with
select hotel_n, room_n, min(price)
from hotel
group by room_n;
but it shows the error: ORA-00979: not a GROUP BY expression. I know I have to type group by room_n, hotel_n, but I want the hotel_n to be seen in the table that I make without grouping by it!
Any ideas? thank you very much!
Aggregate functions are useful to show, well, aggregate information per group of rows. If you want to get a specific row from a group of rows in relation to the other group members (e.g., the cheapest room per room_n), you'd probably need an analytic function, such as rank:
SELECT hotel_n, hotel_name, room_n, price
FROM (SELECT hotel_n, hotel_name, room_n, price
RANK() OVER (PARTITION BY room_n ORDER BY price ASC) rk
FROM hotel) t
WHERE rk = 1

Can peewee nest SELECT queries such that the outer query selects on an aggregate of the inner query?

I'm using peewee2.1 with python3.3 and an sqlite3.7 database.
I want to perform certain SELECT queries in which:
I first select some aggregate (count, sum), grouping by some id column; then
I then select from the results of (1), aggregating over its aggregate. Specifically, I want to count the number of rows in (1) that have each aggregated value.
My database has an 'Event' table with 1 record per event, and a 'Ticket' table with 1..N tickets per event. Each ticket record contains the event's id as a foreign key. Each ticket also contains a 'seats' column that specifies the number of seats purchased. (A "ticket" is really best thought of as a purchase transaction for 1 or more seats at the event.)
Below are two examples of working SQLite queries of this sort that give me the desired results:
SELECT ev_tix, count(1) AS ev_tix_n FROM
(SELECT count(1) AS ev_tix FROM ticket GROUP BY event_id)
GROUP BY ev_tix
SELECT seat_tot, count(1) AS seat_tot_n FROM
(SELECT sum(seats) AS seat_tot FROM ticket GROUP BY event_id)
GROUP BY seat_tot
But using Peewee, I don't know how to select on the inner query's aggregate (count or sum) when specifying the outer query. I can of course specify an alias for that aggregate, but it seems I can't use that alias in the outer query.
I know that Peewee has a mechanism for executing "raw" SQL queries, and I've used that workaround successfully. But I'd like to understand if / how these queries can be done using Peewee directly.
I posted the same question on the peewee-orm Google group. Charles Leifer responded promptly with both an answer and new commits to the peewee master. So although I'm answering my own question, obviously all credit goes to him.
You can see that thread here: https://groups.google.com/forum/#!topic/peewee-orm/FSHhd9lZvUE
But here's the essential part, which I've copied from Charles' response to my post:
I've added a couple commits to master which should make your queries
possible
(https://github.com/coleifer/peewee/commit/22ce07c43cbf3c7cf871326fc22177cc1e5f8345).
Here is the syntax,roughly, for your first example:
SELECT ev_tix, count(1) AS ev_tix_n FROM
(SELECT count(1) AS ev_tix FROM ticket GROUP BY event_id)
GROUP BY ev_tix
ev_tix = SQL('ev_tix') # the name of the alias.
(Ticket
.select(ev_tix, fn.count(ev_tix).alias('ev_tix_n'))
.from_(
Ticket.select(fn.count(Ticket.id).alias('ev_tix')).group_by(Ticket.event))
.group_by(ev_tix))
This yields the following SQL:
SELECT ev_tix, count(ev_tix) AS ev_tix_n FROM (SELECT Count(t2."id")
AS ev_tix FROM "ticket" AS t2 GROUP BY t2."event_id")
GROUP BY ev_tix

SQL grouping results in a select

My SQL table "offers" contains offers users make for products (product_ID, customer_ID, offer).
In my admin page, I want to list the products for which at least one offer exists and show the total offers existing for it.
For example,
PRODUCT #324 Total offers: 42
PRODUCT #99 Total offers: 1
etc.
My guess would be to combine a
SELECT DISTINCT product_ID FROM offers...
And in a second query, to SELECT COUNT(*) FROM offers WHERE product_ID=...
Is it the most efficient way to achieve this, or is there a way to make it inside a single query?
You can do this in one query which will get the count by grouping by the product_id:
SELECT product_ID, COUNT(*)
FROM offers
GROUP BY product_ID
As bluefeet already answered, you achieve it in single query by using group by.
(group by demo)
Another thing to mention is the order by,
select
product_id as id,
count(*) as totals
from
t
group by product_id
order by totals;
If you want to sort with the totals of hits, or if you want to sort by product_id etc.
sqlfiddle