How do I select those records where the group by clause returns 2 or more? - sql

I'd like to return a list of items of only those that have two or more in the group:
select count(item_id) from items group by type_id;
Specifically, I'd like to know the values of item_id when the count(item_id) == 2.

You're asking for something that's not particularly possible without a subquery.
Basically, you want to list all values in a column while aggregating on that same column. You can't do this. Aggregating on a column makes it impossible to list of all the individual values from that column.
What you can do is find all type_id values which have an item_id count equal to 2, then select all item_ids from records matching those type_id values:
SELECT item_id
FROM items
WHERE type_id IN (
SELECT type_id
FROM items
GROUP BY type_id
HAVING COUNT(item_id) = 2
)
This is best expressed using a join rather than a WHERE IN clause, but the idea is the same no matter how you approach it. You may also want to select distinct item_ids in which case you'll need the DISTINCT keyword before item_id in the outer query.

If your SQL dialect includes GROUP_CONCAT(), that could be used to generate a list of items without the inner query. However, the results differ; the inner query returns one item id per row, where GROUP_CONCAT() returns multiple ids as a string.
SELECT type_id, GROUP_CONCAT(item_id), COUNT(item_id) as number
FROM items
GROUP BY type_id
HAVING number = 2

Try this sql query:
select count(item_id) from items group by type_id having count(item_id)=2;

Related

Why does adding GROUP BY cause a seemingly unrelated error?

The following code works fine:
SELECT name, (SELECT count(item_id) FROM bids WHERE item_id = items.id)
FROM items;
However, when I add
SELECT name, (SELECT count(item_id) FROM bids WHERE item_id = items.id)
FROM items
GROUP BY name;
I get ERROR: subquery uses ungrouped column "items.id" from outer query
Can anyone tell me why this is happening? Thanks!
If you GROUP BY name then any other columns you select from items must have an aggregate function applied. That's what GROUP BY means.
In your case, you are using another column from items -- id -- in a correlated scalar subquery. That's not an aggregate function, and id is not in the GROUP BY clause, so you get an error.
You could instead GROUP BY name, id. That should give you the same results as the first query, and is probably pointless.
If you actually have multiple rows in items with the same value for name, and you want to group the results of the scalar subquery for those values, you need to specify how to group them. Perhaps you want the total of the subquery results for each value of name. If so, I think you could do:
SELECT name, SUM(SELECT count(item_id) FROM bids WHERE item_id = items.id))
FROM items
GROUP BY name;
(I'm not positive about the specific syntax as I don't have a Postgres instance to test against.)
A clearer way to express it might be:
SELECT name, SUM(bid_count)
FROM (
SELECT name, (SELECT count(item_id) FROM bids WHERE item_id = items.id) AS bid_count
FROM items
)
GROUP BY name
Join the tables then perform the GROUP BY:
select i.name, count(b.item_id)
from items i
inner join bids b
on b.item_id = i.id
group by i.name
db<>fiddle here

how can I select rows that column does NOT have more than 1 value?

I am very new to SQL and I am wondering how to solve this issue. For example, my table looks as follows:
As you see in the table item_id 1 appears in both city_id 1 and 2, so does the item_id 4, but I want to get all the items where appears only in one city_id.
In this example, these would be item_id 2 (appearing only in city_id 2) and item_id 3 (appearing in city_id 1).
Use aggregation on item_id and count distinct values of city_id. The having clause can be used to filter on aggregates.
select item_id from mytable group by id having count(distinct city_id) = 1
You can use the following query:
SELECT item_id
FROM table_name
GROUP BY item_id
HAVING COUNT(DISTINCT city_id) = 1
In case you want to see the city_id to you can use this query:
SELECT item_id, MIN(city_id) AS city_id
FROM example
GROUP BY item_id
HAVING COUNT(DISTINCT city_id) = 1
Since there is only one city_id you can use MIN or MAX to get the id.
demo on dbfiddle.uk
You want all the id where they have only one distinct city:
SELECT item_id
FROM table
GROUP BY item_id
HAVING count(distinct city_id) = 1
It works by counting all the different values that city_id has for the same item_id. For those item ids where they repeat a lot, but the city_id is always the same the count of unique values in the city id is 1, and we can look for these using a HAVING clause. "Having" is like a where clause that runs after a GROUP BY operation is completed. It is the conceptual equivalent of this:
SELECT item_id
FROM
(
SELECT item_id, count(distinct city_id) as cdci
FROM table
GROUP BY item_id
) x
WHERE cdci = 1
If you want the city id too you can either get the MAX city (because in this case there is only one city so it's safe to do):
SELECT item_id, MAX(city_id) as city_id
FROM table
GROUP BY item_id
HAVING count(distinct city_id) = 1
or you could join this query back to the item table as a subquery:
SELECT t.*
(
SELECT item_id
FROM table
GROUP BY item_id
HAVING count(distinct city_id) = 1
) x
INNER JOIN
table t
ON x.item_id = t.item_id
This technique is the more general process for performing a group by that finds some particular set of rows, then bringing in the rest of the data from that row. You cant always stick every other column you want in a MAX because it will mix row data up, and you can't put the extra columns in your group by because that will subdivide what you're grouping on, giving the wrong results. Doing the group as a subquery and joining it back is a typical way to get all the row data when you have to group it to find which rows are interesting
In your case this form of query will bring all the duplicated rows (whereas the group by/max won't). If you don't want the duplicate rows you can make the top line SELECT DISTINCT t.* but don't make a habit of slapping distinct in to get rid of duplicated rows; if your tables don't have duplicates to start with but suddenly after you wrote a JOIN you got duplicated rows, google fornwhat a Cartesian product is in database queries and how to prevent it
You just need a group by on item id with having
Select item_id from table group by
item_id having count(distinct city_id)
=1
Also, if you want to have majority of same no of rows as input then
Select item_id, city, rank()
over(partition by item_id order by city)
rn
From table where rn=1;

How to select all columns and count from a table?

I'm trying to select all columns in table top_teams_team as well as get a count of values for the hash_value column. The sql statement here is partially working in that it returns two columns, hash_value and total. I still want it to give me all the columns of the table as well.
select hash_value, count(hash_value) as total
from top_teams_team
group by hash_value
In the sql statement below, it gives me all the columns, but there are duplicates hash_value being displayed which isn't what I want. I tried putting distinct keyword in but it wasn't working correctly or maybe I'm not putting it in the right place.
select *
from top_teams_team
inner join (
select hash_value, count(hash_value) as total
from top_teams_team
group by hash_value
) q
on q.hash_value = top_teams_team.hash_value
A combination of a window function with DISTINCT ON might do what you are looking for:
SELECT DISTINCT ON (hash_value)
*, COUNT(*) OVER (PARTITION BY hash_value) AS total_rows
FROM top_teams_team
-- ORDER BY hash_value, ???
;
DISTINCT ON is applied after the window function, so Postgres first counts rows per distinct hash_value before picking the first row per group (incl. that count).
The query picks an arbitrary row from each group. If you want a specific one, add ORDER BY expressions accordingly.
This is not "a count of values for the hash_value column" but a count of rows per distinct hash_value. I guess that's what you meant.
Detailed explanation:
Best way to get result count before LIMIT was applied
Select first row in each GROUP BY group?
Depending on undisclosed information there may be (much) faster query styles ...
Optimize GROUP BY query to retrieve latest row per user
I am assuming that you are getting duplicate columns when you say: "but there are duplicates hash_value being displayed"
select q.hash_value, q.total, ttt.field1, ttt.field2, ttt.field3
from top_teams_team ttt
join (
select hash_value, count(hash_value) as total
from top_teams_team
group by hash_value
) q
on q.hash_value = top_teams_team.hash_value
Try using COUNT as an analytic function:
SELECT *, COUNT(*) OVER (PARTITION BY hash_value) total
FROM top_teams_team;

count the total number of column field appeared more than once in database

I am trying to run the query to get the total number of repetitions (appeared more than once) for one column called "abc" . I am trying this but not able to achieve.
select COUNT(SELECT DISTINCT card_no, COUNT(*) AS cnt )
please help, thanks in advance.
For Example below is the column :
cards
123,
456
,123
Result:
Count
1
As 123 appeared more than once.
You want the number of distinct values in the column that are repeated at least once, is that right?
SELECT COUNT(dupes)
FROM (SELECT card_no AS dupes, COUNT(*) cnt FROM table_name
GROUP BY card_no HAVING COUNT(*) > 1) A
Edit for explanation.
The inner query SELECT card_no AS dupes, COUNT(*) cnt FROM table_name GROUP BY card_no HAVING COUNT(*) > 1 returns only those values that are repeated in your table. The aliases on the columns are necessary because it's a subquery. You can run this query independently of the outer query to see what results it returns.
You have to have the group by on any field that you don't want to aggregate when you're aggregating other fields (e.g. performing a count of records), and the HAVING part is to filter out anything that isn't duplicated (i.e. has a count of 1). HAVING is the way to apply filtering on aggregated fields that you can't have in a WHERE.
The outer query SELECT COUNT(dupes)... is merely counting the number of card_no values returned by the inner query. Since these are grouped, it gives the number of distinct values that are duplicated.
A at the end there sets up an alias for the subquery so that it can be referenced like it's an actual table elsewhere in the query. This is necessary for any subquery in the FROM clause of another query. Effectively the select in the outer query reads SELECT COUNT(A.dupes)... and without the alias A there would be no way to qualify where the dupes field is being referenced from (even though in this case it's implied).
It's also worth noting that the field COUNT(*) cnt isn't required in the SELECT part of the subquery as it isn't being used anywhere else in the query. It will work just as effectively without it, as long as you still have the GROUP BY and HAVING clauses.
SELECT
card_no, COUNT(*) AS "Occurrences"
FROM
YourTable
GROUP BY card_no
HAVING
COUNT(*) > 1

BigQuery how to group by after flattening a collection of tables over timerange

I'm trying to do the following:
combine tables over a timerange using FROM TABLE_DATE_RANGE
FLATTEN that set of data
GROUP BY ColumnX
SELECT ColumnX, SUM(ColumnY), SUM(ColumnZ) over only unique ColumnX values.
here's the gist of my query:
SELECT
r.ColumnX
,SUM(r.ColumnY)
,SUM(r.ColumnZ)
FROM
(
SELECT *
FROM FLATTEN(
(
SELECT
ColumnX
,ColumnY
,ColumnZ
FROM TABLE_DATE_RANGE(projectx.events_,
TIMESTAMP('2015-09-01'), TIMESTAMP('2015-09-08'))), my_funky_object
)
WHERE ColumnY > 10
) r
GROUP BY
r.ColumnX
The problem is, I get a number of rows WAY GREATER than the count of unique values of ColumnX should. So I took a step back and simply outputted the GROUP BY - COUNT of ColumnX in order to debug, and I get the following output!
and I get what looks like an intermediate result.
What is happening and how do I ensure that my outer select only aggregates over unique values of ColumnX?
You're getting the count of each distinct value of ColumnX, but you're only showing the count, not the value.
If your goal is to get an accurate count for the number of distinct values, try something like this:
SELECT
COUNT(*) ct
FROM (
SELECT
1
FROM
... rest of your query ...
GROUP BY r.ColumnX
)
That inner query will give you exactly one row (each with the value 1) for each distinct value of ColumnX. The outer select statement will count the number of such rows.
Another alternative is to use EXACT_COUNT_DISTINCT to get the exact count of rows. That's simpler but less scalable than using GROUP BY.