SQL to count table column - sql

I have a single table as shown below:
ID | Name | Category
--------------------
1 | Cat | Mammal
2 | Dog | Mammal
3 | Pea | Vegetable
4 | Snake| Reptile
I would like an SQL query that will list each individual item with the count of the elements in its category. i.e.
Name | Count of Category
-------------------------
Cat | 2
Dog | 2
Pea | 1
Snake | 1
Edit1: I am using postgrsql

If your DBMS support window function, we can try to use COUNT window function and add each individual item in PARTITION BY
SELECT Name,COUNT(*) OVER(PARTITION BY Category)
FROM T
sqlfiddle

If possible, I would also prefer a window function like D-Shih showed. If your DB doesn't support this, you can use a subquery to count the category, something like this:
SELECT name, c "Count of Category" FROM
yourtable y JOIN
(SELECT category, COUNT(category) c FROM yourtable GROUP BY category) sub
ON y.category = sub.category;
This will produce the identic outcome.

Related

Counting SQLite rows that might match multiple times in a single query

I have a SQLite table which has a column containing categories that each row may fall into. Each row has a unique ID, but may fall into zero, one, or more categories, for example:
|-------+-------|
| name | cats |
|-------+-------|
| xyzzy | a b c |
| plugh | b |
| quux | |
| quuux | a c |
|-------+-------|
I'd like to obtain counts of how many items are in each category. In other words, output like this:
|------------+-------|
| categories | total |
|------------+-------|
| a | 2 |
| b | 2 |
| c | 2 |
| none | 1 |
|------------+-------|
I tried to use the case statement like this:
select case
when cats like "%a%" then 'a'
when cats like "%b%" then 'b'
when cats like "%c%" then 'c'
else 'none'
end as categories,
count(*)
from test
group by categories
But the problem is this only counts each row once, so it can't handle multiple categories. You then get this output instead:
|------------+-------|
| categories | total |
|------------+-------|
| a | 2 |
| b | 1 |
| none | 1 |
|------------+-------|
One possibility is to use as many union statements as you have categories:
select case
when cats like "%a%" then 'a'
end as categories, count(*)
from test
group by categories
union
select case
when cats like "%b%" then 'b'
end as categories, count(*)
from test
group by categories
union
...
but this seems really ugly and the opposite of DRY.
Is there a better way?
Fix your data structure! You should have a table with one row per name and per category:
create table nameCategories (
name varchar(255),
category varchar(255)
);
Then your query would be easy:
select category, count(*)
from namecategories
group by category;
Why is your data structure bad? Here are some reasons:
A column should contain a single value.
SQL has pretty lousy string functionality.
SQL queries to do what you want cannot be optimized.
SQL has a great data structure for storing lists. It is called a table, not a string.
With that in mind, here is one brute force method for doing what you want:
with categories as (
select 'a' as category union all
select 'b' union all
. . .
)
select c.category, count(t.category)
from categories c left join
test t
on ' ' || t.categories || ' ' like '% ' || c.category || ' %'
group by c.category;
If you already have a table of valid categories, then the CTE is not needed.

Closest match per group?

I have table that looks similar to this:
Motor MotorType CalibrationValueX CalibrationValueY
A Car 1.2343 2.33343
B Boat 1.2455 2.55434
B1 Boat 1.4554 2.11211
C Car 1.4323 4.56555
D Car 1.533 4.6666
..... 500 entries
In my SQL query, I am trying to find average of CalibrationValueY where CalibrationValueX is a certain value:
SELECT avg(CalibrationValueY), MotorType, Motor FROM MotorTable
WHERE CalibrationValueX = 1.23333
GROUP BY MotorType
This will not return anything, since there is not a CalibrationValueX value that equals exactly 1.23333.
I am able to find closest match separately for each MotorTable with:
SELECT TOP 1 CalibrationValueY, FileSize, MotorType, Motor FROM MotorTable
where FileType = 'text' order by abs(FileSize - 1.23333)
however, I can't get it to work with a group by statement.
How can I do it so that if I am grouping by MotorType and I am searching CalibrationValueX = 1.23333, I would get this:
A Car 1.2343 2.33343
B Boat 1.2455 2.55434
Using ROW_NUMBER and PARTITION BY You combinate TOP 1 for each group
SQL Fiddle Demo
with cte as (
SELECT MotorType, CalibrationValueX, CalibrationValueY,
ROW_NUMBER() over (partition by MotorType order by abs(CalibrationValueX - 1.23333)) rn
from historyCR
)
SELECT *
from cte
where rn = 1
OUTPUT
| MotorType | CalibrationValueX | CalibrationValueY | rn |
|-----------|-------------------|-------------------|----|
| Boat | 1.2455 | 2.55434 | 1 |
| Car | 1.2343 | 2.33343 | 1 |

Sybase SQL Select Distinct Based on Multiple Columns with an ID

I'm trying to query a sybase server to get examples of different types of data we hold for testing purposes.
I have a table that looks like the below (abstracted)
Animals table:
id | type | breed | name
------------------------------------
1 | dog | german shepard | Bernie
2 | dog | german shepard | James
3 | dog | husky | Laura
4 | cat | british blue | Mr Fluffles
5 | cat | other | Laserchild
6 | cat | british blue | Sleepy head
7 | fish | goldfish | Goldie
As I mentioned I want an example of each type so for the above table would like a results set like (in reality I just want the ID's):
id | type | breed
---------------------------
1 | dog | german shepard
3 | dog | husky
4 | cat | british blue
5 | cat | other
7 | fish | goldfish
I've tried multiple combinations of queries such as the below but they are either invalid SQL (for sybase) or return invalid results
SELECT id, DISTINCT ON type, breed FROM animals
SELECT id, DISTINCT(type, breed) FROM animals
SELECT id FROM animals GROUP BY type, breed
I've found other questions such as SELECT DISTINCT on one column but this only deal with one column
Do you have any idea how to implement this query?
Maybe you have to use aggregate function max or min for column ID. It will return only one ID for grouped columns.
select max(Id), type, breed
from animals
group by type, breed
EDIT:
Other different ways to do it:
With having and aggregate function
select id, type, breed
from animals
group by type, breed
having id = max(Id)
With having and aggregate subquery
select id, type, breed
from animals a1
group by type, breed
having id = (
select max(id)
from animals a2
where a2.type = a1.type
and a2.breed = a1.breed
)
Try this and let me know if it works:
select distinct breed, max(id) as id , max(type) as type
from animals
You may have to play around with max()
The arbitrary choice here is max(), but you could arbitrarily use min() instead.
max() returns the largest value for that columns, min() the smallest

how to limit the amount of sql data that i want to display?

example of what i have in my database :
sub-category | item
-----------------------
Fruit | Orange
Fruit | Apple
Fruit | Pineapple
Fruit | Cherry
Vegetable | Potato
Vegetable | Celery
Vegetable | Ginger
Vegetable | Carrot
Drinks | Coffee
Drinks | Tea
Drinks | Coke
Drinks | Pepsi
I tried to use the following code, but it only displays 1 item per category instead of displaying all the items:
SELECT SubCategory, Item
FROM ItemList
GROUP BY SubCategory
ORDER BY item DESC
What I get is :
sub-category | item
-----------------------
Fruit | Apple
Vegetable | Carrot
Drinks | Coke
What I want is the following (example with a display limit of 2):
sub-category | item
-----------------------
Fruit | Apple
Fruit | Cherry
Vegetable | Carrot
Vegetable | Celery
Drinks | Coke
Drinks | Coffee
Well, most SQL implementations have a keyword: LIMIT <#> you can utilize at the end of your query, replacing # with the number of rows you want displayed. I don't know how you formatted your data, but if you drop that at the end of your query, it should suffice.
You cannot use the GROUP BY clause, because it will then only display one item per group. However, You can also use the ORDER BY clause, and this can be used for more than one item. For example...
SELECT * FROM table ORDER BY category DESC,subCategory DESC LIMIT 5;
If you only want to display the top 5 from each category though, you won't be able to do that with a single query.
* Someone have changed the original question. The question is totally not the same as before. So my answer would not be useful. *
I recommend you to add ID field to be a primary key and ID should be int.
Then you can use this code
select t1.*
from yourtable t1
where 5 >= (select count(t2.id)
from yourtable t2
where t1.category= t2.category
and t1.id >= t2.id
)
I hope this works on SQLite. I have tested it on MySQL.
In MSSQL you can partition by a column to and then use that to filter
SELECT
SubCategory,
Item
FROM
(
SELECT
SubCategory,
Item,
ROW_NUMBER() OVER(partition SubCategory, ORDER BY SubCategory) AS seq
FROM
ItemList
) AS t
WHERE seq < 3
http://msdn.microsoft.com/en-us/library/ms189461.aspx

Confusing SELECT statement

First I will show you example tables that my issue pertains to, then I will ask the question.
[my_fruits]
fruit_name | fruit_id | fruit_owner | fruit_timestamp
----------------------------------------------------------------
Banana | 3 | Timmy | 3/4/11
Banana | 3 | Timmy | 4/1/11
Banana | 8 | Timmy | 5/2/11
Apple | 4 | Timmy | 2/1/11
Apple | 4 | Roger | 3/4/11
Now I want to run a query that only selects fruit_name, fruit_id, and fruit_owner values. I only want to get one row per fruit, and the way I want it to be decided is by the latest timestamp. For example the perfect query on this table would return:
[my_fruits]
fruit_name | fruit_id | fruit_owner |
----------------------------------------------
Banana | 8 | Timmy |
Apple | 4 | Roger |
I tried the query:
select max(my_fruits.fruit_name) keep
(dense_rank last order by my_fruits.fruit_timestamp) fruit_name,
my_fruits.fruit_id, my_fruits.fruit_owner
from my_fruits
group by my_fruits.fruit_id, my_fruits.fruit_owner
Now the issue with that is returns basically distinct fruit names, fruit ids, and fruit owners.
For Oracle 9i+, use:
SELECT x.fruit_name,
x.fruit_id,
x.fruit_owner
FROM (SELECT mf.fruit_name,
mf.fruit_id,
mf.fruit_owner,
ROW_NUMBER() OVER (PARTITION BY mf.fruit_name
ORDER BY mf.fruit_timestamp) AS rank
FROM MY_FRUIT mf) x
WHERE x.rank = 1
Most databases will support using a self join on a derived table/inline view:
SELECT x.fruit_name,
x.fruit_id,
x.fruit_owner
FROM MY_FRUIT x
JOIN (SELECT t.fruit_name,
MAX(t.fruit_timestamp) AS max_ts
FROM MY_FRUIT t
GROUP BY t.fruit_name) y ON y.fruit_name = x.fruit_name
AND y.max_ts = x.fruit_timestamp
However, this will return duplicates if there are 2+ fruit_name records with the same timestamp value.
If you want one row per fruit name, you have to group by fruit_name.
select fruit_name,
max(my_fruits.fruit_id) keep
(dense_rank last order by my_fruits.fruit_timestamp) fruit_id,
max(my_fruits.fruit_owner) keep
(dense_rank last order by my_fruits.fruit_timestamp) fruit_owner
from my_fruits
group by my_fruits.fruit_name
How you want to deal with tie-breaks is a separate issue.
Try a subquery:
select a.fruit_name, a.fruit_id, a.fruit_owner
from my_fruits a
where a.fruit_timestamp =
(select max(b.fruit_timestamp)
from my_fruits b
where b.fruit_id = a.fruit_id)
I would do it by finding out the list of (fruit_name, fruit_timestamp) which are of interest to you, and then grouping that "table" with the actual fruit table and retrieving the other values.
SELECT fruit_and_max_t.fruit_name,
my_fruits.fruit_id,
my_fruits.fruit_owner
FROM my_fruits,
( SELECT fruit_name, MAX(fruit_timestamp) AS max_timestamp
FROM my_fruits
GROUP BY fruit_name) AS fruit_and_max_t,
WHERE fruit_and_max_t.max_timestamp = my_fruits.fruit_timestamp
AND fruit_and_max_t.fruit_name = my_fruits.fruit_name
This assumes that there are not multiple entries in the table with the same value of (fruit_name, fruit_timestamp), i.e. that tuple (pair) act like a unique identifier.