Grouping by two values in same table

Grouping by two values in same table - sql

I have a table on the format
Ship_type | userid | Message
Neither of these columns are unique.
I want to count how many (unique) user id's that belong to each ship type, and thus find out which ship type is the most popular.
Example:
Ship_type | userid| Message
-------------- ------- ----------
Sailboat | 34241 | hello
Sailboat | 34241 | hi
Sailboat | 34241 | I'm on a boat!
Fishingvessel | 31245 | yo
Fishingvessel | 98435 | hi there
Here we see that there are two different fishingvessels and one sailboat.
If I do the following query:
select ship_type, count(ship_type) FROM db1.MessageType5 GROUP BY ship_type ORDER BY count(ship_type) ASC;
I get
Sailboat | 3
Fishingvessel | 2
which is wrong - as it counts the number of messages belonging to each ship_type.
Desired result:
Fishingvessel | 2
Sailboat | 1

You have to COUNT DISTINCT user ids (and ORDER BY ... DESC if you want the provided result):
SELECT ship_type, COUNT(DISTINCT userid) as cnt
FROM db1.MessageType5
GROUP BY ship_type
ORDER BY cnt DESC
See this fiddle.

Related

How do I list the values being counted?

I use the aggregate function to count the most occurring unique values (which lets say is 5). I now want to list these unique values that were counted in a column - struggling with how to do that. Can I even do that? I'm using PostgreSQL.
SELECT IDs,
COUNT(DISTINCT people) AS num_people
FROM class
GROUP BY IDs
ORDER BY COUNT(DISTINCT people) desc
LIMIT 1
Current Sample Result:
-------------------------------------
| **IDs** | **num_people** |
-------------------------------------
| Aabbcc | 5 |
-------------------------------------
I want this result with the new column at thee end. (It could be a separate rows too - it
does not have to be all in one row - but that would be ideal)
-----------------------------------------------------------------------
| **IDs** | **num_people** | **people_listed** |
-----------------------------------------------------------------------
| Aabbcc | 5 | Coco, Riley, Allan, Betty, Cici |
-----------------------------------------------------------------------

You could use the aggregate function ARRAY_AGG or STRING_AGG for that:
SELECT IDs,
COUNT(DISTINCT people) AS num_people,
STRING_AGG(DISTINCT people, ', ') AS people_listed
FROM class
GROUP BY IDs
ORDER BY COUNT(DISTINCT people) desc
LIMIT 1

How to get the frequency in postgresql

I have a table (table_1) like this (I have simplified it)
code | description | season
----------+------------------+--------------
500 | info 1 | fall
500 | info 4 | fall
500 | info 8 | fall
500 | info 1 | winter
300 | info 1 | spring
400 | info 1 | fall
And I want a table like below, where I have the frequency of codes in each season
season | Number of Unique Codes
----------+------------------------
fall | 2
winter | 1
spring | 1
So far I have this:
SELECT
season,
count(DISTINCT code) AS "Number of Unique Codes"
FROM table_1
WHERE code IS NOT NULL
GROUP BY season
ORDER BY code desc;
However, I am running into a few issues.

Your error is on the ORDER BY, change your ORDER BY to sort by the alias created.
SELECT
season,
count(distinct code) AS "Number of Unique Codes"
FROM table_1
WHERE code IS NOT NULL
GROUP BY season
ORDER BY "Number of Unique Codes" DESC;

More efficient way to query shortest string value associated with each value in another column in Hive QL

I have a table in Hive containing store names, order IDs, and User IDs (as well as some other columns including item ID). There is a row in the table for every item purchased (so there can be more than one row per order if the order contains multiple items). Order IDs are unique within a store, but not across stores. A single order can have more than one user ID associated with it.
I'm trying to write a query that will return a list of all stores and order IDs and the shortest user ID associated with each order.
So, for example, if the data looks like this:
STORE | ORDERID | USERID | ITEMID
------+---------+--------+-------
| a | 1 | bill | abc |
| a | 1 | susan | def |
| a | 2 | jane | abc |
| b | 1 | scott | ghi |
| b | 1 | tony | jkl |
Then the output would look like this:
STORE | ORDERID | USERID
------+---------+-------
a | 1 | bill
a | 2 | jane
b | 1 | tony
I've written a query that will do this, but I feel like there must be a more efficient way to go about it. Does anybody know a better way to produce these results?
This is what I have so far:
select
users.store, users.orderid, users.userid
from
(select
store, orderid, userid, length(userid) as len
from
sales) users
join
(select distinct
store, orderid,
min(length(userid)) over (partition by store, orderid) as len
from
sales) len on users.store = len.store
and users.orderid = len.orderid
and users.len = len.len

Check out probably this will work for you, here you can achieve your goal of single "SELECT" clause with no extra overhead on SQL.
select distinct
store, orderid,
first_value(userid) over(partition by store, orderid order by length(userid) asc) f_val
from
sales;
The result will be:
store orderid f_val
a 1 bill
a 2 jane
b 1 tony

Probably rank() is the best way:
select s.*
from (select s.*, rank() over (partition by store order by length(userid) as seqnum
from sales s
) s
where seqnum = 1;

How to return unique rows having count() of multiple columns = 1 using group by?

So here is my situation:
____________________________________________
| idnumber | name | sectiongroup |
--------------------------------------------
| 123 | Joe | one |
| 123 | Barry | two |
| 1234 | Laura | one |
| 1234 | LauraCopyCat | one |
--------------------------------------------
I am trying to build a query which will return any unique (i.e. - COUNT(idnumber) = 1) id numbers in a given sectiongroup. So if you are in sectiongroup number one and no one else in your sectiongroup has the same ID number as you, then I want your idnumber. If someone in group two happens to have the same idnumer, that is okay, I still want your idnumber.
For example, Barry and Joe have the same id number but they are in separate sectiongroups, so I want to return their idnubers. However, Laura and LauraCopyCat have the SAME sectiongroup, so I do NOT want their idnumbers to be returned. So far I have the following:
SELECT idnumber
FROM namestable
GROUP BY idnumber, sectiongroup
HAVING(COUNT(idnumber) = 1)
Is there a way to add sectiongroup into the COUNT()=1 condition?

Just use COUNT(*) to avoid confusion. This will count the number of records in the particular group. Remember, a group consists of the unique combinations of values in the fields specified in your GROUP BY statement.
SELECT idnumber
FROM namestable
GROUP BY idnumber, sectiongroup
HAVING COUNT(*) = 1
Note that this will result in duplicate idnumbers, if you have records that share an id but have different subgroups. To remove duplicate, just change SELECT to SELECT DISTINCT.
Tested here: http://sqlfiddle.com/#!9/b0a50c/3

MIN() Function in SQL

Need help with Min Function in SQL
I have a table as shown below.
+------------+-------+-------+
| Date_ | Name | Score |
+------------+-------+-------+
| 2012/07/05 | Jack | 1 |
| 2012/07/05 | Jones | 1 |
| 2012/07/06 | Jill | 2 |
| 2012/07/06 | James | 3 |
| 2012/07/07 | Hugo | 1 |
| 2012/07/07 | Jack | 1 |
| 2012/07/07 | Jim | 2 |
+------------+-------+-------+
I would like to get the output like below
+------------+------+-------+
| Date_ | Name | Score |
+------------+------+-------+
| 2012/07/05 | Jack | 1 |
| 2012/07/06 | Jill | 2 |
| 2012/07/07 | Hugo | 1 |
+------------+------+-------+
When I use the MIN() function with just the date and Score column I get the lowest score for each date, which is what I want. I don't care which row is returned if there is a tie in the score for the same date. Trouble starts when I also want name column in the output. I tried a few variation of SQL (i.e min with correlated sub query) but I have no luck getting the output as shown above. Can anyone help please:)
Query is as follows
SELECT DISTINCT
A.USername, A.Date_, A.Score
FROM TestTable AS A
INNER JOIN (SELECT Date_,MIN(Score) AS MinScore
FROM TestTable
GROUP BY Date_) AS B
ON (A.Score = B.MinScore) AND (A.Date_ = B.Date_);

Use this solution:
SELECT a.date_, MIN(name) AS name, a.score
FROM tbl a
INNER JOIN
(
SELECT date_, MIN(score) AS minscore
FROM tbl
GROUP BY date_
) b ON a.date_ = b.date_ AND a.score = b.minscore
GROUP BY a.date_, a.score
SQL-Fiddle Demo
This will get the minimum score per date in the INNER JOIN subselect, which we use to join to the main table. Once we join the subselect, we will only have dates with names having the minimum score (with ties being displayed).
Since we only want one name per date, we then group by date and score, selecting whichever name: MIN(name).
If we want to display the name column, we must use an aggregate function on name to facilitate the GROUP BY on date and score columns, or else it will not work (We could also use MAX() on that column as well).

Please learn about the GROUP BY functionality of RDBMS.
SELECT Date_,Name,MIN(Score)
FROM T
GROUP BY Name
This makes the assumption that EACH NAME and EACH date appears only once, and this will only work for MySQL.
To make it work on other RDBMSs, you need to apply another group function on the Date column, like MAX. MIN. etc

SELECT T.Name, T.Date_, MIN(T.Score) as Score FROM T
GROUP BY T.Date_

Edit: This answer is not corrected as pointed out by JNK in comments
SELECT Date_,MAX(Name),MIN(Score)
FROM T
GROUP BY Date_
Here I am using MAX(NAME), it will pick one name if two names were found with the same goal numbers.
This will find Min score for each day (no duplicates), scored by any player. The name that starts with Z will be picked first than the name that starts with A.
Edit: Fixed by removing group by name

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Grouping by two values in same table - sql

You have to COUNT DISTINCT user ids (and ORDER BY ... DESC if you want the provided result): SELECT ship_type, COUNT(DISTINCT userid) as cnt FROM db1.MessageType5 GROUP BY ship_type ORDER BY cnt DESC See this fiddle.

Related

How do I list the values being counted?

How to get the frequency in postgresql

More efficient way to query shortest string value associated with each value in another column in Hive QL

How to return unique rows having count() of multiple columns = 1 using group by?

MIN() Function in SQL

Categories

Resources