Count() how many times a name shows up in a table with the rest of info - sql

I have read in various websites about the count() function but I still cannot make this work.
I made a small table with (id, name, last name, age) and I need to retrieve all columns plus a new one. In this new column I want to display how many times a name shows up or repeats itself in the table.
I have made test and can retrieve but only COLUMN NAME with the count column, but I haven't been able to retrieve all data from the table.
Currently I have this
select a.n_showsup, p.*
from [test1].[dbo].[person] p,
(select count(*) n_showsup
from [test1].[dbo].[person])a
This gives me all data on output but on the column n_showsup it gives me just the number of rows, now I know this is because I'm missing a GROUP BY but then when I write group by NAME it shows me a lot of records. This is an example of what I need:

You can use window functions, if you RDBMS supports them:
select t.*, count(*) over(partition by name) n_showsup
from mytable t
Alternatively, you can join the table with an aggregation query that counts the number of occurences of each name:
select t.*, x.n_showsup
from mytable t
inner join (select name, count(*) n_showsup from mytable group by name) x
on x.name = t.name

While the window function approach (#GMB's answer) is the right way to go, thinking through this from a subquery approach (like you were headed towards) would look something like:
select p.*, a.n_showsup
from [test1].[dbo].[person] p
INNER JOIN (
select name, count(*) n_showsup
from [test1].[dbo].[person]
GROUP BY name
) a ON p.name = a.name
This is VERY close to what you had, the difference is that we are grouping that subquery by name (so we get a count by name) and we can use that in the join criteria which we do with the ON clause on that INNER JOIN.
You should really never ever use a comma in your FROM clause. Instead use a JOIN.

Related

Why does adding GROUP BY cause a seemingly unrelated error?

The following code works fine:
SELECT name, (SELECT count(item_id) FROM bids WHERE item_id = items.id)
FROM items;
However, when I add
SELECT name, (SELECT count(item_id) FROM bids WHERE item_id = items.id)
FROM items
GROUP BY name;
I get ERROR: subquery uses ungrouped column "items.id" from outer query
Can anyone tell me why this is happening? Thanks!
If you GROUP BY name then any other columns you select from items must have an aggregate function applied. That's what GROUP BY means.
In your case, you are using another column from items -- id -- in a correlated scalar subquery. That's not an aggregate function, and id is not in the GROUP BY clause, so you get an error.
You could instead GROUP BY name, id. That should give you the same results as the first query, and is probably pointless.
If you actually have multiple rows in items with the same value for name, and you want to group the results of the scalar subquery for those values, you need to specify how to group them. Perhaps you want the total of the subquery results for each value of name. If so, I think you could do:
SELECT name, SUM(SELECT count(item_id) FROM bids WHERE item_id = items.id))
FROM items
GROUP BY name;
(I'm not positive about the specific syntax as I don't have a Postgres instance to test against.)
A clearer way to express it might be:
SELECT name, SUM(bid_count)
FROM (
SELECT name, (SELECT count(item_id) FROM bids WHERE item_id = items.id) AS bid_count
FROM items
)
GROUP BY name
Join the tables then perform the GROUP BY:
select i.name, count(b.item_id)
from items i
inner join bids b
on b.item_id = i.id
group by i.name
db<>fiddle here

SQL Server Count field values without merge

How do I create a COUNT column to count the repetitive values?
And I want to keep the table EXACTLY as below but add the last column (count_id).
The values at the left come from a JOIN so they are "equal".
Thanks! (I tried a lot)
You just want count(*) as a window function:
select t.*,
count(*) over (partition by id, name, department) as count_id
from t;

How do we find frequency of one column based off two other columns in SQL?

I'm relatively new to working with SQL and wasn't able to find any past threads to solve my question. I have three columns in a table, columns being name, customer, and location. I'd like to add an additional column determining which location is most frequent, based off name and customer (first two columns).
I have included a photo of an example where name-Jane customer-BEC in my created column would be "Texas" as that has 2 occurrences as opposed to one for California. Would there be anyway to implement this?
If you want 'Texas' on all four rows:
select t.Name, t.Customer, t.Location,
(select t2.location
from table1 t2
where t2.name = t.name
group by name, location
order by count(*) desc
fetch first 1 row only
) as most_frequent_location
from table1 t ;
You can also do this with analytic functions:
select t.Name, t.Customer, t.Location,
max(location) keep (dense_rank first order by location_count desc) over (partition by name) most_frequent_location
from (select t.*,
count(*) over (partition by name, customer, location) as location_count
from table1 t
) t;
Here is a db<>fiddle.
Both of these version put 'Texas' in all four rows. However, each can be tweaks with minimal effort to put 'California' in the row for ARC.
In Oracle, you can use aggregate function stats_mode() to compute the most occuring value in a group.
Unfortunately it is not implemented as a window function. So one option uses an aggregate subquery, and then a join with the original table:
select t.*, s.top_location
from mytable t
inner join (
select name, customer, stats_mode(location) top_location
from mytable
group by name, customer
) s where s.name = t.name and s.customer = t.customer
You could also use a correlated subquery:
select
t.*,
(
select stats_mode(t1.location)
from mytable t1
where t1.name = t.name and t1.customer = t.customer
) top_location
from mytable t
This is more a question about understanding the concepts of a relational database. If you want that information, you would not put that in an additional column. It is calculated data over multiple columns - why would you store that in the table itself ? It is complex to code and it would also be very expensive for the database (imagine all the rows you have to calculate that value for if someone inserted a million rows)
Instead you can do one of the following
Calculate it at runtime, as shown in the other answers
if you want to make it more persisent, you could embed that query above in a view
if you want to physically store the info, you could use a materialized view
Plenty of documentation on those 3 options in the official oracle documentation
Your first step is to construct a query that determines the most frequent location, which is as simple as:
select Name, Customer, Location, count(*)
from table1
group by Name, Customer, Location
This isn't immediately useful, but the logic can be used in row_number(), which gives you a unique id for each row returned. In the query below, I'm ordering by count(*) in descending order so that the most frequent occurrence has the value 1.
Note that row_number() returns '1' to only one row.
So, now we have
select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1 tb_
group by Name, Customer, Location
The final step puts it all together:
select tab.*, tb_.Location most_freq_location
from table1 tab
inner join
(select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1
group by Name, Customer, Location) tb_
on tb_.Name = tab.Name
and tb_.Customer = tab.Customer
and freq_name_cust = 1
You can see how it all works in this Fiddle where I deliberately inserted rows with the same frequency for California and Texas for one of the customers for illustration purposes.

Listagg + Count in Select duplicates

I'm writing up a query and cannot seem to get over this hurdle.
I am using both LISTAGG and COUNT (side-by-side) in it and whenever I do so, the ListAgg will duplicate when count is more than 1. Moreover, it adds more into the count when the ListAgg is more than one. They're each messing with each other, and I want to know how to keep them within the same query, but keep duplicates from appearing in the ListAgg while finding only the correct amount of instances for the Count.
I've tried using DISTINCT and various groupings, but to no avail.
Here is my (simplified) SQL:
SELECT DISTINCT /*+PARALLEL */ ID, NAME, LISTAGG(USERID, ';'), COUNT(MAIN_DATA)
FROM MAIN m
JOIN USERS u on m.pk1 = u.main_pk1
WHERE MAIN_DATA like '%keyword%'
GROUP BY ID, NAME
which yields something similar to this:
ID|NAME|USERID|MAIN_DATA
1|Hello|Jim|1
2|Hi|Arthur;Arthur;Arthur|3
3|Bonjour|Jane;Jane;Jim;Jim|4
When ID 2 should only have Arthur once, and there are only 2 instances of the keyword in ID 3, not 4. How can I achieve this?
Unfortunately, LISTAGG() doesn't support DISTINCT.
To remove duplicates, you need a subquery:
SELECT ID, NAME, LISTAGG(USERID, ';'), SUM(cnt)
FROM (SELECT ID, NAME, USERID, COUNT(*) as cnt
FROM MAIN m JOIN
USERS u
ON m.pk1 = u.main_pk1
WHERE m.MAIN_DATA like '%keyword%'
GROUP BY ID, NAME, USERID
) mu
GROUP BY ID, NAME;

Hide column from a left join select

Struggling a little on this, hopefully I can get some answers here. I am trying to remove a column that appears in my results from the SQL command. I don't want 'player_id' to be showing up twice (the second time it shows is coming from the below join statement). Tried manipulating the join statement but everytime I do, it fails to then fetch the data.
select * from org_members
left join (select player_id, LISTAGG(ship_name, ', ') WITHIN GROUP
(order by ship_name) as Ships
from member_ships
group by player_id) ships
on org_members.player_id=ships.player_id
Just as an FYI, running this through APEX on oracle 11gXE.
EDIT: here is hte output
If you don't want player_id, then the safest thing is to list the columns that you do want.
However, there is a short-hand if you want to use *:
select *
from org_members om left join
(select player_id,
LISTAGG(ship_name, ', ') WITHIN GROUP (order by ship_name) as Ships
from member_ships
group by player_id
) ships
using (player_id);
The using clause does exactly what you want.
In your case, though, I would still be explicit about the columns:
select om.*, ships.ships
from org_members om left join
(select player_id,
LISTAGG(ship_name, ', ') WITHIN GROUP (order by ship_name) as Ships
from member_ships
group by player_id
) ships
using (player_id);
Not sure unless you show your current output but instead you should say like below. Unless you select the column it shouldn't appear twice.
select org_members.* from org_members
Per your comment, in that case you should manually specify the column names instead of using * like
select org_members.player_id,.....,ships.name,....