How to get a count via a subselect - sql

I'm trying to write an SQL which gets the number of rows in a table which has the same value as a specific column.
In this case, the table has a column called 'title'. For each row I return, along with the value of other columns for the row, I want to get the number of rows in the table which have the same value as that row's 'title' value.
For now, the best I have is:
select firstname, lastname, city, state, title, (select count from myTable where title = title);
Obviously, all this gives me is the number of rows in the table in my subselect.
How do I get the right side of the title = to refer to the value of the column's title?
Thanks for any help, anyone.

this is called "correlated subquery". It goes like this:
select firstname, lastname, title,
(select count(1) from table where title=a.title) title_count
from table a;

This is not tested, but I think that it should work for what you are trying to do
WITH titles AS (
SELECT
Title
, COUNT(*) AS Occurences
FROM myTable
GROUP BY Title
)
SELECT
t1.FirstName
, t1.LastName
, t1.City
, t1.state
, t1.title
, titles.Occurences
FROM myTable AS t1
INNER JOIN titles ON t1.Title = titles.Title

Most databases support the window/analytic functions. If you are using one of these (SQL Server, Oracle, Postgres, for example), you can do this as:
select t.firstname, t.lastname, t.title, t.city, t.state,
count(*) over (partition by title) as numwithtitle
from t

Related

How do we find frequency of one column based off two other columns in SQL?

I'm relatively new to working with SQL and wasn't able to find any past threads to solve my question. I have three columns in a table, columns being name, customer, and location. I'd like to add an additional column determining which location is most frequent, based off name and customer (first two columns).
I have included a photo of an example where name-Jane customer-BEC in my created column would be "Texas" as that has 2 occurrences as opposed to one for California. Would there be anyway to implement this?
If you want 'Texas' on all four rows:
select t.Name, t.Customer, t.Location,
(select t2.location
from table1 t2
where t2.name = t.name
group by name, location
order by count(*) desc
fetch first 1 row only
) as most_frequent_location
from table1 t ;
You can also do this with analytic functions:
select t.Name, t.Customer, t.Location,
max(location) keep (dense_rank first order by location_count desc) over (partition by name) most_frequent_location
from (select t.*,
count(*) over (partition by name, customer, location) as location_count
from table1 t
) t;
Here is a db<>fiddle.
Both of these version put 'Texas' in all four rows. However, each can be tweaks with minimal effort to put 'California' in the row for ARC.
In Oracle, you can use aggregate function stats_mode() to compute the most occuring value in a group.
Unfortunately it is not implemented as a window function. So one option uses an aggregate subquery, and then a join with the original table:
select t.*, s.top_location
from mytable t
inner join (
select name, customer, stats_mode(location) top_location
from mytable
group by name, customer
) s where s.name = t.name and s.customer = t.customer
You could also use a correlated subquery:
select
t.*,
(
select stats_mode(t1.location)
from mytable t1
where t1.name = t.name and t1.customer = t.customer
) top_location
from mytable t
This is more a question about understanding the concepts of a relational database. If you want that information, you would not put that in an additional column. It is calculated data over multiple columns - why would you store that in the table itself ? It is complex to code and it would also be very expensive for the database (imagine all the rows you have to calculate that value for if someone inserted a million rows)
Instead you can do one of the following
Calculate it at runtime, as shown in the other answers
if you want to make it more persisent, you could embed that query above in a view
if you want to physically store the info, you could use a materialized view
Plenty of documentation on those 3 options in the official oracle documentation
Your first step is to construct a query that determines the most frequent location, which is as simple as:
select Name, Customer, Location, count(*)
from table1
group by Name, Customer, Location
This isn't immediately useful, but the logic can be used in row_number(), which gives you a unique id for each row returned. In the query below, I'm ordering by count(*) in descending order so that the most frequent occurrence has the value 1.
Note that row_number() returns '1' to only one row.
So, now we have
select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1 tb_
group by Name, Customer, Location
The final step puts it all together:
select tab.*, tb_.Location most_freq_location
from table1 tab
inner join
(select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1
group by Name, Customer, Location) tb_
on tb_.Name = tab.Name
and tb_.Customer = tab.Customer
and freq_name_cust = 1
You can see how it all works in this Fiddle where I deliberately inserted rows with the same frequency for California and Texas for one of the customers for illustration purposes.

SQL: find most common values for specific members in column 1

I have the following SQL related question:
Let us assume I have the following simple data table:
I would like to identify the most common street address and place it in column 3:
I think this should be fairly straight-forward using COUNT? Not quite sure how to go about it though. Any help is greatly appreciated
Regards
This is a very long method that I just wrote. It only lists the most frequent address. You have to get these values and insert them into the table. See if it works for you:
select * from
(select d.company, count(d.address) as final, c.maxcount,d.address
from dbo.test d inner join
(select a.company,max(a.add_count) as maxcount from
(select company,address,count(address) as add_count from dbo.test group by company,address)a
group by a.company) c
on (d.company = c.company)
group by d.company,c.maxcount,d.address)e
where e.maxcount=e.final
Here is a query in standard SQL. It first counts records per company and address, then ranks them per company giving the most often occurring address rank #1. Then it only keeps those best ranked address records, joins with the table again and shows the results.
select
mytable.company,
mytable.address,
ranked.address as most_common_address
from mytable
join
(
select
company,
address,
row_number() over (partition by company oder by cnt desc) as rn
from
(
select
company,
address,
count(*) over (partition by company, address) as cnt
from mytable
) counted
) ranked on ranked.rn = 1
and ranked.company = mytable.company
and ranked.address = mytable.address;
This select statement will give you the most frequent occurrence. Let us call this A.
SELECT `value`,
COUNT(`value`) AS `value_occurrence`
FROM `my_table`
GROUP BY `value`
ORDER BY `value_occurrence` DESC
LIMIT 1;
To INSERT this into your table,
INSERT INTO db (col1, col2, col3) VALUES (val1, val2, A)
Note that you want that whole select statment for A!
You don't mention your DBMS. Here is a solution for Oracle.
select
company,
address,
(
select stats_mode(address)
from mytable this_company_only
where this_company_only.company = mytable.company
) as most_common_address
from mytable;
This looks a bit clumsy, because STATS_MODE is only available as an aggregate function, not as an analytic window function.

Get the first instance of a row using MS Access

EDITED:
I have this query wherein I want to SELECT the first instance of a record from the table petTable.
SELECT id,
pet_ID,
FIRST(petName),
First(Description)
FROM petTable
GROUP BY pet_ID;
The problem is I have huge number of records and this query is too slow. I discovered that GROUP BY slows down the query. Do you have any idea that could make this query faster? or better, a query wherein I don't need to use GROUP BY?
"The problem is I have huge number of records and this query is too slow. I discovered that GROUP BY slows down the query. Do you have any idea that could make this query faster?"
And an index on pet_ID, then create and test this query:
SELECT pet_ID, Min(id) AS MinOfid
FROM petTable
GROUP BY pet_ID;
Once you have that query working, you can join it back to the original table --- then it will select only the original rows which match based on id and you can retrieve the other fields you want from those matching rows.
SELECT pt.id, pt.pet_ID, pt.petName, pt.Description
FROM
petTable AS pt
INNER JOIN
(
SELECT pet_ID, Min(id) AS MinOfid
FROM petTable
GROUP BY pet_ID
) AS sub
ON pt.id = sub.MinOfid;
Your Query could change as,
SELECT ID, pet_ID, petName, Description
FROM petTable
WHERE ID IN
(SELECT Min(ID) As MinID FROM petTable GROUP BY pet_ID);
Or use the TOP clause,
SELECT petTable.petID, petTable.petName, petTable.[description]
FROM petTable
WHERE petTable.ID IN
(SELECT TOP 1 ID
FROM petTable AS tmpTbl
WHERE tmpTbl.petID = petTable.petID
ORDER BY tmpTbl.petID DESC)
ORDER BY petTable.petID, petTable.petName, petTable.[description];

Find duplicated rows that are not exactly same

Can i select all rows that have same column value (for example SSN field) but display them all separably. ?
I've searched for this answer but they all have "count(*) and group by" section that demands the rows to be exactly same.
Try This:
SELECT A, B FROM MyTable
WHERE A IN
(
SELECT A FROM MyTable GROUP BY A HAVING COUNT(*)>1
)
I have done with SQL server. But hope this is what you need
Here is another approach, which only references the table once, using an analytic function instead of a subquery to get the duplicate counts It might be faster; it also might not, depending on the particular data.
SELECT * FROM (
SELECT col1, col2, col3, ssn, COUNT(*) OVER (PARTITION BY ssn) ssn_dup_count
)
WHERE ssn_dup_count > 1
ORDER BY ssn_dup_count DESC
SELECT
*
FROM
MyTable
WHERE
EXISTS
(
SELECT
NULL
FROM
MyTable MT
WHERE
MyTable.SameColumnName = MT.SameColumnName
AND MyTable.DifferentColumnName <> MT.DifferentColumnName)
This will fetch the required data and show them in order so that we can see the grouped data together.
SELECT * FROM TABLENAME
WHERE SSN IN
(
SELECT SSN FROM TABLENAMEGROUP BY SSN HAVING COUNT(SSN)>1
)
ORDER BY SSN
Here SSN is the column names fro which similar value check is done.

Selecting COUNT(*) with DISTINCT

In SQL Server 2005 I have a table cm_production that lists all the code that's been put into production. The table has a ticket_number, program_type, program_name and push_number along with some other columns.
GOAL: Count all the DISTINCT program names by program type and push number.
What I have so far is:
DECLARE #push_number INT;
SET #push_number = [HERE_ADD_NUMBER];
SELECT DISTINCT COUNT(*) AS Count, program_type AS [Type]
FROM cm_production
WHERE push_number=#push_number
GROUP BY program_type
This gets me partway there, but it's counting all the program names, not the distinct ones (which I don't expect it to do in that query). I guess I just can't wrap my head around how to tell it to count only the distinct program names without selecting them. Or something.
Count all the DISTINCT program names by program type and push number
SELECT COUNT(DISTINCT program_name) AS Count,
program_type AS [Type]
FROM cm_production
WHERE push_number=#push_number
GROUP BY program_type
DISTINCT COUNT(*) will return a row for each unique count. What you want is COUNT(DISTINCT <expression>): evaluates expression for each row in a group and returns the number of unique, non-null values.
I needed to get the number of occurrences of each distinct value. The column contained Region info.
The simple SQL query I ended up with was:
SELECT Region, count(*)
FROM item
WHERE Region is not null
GROUP BY Region
Which would give me a list like, say:
Region, count
Denmark, 4
Sweden, 1
USA, 10
You have to create a derived table for the distinct columns and then query the count from that table:
SELECT COUNT(*)
FROM (SELECT DISTINCT column1,column2
FROM tablename
WHERE condition ) as dt
Here dt is a derived table.
SELECT COUNT(DISTINCT program_name) AS Count, program_type AS [Type]
FROM cm_production
WHERE push_number=#push_number
GROUP BY program_type
try this:
SELECT
COUNT(program_name) AS [Count],program_type AS [Type]
FROM (SELECT DISTINCT program_name,program_type
FROM cm_production
WHERE push_number=#push_number
) dt
GROUP BY program_type
You can try the following query.
SELECT column1,COUNT(*) AS Count
FROM tablename where createddate >= '2022-07-01'::date group by column1
This is a good example where you want to get count of Pincode which stored in the last of address field
SELECT DISTINCT
RIGHT (address, 6),
count(*) AS count
FROM
datafile
WHERE
address IS NOT NULL
GROUP BY
RIGHT (address, 6)