How to group by so duplicates are not counted - sql

I have this query :
select resorti.resort_id
,resorti.resort
,hoteli.resort_id
,hoteli.hotel_id
,hoteli.hotel
from resorti
inner join
hoteli on resorti.resort_id = hoteli.resort_id
How do I change this query so that the name of the resort is listed only once if there are many resorts with the same name ?
Edit: I altered the query. Here are the actual results :
Here are the desired results:

You can group by the resort info but you'll lose the detail you want on the hotels.
Truly you are better suppressing the "duplicate" resort entry in you application layer.
You could generate row number within each group in a derived table and then query that suppressing the resort when row number <> 1, but the syntax would depend on your rdbms.
Edit:
against my better judgement, here's how you could do it in sql sever. same is possible without tsql's row_number or a cte, it's just more concise.
create table resorti (resort_id INT, resort VARCHAR(50))
create table hoteli (hotel_id INT, resort_id INT, hotel VARCHAR(50))
insert into resorti values (1,'resort_1'),(2,'resort_2')
insert into hoteli values (1,1,'hotel_1a'),(2,1,'hotel_1b'), (3,2,'hotel_2a'),(4,2,'hotel_2b')
;with cte as (
SELECT *
,ROW_NUMBER() OVER (PARTITION BY resort_id ORDER BY hotel_id) rn
FROM hoteli
)
select case when rn=1 then resorti.resort_id end resort_id
,case when rn=1 then resorti.resort end resort
,resorti.resort
,cte.hotel_id
,cte.hotel
from resorti
inner join
cte on resorti.resort_id = cte.resort_id

Please give more information of your database... If you group by resort and there are two rows with same name but different resort_id then a group is only possible if you don't select the resort_id. Or your second selected column of table resorti is calculated by one of these functions: SUM, AVG, MAX, MIN, and COUNT
Try something like this
SELECT
resorti.resort,
hoteli.hotel_id,
hoteli.hotel
FROM resorti
INNER JOIN hoteli ON resorti.resort_id = hoteli.resort_id
GROUP BY resorti.resort

Related

SQL: find most common values for specific members in column 1

I have the following SQL related question:
Let us assume I have the following simple data table:
I would like to identify the most common street address and place it in column 3:
I think this should be fairly straight-forward using COUNT? Not quite sure how to go about it though. Any help is greatly appreciated
Regards
This is a very long method that I just wrote. It only lists the most frequent address. You have to get these values and insert them into the table. See if it works for you:
select * from
(select d.company, count(d.address) as final, c.maxcount,d.address
from dbo.test d inner join
(select a.company,max(a.add_count) as maxcount from
(select company,address,count(address) as add_count from dbo.test group by company,address)a
group by a.company) c
on (d.company = c.company)
group by d.company,c.maxcount,d.address)e
where e.maxcount=e.final
Here is a query in standard SQL. It first counts records per company and address, then ranks them per company giving the most often occurring address rank #1. Then it only keeps those best ranked address records, joins with the table again and shows the results.
select
mytable.company,
mytable.address,
ranked.address as most_common_address
from mytable
join
(
select
company,
address,
row_number() over (partition by company oder by cnt desc) as rn
from
(
select
company,
address,
count(*) over (partition by company, address) as cnt
from mytable
) counted
) ranked on ranked.rn = 1
and ranked.company = mytable.company
and ranked.address = mytable.address;
This select statement will give you the most frequent occurrence. Let us call this A.
SELECT `value`,
COUNT(`value`) AS `value_occurrence`
FROM `my_table`
GROUP BY `value`
ORDER BY `value_occurrence` DESC
LIMIT 1;
To INSERT this into your table,
INSERT INTO db (col1, col2, col3) VALUES (val1, val2, A)
Note that you want that whole select statment for A!
You don't mention your DBMS. Here is a solution for Oracle.
select
company,
address,
(
select stats_mode(address)
from mytable this_company_only
where this_company_only.company = mytable.company
) as most_common_address
from mytable;
This looks a bit clumsy, because STATS_MODE is only available as an aggregate function, not as an analytic window function.

Retrieving most recent data in SQL

Total disclosure: I'm a SQL beginner.
I have a data set of certain accounting and governance metrics for US companies. It has about 15 columns and roughly 18 million rows. Each row is a unique combination of company, date and metric being measured. The columns include certain identifiers like isin number, ticker symbol, etc, the date the metric was released, the metric description, and the metric itself.
What I'm trying to do is write a query that will yield the NEWEST values for a certain metric for all companies. In my hopeless search over the past few days I've come to think that the GROUP BY clause may be what I'm looking for. However, it doesn't seem to do exactly what I need. I've got it working with just 2 columns: isin number (company identifier), and date. In other words, I can spit out a list that shows the most recent date for each company, but I'm not sure how to add more columns to this, how to specify what metric to look at.
Any guidance would be appreciated, even if it's just pointing me in the right direction towards what kind of commands I should be looking into.
Thanks!
EDIT: Wow. Thanks for the quick and thorough replies. And point taken on the clarity and example data sets/starting query. Update: I think I have it working. Here's what I used:
SELECT a1.["id_isin_number"], a1.["metric_description"], a1.["date_period_ends"], a1.["company_metric_value"], a2.maxdate
FROM [AGR Metrics].[dbo].[Audit_Integrity_Metric_Data_File_NA Original_0] a1
INNER JOIN (
SELECT a2.["id_isin_number"], MAX(a2.["date_period_ends"]) AS maxdate
FROM [AGR Metrics].[dbo].[Audit_Integrity_Metric_Data_File_NA Original_0] a2
GROUP BY a2.["id_isin_number"]
) a2
ON a1.["date_period_ends"] = a2.maxdate
AND a1.["id_isin_number"] = a2.["id_isin_number"]
WHERE a1.["metric_description"] = '"Litigation: Class Action"'
I'm looking over the responses now to make sure I'm doing this as efficiently as possible.
You can use the ROW_NUMBER() function for this (if using SQL Server 2005 or newer):
SELECT *
FROM (SELECT *,ROW_NUMBER() OVER(PARTITION BY isin ORDER BY [date] DESC) AS RowRank
FROM YourTable
)sub
WHERE RowRank = 1
Just list out the fields you want in place of * if you don't want them all returned.
The ROW_NUMBER() function adds a number to each row, PARTITION BY is optional and is used to define a group for which numbering will start over at 1, in this case, you want the most recent for each value of isin so we PARTITION BY that. ORDER BY is required and defines the order of the numbering, in this case by date.
Your current query can also be used, but the ROW_NUMBER() method is simpler and more efficient:
SELECT a.*
FROM YourTable a
JOIN (SELECT isin, MAX([date])
FROM YourTable
GROUP BY isin
)b
ON a.isin = b.isin
AND a.[date] = b.[date]
Well as you quote the date the metric was released , So you can use it to sort your table using Order By .
This is a very basic example which can be used to simply sort data and selecting top 1 value.
Please refer This
CREATE TABLE trialOne (
Id INT NULL,
NAME VARCHAR(50) NULL,
[Date] DATETIME NULL
)
SELECT * FROM dbo.ETProgram
INSERT INTO trialone VALUES(1,'john','2009-01-06 11:39:51.827')
INSERT INTO trialone VALUES(2,'joseph','2010-01-06' )
INSERT INTO trialone VALUES(3,'Ajay','2009-05-06' )
INSERT INTO trialone VALUES(4,'Dave','2009-11-06' )
INSERT INTO trialone VALUES(5,'jonny','2004-01-06')
INSERT INTO trialone VALUES(6,'sunny','2005-01-06')
INSERT INTO trialone VALUES(7,'elle','2013-01-06' )
INSERT INTO trialone VALUES(8,'mac','2012-01-06' )
INSERT INTO trialone VALUES(8,'Sam','2008-01-06' )
INSERT INTO trialone VALUES(10,'xxxxx','2013-08-06')
SELECT TOP(1)name FROM trialone ORDER BY Date DESC

SQL Server: Select all but the first result in query

We have an issue with two tables, let's call them Item and ItemStatuses. We track each change to the status, so there is a start date and end date in the ItemStatuses table. We track the current status by looking for the one with an end date that is null.
Through an error in the system the newest status was added multiple times to a number of items. I need to select all but the first status for each item. I have the following query which gives me all the open statuses. I was trying this route because I figured I could use the row number to skip the first one, but there are multiple Items in these sets, so I need to skip the first status for each item. I think I'm pretty close with my query, but I'm not sure what I need to do.
SELECT ID, rn = ROW_NUMBER() OVER (ORDER BY ItemID)
FROM ItemStatuses WHERE ID IN
(
SELECT
s.ID
FROM Items as i
INNER JOIN ItemStatuses AS s ON
i.ID = s.ItemID AND
s.EndDate IS NULL
GROUP BY i.ID
HAVING COUNT(i.ID) > 1
)
To illustrate how to update all but the first status of your table:
declare #itemstatuses table (id int, Enddate datetime, theStatus int)
insert into #itemstatuses
values (1,getdate()-3,1),(1,getdate()-2,2),(1,getdate()-1,3),
(2,getdate()-3,2),(2,getdate()-2,2),(2,getdate()-1,99),
(3,getdate(),1)
select 'before',* from #itemStatuses
;with sorted
as (
select [r] = row_number()over(partition by id order by Enddate), *
from #ItemStatuses
)
update sorted
set theStatus = 100
where r>1
select 'after',* from #itemStatuses
I would simplify your SQL query since this can become overly complicated and expensive.
Then use your server sided language to perform any filtering or condition.

sql query - filtering duplicate values to create report

I am trying to list all the duplicate records in a table. This table does not have a Primary Key and has been specifically created only for creating a report to list out duplicates. It comprises of both unique and duplicate values.
The query I have so far is:
SELECT [OfficeCD]
,[NewID]
,[Year]
,[Type]
FROM [Test].[dbo].[Duplicates]
GROUP BY [OfficeCD]
,[NewID]
,[Year]
,[Type]
HAVING COUNT(*) > 1
This works right and gives me all the duplicates - that is the number of times it occurs.
But I want to display all the values in my report of all the columns. How can I do that without querying for each record separately?
For example:
Each table has 10 fields and [NewID] is the field which is occuring multiple times.I need to create a report with all the data in all the fields where newID has been duplicated.
Please help.
Thank you.
You need a subquery:
SELECT * FROM yourtable
WHERE NewID IN (
SELECT NewID FROM yourtable
GROUP BY OfficeCD,NewID,Year,Type
HAVING Count(*)>1
)
Additionally you might want to check your tags: You tagged mysql, but the Syntax lets me think you mean sql-server
Try this:
SELECT * FROM [Duplicates] WHERE NewID IN
(
SELECT [NewID] FROM [Duplicates] GROUP BY [NewID] HAVING COUNT(*) > 1
)
select d.*
from Duplicates d
inner join (
select NewID
from Duplicates
group by NewID
having COUNT(*) > 1
) dd on d.NewID = dd.NewID

How to find max value and its associated field values in SQL?

Say I have a list of student names and their marks. I want to find out the highest mark and the student, how can I write one select statement to do that?
Assuming you mean marks rather than remarks, use:
select name, mark
from students
where mark = (
select max(mark)
from students
)
This will generally result in a fairly efficient query. The subquery should be executed once only (unless your DBMS is brain-dead) and the result fed into the second query. You may want to ensure that you have an index on the mark column.
If you don't want to use a subquery:
SELECT name, remark
FROM students
ORDER BY remark DESC
LIMIT 1
select name, remarks
from student
where remarks =(select max(remarks) from student)
If you are using a database that supports windowing,
SELECT name, mark FROM
(SELECT name, mark, rank() AS rk
FROM student_marks OVER (ORDER BY mark DESC)
) AS subqry
WHERE subqry.rk=1;
This probably does not run as fast as the mark=(SELECT MAX(mark)... style query, but it would be worth checking out.
In SQL Server:
SELECT TOP 1 WITH TIES *
FROM Students
ORDER BY Mark DESC
This will return all the students that have the highest mark, whether there is just one of them or more than one. If you want only one row, drop the WITH TIES specifier. (But the actual row is not guaranteed to be always the same then.)
You can create view and join it with original table:
V1
select id , Max(columName)
from t1
group by id
select * from t1
where t1.id = V1.id and t1.columName = V1.columName
this is right if you need Max Values with related info
I recently had a need for something "kind of similar" to this post and wanted to share a technique. Say you have an Order and OrderDetail table, and you want to return info from the Order table along with the product name associated with the highest priced detail row. Here's a way to pull that off without subtables, RANK, etc.. The key is to create and aggregate that combined the key and value from the detailed table and then just max on that and substring out the value you want.
create table CustOrder(ID int)
create table CustOrderDetail(OrderID int, Price money, ProdName varchar(20))
insert into CustOrder(ID) values(1)
insert into CustOrderDetail(OrderID,Price,ProdName) values(1,10,'AAA')
insert into CustOrderDetail(OrderID,Price,ProdName) values(1,50,'BBB')
insert into CustOrderDetail(OrderID,Price,ProdName) values(1,10,'CCC')
select
o.ID,
JoinAggregate=max(convert(varchar,od.price)+'*'+od.prodName),
maxProd=
SUBSTRING(
max(convert(varchar,od.price)+'*'+od.prodName)
,CHARINDEX('*',max(convert(varchar,od.price)+'*'+convert(varchar,od.prodName))
)+1,9999)
from
CustOrder o
inner join CustOrderDetail od on od.orderID = o.ID
group by
o.ID