Filter on specific columns and return all columns - sql

I am trying to left join two tables and retrieve all columns from table one but remove duplicates based on a set of columns.
SELECT A.*, B.impact
FROM #Site_one AS A WITH (NOLOCK)
LEFT JOIN #Progress AS B With (NOLOCK)
ON lower(A.site_code) = lower(B.site_code)
GROUP BY A.date, A.operationid, A.worklocation, A.siteid, A.alias
This does not work as there will be column in A which either need to be aggregated or be added to the group by clause. The issue with that is that I do not want to filter on those columns and do not want them aggregated.
Is there a way to select all columns in A and the impact column in B and still be able to filter out duplicates on the columns specified in the group by clause?
Any pointers/help would be greatly appreciated.

and still be able to filter out duplicates on the columns specified in the group by clause
But, how does the database really know which rows to throw away? Suppose you have:
Person
John, 42, Stockbroker
John, 36, Train driver
John, 58, Retired
John, 58, Metalworker
And you think "I wanna dedupe those based on the name":
SELECT * FROM person GROUP BY name
So which three Johns should the DB throw away?
It cannot decide this for you; you have to write the query to make it clear what you want to keep or throw
You could MAX everything:
SELECT name, MAX(age), MAX(job) FROM person GROUP BY name
That'll work.. but it gives you a John that never existed in the original data:
John, 58, Train driver
You could say "I'll only keep the person with the max age":
SELECT p.*
FROM
person p
INNER JOIN (SELECT name, max(age) as maxage FROM person GROUP BY name) maxp
ON p.name = maxp.name AND p.age = maxp.maxage
.. but there are two people with the same max age.
Your DB might have a row number analytic, which is nice:
SELECT *, row_number() over(PARTITION BY name ORDER BY age DESC) rn
FROM person
One of your 58 year old Johns will get row number 1 - can't be sure which one, but you could then discard all the rows with an rn > 1:
WITH x as (
SELECT *, row_number() over(PARTITION BY name ORDER BY age DESC) rn
FROM person
)
SELECT name, age, job
INTO newtable
FROM x
WHER rn = 1
..but what if you discarded the wrong John...
You're going to have to go and think about this some more, and exactly specify what to throw away...

Related

Rank() Over Partition By to rank over a table column to give me the older records

THis is what i have - a query that retrieves all the Persons that have a duplicated email address believe rank over partition by should solve my problem (The filter is the Email Address)
SELECT a.Id, a.EmailAddress,a.UntilDate,a.CreatedOn,a.UserId
INTO #GetEmployeesWithDuplicateEmails
FROM Employee a
INNER JOIN (SELECT
Employee.EmailAddress as EmailAddress
FROM Employee
GROUP BY Employee.EmailAddress
HAVING count(Employee.EmailAddress) > 1
) b
ON a.EmailAddress = b.EmailAddress
ORDER BY a.Id
this is the output of the query Query Result
What i want - Query below retrieves the users that have duplicated EmailAddresses, i want to keep the most recent record by each email, if the email belongs to the same UserId of course, imagine that there are 5 duplicated emails, if i verify that those 5 duplicates belong to the same UserId i want to keep the newest record based on the CreatedOn field, the other 4 will be updated . I wanted to use rank over partition by, but you can advise me with a better scenario anyways here goes:
SELECT #GetEmployeesWithDuplicateEmails.*,
RANK() OVER (
PARTITION BY #GetEmployeesWithDuplicateEmails.CreatedOn
ORDER BY #GetEmployeesWithDuplicateEmails.CreatedOn DESC) createdon_rank
INTO #TableValuesToDelete
FROM #GetEmployeesWithDuplicateEmails
INNER JOIN
(
(SELECT #GetEmployeesWithDuplicateEmails.[EmailAddress]
FROM #GetEmployeesWithDuplicateEmails
GROUP BY #GetEmployeesWithDuplicateEmails.[EmailAddress])
) as temp2 ON #GetEmployeesWithDuplicateEmails.[EmailAddress]=temp2.[EmailAddress]
update
#TableValuesToUpdate
SET
#TableValuesToUpdate.EmployedUntilDate=getDate()
WHERE
created_rank > 1
i want to retain the most recent record by each email if the email belongs to the same UserId, Imagine that there are 5 duplicated emails, if i verify that those 5 duplicates belong to the same UserId i want to keep the newest record based on the CreatedOn field .
Update: Just Updated my Partion By query but it still can't rank the displayed values
If you want email addresses that belong to multiple users, you can use:
select e.*
from Employee e
where exists (select 1
from Employee e2
where e2.EmailAddress = e.EmailAddress and
e2.id <> e.id -- or however you identify the same employee
)
order by e.EmailAddress;

Count() how many times a name shows up in a table with the rest of info

I have read in various websites about the count() function but I still cannot make this work.
I made a small table with (id, name, last name, age) and I need to retrieve all columns plus a new one. In this new column I want to display how many times a name shows up or repeats itself in the table.
I have made test and can retrieve but only COLUMN NAME with the count column, but I haven't been able to retrieve all data from the table.
Currently I have this
select a.n_showsup, p.*
from [test1].[dbo].[person] p,
(select count(*) n_showsup
from [test1].[dbo].[person])a
This gives me all data on output but on the column n_showsup it gives me just the number of rows, now I know this is because I'm missing a GROUP BY but then when I write group by NAME it shows me a lot of records. This is an example of what I need:
You can use window functions, if you RDBMS supports them:
select t.*, count(*) over(partition by name) n_showsup
from mytable t
Alternatively, you can join the table with an aggregation query that counts the number of occurences of each name:
select t.*, x.n_showsup
from mytable t
inner join (select name, count(*) n_showsup from mytable group by name) x
on x.name = t.name
While the window function approach (#GMB's answer) is the right way to go, thinking through this from a subquery approach (like you were headed towards) would look something like:
select p.*, a.n_showsup
from [test1].[dbo].[person] p
INNER JOIN (
select name, count(*) n_showsup
from [test1].[dbo].[person]
GROUP BY name
) a ON p.name = a.name
This is VERY close to what you had, the difference is that we are grouping that subquery by name (so we get a count by name) and we can use that in the join criteria which we do with the ON clause on that INNER JOIN.
You should really never ever use a comma in your FROM clause. Instead use a JOIN.

SQL: Get the first value

I have two tables:
patients(ID, Firstname, Lastname, ...)
records(ID, Date, Time, Version)
I want to (inner) join these tables, so I have the records with patient data, but in the column for Version I want always the first value that was recorded for the patient (so with the minimum of date and time dependent on the patient (id)). I tried with subquery but HANA doesn't allow ORDER-BY or LIMIT clause in subqueries.
How can I implement this with SQL? (HANA SQL)
Kind regards and thanks in advance.
HANA supports window functions, so you can join against a derived table that picks the first version:
select p.*, r.id, r.date, r.time, r.version
from patients p
join (
select id, date, time, version, patient_id,
row_number() over (partition by patient_id order by version) as rn
from records
) r on p.id = r.patient_id and r.rn = 1
The above assumes that the records table has a column patient_id that contains the id of the patients table to which that record belongs to.

How to do the max count part in SQL?

I was told to Find out which occupation has the greatest number of patients with conditionID=MC8
I dk how to do the greatest part.....
Here my code right now
SELECT occupation
FROM Patient
WHERE EXISTS
(SELECT PatientID FROM PatientMedcon
Where conditionID=’MC8’)
GROUP BY occupation
HAVNG count(occupation) = (Select max(occupation)
From Patient
You should approach these types of queries using regular joins and then add additional factors. The following gets the count of patients for each occupation with that condition:
SELECT occupation, COUNT(*)
FROM Patient p JOIN
PatentMedcon pm
ON p.PatientId = pm.PatientId and
pm.conditionId = 'MC8'
GROUP BY occupation
ORDER BY COUNT(*) DESC;
If you want the top row, that depends on the database. It might be select top 1, limit 1 at the end, fetch first 1 rows only at the end, or even something else.

How to find the highest populated instance in a column in SQL

So I have a table (person), that contains columns such as persons name, age, eye-color, favorite movie.
How do I find the most popular eye color(s), returning just the eye color (not the count) using SQL (Microsft Access), without using top as there might be multiple colours with the same count.
Thank you
SELECT
EyeColor
FROM
Person
GROUP BY
EyeColor
HAVING
COUNT(*) = (
SELECT MAX(i.EyeColorCount) FROM (
SELECT COUNT(*) AS EyeColorCount FROM Person GROUP BY EyeColor
) AS i
)
In Access, I think you need something on the lines of:
SELECT First(t.Eyecolor) AS FirstOfEyeColor
FROM (SELECT p.EyeColor, Count(p.EyeColor) AS C
FROM Person p
GROUP BY p.EyeColor
ORDER BY Count(p.EyeColor) DESC) AS t;