SQL statements without group by - sql

If I wanted to find all values in a table that occur more than twice without using group by, how would I do that? I understand how to do this with group by and was curious how to do it without group by (EDIT: could you do this with join?).
For example, if I had last names in a certain zip code, and I wanted to find entries with this last name more than twice, how would I do this without group by in SQL statements?
I tried
select name, count() from population order by name asc having count() > 2;
but that doesn't do what I want it to. Any suggestions?

Being this tagged only as sql it seems a general solution is being looked for. Since the SQL:2003 revision it should be fair to say that this can be solved with window functions:
SELECT name FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) rank,
name
FROM population
) s
WHERE rank = 3
See a sample fiddle here.
Anyway, the fact that it is possible to solve this without a GROUP BY doesn't mean that it should :)

It seems correlated query can work for you. Please check.
Assuming the data set given below
Id Zip Lastname
--- ----- --------
101 12345 John
102 12345 John
103 12345 John
104 12345 Ram
105 12345 Kelly
106 12345 Kelly
107 45678 Krishna
108 45678 Krishna
109 45678 Krishna
110 45678 David
111 45678 David
Query
select * from test.population pop1
where 2 < (select count(*) from test.population pop2
where pop1.Lastname=pop2.Lastname and pop1.Zipcode = pop2.Zipcode)
The output of above query is
Id Zip Lastname
--- ------ --------
101 12345 John
102 12345 John
103 12345 John
107 45678 Krishna
108 45678 Krishna
109 45678 Krishna

Related

how to pull all records in hive based on another column

If this is what my table looks like below:
my_id my_words my_people my_number
100 need more info? Jim 1
100 now Mary 2
100 what's that? Jim 3
101 okay now Jim 1
101 sounds good Mary 2
102 still hungry? Jim 1
102 now I'm thirsty though Mary 2
102 I don't understand Jim 3
102 no I'm not hungry Mary 4
103 are you there? Jim 1
103 I don't know Mary 2
103 That's okay Jim 3
How can I get this output?
my_id my_words my_people my_number
100 need more info? Jim 1
100 now Mary 2
100 what's that? Jim 3
102 still hungry? Jim 1
102 Now I'm thirsty though Mary 2
102 I don't understand Jim 3
right now I have: SELECT my_id, my_words, my_people, my_number from table where my_people="Mary" AND lower(my_words) like 'now%';
But I don't only want to return those rows, I also want to return Jim's comment right before and right after Mary's (before/after based on my_number column)
Maybe this is unrelated, but ultimately, I'm going to want this in Excel with this format:
my_id Jim_words Mary_words Jim_next_words
100 need more info? now what's that?
102 still hungry? now I'm thirsty though I don't understand
Could you please try below code? Code has explanations as comment.
WITH cte as (SELECT my_id, my_words, my_people, my_number
row_number() over( partition by my_id order by my_number) as rn --giving a unique row number for a my id
from table)
SELECT
distinct mytab.my_id, chosentab.my_words jims_words, mytab.my_people, mytab.my_number,
case when case when lower(mytab.my_words) like 'now%' then mytab.rn+1 end = chosentab.rn then chosentab.my_words end jims_words_after_marys_now,
case when case when lower(mytab.my_words) like 'now%' then mytab.rn-1 end = chosentab.rn then chosentab.my_words end jims_words_before_marys_now
FROM
cte mytab,
cte chosentab
where
mytab.my_id=chosentab.my_id and
case when lower(mytab.my_words) like 'now%' then mytab.rn+1 end = chosentab.rn and -- selecting jims rows where mary said now after jim
case when lower(mytab.my_words) like 'now%' then mytab.rn-1 end = chosentab.rn -- selecting jims rows where mary said now before jim
Now I created the SQL based on our discussion. Could you please validate and let me know it if worked?

Self JOIN to find the parent detail which matches with the row data -

I am trying to query in MS SQL and I can not resolve it. I have a table employees:
Id Name Surname FatherName MotherName WifeName Pincode isChild
-- ------- ------- ---------- ---------- -------- ------- -------
1 John Green James Sue null 101011 1
2 Michael Sloan Barry Lilly null 101011 1
3 Sally Green Andrew Molly Jemi 101011 1
4 Barry Sloan Soul Paul Lilly 101011 0
5 James Green Ned White Sue 101011 0
I want a query that selects rows where the father name and mother name of child matches with name and wife name. For the example table, where I want to return the result of rows where father and mother name matches the name and wife name column. For eg. id=1, where John's father name James and mother name Sue matches with id 5 which returns James as first name and Sue as wife name. So my query should return (this is my expected result)
Id Name Surname FatherName MotherName WifeName Pincode isChild
-- ------- ------- ---------- ---------- -------- ------- -------
5 James Green Ned White Sue 101011 0
4 Barry Sloan Soul Paul Lilly 101011 0
I tried with the below query but it checks for James only. How to change my query so that it checks all the names and returns the expected result.
select * FROM employees
where first_name like '%James%'
and wife_name like '%Sue%'
and pincode=101011;
Any tips on this will be really helpful. I am new to joins, need help on writing self join to get the result.
…
select *
from thetable as p -- the parent/father
where exists -- with one child at least
(
select *
from thetable as c
where c.fathername = p.name
and c.mothername = p.wifename
-- lastname?
)
Too long for a comment, but also not intended as a slam against what you are working with. Please take as constructive criticism.
Aside from VERY POOR DESIGN on the table content, getting that corrected before you get too deep into whatever you are working should be done first. A more typical design might be having a table of people. Now, to get the relationships you could do a couple ways. One is that on each individual person's record, you add 2 additional IDs. FatherID, MotherID. These IDs would join directly back to the child vs hard strings to match against. Take a surname like Smith or Jones. Then, look at the many instances of a "John Smith" may exist, yes a lot, and lower probability of finding a matching wife's name of Sue, Mary or whatever else name. But even that could lead to multiple possibilities. Yes, you are adding a PIN, but even a computer can generate a random pin of 1234.
By having the IDs, there is NO ambiguity of who the relationship is with.
If the data were slightly altered to something like
Id Name Surname FatherID MotherID SpouseID
-- ------- ------- ---------- ---------- --------
1 John Green 5 6 null
2 Michael Sloan 4 3 null
3 Lilly Sloan null null 4
4 Barry Sloan null null 3
5 James Green 9 10 6
6 Sue Green 7 8 5
7 Bill Jones null null 8
8 Martha Jones null null 7
9 Brian Green null null 10
10 Beth Smith-Green null null 9
So, in this modified example, you can see right away that ID#1 John Green has parents of Father (ID#5) is James and Mother (ID#6) is Sue. But even from this, James is a child to Father (ID#9) Brian and Mother (ID#10) Beth. This scenario is showing to a grand-parent level capacity and that each of James and Sue are also children but to their respective parents. Sue's parents of the Jones surname.
For Michael Sloan, parents of #4 Barry, and #3 Lilly.
And I additionally added a spouse ID. This prevents redundancy of people's names copied all over. Then you can query based on the child's parent's respective IDs to find out vs a hopeful name LIKE guess.
So, even though not solving a relatively simple query, fixing the underlying foundation of your database and is relations will, long-term, help ease your querying in the future.
Try this:
SELECT
T2.*
FROM Employee T1
JOIN Employee T2 ON T2.Name = T1.FatherName
AND T2.WifeName = T1.MotherName

SQL Server group by? [duplicate]

This question already has answers here:
Retrieving last record in each group from database - SQL Server 2005/2008
(2 answers)
Closed 4 years ago.
I'm not sure how to word my question so perhaps an example would be best. I'm looking for a function or statement that would produce the following result from a single table. For each name, return the row with largest id.
ID NAME ADDRESS
1 JOHN DOE 123 FAKE ST.
2 JOHN DOE 321 MAIN ST.
3 JOHN DOE 333 2ND AVE.
4 MARY JANE 222 1ST. AVE
5 MARY JANE 444 POPLAR ST.
6 SUZY JO 999 8TH AVE.
DESIRED RESULT
3 JOHN DOE 333 2ND AVE.
5 MARY JANE 444 POPLAR ST.
6 SUZY JO 999 8TH AVE.
One option is to use the row_number window function. This allows you to establish a row number to the result set. Then you can define the grouping and ordering within the over clause, in this case you want to partition by (group) the name field and order by the id field descending. Finally you filter those results where rn = 1 which returns the max result for each grouping.
select *
from (
select *, row_number() over (partition by name order by id desc) rn
from yourtable
) t
where rn = 1

Eliminate rows with names that are slightly different

I have in POSTGRESQL a database with a UUID, firstname (fname) and phone
uuid fname phone
1 JOHN 111
2 john 111
3 John 111
4 JOHN JAMES 111
5 Charles 222
6 Peter 222
7 James 222
8 Jimmy 222
9 Fred 333
10 Fred 333
11 Greg 333
I would like to keep only the group phone + firstname that have a similarity between at least two names. So, for example, in this case I would like to keep the phone 111 and one of the names and the phone 333 keeping the name that repeats (Fred). The phone 222 woud be eliminate as all names are not similar.
The result data would be
fname phone
John 111
Fred 333
The problem I am having is when the name is similar but it has more names (as in John and John James or when the name was mistyped, as in John and Jonh). I have tried to do the following
SELECT
m1.phone,
m1.fname,
m1.uuid
FROM
master as m1
JOIN master as m2 on m1.uuid = m2.uuid
WHERE
m1.phone = m2.phone
and m1.fname ILIKE m2.fname
ORDER BY 1
The definition of similarity is a bit vague, but this works for the data you have in the question:
select m.*
from master m
where exists (select 1
from master m2
where m2.phone = m.phone and m2.uuid <> m.uuid and
(m.fname ilike '%' || m2.fname || '%' or
m2.fname ilike '%' || m.fname || '%'
)
);
Here is a rextester.
Name matching is a complicated task and not well suited to SQL. However, you might want to look into Levenshtein distance and other string similarity metrics if this is a problem that you are facing.
Note: This keeps all names that match. If you want only one row per phone, you can use distinct on.

sql "group by" same PersonID, different PersonNames. Eliminate duplicates

I have a (rather dirty) datasource (excel) that looks like this:
ID | Name | Subject | Grade
123 | Smith, Joe R. | MATH | 2.0
123 | Smith, Joe Rodriguez | FRENCH | 3.0
234 | Doe, Mary Jane D.| BIOLOGY | 2.5
234 | Doe, Mary Jane Dawson| CHEMISTRY | 2.5
234 | Doe, Mary Jane | FRENCH | 3.5
My application's output should look like this:
Smith, Joe R.
123
MATH | 2.0
FRENCH | 3.0
So basically I want to do query (just for the ID/Person parent 'container') something like:
SELECT DISTINCT ID, Name FROM MyTable<br/>
or
SELECT ID, Name FROM MyTable GROUP BY ID
Of course both of the above are invalid and won't work.
I would like to 'combine' the same ID's and ignore/truncate the other records with the same ID/different Name (because we all know they're the same person since ID is our identifier and clearly it's just a typo/dirty data).
Can this be done by a single SELECT query?
If you don't really care which value shows up in the name field, use MAX() or MIN():
SELECT ID,
MAX(Name) AS Name
FROM [YourTable]
GROUP BY ID
Here's a working example to play with: https://data.stackexchange.com/stackoverflow/q/116699/
You can find the MIN or MAX Value of Name
SELECT ID, Max(Name)
FROM MyTable
GROUP BY ID
SELECT A.ID, A.NAME, T.Subject, T.Grade
FROM (SELECT ID, MIN(NAME) AS NAME
FROM MyTable
GROUP BY ID) A
LEFT JOIN MyTable T on A.ID = T.ID
Will give you something like
123 Smith, Joe R. Math 2.0
123 Smith, Joe R. FRENCH 3.0
234 Doe, Mary Jane BIOLOGY 2.5
234 Doe, Mary Jane CHEMISTRY 2.5
234 Doe, Mary Jane FRENCH 3.5
If you don't care which name you keep, you can use a MAX() or MIN() aggregate to pick just one name:
SELECT ID, MAX(Name) as Name
FROM MyTable GROUP BY ID