How do I write a query to identify names(possibly including non-English names) that have similar sounds? Soundex does not seem to handle non-English names well.
The code should be able to identify that for example the following(or most of them) are names with similar sounds?
Helena - Elena
Violet - Viola
Beatrix - Beatrice
Madeline - Madeleine (ma-duh-LINE vs ma-duh-LEN)
Alice - Elise
Madeline - Adeline
Kristen - Kirsten
Lily - Millie
Charlotte - Scarlett
Zara / Lara / Sara / Mara
Elena - Alana
Emily - Emmeline
Amelia - Amalia
Stella - Bella - Ella
Isabel - Isabeau
Holly - Hallie
Laura - Lara
Fiona - Finola
Louise - Eloise
Cara - Clara
Susanna vs Susannah
Nora vs Norah
Talia vs Tahlia vs Thalia
Catherine vs Katherine
Cecilia vs Cecelia
Lucy vs Lucie
Vivian vs Vivien
Lillian vs Lilian
Gwendolen vs Gwendolyn
Sofia vs Sophia
Isabel vs Isobel vs Isabelle
Seraphina vs Serafina
Juliet vs Juliette
Annabel vs Annabelle
Emily vs Emilie
Elisabeth vs Elizabeth
...and non-English names too.
Would it help by using algorithm like Levenshtein Distance to compare the similarity between two sequences?
https://en.wikipedia.org/wiki/Levenshtein_distance
Particularly in Oracle, you can use utl_match.
For example:
--Find closest names based on UTL_MATCH.EDIT_DISTANCE.
with names as
(
--Names data.
select column_value name
from table(sys.odcivarchar2list('Adeline','Alana','Alice','Amalia','Amelia','Annabel',
'Annabelle','Beatrice','Beatrix','Bella','Cara','Catherine','Cecelia','Cecilia',
'Charlotte','Clara','Elena','Elisabeth','Elise','Elizabeth','Ella','Eloise','Emilie',
'Emily','Emmeline','Finola','Fiona','Gwendolen','Gwendolyn','Hallie','Helena','Holly',
'Isabeau','Isabel','Isabelle','Isobel','Juliet','Juliette','Katherine','Kirsten',
'Kristen','Lara','Laura','Lilian','Lillian','Lily','Louise','Lucie','Lucy',
'Madeleine','Madeline','Mara','Millie','Nora','Norah','Sara','Scarlett','Serafina',
'Seraphina','Sofia','Sophia','Stella','Susanna','Susannah','Tahlia','Talia','Thalia',
'Viola','Violet','Vivian','Vivien','Zara'))
)
--Name with the closest matches.
select name1, edit_distance, listagg(name2, ',') within group (order by name2) names
from
(
--Compare strings.
select names1.name name1, names2.name name2
,utl_match.edit_distance(names1.name, names2.name) edit_distance
,min(utl_match.edit_distance(names1.name, names2.name))
over (partition by names1.name) min_edit_distance
from names names1
cross join names names2
--This cross join could get expensive. It may help to add conditions here to
--filter out obvious non-matches. For example, maybe throw out rows where the
--string length is vastly different?
where names1.name <> names2.name
order by 1, 3, 2
)
where edit_distance = min_edit_distance
group by name1, edit_distance
order by 1;
Results:
NAME1 EDIT_DISTANCE NAMES
----- ------------- -----
Adeline 2 Madeline
Alana 2 Clara,Elena
Alice 2 Elise
Amalia 1 Amelia
Amelia 1 Amalia
Annabel 2 Annabelle
Annabelle 2 Annabel
Beatrice 2 Beatrix
Beatrix 2 Beatrice
Bella 2 Ella,Stella
Cara 1 Clara,Lara,Mara,Sara,Zara
Catherine 1 Katherine
Cecelia 1 Cecilia
Cecilia 1 Cecelia
Charlotte 4 Scarlett
Clara 1 Cara
Elena 2 Alana,Ella,Helena
Elisabeth 1 Elizabeth
Elise 1 Eloise
Elizabeth 1 Elisabeth
Ella 2 Bella,Elena
Eloise 1 Elise
Emilie 2 Emily
Emily 2 Emilie,Lily
Emmeline 3 Adeline,Emilie,Madeline
Finola 2 Fiona,Viola
Fiona 2 Finola,Viola
Gwendolen 1 Gwendolyn
Gwendolyn 1 Gwendolen
Hallie 2 Millie
Helena 2 Elena
Holly 3 Bella,Ella,Emily,Hallie,Lily
Isabeau 2 Isabel
Isabel 1 Isobel
Isabelle 2 Isabel
Isobel 1 Isabel
Juliet 2 Juliette
Juliette 2 Juliet
Katherine 1 Catherine
Kirsten 2 Kristen
Kristen 2 Kirsten
Lara 1 Cara,Laura,Mara,Sara,Zara
Laura 1 Lara
Lilian 1 Lillian
Lillian 1 Lilian
Lily 2 Emily,Lucy
Louise 3 Elise,Eloise,Lucie
Lucie 2 Lucy
Lucy 2 Lily,Lucie
Madeleine 1 Madeline
Madeline 1 Madeleine
Mara 1 Cara,Lara,Sara,Zara
Millie 2 Hallie
Nora 1 Norah
Norah 1 Nora
Sara 1 Cara,Lara,Mara,Zara
Scarlett 4 Charlotte
Serafina 2 Seraphina
Seraphina 2 Serafina
Sofia 2 Sophia
Sophia 2 Sofia
Stella 2 Bella
Susanna 1 Susannah
Susannah 1 Susanna
Tahlia 1 Talia
Talia 1 Tahlia,Thalia
Thalia 1 Talia
Viola 2 Finola,Fiona,Violet
Violet 2 Viola
Vivian 1 Vivien
Vivien 1 Vivian
Zara 1 Cara,Lara,Mara,Sara
Related
I have 2 Tables (History and Responsible). They need to be JOINED based on Service Date.
History Table:
Id
ServiceDate
Hours
ClientId
ClientName
1
2021-10-15
3
123
Tom Holland
2
2021-10-25
5
123
Tom Holland
3
2022-01-14
2
123
Tom Holland
Responsible Table:
2999-12-31 means Responsible has no end date (current)
ClientId
ClientName
ResponsibleId
ResponsibleName
ResponsibleStartDate
ResponsibleEndtDate
123
Tom Holland
77
Thomas Anderson
2020-09-17
2021-10-17
123
Tom Holland
88
Tom Cruise
2021-10-18
2999-12-31
123
Tom Holland
99
Sten Lee
2022-01-07
2999-12-31
My code produces multiple rows, because 2022-01-14 Service date falls under multiple date ranges from Responsible Table:
SELECT h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName
FROM History AS h
LEFT JOIN Responsible AS r
ON (h.ClientId = r.ClientId AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)
The output of the query above is:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Sten Lee
Technically, output is correct (because 2022-01-14 is between 2021-10-18 - 2999-12-31 as well between 2022-01-07 - 2999-12-31), but not what I need.
I would like to know if possible to achieve 2 outputs:
1) If Service Date falls in multiple date ranges from Responsible Table, Responsible Should be the person who's ResponsibleStartDate is closer to the ServiceDate:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Sten Lee
2) Keep all rows, if Service Date falls in multiple date ranges from Responsible Table, but split Hours evenly between Responsible:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
1
123
Tom Holland
Tom Cruise
3
2022-01-14
1
123
Tom Holland
Sten Lee
First one, we can use a window function to apply a row number, based on how close to ServiceDate the ResponsibleStartDate is, then we can just pick the first row per h.Id. If there is a tie we can break it by picking something that will give us deterministic order, e.g. ORDER BY {DATEDIFF expression}, ResponsibleName.
;WITH cte AS
(
SELECT h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName,
RankOrderedByProximityToServiceDate = ROW_NUMBER() OVER
(PARTITION BY h.Id
ORDER BY ABS(DATEDIFF(DAY, ResponsibleStartDate, ServiceDate)))
FROM dbo.History AS h
LEFT JOIN dbo.Responsible AS r
ON (h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate)
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM cte WHERE RankOrderedByProximityToServiceDate = 1;
Output:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3
123
Tom Holland
Thomas Anderson
2
2021-10-25
5
123
Tom Holland
Tom Cruise
3
2022-01-14
2
123
Tom Holland
Sten Lee
Second one doesn't require a CTE, we can simply divide the Hours in h by the number of rows that exist for that h.Id, then limit it to 2 decimal places:
SELECT h.Id,
h.ServiceDate,
Hours = CONVERT(decimal(11,2),
h.Hours * 1.0
/ COUNT(h.Id) OVER (PARTITION BY h.Id)),
h.ClientId,
h.ClientName,
r.ResponsibleName
FROM dbo.History AS h
LEFT JOIN dbo.Responsible AS r
ON (h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate);
Output:
Id
ServiceDate
Hours
ClientId
ClientName
ResponsibleName
1
2021-10-15
3.00
123
Tom Holland
Thomas Anderson
2
2021-10-25
5.00
123
Tom Holland
Tom Cruise
3
2022-01-14
1.00
123
Tom Holland
Tom Cruise
3
2022-01-14
1.00
123
Tom Holland
Sten Lee
Both demonstrated in this db<>fiddle.
My attempt at part 1 - it doesn't work if there's more than one Responsible as of the same start date.
WITH
"all_services" AS (
SELECT
h.Id,
h.ServiceDate,
h.Hours,
h.ClientId,
h.ClientName,
r.ResponsibleName,
r.ResponsibleStartDate
FROM History AS h
LEFT JOIN Responsible AS r
ON h.ClientId = r.ClientId
AND h.ServiceDate BETWEEN r.ResponsibleStartDate AND r.ResponsibleEndtDate
),
"most_recent_key" AS (
SELECT
ServiceDate,
ClientId,
MAX(ResponsibleStartDate) AS "ResponsibleStartDate"
FROM all_services
GROUP BY ServiceDate, ClientId
)
SELECT Id, ServiceDate, Hours, ClientId, ClientName, ResponsibleName
FROM all_services
INNER JOIN most_recent_key
USING (ServiceDate, ClientId, ResponsibleStartDate)
Posting it anyway as a contrast to Aaron's better solution as a learning point for myself.
I have the following dataset:
A
B
C
1
John
2018-08-14
1
John
2018-08-20
1
John
2018-09-03
2
John
2018-11-13
2
John
2018-12-11
2
John
2018-12-12
1
John
2020-01-20
1
John
2020-01-21
3
John
2021-03-02
3
John
2021-03-03
1
John
2020-05-10
1
John
2020-05-12
And I would like to have the following result:
A
B
C
1
John
2018-08-14
2
John
2018-11-13
1
John
2020-01-20
3
John
2021-03-02
1
John
2020-05-10
If I group by A, B the 1st row and the third just concatenate which is coherent. How could I create another columns to still use a group by and have the result I want.
If you have another ideas than mine, please explain it !
I tried to use some first, last, rank, dense_rank without success.
Use lag(). Looks like B is a function of A in your data. So checking lag(A) will suffice.
select A,B,C
from (
select *, case when lag(A) over(order by C) = A then 0 else 1 end startFlag
from mytable
) t
where startFlag = 1
order by C
I'd like to add a new second column to a 'teams' table which is representative of premier league (UK) football rankings. At the moment the table just contains the names of each football team.
The column will be called 'Played' and it will list the number of games each team has played. I'd like to calculate this number (integer data type) from a separate table called 'games', which records a historic log of games fixtures. This would probably include using SQL's native 'COUNT' function.
I have tried to use a function to help me do this, but currently it is inserting all values as '0'
CREATE FUNCTION [dbo].[GetPlayed](#Team VARCHAR)
RETURNS INT
BEGIN
RETURN(SELECT COUNT(*)
FROM games
WHERE games.Home = #Team OR games.Away = #Team);
END;
ALTER TABLE teams
ADD Played AS GetPlayed(teams.Team)
The tables:
teams:
```Team
Arsenal
Bournemouth
Burnley
Chelsea
Crystal Palace
Everton
Hull City
Leicester City
Liverpool
Manchester City
Manchester United
Middlesbrough
Southampton
Stoke City
Sunderland
Swansea City
Tottenham Hotspur
Wat"For"d
West Bromwich Albion
West Ham United
```
games:
gameID Home HomeScore Away AwayScore GameDate
4 Arsenal 2 Chelsea 0 2018-05-26
5 Arsenal 5 Bournemouth 0 2018-04-22
6 Arsenal 1 Leicester City 1 2018-03-15
7 Bournemouth 5 Liverpool 0 2018-04-22
8 Burnley 5 Bournemouth 0 2018-04-22
9 Burnley 1 Swansea City 2 2017-11-22
10 Stoke City 0 Burnley 0 2018-01-08
11 Chelsea 1 Middlesborough 2 2017-11-22
12 Southampton 0 Chelsea 0 2018-01-01
13 Crystal Palace 1 Everton 2 2018-03-26
14 Manchester United 4 Crystal Palace 0 2018-06-01
15 Crystal Palace 0 Southampton 1 2018-04-16
16 Everton 1 Hull City 2 2017-11-20
17 Manchester City 4 Everton 0 2017-11-20
18 Hull City 0 Burnley 0 2018-06-01
19 Sunderland 2 Hull City 0 2018-06-15
20 Leicester City 3 Tottenham Hotspur 1 2017-09-20
21 Swansea City 2 Leicester City 5 2018-02-15
22 Sunderland 0 Leicester City 1 2018-01-29
23 Liverpool 3 Tottenham Hotspur 0 2018-02-28
24 Stoke City 1 Liverpool 2 2017-09-19
25 Manchester City 2 Manchester United 4 2018-05-02
26 Middlesborough 1 Southampton 1 2018-02-08
27 Stoke City 2 Middlesborough 2 2017-08-19
28 Swansea City 0 Manchester United 5 2018-06-27
29 Sunderland 1 Tottenham Hotspur 2 2017-09-01
Any help would be much appreciated!
Thanks, Rob
VARCHAR without size defaults to 1 char, you need to change your function declaration
CREATE FUNCTION [dbo].[GetPlayed](#Team VARCHAR(32))
.....
Without size your parameter #Team will receive just the first letter of your passed team value and, of course, the WHERE statement is unable to find any result in your games table
Below is a sample of a much larger dataframe.
Fare Cabin Pclass Ticket Name
257 86.5000 B77 1 110152 Cherry, Miss. Gladys
759 86.5000 B77 1 110152 Rothes, the Countess. of (Lucy Noel Martha Dye...
504 86.5000 B79 1 110152 Maioni, Miss. Roberta
262 79.6500 E67 1 110413 Taussig, Mr. Emil
558 79.6500 E67 1 110413 Taussig, Mrs. Emil (Tillie Mandelbaum)
585 79.6500 NaN 1 110413 Taussig, Miss. Ruth
475 52.0000 A14 1 110465 Clifford, Mr. George Quincy
110 52.0000 C110 1 110465 Porter, Mr. Walter Chamberlain
335 26.0000 C106 1 110469 Maguire, Mr. John Edward
158 26.5500 D22 1 110489 Borebank, Mr. John James
430 26.5500 C52 1 110564 Bjornstrom-Steffansson, Mr. Mauritz Hakan
236 75.2500 D37 1 110813 Warren, Mr. Frank Manley
366 75.2500 D37 1 110813 Warren, Mrs. Frank Manley (Anna Sophia Atkinson)
191 26.0000 NaN 1 111163 Salomon, Mr. Abraham L
170 33.5000 B19 1 111240 Van der hoef, Mr. Wyckoff
462 38.5000 E63 1 111320 Gee, Mr. Arthur H
329 57.9792 Nan 1 111361 Hippach, Miss. Jean Gertrude
523 57.9792 B18 1 111361 Hippach, Mrs. Louis Albert (Ida Sophia Fischer)
If I want to iterate the filling of missing values of "Cabin" for people who are missing "Cabin" values, with someone else's "Cabin" values, only if
the someone else (the one who has a cabin value) has the same last name and also are in the vicinity of oneself( as in one above or one below them) .
So in the dataframe above, [Tassuig, Miss.Ruth]'s Cabin value of "Nan" would be replaced with that of [Tassuig, Mrs.Emil]'s cabin value [E67] who is one above herself because both conditions are met. (Same last name and in the vicinity)
And [Hippach, Miss. Jean Gertrude]'s missing cabin value would be replaced with
[ Hippach, Mrs. Louis Albert (Ida Sophia Fischer)]'s Cabin value of [B18].
I tried to think of iteration but this is as far as I got
for x in df.Name.str.split(',')[x][0] ==df.Name.str.split(',')[x+1][0]:
if df.Cabin[x] or df.Cabin[x+1] == np.nan:
df.Cabin.replace(np.nan,
I want to make sure the np.nan value is replaced with a True value and not np.nan. Couldn't figure out how to do that.
Thanks.
Starting with your DataFrame
print(df)
Fare Cabin Pclass Ticket \
0 86.5000 B77 1 110152
1 86.5000 B77 1 110152
2 86.5000 B79 1 110152
3 79.6500 E67 1 110413
4 79.6500 E67 1 110413
5 79.6500 NaN 1 110413
6 52.0000 A14 1 110465
7 52.0000 C110 1 110465
8 26.0000 C106 1 110469
9 26.5500 D22 1 110489
10 26.5500 C52 1 110564
11 75.2500 D37 1 110813
12 75.2500 D37 1 110813
13 26.0000 NaN 1 111163
14 33.5000 B19 1 111240
15 38.5000 E63 1 111320
16 57.9792 NaN 1 111361
17 57.9792 B18 1 111361
Name
0 Cherry, Miss. Gladys
1 Rothes, the Countess. of (Lucy Noel Martha Dye...
2 Maioni, Miss. Roberta
3 Taussig, Mr. Emil
4 Taussig, Mrs. Emil (Tillie Mandelbaum)
5 Taussig, Miss. Ruth
6 Clifford, Mr. George Quincy
7 Porter, Mr. Walter Chamberlain
8 Maguire, Mr. John Edward
9 Borebank, Mr. John James
10 Bjornstrom-Steffansson, Mr. Mauritz Hakan
11 Warren, Mr. Frank Manley
12 Warren, Mrs. Frank Manley (Anna Sophia Atkinson)
13 Salomon, Mr. Abraham L
14 Van der hoef, Mr. Wyckoff
15 Gee, Mr. Arthur H
16 Hippach, Miss. Jean Gertrude
17 Hippach, Mrs. Louis Albert (Ida Sophia Fischer)
Creating a new column/series with just the LastName. Note, might be a better way to do this with pandas str methods, but I couldn't get anything to work
df['LastName'] = df['Name'].map(lambda x : x[:x.find(',')])
Then we leverage Pandas' shift and boolean indexing to see if the passenger above has the same last name (ie the Taussig case)
filter = (df['Cabin'].isnull()) & (df['LastName'] == df['LastName'].shift())
df.loc[filter,'Cabin'] = df['Cabin'].shift()
and then the passenger below by passing a -1 to shift() (ie the Hippach case)
filter = (df['Cabin'].isnull()) & (df['LastName'] == df['LastName'].shift(-1))
df.loc[filter,'Cabin'] = df['Cabin'].shift(-1)
print(df)
Fare Cabin Pclass Ticket \
0 86.5000 B77 1 110152
1 86.5000 B77 1 110152
2 86.5000 B79 1 110152
3 79.6500 E67 1 110413
4 79.6500 E67 1 110413
5 79.6500 E67 1 110413
6 52.0000 A14 1 110465
7 52.0000 C110 1 110465
8 26.0000 C106 1 110469
9 26.5500 D22 1 110489
10 26.5500 C52 1 110564
11 75.2500 D37 1 110813
12 75.2500 D37 1 110813
13 26.0000 NaN 1 111163
14 33.5000 B19 1 111240
15 38.5000 E63 1 111320
16 57.9792 B18 1 111361
17 57.9792 B18 1 111361
Name LastName
0 Cherry, Miss. Gladys Cherry
1 Rothes, the Countess. of (Lucy Noel Martha Dye... Rothes
2 Maioni, Miss. Roberta Maioni
3 Taussig, Mr. Emil Taussig
4 Taussig, Mrs. Emil (Tillie Mandelbaum) Taussig
5 Taussig, Miss. Ruth Taussig
6 Clifford, Mr. George Quincy Clifford
7 Porter, Mr. Walter Chamberlain Porter
8 Maguire, Mr. John Edward Maguire
9 Borebank, Mr. John James Borebank
10 Bjornstrom-Steffansson, Mr. Mauritz Hakan Bjornstrom-Steffansson
11 Warren, Mr. Frank Manley Warren
12 Warren, Mrs. Frank Manley (Anna Sophia Atkinson) Warren
13 Salomon, Mr. Abraham L Salomon
14 Van der hoef, Mr. Wyckoff Van der hoef
15 Gee, Mr. Arthur H Gee
16 Hippach, Miss. Jean Gertrude Hippach
17 Hippach, Mrs. Louis Albert (Ida Sophia Fischer) Hippach
groupby + fillna
# back fills, then forward fills
def bffill(x):
return x.bfill().ffill()
# group by last name
df['Cabin'] = df.groupby(df.Name.str.split(',').str[0]).Cabin.apply(bffill)
df
I have database table that I am after some SQL for (Which is defeating me so far!)
Imagine there are 192 Athletic Clubs who all take part in 12 Track Meets per season.
So that is 2304 individual performances per season (for example in the 100Metres)
I would like to find the top 48 (unique) individual performances from the table, these 48 athletes are then going to take part in the end of season World Championships.
So imagine the 2 fastest times are both set by "John Smith", but he can only be entered once in the world champs. So i would then look for the next fastest time not set by "John Smith"... so on and so until I have 48 unique athletes..
hope that makes sense.
thanks in advance if anyone can help
PS
I did have a nice screen shot created that would explain it much better. but as a newish user i cannot post images.
I'll try a copy and paste version instead...
ID AthleteName AthleteID Time
1 Josh Lewis 3 11.99
2 Joe Dundee 4 11.31
3 Mark Danes 5 13.44
4 Josh Lewis 3 13.12
5 John Smith 1 11.12
6 John Smith 1 12.18
7 John Smith 1 11.22
8 Adam Bennett 6 11.33
9 Ronny Bower 7 12.88
10 John Smith 1 13.49
11 Adam Bennett 6 12.55
12 Mark Danes 5 12.12
13 Carl Tompkins 2 13.11
14 Joe Dundee 4 11.28
15 Ronny Bower 7 12.14
16 Carl Tompkin 2 11.88
17 Nigel Downs 8 14.14
18 Nigel Downs 8 12.19
Top 4 unique individual performances
1 John Smith 1 11.12
3 Joe Dundee 4 11.28
5 Adam Bennett 6 11.33
6 Carl Tompkins 2 11.88
Basically something like this:
select top 48 *
from (
select athleteId,min(time) as bestTime
from theRaces
where raceId = '123' -- e.g., 123=100 meters
group by athleteId
) x
order by bestTime
try this --
select x.ID, x.AthleteName , x.AthleteID , x.Time
(
select rownum tr_count,v.AthleteID AthleteID, v.AthleteName AthleteName, v.Time Time,v.id id
from
(
select
tr1.AthleteName AthleteName, tr1.Time time,min(tr1.id) id, tr1.AthleteID AthleteID
from theRaces tr1
where time =
(select min(time) from theRaces tr2 where tr2.athleteId = tr1.athleteId)
group by tr1.AthleteName, tr1.AthleteID, tr1.Time
having tr1.Time = ( select min(tr2.time) from theRaces tr2 where tr1.AthleteID =tr2.AthleteID)
order by tr1.time
) v
) x
where x.tr_count < 48