Select longest string for each user - sql

I have a table like this :
Clients Cities
1 NY
1 NY | WDC | LA
1 NY | WDC
2 LA
So, I have duplicate clients with different cities (not in order, but with different length at each line). What I want is to display for each user the longest cities string. So, I should get something like this :
Clients Cities
1 NY | WDC | LA
2 LA
I am a beginner in SQL (I use Spark SQL but it's mainly the same thing), so can you please how can I fix this problem please ??
Thanks !

You can use max():
select client, max(cities)
from t
group by client;
Then you should fix your data model, so you are not storing lists of cities in a string. That is not a good way to store the data in a relational database.

I think you should handle that query (in MYSQL) by using SELECT DISTINCT statement,
As inside a table contains many duplicate values, I hope it will make it work!
For instance,
SELECT DISTINCT city_name FROM cities;
And continue.... this is my hint to lead you to the desired and great answer

Related

SQL different null values in different rows

I have a quick question regarding writing a SQL query to obtain a complete entry from two or more entries where the data is missing in different columns.
This is the example, suppose I have this table:
Client Id | Name | Email
1234 | John | (null)
1244 | (null) | john#example.com
Would it be possible to write a query that would return the following?
Client Id | Name | Email
1234 | John | john#example.com
I am finding this particularly hard because these are 2 entires in the same table.
I apologize if this is trivial, I am still studying SQL and learning, but I wasn't able to come up with a solution for this and I although I've tried looking online I couldn't phrase the question in the proper way, I suppose and I couldn't really find the answer I was after.
Many thanks in advance for the help!
Yes, but actually no.
It is possible to write a query that works with your example data.
But just under the assumption that the first part of the mail is always equal to the name.
SELECT clients.id,clients.name,bclients.email FROM clients
JOIN clients bclients ON upper(clients.name) = upper(substring(bclients.email from 0 for position('#' in bclients.email)));
db<>fiddle
Explanation:
We join the table onto itself, to get the information into one row.
For this we first search for the position of the '#' in the email, get the substring from the start (0) of the string for the amount of characters until we hit the # (result of positon).
To avoid case-problems the name and substring are cast to uppercase for comparsion.
(lowercase would work the same)
The design is flawed
How can a client have multiple ids and different kind of information about the same user at the same time?
I think you want to split the table between clients and users, so that a user can have multiple clients.
I recommend that you read information about database normalization as this provides you with necessary knowledge for successfull database design.

SQL Combine null rows with non null

Due to the way a particular table is written I need to do something a little strange in SQL and I can't find a 'simple' way to do this
Table
Name Place Amount
Chris Scotland
Chris £1
Amy England
Amy £5
Output
Chris Scotland £1
Amy England £5
What I am trying to do is above, so the null rows are essentially ignored and 'grouped' up based on the Name
I have this working using For XML however it is incredibly slow, is there a smarter way to do this?
This is where MAX would work
select
Name
,Place = Max(Place)
,Amount = Max(Amount)
from
YourTable
group by
Name
Naturally, if you have more than one occurance of a place for a given name, you may get unexpected results.

SQL: Select distinct based on regular expression

Basically, I'm dealing with a horribly set up table that I'd love to rebuild, but am not sure I can at this point.
So, the table is of addresses, and it has a ton of similar entries for the same address. But there are sometimes slight variations in the address (i.e., a room # is tacked on IN THE SAME COLUMN, ugh).
Like this:
id | place_name | place_street
1 | Place Name One | 1001 Mercury Blvd
2 | Place Name Two | 2388 Jupiter Street
3 | Place Name One | 1001 Mercury Blvd, Suite A
4 | Place Name, One | 1001 Mercury Boulevard
5 | Place Nam Two | 2388 Jupiter Street, Rm 101
What I would like to do is in SQL (this is mssql), if possible, is do a query that is like:
SELECT DISTINCT place_name, place_street where [the first 4 letters of the place_name are the same] && [the first 4 characters of the place_street are the same].
to, I guess at this point, get:
Plac | 1001
Plac | 2388
Basically, then I can figure out what are the main addresses I have to break out into another table to normalize this, because the rest are just slight derivations.
I hope that makes sense.
I've done some research and I see people using regular expressions in SQL, but a lot of them seem to be using C scripts or something. Do I have to write regex functions and save them into the SQL Server before executing any regular expressions?
Any direction on whether I can just write them in SQL or if I have another step to go through would be great.
Or on how to approach this problem.
Thanks in advance!
Use the SQL function LEFT:
SELECT DISTINCT LEFT(place_name, 4)
I don't think you need regular expressions to get the results you describe. You just want to trim the columns and group by the results, which will effectively give you distinct values.
SELECT left(place_name, 4), left(place_street, 4), count(*)
FROM AddressTable
GROUP BY left(place_name, 4), left(place_street, 4)
The count(*) column isn't necessary, but it gives you some idea of which values might have the most (possibly) duplicate address rows in common.
I would recommend you look into Fuzzy Search Operations in SQL Server. You can match the results much better than what you are trying to do. Just google sql server fuzzy search.
Assuming at least SQL Server 2005 for the CTE:
;with cteCommonAddresses as (
select left(place_name, 4) as LeftName, left(place_street,4) as LeftStreet
from Address
group by left(place_name, 4), left(place_street,4)
having count(*) > 1
)
select a.id, a.place_name, a.place_street
from cteCommonAddresses c
inner join Address a
on c.LeftName = left(a.place_name,4)
and c.LeftStreet = left(a.place_street,4)
order by a.place_name, a.place_street, a.id

Vendor agnostic SQL to concatenate field values across records

I have the following DB Schema :-
Data is ...
Location Table
1. New York
2. London
3. Tokyo
4. Melbourne
OtherNames Table (aka Aliases)
1. NYC
1. New York City
4. Home
3. Foo
3. PewPew
What I'm trying to do, as SQL, is get the following results :-
ID, Name, Name + Aliases
eg.
1 | New York | new york nyc new york city
2 | London | NULL
3 | Tokyo | tokyo foo pewpew
4 | Melbourne | melbourne home
I'm not sure how to get that LAST column.
It's like I want to have a SubQuery which COALESCE's the OtherName.Name field, per Location row... ?
It's related to a previous question I have .. but my previous question doesn't give me the proper results I was after (I didn't ask the right question, before :P)
NOTE: I'm after a TSQL / Non server specific answer. So please don't suggest GROUP_CONCAT();
SQL isn't suited to this kind of operation (1NF violation and all that), therefore the various workarounds in SQL will be vendor-specific. If you want something vendor-independent then use something that will consume vanilla SQL (rather than generate it) e.g. a report writer or 3GL application ;)
If you're using SQL Server 2005 onwards, I personally like the XPATH approach

Getting distinct rows based on a certain field from a database in Django

I need to construct a query in Django, and I'm wondering if this is somehow possible (it may be really obvious but I'm missing it...).
I have a normal query Model.objects.filter(x=True)[:5] which can return results like this:
FirstName LastName Country
Bob Jones UK
Bill Thompson UK
David Smith USA
I need to only grab rows which are distinct based on the Country field, something like Model.objects.filter(x=True).distinct('Country')[:5] would be ideal but that's not possible with Django.
The rows I want the query to grab ultimately are:
FirstName LastName Country
Bob Jones UK
David Smith USA
I also need the query to use the same ordering as set in the model's Meta class (ie. I can't override the ordering in any way).
How would I go about doing this?
Thanks a lot.
I haven't tested this, but it seems to me a dict should do the job, although ordering could be off then:
d = {}
for x in Model.objects.all():
d[x.country] = x
records_with_distinct_countries = d.values()
countries = [f.country in Model.objects.all()]
for c in countries:
try:
print Model.objects.filter(country=c)
except Model.DoesNotExist:
pass
I think that #skrobul is on the right track, but a little bit off.
I don't think you'll be able to do this with a single query, because the distinct() method adds the SELECT DISTINCT modifier to the query, which acts on the entire row. You'll likely have to create a list of countries and then return limited QuerySets based on iterating that list.
Something like this:
maxrows = 5
countries = set([x.country for x in Model.objects.all()])
rows = []
count = 0
for c in countries:
if count >= maxrows:
break
try:
rows.append(Model.objects.filter(country=c)[0])
except Model.DoesNotExist:
pass
count += 1
This is a very generic example, but it gives the intended result.
Can you post the raw SQL that returns what you want from the source database? I have a hunch that the actual problem here is the query/data structure...