Vendor agnostic SQL to concatenate field values across records - sql

I have the following DB Schema :-
Data is ...
Location Table
1. New York
2. London
3. Tokyo
4. Melbourne
OtherNames Table (aka Aliases)
1. NYC
1. New York City
4. Home
3. Foo
3. PewPew
What I'm trying to do, as SQL, is get the following results :-
ID, Name, Name + Aliases
eg.
1 | New York | new york nyc new york city
2 | London | NULL
3 | Tokyo | tokyo foo pewpew
4 | Melbourne | melbourne home
I'm not sure how to get that LAST column.
It's like I want to have a SubQuery which COALESCE's the OtherName.Name field, per Location row... ?
It's related to a previous question I have .. but my previous question doesn't give me the proper results I was after (I didn't ask the right question, before :P)
NOTE: I'm after a TSQL / Non server specific answer. So please don't suggest GROUP_CONCAT();

SQL isn't suited to this kind of operation (1NF violation and all that), therefore the various workarounds in SQL will be vendor-specific. If you want something vendor-independent then use something that will consume vanilla SQL (rather than generate it) e.g. a report writer or 3GL application ;)

If you're using SQL Server 2005 onwards, I personally like the XPATH approach

Related

Select longest string for each user

I have a table like this :
Clients Cities
1 NY
1 NY | WDC | LA
1 NY | WDC
2 LA
So, I have duplicate clients with different cities (not in order, but with different length at each line). What I want is to display for each user the longest cities string. So, I should get something like this :
Clients Cities
1 NY | WDC | LA
2 LA
I am a beginner in SQL (I use Spark SQL but it's mainly the same thing), so can you please how can I fix this problem please ??
Thanks !
You can use max():
select client, max(cities)
from t
group by client;
Then you should fix your data model, so you are not storing lists of cities in a string. That is not a good way to store the data in a relational database.
I think you should handle that query (in MYSQL) by using SELECT DISTINCT statement,
As inside a table contains many duplicate values, I hope it will make it work!
For instance,
SELECT DISTINCT city_name FROM cities;
And continue.... this is my hint to lead you to the desired and great answer

SQL Combine null rows with non null

Due to the way a particular table is written I need to do something a little strange in SQL and I can't find a 'simple' way to do this
Table
Name Place Amount
Chris Scotland
Chris £1
Amy England
Amy £5
Output
Chris Scotland £1
Amy England £5
What I am trying to do is above, so the null rows are essentially ignored and 'grouped' up based on the Name
I have this working using For XML however it is incredibly slow, is there a smarter way to do this?
This is where MAX would work
select
Name
,Place = Max(Place)
,Amount = Max(Amount)
from
YourTable
group by
Name
Naturally, if you have more than one occurance of a place for a given name, you may get unexpected results.

Updating a database column based on its similarity to another database column

I have a database table (Customers) with the following columns:
ID
FIRST_NAME
MIDDLE_INIT
LAST_NAME
FULL_NAME
I also have a database table (ENG) with the following columns:
ID
ENG_NAME
I want to replace all of the ENG.ENG_NAME entries with a FULL_NAME entry from the CUSTOMERS table
Here is the problem.
The ENG_NAME was hand-jammed through a web form and, so, has no consistency. For instance, one row might contain "Robin Hood". Another "Hood, Robin L". An another "Robin L Hood".
I want to search the entries in the CUSTOMERS table, find a close match, then replace the ENG.ENG_NAME with the CUSTOMERS.FULL_NAME.
Example:
ENG table CUSTOMERS table
ID ENG_NAME ID FULL_NAME FIRST_NAME MIDDLE_INIT LAST_NAME
================ ==================================================================
1 Hood,Robin 1 Robin L Hood Robin L Hood
2 Rob Hood 2 Maid M Marion Maid M Marion
3 Marion M 3 Friar F Tuck Friar F Tuck
4 Rob Garza 4 Robert A Garza Robert A Garza
Based on the data above, I would want ENG_NAME columns to be replaced like this:
ENG table
ID ENG_NAME
====================
1 Robin L Hood
2 Robin L Hood
3 Maid M Marion
4 Robert A Garza
Any thoughts on how to do this?
Thanks
This is not going to be a simple task, I would start at finding a good C# (or any .NET) algorithm that detects similar strings portions.
Then look at Compiling C# Code into SQL Stored Procedures and Invoke that code using SQL Server. This CLR Code can then write the results to a table for you to analyze and do whatever you want with it.
For More: CLR SQL Server User-Defined Function
I would do it in .NET using Levenshtein distance.
Start at 1 and you are going to have some ties and you need to decide
Then move to 2,3,4...
You could do in a CLR but how are you going to deal with ties? And you are going to have ties. How are you going to decide when it is not a match at all?
And I would put it in new column so you have a history of original data
Or a FK reference to customers table

Check if a value exists in the child-parent tree

I'm creating a simple directory listing page where you can specify what kind of thing you want to list in the directory e.g. a person or a company.
Each user has an UserTypeID and there is a dbo.UserType lookup table. The dbo.UserType lookup table is like this:
UserTypeID | UserTypeParentID | Name
1 NULL Person
2 NULL Company
3 2 IT
4 3 Accounting Software
In the dbo.Users table we have records like this:
UserID | UserTypeID | Name
1 1 Jenny Smith
2 1 Malcolm Brown
3 2 Wall Mart
4 3 Microsoft
5 4 Sage
My SQL (so far) is very simple: (excuse the pseudo-code style)
DECLARE #UserTypeID int
SELECT
*
FROM
dbo.Users u
INNER JOIN
dbo.UserType ut
WHERE
ut.UserTypeID = #UserTypeID
The problem is here is that when people want to search for companies they will enter in '2' as the UserTypeID. But both Microsoft and Sage won't show up because their UserTypeIDs are 3 and 4 respectively. But its the final UserTypeParentID which tells me that they're both Companies.
How could I rewrite the SQL to ask it to return to return records where the UserTypeID = #UserTypeID or where its final UserTypeParentID is also equal to #UserTypeID. Or am I going about this the wrong way?
Schema Change
I would suggest you to break it down this schema a little bit more, to make your queries and life simpler, with this current schema you will end up writing a recursive query every time you want to get simplest data from your Users table, and trust me you dont want to do this to yourself.
I would break down this schema of these tables as follow:
dbo.Users
UserID | UserName
1 | Jenny
2 | Microsoft
3 | Sage
dbo.UserTypes_Type
TypeID | TypeName
1 | Person
2 | IT
3 | Compnay
4 | Accounting Software
dbo.UserTypes
UserID | TypeID
1 | 1
2 | 2
2 | 3
3 | 2
3 | 3
3 | 4
You say that you are "creating" this - excellent because you have the opportunity to reconsider your whole approach.
Dealing with hierarchical data in a relational database is problematic because it is not designed for it - the model you choose to represent it will have a huge impact on the performance and ease of construction of your queries.
You have opted for an Adjacently List model which is great for inserts (and deletes) but a bugger for selects because the query has to effectively reconstruct the hierarchy path. By the way an Adjacency List is the model almost everyone goes for on their first attempt.
Everything is a trade off so you should decide what queries will be most common - selects (and updates) or inserts (and deletes). See this question for starters. Also, since SQL Server 2008, there is a native HeirachyID datatype (see this) which may be of assistance.
Of course, you could store your data in an XML file (in SQL Server or not) which is designed for hierarchical data.

Crystal reports - missing fields

using Crystal reports 10 linked to an excel document. Would like to pull the dinner field but also pull country and Company name from row that dont have it, this are linked via Bookingref. Example below. I've tried sub-reports and supressing unwanted fields but can't get it right. Also I can't make changes in excel doc as it's 1000+ records, which is exported from an online system weekly.
Id BookingRef Country CompanyName Surname Forname Dinner
1 001 UK Company1 John Andrews
2 001 Mary Jane 1
3 001 Tom Andrews 1
4 002 Germany Company2 Lee Jones
5 003 Germany Company3 Peter Lee 1
6 003 Sofie Lee 1
OK I am not sure I understand the full extent of your problem but let's start with the Country and Company name and see if I can get you moving forward. Instead of putting the Country field directly on the report you could use a formula field and do something like this:
IF {#BookingRef} = "001" Then
"UK"
Else IF {#BookingRef} = "002" Then
"Germany"
Else
"Unnamed"
Now you just put the formula field where the country field used to be and it will put the right country in bases on the BookingRef code. This, however, is only practical if you are working with a small number of Country / Company Names or possibly a big list that never changes although I would caution against the latter.
The other thing you could do is create a table in any database that holds the BookingRef, Company and Country values, link the BookingRef fields from both "databases" and then just drop the fields on your report.
If I am missing the point of your question please be real specific about what it is you are trying to accomplish and what is and is not working in your current solution.