Separating multiple values in one column in MS SQL - sql

I have a field in an application that allows a user to select multiple values. When I query this field in the DB, if multiple values were selected the result gets displayed as one long word. There are no commas or space between the multiple selected values. Is there any way those values can be split by a comma?
Here’s my query:
SELECT HO.Value
FROM HAssessment ha
INNER JOIN HObservation HO
ON HO.AssessmentiD = ha.AssessmentID
AND HO.Patient_Oid = 2255231
WHERE Ho.FindingAbbr = 'A_R_CardHx'
------------------------------------------------
Result:
AnginaArrhythmiaCADCChest Pain
-------------------------
I would like to see:
Angina, Arrhythmia, CADC, Chest Pain
------------------------------------------
Help!

There's no easy solution to this.
The most expedient would be writing a string splitting function. From your sample data, it seems the values are all concatenated together without any separators. This means you'll have to come up with a list of all possible values (hopefully this is a query from some symptoms table...) and parse each one out from the string. This will be complex and painful.
A straightforward way to do this would be to test each valid symptom value to see whether it's contained within HObservation.Value, stuff all the found values together, and return the result. Note that this will perform very poorly.
Here's an example in TSQL. You'd be much better off doing this at the application layer, though, or better yet, normalizing your database (see below for more on that).
declare #symptoms table (symptom varchar(100))
insert into #symptoms (symptom)
values ('Angina'),('Arrhythmia'),('CADC'),('Chest Pain')
declare #value varchar(100)
set #value = 'AnginaArrhythmiaCADCChest Pain'
declare #result varchar(100)
select #result = stuff((
SELECT ', ' + s.symptom
FROM #symptoms s
WHERE patindex('%' + s.symptom + '%',#value) > 0
FOR XML PATH('')),1,1,'')
select #result
The real answer is to restructure your database. Put each distinct item found in HObservation.Value (this means Angina, Arrhythmia, etc. as separate rows) in to some other table if such a table doesn't exist already. I'll call this table Symptom. Then create a lookup table to link HObservation with Symptom. Then drop the HObservation.Value column entirely. Do the splitting work in the application level, and make multiple inserts in to the lookup table.
Example, based on sample data from your question:
HObservation
------------
ID Value
1 AnginaArrhythmiaCADC
Becomes:
HObservation
------------
ID
1
Symptom
-------
ID Value
1 Angina
2 Arrhythmia
3 CADC
HObservationSymptom
-------------------
ID HObservationID SymptomID
1 1 1
1 1 2
1 1 3
Note that if this is a production system (or you want to preserve the existing data for some other reason), you'll still have to write code to do the string splitting.

Related

SQL query to SELECT and replace only one column value with out defining the other columns

I have a need to do a select, but I need to replace one column's value out. The table has 25 columns, and I wanted to make this readable with out listing all columns to do the replacement of one column from another table. Here is what i did that does work,
SELECT *
INTO #temp_grouping
FROM [ae_p_phs_e]
WHERE [template_id] = '1010'
AND [status_code] = 'OPEN'
AND [shop] = 'SP-STEAM'
-- select row from the temp table for inserting
UPDATE #temp_grouping
SET
[description] = [source_data].[description]
FROM
[ae_a_asset_e] AS [source_data]
WHERE
[source_data].[asset_tag] = [#temp_grouping].[asset_tag]
AND [source_data].[multitenant_id] = [#temp_grouping].[multitenant_id]
SELECT *
FROM #temp_grouping
--drop the temp table
DROP TABLE #temp_grouping
But what are other ways to do this same thing?
SAMPLE
TABLE A
-------------------------------------------------
|col1 | col2 | description | .... | nthColumn|
-------------------------------------------------
TABLE B
-------------------------------------------------
|col1 | col2 | description | .... | nthColumn|
-------------------------------------------------
EXAMPLE Data return on the first TABLE A
1,2017-026221,001,BAD description,..... VERY LAST
1,2017-026221,002,BAD description,..... VERY LAST
1,2017-026221,003,BAD description,..... VERY LAST
1,2017-026221,004,BAD description,..... VERY LAST
1,2017-026221,005,BAD description,..... VERY LAST
EXAMPLE Data return on the first TABLE B
1,null,XX1,GOOD description,..... VERY LAST
1,null,XX2,GOOD description,..... VERY LAST
1,null,XX3,GOOD description,..... VERY LAST
1,null,XX4,GOOD description,..... VERY LAST
1,null,XX5,GOOD description,..... VERY LAST
EXAMPLE RETURN Data return, basically first TABLE A with one value on TABLE B
1,2017-026221,001,GOOD description,..... VERY LAST
1,2017-026221,002,GOOD description,..... VERY LAST
1,2017-026221,003,GOOD description,..... VERY LAST
1,2017-026221,004,GOOD description,..... VERY LAST
1,2017-026221,005,GOOD description,..... VERY LAST
Assume
Script Table As is a solution to auto fill the column, but is not going to fit for the need. The solution of the
question should not include the opposite of the question's request to
not list the columns as stated in the title.
A solution that requires GRANTS that will let you read the system table are not allowed in many cases.
Solution as of yet
It is starting to sound like the answer is there may not be another way to do what I did with out listing all columns out. If it doesn't turn out to be the case then I'll remove this section.
It is the only way to perform it using dynamic queries.
declare #query nvarchar(max) = ''
declare #temp_grouping nvarchar(max) = 'col1, col2, col3'
set #query = 'SELECT '+#temp_grouping+' FROM [ae_p_phs_e] where [template_id] = ''1010'''
print #query
exec sp_executesql #query
I am afraid your comment didn't really clarify what I'm trying to figure out, but let me go ahead and suggest a solution based on what I'm guessing you are trying to do. Actually, I have two solutions, based on two guesses:
You want to write a SELECT from a table with a lot of columns, so you want to avoid typing out all the column names for no other reason than that it would be a lot of typing. But you can't do a simple SELECT * because there is one column that you want to replace in the output with a column from another table.
Solution: Right click on the table in the SSMS Object Explorer pane, and choose Script Table As > SELECT To > New Query Window.
You will get a new query window with the SELECT query written out, with all the column names written out for you. No typing required. Then just modify that query with the JOIN you want, and find the column you want to replace in the SELECT list and replace it with the column from the JOINed table. This produces the SELECT query you want on an adhoc basis.
You have a more fluid situation where you don't necessarily know, for whatever reason, what the columns are going to be, so therefore you can't hard-code a column list, but you know that if a certain column appears, you want to replace it with a different one.
Solution: use a dynamic sql query, like the one in Rainman's answer. Instead of typing out the column list, like you mention in your comment to his answer, you can dynamically generate that list by querying the system tables to get all the columns belonging to the table you are interested in.

DB Design: how to indicate if a column data is public or private

I'm designing a database and I have a question about how to make private some user data.
I have a user table, with name, city, birthday, biography, etc. And the user can make some data private (other users can see that data).
First, I thought to add columns to the table to indicate if a column is private or not. For example:
User
-------
user_id | name | city | cityIsPrivate | birthday | birthdayIsPrivate
---------+------+------+---------------+----------+------------------
Or, another approach is to add a varchar column to indicate which columns are private:
User
-------
user_id | name | city | birthday | privateColumns
---------+------+------+----------+---------------
And this privateColumns will have this: "city:NO; birthday:YES".
The user table will only have three columns that can be private or public. I will only have to add three columns more to the table.
Any advice?
Do not move data into a separate table if you are going to have to join to it every time you query the main table.
Boolean (or equivalent) columns to indicate privacy for every column on which a privacy setting can be applied:
is very simple to add.
takes up practically no space.
is much quicker to query.
shows that the privacy settings are an intrinsic part of the user data.
removes unnecessary complexity from the schema.
The facts that these relate to other columns and that they represent a single kind of logical object are just a distraction. Go for simplicity, performance, and logical correctness.
Move the list of you private columns to separate table whith three fields, like:
UserId |ColumnName |IsPrivate
-----------------------------
Then you can join your queries with that table and get proper result set for each user, and at the same time change the columns of yor user table.
If your User table would not suppose have changes, it is better to move the list of you private columns to separate table with proper structure, like this:
UserId |cityIsPrivate |birthdayIsPrivate
----------------------------------------
Then you can join your user table with this table in a single query and get result set your need.
But don't make it in the same table. The first approach brings redundancy to your database structure. In your second case you would not be able to make SELECT queries by IsPrivate criterias.
You can have a separate table (UserPrivateFields, for example) listing user ID's along with fields they have elected to make private, like so:
UserID | PrivateField
-------+-------------
1 | Name
1 | City
2 | Birthday
When you're running the procedure grabbing the user info to be pulled by the person requesting the info, you can then build a query like this (assume the desired user's UserID is passed into the proc):
CREATE TABLE #PublicUserFields (Publicfield varchar(50))
INSERT INTO #PublicUserFields (Publicfield)
SELECT Publicfield
FROM userPublicfields
WHERE userid = #userid
Declare #sql VARCHAR(MAX) = 'SELECT '
WHILE EXISTS (SELECT 1 FROM #PublicUserFields)
BEGIN
DECLARE #Publicfield VARCHAR(50) =
(SELECT TOP 1 Publicfield FROM #PublicUserFields)
SET #sql = #SQL + #Publicfield
IF (SELECT COUNT(*) FROM #PublicUSERFIELDS) > 1
BEGIN
SET #SQL = #SQL + ', '
END
DELETE FROM #PublicUserFields
WHERE Publicfield = #Publicfield
END
SET #SQL = #SQL + ' FROM MainTable WHERE userID = #userID'
EXEC(#SQL)
The reason I'm bothering with dynamic SQL is that the names of your public fields can't be joined directly to the column names of the main table with this setup - they can only be selected or joined to other records with the same string value. You could maybe get around this by joining to sys.columns and doing interesting things with the object_id of the columns, but that doesn't seem much easier than this appraoch.
This makes sense IF the users can all dynamically set which fields they want to be viewable by other people. If the private fields are known and static, you may just want to separate the two categories and tighten down the permissions on read-access on the private table.

change ID number to smooth out duplicates in a table

I have run into this problem that I'm trying to solve: Every day I import new records into a table that have an ID number.
Most of them are new (have never been seen in the system before) but some are coming in again. What I need to do is to append an alpha to the end of the ID number if the number is found in the archive, but only if the data in the row is different from the data in the archive, and this needs to be done sequentially, IE, if 12345 is seen a 2nd time with different data, I change it to 12345A, and if 12345 is seen again, and is again different, I need to change it to 12345B, etc.
Originally I tried using a where loop where it would put all the 'seen again' records in a temp table, and then assign A first time, then delete those, assign B to what's left, delete those, etc., till the temp table was empty, but that hasn't worked out.
Alternately, I've been thinking of trying subqueries as in:
update table
set IDNO= (select max idno from archive) plus 1
Any suggestions?
How about this as an idea? Mind you, this is basically pseudocode so adjust as you see fit.
With "src" as the table that all the data will ultimately be inserted into, and "TMP" as your temporary table.. and this is presuming that the ID column in TMP is a double.
do
update tmp set id = id + 0.01 where id in (select id from src);
until no_rows_changed;
alter table TMP change id into id varchar(255);
update TMP set id = concat(int(id), chr((id - int(id)) * 100 + 64);
insert into SRC select * from tmp;
What happens when you get to 12345Z?
Anyway, change the table structure slightly, here's the recipe:
Drop any indices on ID.
Split ID (apparently varchar) into ID_Num (long int) and ID_Alpha (varchar, not null). Make the default value for ID_Alpha an empty string ('').
So, 12345B (varchar) becomes 12345 (long int) and 'B' (varchar), etc.
Create a unique, ideally clustered, index on columns ID_Num and ID_Alpha.
Make this the primary key. Or, if you must, use an auto-incrementing integer as a pseudo primary key.
Now, when adding new data, finding duplicate ID number's is trivial and the last ID_Alpha can be obtained with a simple max() operation.
Resolving duplicate ID's should now be an easier task, using either a while loop or a cursor (if you must).
But, it should also be possible to avoid the "Row by agonizing row" (RBAR), and use a set-based approach. A few days of reading Jeff Moden articles, should give you ideas in that regard.
Here is my final solution:
update a
set IDnum=b.IDnum
from tempimiportable A inner join
(select * from archivetable
where IDnum in
(select max(IDnum) from archivetable
where IDnum in
(select IDnum from tempimporttable)
group by left(IDnum,7)
)
) b
on b.IDnum like a.IDnum + '%'
WHERE
*row from tempimport table = row from archive table*
to set incoming rows to the same IDnum as old rows, and then
update a
set patient_account_number = case
when len((select max(IDnum) from archive where left(IDnum,7) = left(a.IDnum,7)))= 7 then a.IDnum + 'A'
else left(a.IDnum,7) + char(ascii(right((select max(IDnum) from archive where left(IDnum,7) = left(a.IDnum,7)),1))+1)
end
from tempimporttable a
where not exists ( *select rows from archive table* )
I don't know if anyone wants to delve too far into this, but I appreciate contructive criticism...

Adding N number of dynamic columns in sql query

I have a table which is called datarecords which contains 7 fixed columns that are always required in select query. A user can add as many custom columns they want. I am storing this information in a table called datacolumn and the values are stored in another table called datavalue.
Now I want to create a query which bring the 7 fixed columns from datarecord, and then add custom columns and bring the data value from these tables since each data record have corresponding value in data value table.
You can try to PIVOT the custom attributes from rows into columns, but you'll find that even with support for PIVOT in Microsoft SQL Server, you need to know the attributes in advance of writing the query, and the query code needs to specify all the attributes. There's no way in SQL to ask for all the custom attributes to magically fill as many columns as necessary.
You can retrieve an arbitrary number of custom attributes only by fetching them row by row, as they are stored in the database. Then write application code to loop over the results. If you want, you can write a class to map the multiple rows of custom attributes into fields of an object in your application.
It's awkward and inelegant to query non-relational data using SQL. This is because SQL is designed to assume each logical entity of the same type has a fixed number of columns, and that you know the columns before you write the query. If your entity has variable attributes, it can't be stored as a relation, by definition.
Many people try to extend this using the design you're using, but they find it's hard to manage and doesn't scale well. This design is usually called the Entity-Attribute-Value model, or key-value pairs. For more details on the pitfalls of the EAV design, see my book SQL Antipatterns.
If you need to support custom attributes, here are a few alternatives:
Store all the custom attributes together in a BLOB, with some internal structure to delimit field names and values (Serialized LOB). You can optionally create inverted indexes to help you look up rows where a given field has a given value (see How FriendFeed Uses MySQL).
Use a document-oriented database such as MongoDB or Solr for the dynamic data.
Use ALTER TABLE to add conventional columns to the table when users need custom attributes. This means you either need to enforce the same set of custom attributes for all users, or else store all users' custom attributes and hope your table doesn't get too wide (Single Table Inheritance), or create a separate table per user, either for all columns (Concrete Table Inheritance) or for just the custom columns (Class Table Inheritance).
EDIT: See note at bottom for more detail.
I am facing the same problem, and I found a solution that is slow. Maybe someone else has a solution for speeding up my findings. In my code, I have a table with three columns: Col1, Col2, Col3. Col1 is my record ID. Col2 is the name of my dynamic columns. Col3 is the value at that column. So if I wanted to represent a record with ID 1, two columns 2 and 3, and values at those columns: 4 and 5, I would have the following:
Col1, Col2, Col3
1, 2, 4
1, 3, 5
Then we pivot over column 2 and select the MAX (or MIN or AVG, doesn't matter since col2 and col3 combinations are unique) col3 in the pivot. In order to accomplish the pivot with a variable number of columns, we use dynamic SQL generation to generate our SQL. This works well for small input data (I believe the derived table inside the FROM clause of the dynamic SQL). Once your dataset gets large, the average function starts taking a long time to execute. A very long time. It looks like this starts at around 1000 rows, so maybe there's a hint or another method that makes this shorter.
As a note, since the values for Col2 and Col3 map 1:1, I also tried dynamically generating a SELECT statement like the following:
SELECT Col1,
CASE WHEN Col2 = '4' THEN Col3 END [4],
CASE WHEN Col2 = '5' THEN Col3 END [5],
CASE WHEN Col2 = '6' THEN Col3 END [6], -- ... these were dyanmically generated
FROM #example
GROUP BY Col1
This was just as slow for my dataset. Your milege may vary. Here is a full example of how this works written for SQL Server (2005+ should run this).
--DROP TABLE #example
CREATE TABLE #example
(
Col1 INT,
Col2 INT,
Col3 INT
)
INSERT INTO #example VALUES (2,4,10)
INSERT INTO #example VALUES (2,5,20)
INSERT INTO #example VALUES (2,6,30)
INSERT INTO #example VALUES (2,7,40)
INSERT INTO #example VALUES (2,8,50)
INSERT INTO #example VALUES (3,4,11)
INSERT INTO #example VALUES (3,5,22)
INSERT INTO #example VALUES (3,6,33)
INSERT INTO #example VALUES (3,7,44)
INSERT INTO #example VALUES (3,8,55)
DECLARE #columns VARCHAR(100)
SET #columns = ''
SELECT #columns = #columns + '[' + CAST(Col2 AS VARCHAR(10)) + '],'
FROM (SELECT DISTINCT Col2 FROM #Example) a
SELECT #columns = SUBSTRING(#columns, 0, LEN(#columns) )
DECLARE #dsql NVARCHAR(MAX)
SET #dsql = '
select Col1, ' + #columns + '
from
(select Col1, Col2, Col3 FROM #example e) a
PIVOT
(
MAX(Col3)
FOR Col2 IN (' + #columns + ')
) p'
print #dsql
EXEC sp_executesql #dsql
EDIT: Because of the unique situation in which I am doing this, I managed to get my speed-up using two tables (one with the entities and another with the attribute-value pairs), and creating a clustered index on the attribute-value pairs which includes all columns (ID, Attribute, Value). I recommend you work around this approach another way if you need fast inserts, large numbers of columns, many data rows, etc.. I have some known certainties about the size and growth rates of my data, and myy solution is suited to my scope.
There are many other solutions which are better suited to solve this problem. For example, if you need fast inserts and single-record reads (or slow reads don't matter) you should consider packing an XML string into a field and serializing/deserializing in the database consumer. If you need ultra-fast writes, ultra-fast reads, and data columns are very rarely added then you may consider altering your table. This is a bad solution in most practice, but may fit some problems. If you have columns that change frequently enough, but you also need fast reads and writes are not an issue then my solution may work for you up to a certain dataset size.

SQL query select from table and group on other column

I'm phrasing the question title poorly as I'm not sure what to call what I'm trying to do but it really should be simple.
I've a link / join table with two ID columns. I want to run a check before saving new rows to the table.
The user can save attributes through a webpage but I need to check that the same combination doesn't exist before saving it. With one record it's easy as obviously you just check if that attributeId is already in the table, if it is don't allow them to save it again.
However, if the user chooses a combination of that attribute and another one then they should be allowed to save it.
Here's an image of what I mean:
So if a user now tried to save an attribute with ID of 1 it will stop them, but I need it to also stop them if they tried ID's of 1, 10 so long as both 1 and 10 had the same productAttributeId.
I'm confusing this in my explanation but I'm hoping the image will clarify what I need to do.
This should be simple so I presume I'm missing something.
If I understand the question properly, you want to prevent the combination of AttributeId and ProductAttributeId from being reused. If that's the case, simply make them a combined primary key, which is by nature UNIQUE.
If that's not feasible, create a stored procedure that runs a query against the join for instances of the AttributeId. If the query returns 0 instances, insert the row.
Here's some light code to present the idea (may need to be modified to work with your database):
SELECT COUNT(1) FROM MyJoinTable WHERE AttributeId = #RequestedID
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO MyJoinTable ...
END
You can control your inserts via a stored procedure. My understanding is that
users can select a combination of Attributes, such as
just 1
1 and 10 together
1,4,5,10 (4 attributes)
These need to enter the table as a single "batch" against a (new?) productAttributeId
So if (1,10) was chosen, this needs to be blocked because 1-2 and 10-2 already exist.
What I suggest
The stored procedure should take the attributes as a single list, e.g. '1,2,3' (comma separated, no spaces, just integers)
You can then use a string splitting UDF or an inline XML trick (as shown below) to break it into rows of a derived table.
Test table
create table attrib (attributeid int, productattributeid int)
insert attrib select 1,1
insert attrib select 1,2
insert attrib select 10,2
Here I use a variable, but you can incorporate as a SP input param
declare #t nvarchar(max) set #t = '1,2,10'
select top(1)
t.productattributeid,
count(t.productattributeid) count_attrib,
count(*) over () count_input
from (select convert(xml,'<a>' + replace(#t,',','</a><a>') + '</a>') x) x
cross apply x.x.nodes('a') n(c)
cross apply (select n.c.value('.','int')) a(attributeid)
left join attrib t on t.attributeid = a.attributeid
group by t.productattributeid
order by countrows desc
Output
productattributeid count_attrib count_input
2 2 3
The 1st column gives you the productattributeid that has the most matches
The 2nd column gives you how many attributes were matched using the same productattributeid
The 3rd column is how many attributes exist in the input
If you compare the last 2 columns and the counts
match - you can use the productattributeid to attach to the product which has all these attributes
don't match - then you need to do an insert to create a new combination