SQL Server : GROUP CONCAT with DISTINCT is sorting natural data input - sql

I have a similar situation. I start out with a table that has data input into a column from another source. This data is comma delimited coming in. I need to manipulate the data to remove a section at the end of each. So I split the data and remove the end with the code below. (I added the ID column later to be able to sort. I also added WITH SCHEMABINDING later to add an XML index but nothing works. I can remove this ... and the ID column, but I do not see any difference one way or the other):
ALTER VIEW [dbo].[vw_Routing]
WITH SCHEMABINDING
AS
SELECT TOP 99.9999 PERCENT
ROW_NUMBER() OVER (ORDER BY CableID) - 1 AS ID,
CableID AS [CableID],
SUBSTRING(m.n.value('.[1]', 'varchar(8000)'), 1, 13) AS Routing
FROM
(SELECT
CableID,
CAST('<XMLRoot><RowData>' + REPLACE([RouteNodeList], ',', '</RowData><RowData>') + '</RowData></XMLRoot>' AS xml) AS x
FROM
[dbo].[Cables]) t
CROSS APPLY
x.nodes('/XMLRoot/RowData') m (n)
ORDER BY
ID)
Now I need to concatenate data from the Routing column's rows into one row grouped by another column into a column again. I have the code working except that it is reordering my data; I must have the data in the order it is input into the table as it is Cable Routing information. I must also remove duplicates. I use the following code. The SELECT DISTINCT removes the duplicates, but reorders the data. The SELECT (without DISTINCT) keeps the correct data order, but does NOT remove the duplicates:
Substring(
(
SELECT DISTINCT ','+ x3.Routing AS [text()] --This DISTINCT reorders the routes once concatenated.
--SELECT ','+ x3.Routing AS [text()] --This without the DISTINCT does not remove duplicates.
From vw_Routing x3
Where x3.CableID = c.CableId
For XML PATH ('')
), 2, 1000) [Routing],
I tried the code you gave above and it provided the same results with the DISTINCT reordering the data but without DISTINCT not removing the duplicates.

Perhaps GROUP BY with ORDER BY will work:
stuff((select ','+ x3.Routing AS [text()] --This DISTINCT reorders the routes once concatenated.
--SELECT ','+ x3.Routing AS [text()] --This without the DISTINCT does not remove duplicates.
from vw_Routing x3
where x3.CableID = c.CableId
group by x3.Routing
order by min(x3.id)
for XML PATH ('')
), 1, 1, '') as [Routing],
I also replaced the SUBSTRING() with STUFF(). The latter is more standard for this operation.

To https://stackoverflow.com/users/1144035/gordon-linoff
Unfortunately, that did not work. It gave me the same result as my select statement; that is, no dups but reordering data.
HOWEVER, I found the correct answer earlier today:
I figured it out finally!! I still have to get implement it within the other code and add the new Cable Area code, but the hard part it over!!!!!
I am going to post the following to the forums so that they know not to work on it .... I was writing this to send to my friend for his help, but I figured it out myself before I sent it.
I started with raw, comma separated data in the records of a table … the data is from another source. I had to remove some of the information from each value, so I used the following code to split it up and manipulate it:
Code1
Once that was done, I had to put the manipulated data back in the same form in the same order and with no duplicates. So I needed a SELECT DISTINCT. When I used the commented out SELECT DISTINCT below, it removed duplicates but it changed the order of the data which I could not have as it is Cable Tray Routing Data. When I took out the SELECT DISTINCT, it kept correct order, but left duplicates.
Because I was using XML PATH, I had to change this code …… To this code so that I could use SELECT DISTINCT remove the duplicates:Code2 and Code3


Related

Loop through table and update a specific column

I have the following table:
Id
Category
1
some thing
2
value
This table contains a lot of rows and what I'm trying to do is to update all the Category values to change every first letter to caps. For example, some thing should be Some Thing.
At the moment this is what I have:
UPDATE MyTable
SET Category = (SELECT UPPER(LEFT(Category,1))+LOWER(SUBSTRING(Category,2,LEN(Category))) FROM MyTable WHERE Id = 1)
WHERE Id = 1;
But there are two problems, the first one is trying to change the Category Value to upper, because only works ok for 1 len words (hello=> Hello, hello world => Hello world) and the second one is that I'll need to run this query X times following the Where Id = X logic. So my question is how can I update X rows? I was thinking in a cursor but I don't have too much experience with it.
Here is a fiddle to play with.
You can split the words apart, apply the capitalization, then munge the words back together. No, you shouldn't be worrying about subqueries and Id because you should always approach updating a set of rows as a set-based operation and not one row at a time.
;WITH cte AS
(
SELECT Id, NewCat = STRING_AGG(CONCAT(
UPPER(LEFT(value,1)),
SUBSTRING(value,2,57)), ' ')
WITHIN GROUP (ORDER BY CHARINDEX(value, Category))
FROM
(
SELECT t.Id, t.Category, s.value
FROM dbo.MyTable AS t
CROSS APPLY STRING_SPLIT(Category, ' ') AS s
) AS x GROUP BY Id
)
UPDATE t
SET t.Category = cte.NewCat
FROM dbo.MyTable AS t
INNER JOIN cte ON t.Id = cte.Id;
This assumes your category doesn't have non-consecutive duplicates within it; for example, bora frickin bora would get messed up (meanwhile bora bora fickin would be fine). It also assumes a case insensitive collation (which could be catered to if necessary).
In Azure SQL Database you can use the new enable_ordinal argument to STRING_SPLIT() but, for now, you'll have to rely on hacks like CHARINDEX().
Updated db<>fiddle (thank you for the head start!)

How do i find max combination from given result string in SQL

Here is the output.
ID Stack
-----------------------------------
123 307290,303665,307285
123 307290,307285,303424,303665
123 307290,307285,303800,303665
123 307061,307290
I want output like only last three row. The reason is in 1st output line stack column all three numbers are available in output line 2 and 3 stack column, so I don't need output line 1.
But the output lines 2,3,4 is different so I want those lines in my result.
I have tried doing it with row_number() and charindex but I'm not getting the proper result.
Thank you.
All the comments telling you to change your database's structure are right! You really should avoid comma separated values. This is breaking 1.NF and will be a pain in the neck forever.
The result of the second CTE might be used to shift all data into a new 1:n related structure.
Something like this?
DECLARE #tbl TABLE(ID INT,Stack VARCHAR(100));
INSERT INTO #tbl VALUES
(123,'307290,303665,307285')
,(123,'307290,307285,303424,303665')
,(123,'307290,307285,303800,303665')
,(123,'307061,307290');
WITH Splitted AS
(
SELECT ID
,Stack
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RowIndex
,CAST('<x>' + REPLACE(Stack,',','</x><x>') + '</x>' AS XML) Casted
FROM #tbl
)
,DerivedDistinctValues AS
(
SELECT DISTINCT
ID
,Stack
,RowIndex
,StackNr.value('.','int') AS Nr
FROM Splitted
CROSS APPLY Casted.nodes('/x') AS A(StackNr)
)
SELECT ddv1.ID
,ddv1.Stack
FROM DerivedDistinctValues AS ddv1
FULL OUTER JOIN DerivedDistinctValues AS ddv2 ON ddv1.RowIndex<>ddv2.RowIndex
AND ddv1.Nr=ddv2.Nr
WHERE ddv2.ID IS NULL
GROUP BY ddv1.ID,ddv1.Stack
This will be slow, especially with larger data sets.
Some explanation:
The first CTE will transform the CSV numbers to <x>307290</x><x>303665</x>... This can be casted to XML, which allows to generate a derived table returning all the numbers as rows. This happens in the second CTE calling the XQuery function .nodes().
The last query will do a full outer join - each with each. All rows, where there is at least one row without a corresponding row are to be kept.
But I assume, that this might not work with each and any situation (e.g. circular data)

Count the occurences of all individual values in a multivalued field in SQL Server

Features Impressive
A,B,C
D,C
A,D
B,C,D
This is a column in my database that contains multiple values that comes from combobox.
I want to count the number of occurrences of each value in this column so that I can generate a bar chart out of this reflecting how many people liked the specific features.
Output I want is
A- 2
B- 2
C- 3
D- 3
Please help me with this SQL query.
You have a very poor design. You should be storing individual values in a separate row in a junction table -- one row per whatever and value.
Given the data structure, here is a method to do what you want -- assuming that you have a lit of allowed values:
select av.feature, count(t.feature)
from AllowedValues av left join
tables t
on ',' + av.feature + ',' like '%,' + t.features + ',%'
group by av.feature;
If you don't have an explicit list of features, you can create one using a CTE, something like:
with AllowedValues as
select 'A' as feature union all
. . .
)
The performance of this query will be lousy. And, there is really no way to make it better without fixing the data structure.
So, I repeat. You should fix the data structure and use a junction table instead of storing a list as a string. In SQL, tables are for storing lists. Strings are for, well, storing strings.
As mentioned by others really this is poor design you should never store comma separated values in a single column.
Use a Split Function to split the comma separated values into individual rows then count the individual rows. Something like this.
;With CTE as
(
SELECT Split.a.value('.', 'VARCHAR(100)') SP_COL
FROM (SELECT Cast ('<M>' + Replace(feature, ',', '</M><M>') + '</M>' AS XML) AS Data
FROM [table]) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)
)
Select SP_COL,COUNT(1) as [COUNT]
FROM CTE
Group By SP_COL

SQL mass string manipulation

I'm working with an oracle DB and need to manipulate a string column within it. The column contains multiple email addresses in this format:
jgooooll#gmail.com;dhookep#gmail.com;amoore#outlook.com
What I want to do is take out anything that does not have '#gmail.com' at the end (in this example amoore#outlook.com would be removed) however amoore#outlook.com may be the first email in the next row of the column so in this way there is no real fixed format, the only format being that each address is seperated by a semi-colon.
Is there anyway of implementing this through one command to run through every row in the column and remove anything thats not #gmail.com? I'm not really sure if this kind of processing is possible in SQL. Just looking for your thoughts!!
Thanks a lot you guys. Look forward to hearing from you!
Applicable to Oracle 11g (11.2) onward only. Because listagg function is supported only in 11.2 onward. If you are using 10.1 onward up to 11.1, you can write your own string aggregate function or take this one.
with T1 as (
select 1 id, 'jhd#jk.com;jgooooll#gmail.com;dhookep#gmail.com;amoore#outlook.com' emails from dual union all
select 2 id, 'jhd#jk.com;jgooooll#gmail.com;dhookep#gmail.com;amoore#outlook.com' emails from dual
)
select id
, listagg(email, ';') within group(order by id) emails
from (select id
, regexp_substr(emails,'[^;]+', 1, rn) email
from t1
cross join (select rownum rn
from(select max (regexp_count(emails, '[^;]+')) ml
from t1
)
connect by level <= ml
)
)
where email like '%#gmail.com%'
group by id
Id Emails
--------------------------------------
1 dhookep#gmail.com;jgooooll#gmail.com
2 dhookep#gmail.com;jgooooll#gmail.com
Here is a Demo
This answer is actually for SQL Server, as that is what I know. That being said, perhaps having an example of how to do it in one system will give you an idea of how to do it in yours. Or maybe there is a way to convert the code into the same type of thing in Oracle.
First, the thought process: In SQL Server combining the FOR XML PATH and STUFF functionality allows you to make a comma separated list. I'm adding a WHERE Split.SplitValue LIKE ... clause into this to filter it to only gmail addresses. I'm cross applying this whole thing to the main table, and that turns it into a filtered email list. You could then further filter the main table to run this on a more targeted set of rows.
Second, the SQL Server implementation:
SELECT
*
FROM #Table Base
CROSS APPLY
(
SELECT
STUFF(
(SELECT
';' + Split.SplitValue AS [text()]
FROM dbo.fUtility_Split(Base.Emails, ';') Split
WHERE Split.SplitValue LIKE '%#gmail.com'
FOR XML PATH (''))
, 1, 1, '') Emails
) FilteredEmails
EDIT: I forgot to mention that this answer requires you have some sort of function to split a string column based on a separator value. If you don't have that already, then google for it. There are tons of examples.

Merging of fields using xml path in sql server, display comma where NULL

I have a table in which two or more different dates are listed for a single id. I want to merge all the dates for a single id. Example code is as below.
create table number(id nvarchar(255), billdate nvarchar(255))
insert into number(id,billdate) values ('56465','12/10/2011'),('56465','02/11/2011'),
('46462','12/09/2009'),('46462','12/06/2010'),('32169','12/22/2009'),
('32169','12/31/2011'),('86835','12/10/2010'),('86835','22-Jan-2010'),
('65641',''),('65641','12-Aug-2009'),('22458','25-Aug-2007'),('22458','')
For merging the rows I am using xml path as below
select Main.id,LEFT(Main.billdate,nullif(LEN(Main.billdate)-1,-1)) as "billdate"
from (select distinct ST2.id,(SELECT ST1.billdate + ',' AS [text()]
from NUMBER ST1 where ST1.id=ST2.id ORDER BY ST1.id FOR XML PATH (''))billdate
from NUMBER ST2)[Main]
It is working perfectly for this sample data, But the Problem is I have huge data, and when I apply this XML path code a comma is not displayed if a date is NULL, like for the id 65641. Its important for me to display a comma in the place of NULL. Where am I going wrong? Can anyone suggest why it's not displaying a comma in the place of NULL?
I'm not sure I perfectly understand you, since the putatively NULL value for 65641 is actually a blank. To treat NULL values like blanks, you can use this:
select Main.id,LEFT(Main.billdate,nullif(LEN(Main.billdate)-1,-1)) as "billdate"
from
(
select distinct ST2.id,
(
SELECT ISNULL(ST1.billdate + ',', ',') AS [text()]
from NUMBER ST1
where ST1.id=ST2.id
ORDER BY ST1.id
FOR XML PATH ('')
) billdate
from NUMBER ST2
)[Main]
The other issue you might be having is that if there is only a single blank/NULL value for a given id, you don't get even a single comma for it. This is happening because a single blank value only generates a single comma, which is then stripped off by your LEFT statement. You can make it leave single commas alone by changing it like so:
select Main.id,LEFT(Main.billdate,nullif(LEN(Main.billdate)-CASE WHEN LEN(Main.billdate) = 1 THEN 0 ELSE 1 END,-1)) as "billdate"
from
(
select distinct ST2.id,
(
SELECT ISNULL(ST1.billdate + ',', ',') AS [text()]
from NUMBER ST1
where ST1.id=ST2.id
ORDER BY ST1.id
FOR XML PATH ('')
) billdate
from NUMBER ST2
)[Main]
You still have issues, one of which is that you have no explicit ordering of dates, but I hope that covers the problems that you have. If not, clarify and I'll attempt to help some more.