Compress data on a key word in sql server

Compress data on a key word in sql server - sql

I have a test table like this-
Field
A
B
C
END
D
E
F
END
G
H
I
END
I want to compress this data on key word "END" in this format-
Field
A|B|C
D|E|F
G|H|I
Tried using Monarch Pro but could not get the desired results. I really can't think of a way to start on this in SQL. Please help.

This might help.
DECLARE #WORD VARCHAR(300)
SELECT #WORD = COALESCE(#WORD + '|','') + Field FROM [YourTable]
SELECT #WORD = REPLACE(#WORD, 'END', '$')
SELECT #WORD Field INTO #A
;WITH c(FieldOutput, Field) as (
select CAST(LEFT(Field, CHARINDEX('$',Field+'$')-2) AS VARCHAR(100)),
STUFF(Field, 1, CHARINDEX('$',Field+'$')+1, '')
from #A
where ISNULL(Field, '') <> ''
union all
select CAST(LEFT(Field, CHARINDEX('$',Field+'$')-2) AS VARCHAR(100)),
STUFF(Field, 1, CHARINDEX('$',Field+'$')+1, '')
from c
where ISNULL(Field, '') <> ''
)
select FieldOutput AS Field
from c
EDIT
Created a fiddle to test this out.

High level approach:
Use a cursor to step through the table row by row.
Append to a temporary variable until the row reads 'END' then write the contents of the temp variable to a row on a different table.
Iterate through until you reach the end of the table.
There is probably a more elegant, non-cursor based way of doing this, but this will get the job done.

Related

Efficient way to merge alternating values from two columns into one column in SQL Server

I have two columns in a table. I want to merge them into a single column, but the merge should be done taking alternate characters from each columns.
For example:
Column A --> value (1,2,3)
Column B --> value (A,B,C)
Required result - (1,A,2,B,3,C)
It should be done without loops.

You need to make use of the UNION and get a little creative with how you choose to alternate. My solution ended up looking like this.
SELECT ColumnA
FROM Table
WHERE ColumnA%2=1
UNION
SELECT ColumnB
FROM TABLE
WHERE ColumnA%2=0
If you have an ID/PK column that could just as easily be used, I just didn't want to assume anything about your table.
EDIT:
If your table contains duplicates that you wish to keep, use UNION ALL instead of UNION

Try This;
SELECT [value]
FROM [Table]
UNPIVOT
(
[value] FOR [Column] IN ([Column_A], [Column_B])
) UNPVT

If you have SQL 2016 or higher you can use:
SELECT QUOTENAME(STRING_AGG (cast(a as varchar(1)) + ',' + b, ','), '()')
FROM test;
In older versions, depending on how much data you have in your tables you can also try:
SELECT QUOTENAME(STUFF(
(SELECT ',' + cast(a as varchar(1)) + ',' + b
FROM test
FOR XML PATH('')), 1, 1,''), '()')
Here you can try a sample
http://sqlfiddle.com/#!18/6c9af/5

with data as (
select *, row_number() over order by colA) as rn
from t
)
select rn,
case rn % 2 when 1 then colA else colB end as alternating
from data;

The following SQL uses undocumented aggregate concatenation technique. This is described in Inside Microsoft SQL Server 2008 T-SQL Programming on page 33.
declare #x varchar(max) = '';
declare #t table (a varchar(10), b varchar(10));
insert into #t values (1,'A'), (2,'B'),(3,'C');
select #x = #x + a + ',' + b + ','
from #t;
select '(' + LEFT(#x, LEN(#x) - 1) + ')';

Select statement that concatenates the first character after every '/' character in a column

So I am trying to write a query which, among other things, brings back the first character in a Varchar field, then returns the first character which appears after each / character throughout the rest of the field.
The field I am refrering too will contain a group of last names, separated by a '/'. For example: Fischer-Costello/Korbell/Morrison/Pearson
For the above example, I would want my select statement to return: FKMP.
So far, I have only been able to get my code to return the first character + the first character after the FIRST (and only the first) '/' character.
So for the above example input, my select statement would return: FK
Here is the code that I have written so far:
select rp.CONTACT_ID, ra.TRADE_REP, c.FIRST_NAME, c.LAST_NAME,
UPPER(LEFT(FIRST_NAME, 1)) + SUBSTRING(c.first_name,CHARINDEX('/',c.first_name)+1,1) as al_1,
UPPER(LEFT(LAST_NAME, 1)) + SUBSTRING(c.LAST_name,CHARINDEX('/',c.LAST_name)+1,1) as al_2
from dbo.REP_ALIAS ra
inner join dbo.REP_PROFILE rp on rp.CONTACT_ID = ra.CONTACT_ID
inner join dbo.CONTACT c on rp.CONTACT_ID = c.CONTACT_ID
where
rp.CRD_NUMBER is null and
ra.TRADE_REP like '%DNK%' and
(c.LAST_NAME like '%/%' or c.FIRST_NAME like '%/%') and
ra.TRADE_FIRM in
(
'xxxxxxx',
'xxxxxxx'
)
If you read the code, it's obvious that I am attempting to perform the same concatenation on the first_name column as well. However, I realize that a solution which will work for the Last_name column (used in my example), will also work for the first_name column.
Thank you.

Some default values
DECLARE #List VARCHAR(50) = 'Fischer-Costello/Korbell/Morrison/Pearson'
DECLARE #SplitOn CHAR(1) = '/'
This area just splits the string into a list
DECLARE #RtnValue table
(
Id int identity(1,1),
Value nvarchar(4000)
)
While (Charindex(#SplitOn, #List)>0)
Begin
Insert Into #RtnValue (value)
Select
Value = ltrim(rtrim(Substring(#List,1,Charindex(#SplitOn,#List)-1)))
Set #List = Substring(#List,Charindex(#SplitOn,#List)+len(#SplitOn+',')-1,len(#List))
End
Insert Into #RtnValue (Value)
Select Value = ltrim(rtrim(#List))
Now lets grab the first character of each name and stuff it back into a single variable
SELECT STUFF((SELECT SUBSTRING(VALUE,1,1) FROM #RtnValue FOR XML PATH('')),1,0,'') AS Value
Outputs:
Value
FKMP

Here is another way to do this would be a lot faster than looping. What you need is a set based splitter. Jeff Moden at sql server central has one that is awesome. Here is a link to the article. http://www.sqlservercentral.com/articles/Tally+Table/72993/
Now I know you have to signup for an account to view this but it is free and the logic in that article will change the way you look at data. You might also be able to find his code posted if you search for DelimitedSplit8K.
At any rate, here is how you could implement this type of splitter.
declare #Table table(ID int identity, SomeValue varchar(50))
insert #Table
select 'Fischer-Costello/Korbell/Morrison/Pearson'
select ID, STUFF((select '' + left(x.Item, 1)
from #Table t2
cross apply dbo.DelimitedSplit8K(SomeValue, '/') x
where t2.ID = t1.ID
for xml path('')), 1, 0 , '') as MyResult
from #Table t1
group by t1.ID

Subquery results in comma separated format

I am trying to write a sub-query, that stores all the results in a single column separated by a comma. My code looks something like this
SELECT column1,
column2,
CourseRequests=(SELECT INNERCourseRequests =
COALESCE(CASE
WHEN innercourserequests
= '' THEN
crse_name
ELSE innercourserequests
+ ',' +
crse_name
END, '')
FROM tor_studentcrserequest SCR
WHERE SCR.stud_pk = MS.tt_stud_pk
AND SCR.delt_flag = 0),
column4
FROM tbl_mainstudent MS
When I try to execute the stored procedure, I get an error saying Invalid column name 'INNERCourseRequests'.
What is the correct way to do this?
TSR is a reference to table from the outer column
EDIT: I changed it to:
CourseRequests=(SELECT INNERCourseRequests =
COALESCE(case when #INNERCourseRequests='' THEN CRSE_NAME ELSE
#INNERCourseRequests+','+CRSE_NAME end,'')
However, now I"m getting an error saying subquery returned more than 1 result which is expected.

You can use FOR XML along with a few REPLACEs as shown here:
SELECT column1,
column2,
CourseRequests=COALESCE(
REPLACE(REPLACE(REPLACE((
SELECT crse_name
FROM (
SELECT 1, 22, 'first', 0
UNION ALL SELECT 2, 22, 'second', 1
UNION ALL SELECT 3, 22, 'third', 0
UNION ALL SELECT 4, 555, 'first', 1
) SCR (id, stud_pk, crse_name, delt_flag)
WHERE SCR.stud_pk = MS.tt_stud_pk
AND SCR.delt_flag = 0
FOR XML PATH('')
),'</crse_name><crse_name>', ','),
'</crse_name>', ''), -- remove end tag
'<crse_name>', ''), -- remove beginning tag
''), -- optional COALESCE to ensure no NULLs
column4
FROM (
SELECT 1, 'a', 'b', '2014-01-01'
UNION ALL SELECT 22, 'd', 'e', '2014-02-02'
) MS (tt_stud_pk, column1, column2, column4)
Output:
column1 column2 CourseRequests column4
a b 2014-01-01
d e first,third 2014-02-02
Explanation:
The FOR XML PATH('') flattens the result of the sub-query to be:
<crse_name>first</crse_name><crse_name>third</crse_name>
The first REPLACE converts just the end-tag/beginning-tag combinations that are only found between values (i.e. where the commas go)
The second REPLACE removes the ending tag (can't be done before the first REPLACE)
The third REPLACE removes the beginning tag (can't be done before the first REPLACE)
Note:
There might be a slightly more elegant way to do the XML stuff so you don't need all of the REPLACEs, but not sure and this does work.

I'm pretty sure you can't do this with a single query, and I'm not entirely certain the tactic I've
come up with is a legitimate tactic--meaning, if it is undocumented, a future version of SQL might
not support this. With that said:
Start with the following:
DECLARE #List varchar(max)
SELECT #List = isnull(#List + ', ', '') + InnerCourseRequests
from tor_studentcrserequest
where stud_pk = <TestValue>
and delt_flag = 0
PRINT #List
This will generate a comma-delimited list of all InnerCourseRequests from the tor_studentcrserequest table for a single stud_pk.
Next, turn it into a function:
DROP FUNCTION phkTest
GO
CREATE FUNCTION phkTest (#stud_pk int) -- Change datatype, if not int
RETURNS varchar(max)
AS
BEGIN
DECLARE #List varchar(max)
SELECT #List = isnull(#List + ', ', '') + InnerCourseRequests
from tor_studentcrserequest
where stud_pk = #stud_pk
and delt_flag = 0
RETURN #List
END
GO
(Add a second parameter for delt_flag, if that might vary somehow)
And add that to a query:
SELECT distinct tt_stud_pk, dbo.phkTest(stud_pk)
from tbl_mainstudent
(I wrote all this using one of my tables, then cut-and-paste your table/columns in, so there may be some syntax issues to deal with.)
There may be ways to improve performance for big tables (OUTER APPLY, select distinct before calling the function, and so forth), and it's entirly likely that this might best be done via procedural code by whatever's querying the data in the first place.

replace value in varchar(max) field with join

I have a table that contains text field with placeholders. Something like this:
Row Notes
1. This is some notes ##placeholder130## this ##myPlaceholder##, #oneMore#. End.
2. Second row...just a ##test#.
(This table contains about 1-5k rows on average. Average number of placeholders in one row is 5-15).
Now, I have a lookup table that looks like this:
Name Value
placeholder130 Dog
myPlaceholder Cat
oneMore Cow
test Horse
(Lookup table will contain anywhere from 10k to 100k records)
I need to find the fastest way to join those placeholders from strings to a lookup table and replace with value. So, my result should look like this (1st row):
This is some notes Dog this Cat, Cow. End.
What I came up with was to split each row into multiple for each placeholder and then join it to lookup table and then concat records back to original row with new values, but it takes around 10-30 seconds on average.

You could try to split the string using a numbers table and rebuild it with for xml path.
select (
select coalesce(L.Value, T.Value)
from Numbers as N
cross apply (select substring(Notes.notes, N.Number, charindex('##', Notes.notes + '##', N.Number) - N.Number)) as T(Value)
left outer join Lookup as L
on L.Name = T.Value
where N.Number <= len(notes) and
substring('##' + notes, Number, 2) = '##'
order by N.Number
for xml path(''), type
).value('text()[1]', 'varchar(max)')
from Notes
SQL Fiddle
I borrowed the string splitting from this blog post by Aaron Bertrand

SQL Server is not very fast with string manipulation, so this is probably best done client-side. Have the client load the entire lookup table, and replace the notes as they arrived.
Having said that, it can of course be done in SQL. Here's a solution with a recursive CTE. It performs one lookup per recursion step:
; with Repl as
(
select row_number() over (order by l.name) rn
, Name
, Value
from Lookup l
)
, Recurse as
(
select Notes
, 0 as rn
from Notes
union all
select replace(Notes, '##' + l.name + '##', l.value)
, r.rn + 1
from Recurse r
join Repl l
on l.rn = r.rn + 1
)
select *
from Recurse
where rn =
(
select count(*)
from Lookup
)
option (maxrecursion 0)
Example at SQL Fiddle.
Another option is a while loop to keep replacing lookups until no more are found:
declare #notes table (notes varchar(max))
insert #notes
select Notes
from Notes
while 1=1
begin
update n
set Notes = replace(n.Notes, '##' + l.name + '##', l.value)
from #notes n
outer apply
(
select top 1 Name
, Value
from Lookup l
where n.Notes like '%##' + l.name + '##%'
) l
where l.name is not null
if ##rowcount = 0
break
end
select *
from #notes
Example at SQL Fiddle.

I second the comment that tsql is just not suited for this operation, but if you must do it in the db here is an example using a function to manage the multiple replace statements.
Since you have a relatively small number of tokens in each note (5-15) and a very large number of tokens (10k-100k) my function first extracts tokens from the input as potential tokens and uses that set to join to your lookup (dbo.Token below). It was far too much work to look for an occurrence of any of your tokens in each note.
I did a bit of perf testing using 50k tokens and 5k notes and this function runs really well, completing in <2 seconds (on my laptop). Please report back how this strategy performs for you.
note: In your example data the token format was not consistent (##_#, ##_##, #_#), I am guessing this was simply a typo and assume all tokens take the form of ##TokenName##.
--setup
if object_id('dbo.[Lookup]') is not null
drop table dbo.[Lookup];
go
if object_id('dbo.fn_ReplaceLookups') is not null
drop function dbo.fn_ReplaceLookups;
go
create table dbo.[Lookup] (LookupName varchar(100) primary key, LookupValue varchar(100));
insert into dbo.[Lookup]
select '##placeholder130##','Dog' union all
select '##myPlaceholder##','Cat' union all
select '##oneMore##','Cow' union all
select '##test##','Horse';
go
create function [dbo].[fn_ReplaceLookups](#input varchar(max))
returns varchar(max)
as
begin
declare #xml xml;
select #xml = cast(('<r><i>'+replace(#input,'##' ,'</i><i>')+'</i></r>') as xml);
--extract the potential tokens
declare #LookupsInString table (LookupName varchar(100) primary key);
insert into #LookupsInString
select distinct '##'+v+'##'
from ( select [v] = r.n.value('(./text())[1]', 'varchar(100)'),
[r] = row_number() over (order by n)
from #xml.nodes('r/i') r(n)
)d(v,r)
where r%2=0;
--tokenize the input
select #input = replace(#input, l.LookupName, l.LookupValue)
from dbo.[Lookup] l
join #LookupsInString lis on
l.LookupName = lis.LookupName;
return #input;
end
go
return
--usage
declare #Notes table ([Id] int primary key, notes varchar(100));
insert into #Notes
select 1, 'This is some notes ##placeholder130## this ##myPlaceholder##, ##oneMore##. End.' union all
select 2, 'Second row...just a ##test##.';
select *,
dbo.fn_ReplaceLookups(notes)
from #Notes;
Returns:
Tokenized
--------------------------------------------------------
This is some notes Dog this Cat, Cow. End.
Second row...just a Horse.

Try this
;WITH CTE (org, calc, [Notes], [level]) AS
(
SELECT [Notes], [Notes], CONVERT(varchar(MAX),[Notes]), 0 FROM PlaceholderTable
UNION ALL
SELECT CTE.org, CTE.[Notes],
CONVERT(varchar(MAX), REPLACE(CTE.[Notes],'##' + T.[Name] + '##', T.[Value])), CTE.[level] + 1
FROM CTE
INNER JOIN LookupTable T ON CTE.[Notes] LIKE '%##' + T.[Name] + '##%'
)
SELECT DISTINCT org, [Notes], level FROM CTE
WHERE [level] = (SELECT MAX(level) FROM CTE c WHERE CTE.org = c.org)
SQL FIDDLE DEMO
Check the below devioblog post for reference
devioblog post

To get speed, you can preprocess the note templates into a more efficient form. This will be a sequence of fragments, with each ending in a substitution. The substitution might be NULL for the last fragment.
Notes
Id FragSeq Text SubsId
1 1 'This is some notes ' 1
1 2 ' this ' 2
1 3 ', ' 3
1 4 '. End.' null
2 1 'Second row...just a ' 4
2 2 '.' null
Subs
Id Name Value
1 'placeholder130' 'Dog'
2 'myPlaceholder' 'Cat'
3 'oneMore' 'Cow'
4 'test' 'Horse'
Now we can do the substitutions with a simple join.
SELECT Notes.Text + COALESCE(Subs.Value, '')
FROM Notes LEFT JOIN Subs
ON SubsId = Subs.Id WHERE Notes.Id = ?
ORDER BY FragSeq
This produces a list of fragments with substitutions complete. I am not an MSQL user, but in most dialects of SQL you can concatenate these fragments in a variable quite easily:
DECLARE #Note VARCHAR(8000)
SELECT #Note = COALESCE(#Note, '') + Notes.Text + COALSCE(Subs.Value, '')
FROM Notes LEFT JOIN Subs
ON SubsId = Subs.Id WHERE Notes.Id = ?
ORDER BY FragSeq
Pre-processing a note template into fragments will be straightforward using the string splitting techniques of other posts.
Unfortunately I'm not at a location where I can test this, but it ought to work fine.

I really don't know how it will perform with 10k+ of lookups.
how does the old dynamic SQL performs?
DECLARE #sqlCommand NVARCHAR(MAX)
SELECT #sqlCommand = N'PlaceholderTable.[Notes]'
SELECT #sqlCommand = 'REPLACE( ' + #sqlCommand +
', ''##' + LookupTable.[Name] + '##'', ''' +
LookupTable.[Value] + ''')'
FROM LookupTable
SELECT #sqlCommand = 'SELECT *, ' + #sqlCommand + ' FROM PlaceholderTable'
EXECUTE sp_executesql #sqlCommand
Fiddle demo

And now for some recursive CTE.
If your indexes are correctly set up, this one should be very fast or very slow. SQL Server always surprises me with performance extremes when it comes to the r-CTE...
;WITH T AS (
SELECT
Row,
StartIdx = 1, -- 1 as first starting index
EndIdx = CAST(patindex('%##%', Notes) as int), -- first ending index
Result = substring(Notes, 1, patindex('%##%', Notes) - 1)
-- (first) temp result bounded by indexes
FROM PlaceholderTable -- **this is your source table**
UNION ALL
SELECT
pt.Row,
StartIdx = newstartidx, -- starting index (calculated in calc1)
EndIdx = EndIdx + CAST(newendidx as int) + 1, -- ending index (calculated in calc4 + total offset)
Result = Result + CAST(ISNULL(newtokensub, newtoken) as nvarchar(max))
-- temp result taken from subquery or original
FROM
T
JOIN PlaceholderTable pt -- **this is your source table**
ON pt.Row = T.Row
CROSS APPLY(
SELECT newstartidx = EndIdx + 2 -- new starting index moved by 2 from last end ('##')
) calc1
CROSS APPLY(
SELECT newtxt = substring(pt.Notes, newstartidx, len(pt.Notes))
-- current piece of txt we work on
) calc2
CROSS APPLY(
SELECT patidx = patindex('%##%', newtxt) -- current index of '##'
) calc3
CROSS APPLY(
SELECT newendidx = CASE
WHEN patidx = 0 THEN len(newtxt) + 1
ELSE patidx END -- if last piece of txt, end with its length
) calc4
CROSS APPLY(
SELECT newtoken = substring(pt.Notes, newstartidx, newendidx - 1)
-- get the new token
) calc5
OUTER APPLY(
SELECT newtokensub = Value
FROM LookupTable
WHERE Name = newtoken -- substitute the token if you can find it in **your lookup table**
) calc6
WHERE newstartidx + len(newtxt) - 1 <= len(pt.Notes)
-- do this while {new starting index} + {length of txt we work on} exceeds total length
)
,lastProcessed AS (
SELECT
Row,
Result,
rn = row_number() over(partition by Row order by StartIdx desc)
FROM T
) -- enumerate all (including intermediate) results
SELECT *
FROM lastProcessed
WHERE rn = 1 -- filter out intermediate results (display only last ones)

Push all rows into single row and column

I have the following query:
SELECT
'' + CONVERT(VARCHAR(MAX),c.ClientId) + ','
FROM [dbo].[tblClient] c
This returns 17,000 + rows. Is there a way to make all these rows return as 1 value? For example:
6A7A24CD-061C-4653-9790-882D90F81E1D,0980722E-6E96-4498-B3BB-BFB4CA60EAC6,etc etc etc.
I am trying to use this as a parameter for testing.

does this work for you?
DECLARE #v VARCHAR(MAX)
SELECT #v = ''
SELECT
#v = #v + CONVERT(VARCHAR(MAX),c.ClientId) + ','
FROM [dbo].[tblClient] c
WHERE c.ClientId IS NOT NULL
SELECT #v
Note: Just be aware that if you add an ORDER BY that it is not guaranteed to sort it, in that case use xml path as shown in Remus' answer
See also: Concatenate Values From Multiple Rows Into One Column Ordered

The article covers a number of techniques at your disposal: Concatenating Row Values in Transact-SQL. My favorite technique is the black-box XML method:
SELECT cast(c.ClientId as varchar(20)) + ','
FROM [dbo].[tblClient] c
for xml path(''), type;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Compress data on a key word in sql server - sql

I have a test table like this- Field A B C END D E F END G H I END I want to compress this data on key word "END" in this format- Field A|B|C D|E|F G|H|I Tried using Monarch Pro but could not get the desired results. I really can't think of a way to start on this in SQL. Please help.

Related

Efficient way to merge alternating values from two columns into one column in SQL Server

Select statement that concatenates the first character after every '/' character in a column

Subquery results in comma separated format

replace value in varchar(max) field with join

Push all rows into single row and column

Categories

Resources