SSIS transformation (almost like a pivot) - sql

I have the following data coming in to SSIS
Set Value
--- -------
1 One
1 Two
1 Three
2 Four
2 Five
2 Six
I want to transform it to read
Set ValueList
--- -------
1 One, Two, Three
2 Four, Five, Six
How do I do this in SSIS?

I used the Script Component to do the string concatenation across rows
string TagId = "-1";
string TagList = "";
bool IsFirstRow = true;
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (Row.TAGSId.ToString() == TagId)
{
TagList += Row.TAG + ",";
}
else
{
if (IsFirstRow)
{
Output0Buffer.AddRow();
IsFirstRow = false;
}
TagId = Row.TAGSId.ToString();
TagList = Row.TAG.ToString() + ",";
}
Output0Buffer.TagId = int.Parse(TagId);
Output0Buffer.TagList = TagList;
Output0Buffer.TagLength = TagList.Length;
//variable used in subsequent queries
this.Variables.TagList = TagList;
}

There is a pivot task in the data flow transformations. You could try it, but I'll warn you that we have been less than hapy with it's implementation.
Alternatively, you could use the dataflow to put the data into a staging table, and pivot using SQL or do the pivot in the SQL you use to create the incoming data source. If you want to do it in SQl code, this might help:
select 1 as Item
into #test
union select 2
union select 3
union select 4
union select 5
select STUFF((SELECT ', ' + cast(Item as nvarchar)
FROM #test
FOR XML PATH('')), 1, 1, '')

Related

Entity Framework cast selected fields to varchar and concat them

I want do below query in Entity Framework
select
cast(p_min as varchar) + '' + cast(p_max as varchar)
from
user_behave_fact
where
beef_dairy_stat = 'True' and param_id = 2
group by
p_min,p_max
go
Since you have not mentioned a language, I am writing code in C#.
Try this:
using (var dbContext = new DatabaseContext())
{
var output = (
from fact in dbContext.user_behave_facts
where fact.beef_dairy_stat == "True" && fact.param_id == 2
group fact by new {fact.p_min, fact.p_max} in grp
select new
{
ColName = grp.Key.p_min.ToString() + " " + grp.Key.p_max.ToString()
}
).ToList();
}
.ToList() can be changed according to your expectations

Converting comma-separated value to in subquery

I have following query :
Set #OrderStatuses = 'Cancelled,Complete'
Select *
From TableName
Where Status in (#OrderStatuses)
The Status is what I am receiving externally and it fails to fetch any result as, what we actually need to process is:
Select *
From TableName
Where Status in ('Cancelled', 'Complete');
#OrderStatuses can contain any number of comma-separated values.
One method is dynamic SQL:
declare #sql nvarchar(max);
set #sql = 'Select * from TableName Where Status in (#list)';
set #sql = replace(#sql, '#list', '#OrderStatuses');
exec sp_executesql #sql;
Note: You cannot pass a list in as a parameter.
You can also use like:
Select *
from TableName
Where ',' + #OrderStatuses + ',' like '%,' + Status + ',%';
However, this cannot use an index for the the comparison.
You will need to use split string function and process the rest..
;with cte
as
(
select * from split_strings(#orderstatus,',')
)
select * from table where status in (select item from cte)
You must add the list as a Table-Valued Parameter
Here are two methods I use to convert any IEnumerable list of items (in this case, integers) into a Table Valued Parameter. You have to have created/defined a User Defined type (UDT) for the resultant table in your database. In the example below the UDT is named dbo.keyIds and is created (once) with this SQL:
CREATE TYPE [dbo].[KeyIds]
AS TABLE(pkId int NOT NULL, PRIMARY KEY CLUSTERED
( [pkId] ASC) WITH (IGNORE_DUP_KEY = OFF)
)
The c# code is:
public class DbParamList : List<IDbDataParameter>
{
public void AddSQLTableParm<T>(
string parmName, IEnumerable<T> values)
{
var parm = new SqlParameter(parmName, CreateDataTable(values))
{
SqlDbType = SqlDbType.Structured,
TypeName = "dbo.keyIds"
};
Add(parm);
}
internal static DataTable CreateDataTable<T>(IEnumerable<T> values)
{
var dt = new DataTable();
var props = typeof (T).GetProperties();
if (props.Length > 0)
{
foreach (var col in props)
dt.Columns.Add(col.Name, col.PropertyType);
foreach (var id in values)
{
var newRow = dt.NewRow();
foreach (var prop in id.GetType().GetProperties())
newRow[prop.Name] = prop.GetValue(id, null);
dt.Rows.Add(newRow);
}
}
else
{
dt.Columns.Add("ids");
foreach (var id in values) dt.Rows.Add(id);
}
return dt;
}
}

Hive combine column values based upon condition

I was wondering if it is possible to combine column values based upon a condition. Let me explain...
Let say my data looks like this
Id name offset
1 Jan 100
2 Janssen 104
3 Klaas 150
4 Jan 160
5 Janssen 164
An my output should be this
Id fullname offsets
1 Jan Janssen [ 100, 160 ]
I would like to combine the name values from two rows where the offset of the two rows are no more apart then 1 character.
My question is if this type of data manipulation is possible with and if it is could someone share some code and explaination?
Please be gentle but this little piece of code return some what what I want...
ArrayList<String> persons = new ArrayList<String>();
// write your code here
String _previous = "";
//Sample output form entities.txt
//USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,10660
//USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,10685
File file = new File("entities.txt");
try {
//
// Create a new Scanner object which will read the data
// from the file passed in. To check if there are more
// line to read from it we check by calling the
// scanner.hasNextLine() method. We then read line one
// by one till all line is read.
//
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
if(_previous == "" || _previous == null)
_previous = scanner.nextLine();
String _current = scanner.nextLine();
//Compare the lines, if there offset is = 1
int x = Integer.parseInt(_previous.split(",")[3]) + Integer.parseInt(_previous.split(",")[4]);
int y = Integer.parseInt(_current.split(",")[4]);
if(y-x == 1){
persons.add(_previous.split(",")[1] + " " + _current.split(",")[1]);
if(scanner.hasNextLine()){
_current = scanner.nextLine();
}
}else{
persons.add(_previous.split(",")[1]);
}
_previous = _current;
}
} catch (Exception e) {
e.printStackTrace();
}
for(String person : persons){
System.out.println(person);
}
Working of this piece sample data
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Richard,PERSON,7,2732
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,2740
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,2756
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,3093
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,3195
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,3220
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,10660
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,10685
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Lea,PERSON,3,10858
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Lea,PERSON,3,11063
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Ken,PERSON,3,11186
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,11234
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,17073
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Lea,PERSON,3,17095
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Stephanie,PERSON,9,17330
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Putt,PERSON,4,17340
Which produces this output
Richard Marottoli
Marottoli
Marottoli
Marottoli
Berkowitz
Berkowitz
Marottoli
Lea
Lea
Ken
Marottoli
Berkowitz
Lea
Stephanie Putt
Kind regards
Load the table using below create table
drop table if exists default.stack;
create external table default.stack
(junk string,
name string,
cat string,
len int,
off int
)
ROW FORMAT DELIMITED
FIELDS terminated by ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 'hdfs://nameservice1/....';
Use below query to get your desired output.
select max(name), off from (
select CASE when b.name is not null then
concat(b.name," ",a.name)
else
a.name
end as name
,Case WHEN b.off1 is not null
then b.off1
else a.off
end as off
from default.stack a
left outer join (select name
,len+off+ 1 as off
,off as off1
from default.stack) b
on a.off = b.off ) a
group by off
order by off;
I have tested this it generates your desired result.

Aggregate replace in SQL Server?

What I'm trying to achieve is to make dynamic a series of replacements that have to be performed on a certain field. (To make things even easier, I want in fact to remove data, so I'll be always comparing with
Say that sometimes I will have to do just one replacement:
... REPLACE(myField, stringToRemove, '')
Sometimes, I will need two replacements:
... REPLACE(REPLACE(myField, stringToRemove, ''), anotherStringToRemove, '')
However, I need to make this dynamic and I do not know in advance how many of those values I'll have, and so, how many replacements (removals) I'll have to do.
I tried searching for aggregate string manipulation functions and, of course, there's none. I also know that this can be achieved through a CLR aggregate function but I don't have the possibility of using it.
Any ideas?
You can setup a table variable with FromValue and ToValue and use a while loop to do the replacements.
-- Table to replace in
declare #T table
(
Value varchar(50)
)
insert into #T values
('first second third'),
('first second third')
-- Table with strings to replace
declare #Rep table
(
ID int identity primary key,
FromValue varchar(50),
ToValue varchar(50)
)
insert into #Rep values
('second', 'fourth'),
('third', 'fifth')
declare #ID int
select #ID = max(ID)
from #Rep
while #ID > 0
begin
update #T
set Value = replace(Value, FromValue, ToValue)
from #Rep
where ID = #ID
set #ID -= 1
end
select *
from #T
Result:
Value
-------------------
first fourth fifth
first fourth fifth
If you only want to query the values you can do something like this.
;with C as
(
select 0 as ID,
Value,
0 as Lvl
from #T
union all
select R.ID,
cast(replace(C.Value, R.FromValue, R.ToValue) as varchar(50)),
Lvl + 1
from #Rep as R
inner join C
on C.ID + 1 = R.ID
)
select top 1 with ties Value
from C
order by Lvl desc
Once you implement the CLR aggregate function below, you can do:
SELECT dbo.ReplaceAgg(t.[text], w.badword, w.goodword) // call CLR aggregate function
FROM [Texts] t CROSS JOIN BadWords w
GROUP BY t.[text]
CLR aggregate function in C#
/// <summary>
/// Allows to apply regex-replace operations to the same string.
/// For example:
/// SELECT dbo.ReplaceAgg(t.[text], w.badpattern, "...")
/// FROM [Texts] t CROSS JOIN BadPatterns w
/// GROUP BY t.[text]
/// </summary>
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.UserDefined,
IsInvariantToDuplicates = true, IsInvariantToOrder = false,
IsInvariantToNulls = true, MaxByteSize = -1)]
public class RegexReplaceAgg : IBinarySerialize
{
private string str;
private string needle;
private string replacement;
public void Init()
{
str = null;
needle = null;
replacement = null;
}
public void Accumulate(SqlString haystack, SqlString needle, SqlString replacement)
{
// Null values are excluded from aggregate.
if (needle.IsNull) return;
if (replacement.IsNull) return;
if (haystack.IsNull) return;
str = str ?? haystack.Value;
this.needle = needle.Value;
this.replacement = replacement.Value;
str = Regex.Replace(str, this.needle, this.replacement, RegexOptions.Compiled | RegexOptions.CultureInvariant);
}
public void Merge(RegexReplaceAgg group)
{
Accumulate(group.Terminate(), new SqlString(needle), new SqlString(replacement));
}
public SqlString Terminate() => new SqlString(str);
public void Read(BinaryReader r)
{
str = r.ReadString();
needle = r.ReadString();
replacement = r.ReadString();
}
public void Write(BinaryWriter w)
{
w.Write(str);
w.Write(needle);
w.Write(replacement);
}
}
You might have to write a scalar function to which you pass the original string, and enough information for it to know which strings to remove, and have it loop through them and return the result of the set of replacements.

LINQ to SQL C# COALESCE

Given the following table:
Length | Width | Color | ID
===========================
18 | 18 | blue | 1
---------------------------
12 | 12 | red | 1
---------------------------
I want to produce a single column/row:
SIZES
=================
18 x 18, 12 x 12,
I can do this in SQL as follows:
DECLARE #SIZES VARCHAR(8000)
SELECT #SIZES = COALESCE(#SIZES, '') + Convert(varchar(80), [Length]) + ' x ' +
Convert(varchar(80), [Width]) + ', '
FROM table
where ID = 1
GROUP BY [Length], [Width]
ORDER BY [Length], [Width]
SELECT SIZES = #SIZES
But I cannot figure out how to do this in LINQ.
The closest I got was:
from t in table
where id == 1
group t by new {
t.Length,
t.Width
} into g
orderby g.Key.Length, g.Key.Width
select new {
SIZES = (Convert.ToInt32(g.Key.Length) + " x " +
Convert.ToInt32(g.Key.Width) + ", ")
}
Which produces one column and two rows:
SIZES
========
18 x 18,
12 X 12,
The converts are unimportant to the problem. The columns are defined as floats though all are integers. The key is the COALESCE function I cannot figure out how to do that in LINQ.
Try ?? (null coalesce operator) like:
t.Length ?? 0
I don't think LINQ to SQL supports this T-SQL trick. The COALESCE isn't really the issue (as Mehrdad points out the equivalent in C# is ??) -- it's the fact that SQL Server aggregates each result via string concatenation into the variable #SIZES. AFAIK LINQ to SQL can't construct this type of query.
This will yield your desired result, but the string concatenation is performed on your side, not on the SQL server side. That probably doesn't matter.
var query =
from t in table
where id == 1
group t by new {
t.Length,
t.Width
} into g
orderby g.Key.Length, g.Key.Width
select new {
SIZES = (Convert.ToInt32(g.Key.Length) + " x " +
Convert.ToInt32(g.Key.Width) + ", ")
};
var result = string.Join(string.Empty, query.Select(r => r.SIZES).ToArray());
I would just return the int sizes from SQL and do the string building client-side:
var query =
from t in table
where id == 1
group t by new {
t.Length,
t.Width
} into g
orderby g.Key.Length, g.Key.Width
select g.Key;
var sizeStrings = from s in query.AsEnumerable()
select string.Format("{0} x {1}", s.Length, s.Width);
var result = string.Join(", ", sizeStrings.ToArray());
You could use the .Aggregate function, like so:
(from t in table
where id == 1
group t by new {
t.Length,
t.Width
} into g
orderby g.Key.Length, g.Key.Width
select new {
SIZES = (Convert.ToInt32(g.Key.Length) + " x " +
Convert.ToInt32(g.Key.Width) + ", ")
}).Aggregate((x,y) => x + y)
This should kick out a single string, like you want.
Aggregate just internally maintains the exact same variable you had defined in the SQL, just implicitly.