.NET Convert the contents of a DataTable to another DataTable with a different schema - sql

I have a program where the user will have the option to map a variety of data source with an unpredictable column schema. So they might have a SQL Server database with 10 fields or an excel file with 20 - the names can be anything and data types of the fields can be a mixture of text and numeric, dates, etc.
The user then has to provide a mapping of what each field means. So column 4 is a "LocName", column 2 is a "LocX", column 1 is a "LocDate", etc. The names and data types that the user is presented as options to map to is well defined by a DataSet DataTable (XSD xchema file).
For example, if the source contains data formatted like this:
User Column 1: "LocationDate" of type string
User Column 2: "XCoord" of type string
User Column 3: "YCoord" of type string
User Column 4: "LocationName" of type int
and the user provides a mapping that would require that translates to this for the Application required DataTable:
Application Column "LocName" of type string = Column **4** of user table
Application Column "LocX" of type double = Column **2** of user table
Application Column "LocY" of type double = Column **3** of user table
Application Column "LocDate" of type datetime = Column **1** of user table
I have routines that connect to the source and pull out the data for a user query in "raw" format as a DataTable - so it takes the schema of the source.
My question is, what is the best way to then "transform" the data from the raw DataTable to the required application DataTable bearing in mine that this projection has to account for type conversions?
A foreach would obviously work but that seems like brute force since it will have to account for the data types with every loop on each row. Is the a "slick" way to do it with LINQ or ADO.NET?

I would normally do in select that "looks like" the destination table, but with data from the source table. You would apply the data conversions also as required.
Select
Cast (LocationNameLocationName As varChar(...) As LocName
, LocX As XCoord
, ...
From SourceTable
Hard to describe in a simple answer. what I've done in the past is issue an "empty" query like "Select * From sourcetable Where 1=0" which returns no rows but makes all the columns and their types available in the result set. You can cycle through the column ADO objects to get the type of each. You can then use that info to dynamically build a real SQL statement with the conversions.
You still have a lot of logic to decide the conversion, but they all happen as you're building the statement, not as the table is being read. You still have to say in code "if the source column is Integer and the destination column is character, then I want to generate into the select ', Cast ( as Varchar) '"
When you finish building the text of the select, you have a select you can run in ADO to get the rows and the actual move becomes as simple as read/write with the field coming in just as you want them. You can also use that select for an "Insert into Select ...".
Hope this makes sense. The concept is easier to do than to describe.

Related

updating one table in database 1 to a table in database 2 receiving converting data error

I have two tables in different databases. The table to update is jomast in the database where the SQL code lives. The second table is in Scheduling_data DBO as testtable.
Additionally, I know that once I get the data from the source table into the correct format that I will need to use wildcards as the incoming data is stated as, e.g. 32, while the receiving table has the data as a varchar10 showing as, e.g. 00031-0000.
So I can do one of two things, either cast the change in the code, listed below, as the link or create an additional column in the source testtable and write code to change the incoming column to another column changing the format from; 31 to 00031-000.
Here is my code that is erroring out with a 8114 message not able to convert varchar to float.
update jomast
set frel_dt = T2.releasedate
from Scheduling_Data.dbo.testtable as T2
where cast (jomast.fjobno as varchar(20)) = T2.job
I realized that my cast was in the wrong location. Moved it to the T2 data and it worked. Would like help on the other part of this post though.

How to select LRAW from DB Table?

I have the following code:
SELECT S~CLUSTD AS ZZCLUSTD
INTO CORRESPONDING FIELDS OF TABLE #lt_viqmel_iflos
FROM viqmel AS v
LEFT OUTER JOIN stxl AS S
ON s~tdobject = #lv_qmel
AND s~tdname = v~qmnum
Select statement generates following short dump:
Only the prefixed length field can be used to read from the LRAW field or
LCHR field S~CLUSTD.
Internal table lt_viqmel_iflos is type viqmel_iflos(DB view which contains DB table QMEL) to which I appended ZZCLUSTD type char200.
The problem is that I cannot make ZZCLUSTD type LRAW in QMEL because I get the following error:
So my only option (that I am aware of) remains to select into char200 the first 200 characters of LRAW.
Is this even possible?
Or is there another way to select LRAW data?
I found the info about the topic, but unfortunately I can't adapt it to my scenario:read LRAW data
In fact, there are two questions here.
The first one is the activation error of table QMEL:
Field ZZCLUSTD does not have a preceding length field of type INT4
A DDIC table containing a column of type LCHR and LRAW, requires that it's always immediately preceded with a column of type INT2 or INT4 (although the message says only INT4).
The second question is about how to read such a field. Both columns must always be read at the same time, and the INT2/INT4 column must be "read before" the LCHR/LRAW column. The only reference I could find to explain this restriction is in the note 302788 - LCHR/LRAW fields in logical cluster tables.
The INT2 column of STXL table being named CLUSTR, the following code works:
TYPES: BEGIN OF ty_viqmel_iflos,
clustr TYPE stxl-clustr, "INT2
zzclustd TYPE stxl-clustd, "LCHR
END OF ty_viqmel_iflos.
DATA lt_viqmel_iflos TYPE TABLE OF ty_viqmel_iflos.
SELECT S~CLUSTR, S~CLUSTD AS ZZCLUSTD
INTO CORRESPONDING FIELDS OF TABLE #lt_viqmel_iflos
FROM viqmel AS v
INNER JOIN stxl AS S
ON s~tdname = v~qmnum
UP TO 100 ROWS.
NB: there is a confusion in your question, where you refer to both CLUSTD from STXL and ZZCLUSTD from QMEL. I don't understand what you are trying to achieve exactly.
NB: if you want to read the texts from the table STXL, there's another solution by calling the function module READ_TEXT_TABLE, or READ_MULTIPLE_TEXTS if you prefer. They were made available by the note 2261311. In case you don't have or can't install these function modules, you may try this gist which does the same thing. It also contains a reference to another discussion.
NB: for information, to be more precise, LRAW contains bytes, not characters, and for data clusters (case of STXL), these bytes correspond to any values (characters in the case of STXL) zipped with the statement EXPORT and are to be unzipped with IMPORT`.

Query a database based on result of query from another database

I am using SSIS in VS 2013.
I need to get a list of IDs from 1 database, and with that list of IDs, I want to query another database, ie SELECT ... from MySecondDB WHERE ID IN ({list of IDs from MyFirstDB}).
There is 3 Methods to achieve this:
1st method - Using Lookup Transformation
First you have to add a Lookup Transformation like #TheEsisia answered but there are more requirements:
In the Lookup you Have to write the query that contains the ID list (ex: SELECT ID From MyFirstDB WHERE ...)
At least you have to select one column from the lookup table
These will not filter rows , but this will add values from the second table
To filter rows WHERE ID IN ({list of IDs from MyFirstDB}) you have to do some work in the look up error output Error case there are 2 ways:
set Error handling to Ignore Row so the added columns (from lookup) values will be null , so you have to add a Conditional split that filter rows having values equal NULL.
Assuming that you have chosen col1 as lookup column so you have to use a similar expression
ISNULL([col1]) == False
Or you can set Error handling to Redirect Row, so all rows will be sent to the error output row, which may not be used, so data will be filtered
The disadvantage of this method is that all data is loaded and filtered during execution.
Also if working on network filtering is done on local machine (2nd method on server) after all data is loaded is memory.
2nd method - Using Script Task
To avoid loading all data, you can do a workaround, You can achieve this using a Script Task: (answer writen in VB.NET)
Assuming that the connection manager name is TestAdo and "Select [ID] FROM dbo.MyTable" is the query to get the list of id's , and User::MyVariableList is the variable you want to store the list of id's
Note: This code will read the connection from the connection manager
Public Sub Main()
Dim lst As New Collections.Generic.List(Of String)
Dim myADONETConnection As SqlClient.SqlConnection
myADONETConnection = _
DirectCast(Dts.Connections("TestAdo").AcquireConnection(Dts.Transaction), _
SqlClient.SqlConnection)
If myADONETConnection.State = ConnectionState.Closed Then
myADONETConnection.Open()
End If
Dim myADONETCommand As New SqlClient.SqlCommand("Select [ID] FROM dbo.MyTable", myADONETConnection)
Dim dr As SqlClient.SqlDataReader
dr = myADONETCommand.ExecuteReader
While dr.Read
lst.Add(dr(0).ToString)
End While
Dts.Variables.Item("User::MyVariableList").Value = "SELECT ... FROM ... WHERE ID IN(" & String.Join(",", lst) & ")"
Dts.TaskResult = ScriptResults.Success
End Sub
And the User::MyVariableList should be used as source (Sql command in a variable)
3rd method - Using Execute Sql Task
Similar to the second method but this will build the IN clause using an Execute SQL Task then using the whole query as OLEDB Source,
Just add an Execute SQL Task before the DataFlow Task
Set ResultSet property to single
Select User::MyVariableList as Result Set
Use the following SQL command
DECLARE #str AS VARCHAR(4000)
SET #str = ''
SELECT #str = #str + CAST([ID] AS VARCHAR(255)) + ','
FROM dbo.MyTable
SET #str = 'SELECT * FROM MySecondDB WHERE ID IN (' + SUBSTRING(#str,1,LEN(#str) - 1) + ')'
SELECT #str
If the column has string data type you should add quotation before and after values as below:
SELECT #str = #str + '''' + CAST([ID] AS VARCHAR(255)) + ''','
FROM dbo.MyTable
Make sure that you have set the DataFlow Task Delay Validation property to True
This is a classic case for using LookUp Transformation. First, use a OLE DB Source to get data from the first database. Then, use a LookUp Transformation to filter this data-set based on the ID values from the second data-set. Here is the steps for using a LookUp Transformation:
In the General tab, select Full Cash, OLE DB Connection Manager and Redirect rows to no match output as shown in the following picture. Notice that using Full Cash provides great performance for your package.
General Setting
In the Connection tab, use OLE DB Connection Manager to connect to your second server. Then, you can either directly select the data-set with ID values or (as is shown in the picture below) you can use SQL code to select the IDs from the filtering data-set.
Connection:
Go to Columns tab and select ID columns from the both datasets. For each record from your first data-set, it will check to see if its ID is in the Available LookUp Column. If it is, it will go to the Matching output, else to No Matching output.
Match ID columns:
Click on OK to close the LookUp. Then you need to select the LookUp Match Output.
Match Output:
The "best" answer depends on data volumes and source systems involved.
Many of the other answers propose building out a list of values based on clever concatenation within SQL Server. That doesn't work so well if the referenced system is Oracle, MySQL, DB2, Informix, PostGres, etc. There may be an equivalent concept but there might not be.
For best performance, you need to filter against the second db before any of those rows ever hit the data flow. That means adding a filtering condition, as the others have suggested, to your source query. The challenge with this approach is that your query is going to be limited by some practical bounds that I don't remember. Ten, one hundred, a thousand values in your where clause is probably fine. A lakh, a million - probably not so much.
In the cases where you have large volumes of values to filter against the source table, it can make sense to create a table on that server and truncate and reload that table (execute sql task + data flow). This allows you to have all of the data local and then you can index the filter table and let the database engine do what it's really good at.
But, you say the source database is some custom solution that you can't make tables in. You can look at the above approach with temporary tables and within SSIS you just need to mark the connection as singleton/persisted (TODO: look this up). I don't much care for temporary tables with SSIS as debugging them is a nightmare I'd not wish upon my mortal enemy.
If you're still reading, we've identified why filtering in the source system might not be "doable", even if it will provide the best performance.
Now we're stuck with purely SSIS solutions. To get the best performance, do not select the table name in the drop down - unless you absolutely need every column. Also, pay attention to your data types. Pulling LOB (XML, text, image (n)varchar(max), varbinary(max)) into the dataflow is a recipe for bad performance.
The default suggestion is to use a Lookup Component to filter the data within the data flow. As long as your source system supports and OLE DB provider (or you can coerce the data into a Cache Connection Manager)
If you can't use a Lookup component for some reason, then you can explicitly sort your data in your source systems, mark your source components as such, and then use a Merge Join of type Inner Join in the data flow to only bring in matched data.
However, be aware that sorts in source systems are going to be sorted according to native rules. I ran into a situation where SQL Server was sorting based on the default ASCII sort and my DB2 instance, running on zOS, provided an EBCDIC sort. Which was great when my domain was only integers but went to hell in a handbasket when the keys became alphanumeric (AAA, A2B, and AZZ will sort differently based on this).
Finally, excluding the final paragraph, the above assumes you have integers. If you're performing string matching, you get an extra level of ugliness because different components may or may not perform a case sensitive match (sorting with case sensitive systems can also be a factor).
I would first create a String variable e.g. SQL_Select, at the Scope of the Package. Then I would assign that a value using an Execute SQL Task against the 1st database. The ResultSet property on the General page should be set to Single row. Add an entry to the Result Set tab to assign it to your Variable.
The SQL Statement used needs to be designed to return the required SELECT statement for your 2nd database, in a single row of text. An example is shown below:
SELECT
'SELECT * from MySecondDB WHERE ID IN ( '
+ STUFF ( (
SELECT TOP 5
' , ''' + [name] + ''''
FROM dbo.spt_values
FOR XML PATH(''), TYPE).value('(./text())[1]', 'VARCHAR(4000)'
) , 1 , 3, '' )
+ ' ) '
AS SQL_Select
Remove the TOP 5 and replace [name] and dbo.spt_values with your column and table names.
Then you can use the variable SQL_Select in a downstream task e.g. an OLE DB Source against database 2. OLE DB Sources and OLE DB Command Tasks both let you specify a Variable as the SQL Statement source.
You could add a LinkedServer between the two servers. The SQL command would be something like this:
EXEC sp_addlinkedserver #server='SRV' --or any name you want
EXEC sp_addlinkedsrvlogin 'SRV', 'false', null, 'username', 'password'
SELECT * FROM SRV.CatalogNameInSecondDB.dbo.SecondDBTableName s
INNER JOIN FirstDBTableName f on s.ID = f.ID
WHERE f.ID IN (list of values)
EXEC sp_dropserver 'SRV', 'droplogins'

When is the type of a column in a SQL query result determined?

When performing a select query from a data base the returned result will have columns of a certain type.
If you perform a simple query like
select name as FirstName
from database
then the type of the resulting FirstName column will be that of database.name.
If you perform a query like
select age*income
from database
then the resulting data type will be that of the return value from the age*income expression.
What happens you use something like
select try_convert(float, mycolumn)
from database
where database.mycolumn has type of nvarchar. I assume that the resulting column has type of float which is decided by the return type of the first call to try_convert.
But consider this example
select coalesce(try_convert(float, mycolumn), mycolumn)
from database
which should give a column with the values of mycolumn unchanged if try_convert fails, but mycolumn as a float when/if that is possible.
Is this determination made as the first row is handled?
Or will the type always be determined by the function called independently of the data in the rows?
Is it possible to conditionally perform a conversion?
I would like to convert to float in the case where this is possible for all rows and leave unchanged in case it fails for any row.
Update 1
It seems that the answer to the first part of the question is that the column type is determined by the expression at compile time which means that you cannot have a dynamic type of your column depending on the data.
I see two workaround for this
Option 1
For each column count the number of not null rows of try_convert(float, mycolumn) and if this number is 0 then do not perform conversion. This will of course read the rows many times and might be inefficient.
Option 2
Simple repeat all columns; once without conversion and once with conversion and then simply use the interesting one.
One could also perform another select statement where only columns with non-null values are included.
Background
I have a dynamically generated pivot table with many (~200 columns) of which some have string values and others have numbers.
I would like to cast all columns as float where this is possible and leave the other columns unchanged (or cast as nvarchar).
The data
The data is mostly NULL values with some columns having text string and other columns having numbers. There are no columns with "mixed" content.
The types are determined at compile time, not at execution. try_convert(float, ...) knows exactly the type at parse/compile time, because float here is a keyword, not a value. As for expressions like COALESCE(foo, bar) the type similarly determined at compile time, following the rules of data type precedence lad already linked.
When you build your dynamic pivot you'll have to know the result type, using the same inference rules the SQL parser/compiler uses. I understand some rules are counter intuitive, when in doubt, test it out.
For the detail oriented: some expressions types can be determined at parse time, eg. N'foo'. But most have to be resolved at compile time, when the names of tables and columns are bind to actual object in the database, because only then the type is discovered.

For SSRS in Visual Studio 2008, how can I make a variable accept multiple values as well as a single value?

I'm making a report with Visual Studio 2008, pulling the info from a table in a database. Let's say I just want to show the Type of employee (which is represented by an int, say 1-10) and their Name. Then my dataset would be this query:
SELECT Type, Name
FROM Employees
WHERE Type = #Type
This is SSRS, so I do not need to declare or set the variable (correct me if I'm wrong). When I run the report, it will give me an option to type in an int for the Type and it will create the corresponding report. My question is how can I set it so that I can type 1,2,3 in for the Type so that I get a report with those types of employees? So having the int variable be able to accept a list of ints or just one int. Essentially the "resulting query" from that example would look like this:
SELECT Type, Name
FROM Employees
WHERE Type = 1 AND Type = 2 AND Type = 3
On the left side in the Report Data window under the Parameters folder, right click the variable, hit Parameter Properties, then check 'Allow Multiple Values' and select the correct datatype in the dropdown. I'm not sure why it decides to make a dropdown when you run the report, and you have to enter the values each on their own line instead of separated by commas, but it seems to work fine. Then just change the WHERE clause in your dataset from WHERE Type = #Type to WHERE Type IN (#Type). I don't know if the parentheses are necessary.
Also if you create a separate dataset that will present certain values you can have those show up instead of having to type them out. For example, create a dataset that contains this query
SELECT DISTINCT Type
FROM Employees
ORDER BY Type
This will create a list of distinct values for type that you can check/uncheck. You can make it more complex obviously as well.