I have xml file and I want check this file and in output print only this records where sourceColumn is not equal targetColumn.
Do you have any idea how I can do this in Pentaho?
It doesn't have to be a ready-made solution, but at least a concept of how I could approach it.
Example data in file:
<Column
sourceColumn="Column1"
targetColumn="COLUMN1"
/>
<Column
sourceColumn="Column2"
targetColumn=
/>
Related
I'm using concat targets in an ant macrodef to generate ddl files. One part of the string in a few of the property strings are getting duplicated in the resulting ddl.
This duplication is only observed when generated from the concat targets.
I've tried 1) using dashes instead of underscores, 2) using ${property-name} instead of #{property-name}, 3) using echo target instead of concat target, 4) switching from ant 1.9.3 to 1.10.5, and 5) doing an online search
Property getting set in ant script
<property name="SCHEMA_ID" value="REPLACE_SCHEMA_ID" />
Attribute being set in macrodef
<attribute name="schema-id" default="${SCHEMA_ID}" />
Concat target
<concat destfile="#{dest-dir}/#{spname}.ddl">
SET CURRENT SCHEMA = '#{schema-id}'
####
SET CURRENT SQLID = '#{sql-id}'
####
</concat>
Output line in the ddl file
SET CURRENT SCHEMA = 'REPLACE_REPLACE_SCHEMA_ID'
I would expect the Output line in the ddl file to be:
SET CURRENT SCHEMA = 'REPLACE_SCHEMA_ID'
As far as I can tell there's a bug when using echo or concat (at least in a macrodef) where if the name of a property equals part of the value of a property, the part of the value that doesn't match the name is duplicated.
<property name="SCHEMA_ID" value="REPLACE_SCHEMA_ID" /> becomes REPLACE_REPLACE_SCHEMA_ID
<property name="SCHEMA_ID" value="#SCHEMA_ID#" /> becomes ##SCHEMA_ID##
but
<property name="SCHEMA_ID" value="#schema_id#" /> becomes #schema_id#
Strange behavior, and I'm open to being proven wrong, but this is what I came up with.
I have an Excel file with list of items: Column A - ID, B - Name.
For example (2 lines):
Line 1: A - 12000; B - "Name of the first item"
Line 2: A - 12001; B - "Name of the second item"
I need to go through all lines and create for each a file with name ID.xml.
For above example I want to have 2 files in the output folder:
12000.xml
<?xml version="1.0" encoding="utf-8"?>
<item>
<property key="ID" value="12000"/>
<property key="name" value="Name of the first item"/>
</item>
12001.xml
<?xml version="1.0" encoding="utf-8"?>
<item>
<property key="ID" value="12001"/>
<property key="name" value="Name of the second item"/>
</item>
How can I achieve it with Pentaho Kettle ETL tool?
Any help is appreciated.
If the XML structure is as simple as you put here, the simplest way is to just build your XML in a Javascript step, generate the filename as well, and then use the Text file output step, with the "accept filename from previous step" box checked.
This will output each row of data in a separate file.
If your structure is more complex than that, then you'll probably need to use several Add XML steps together with some XML Joins.
There is a XML join sample in PDI's samples folder.
When importing db fro azure bacpac file to local sql server 2016 I'm geting the following error.
Error SQL72014: .Net SqlClient Data Provider: Msg 102, Level 15, State 1, Line 1 Incorrect syntax near 'EXTERNAL'.
Error SQL72045: Script execution error. The executed script: CREATE EXTERNAL DATA SOURCE [BoxDataSrc]
WITH (
TYPE = RDBMS,
LOCATION = N'MYAZUREServer.database.windows.net',
DATABASE_NAME = N'MyAzureDb',
CREDENTIAL = [SQL_Credential]
);
(Microsoft.SqlServer.Dac)
I ran into this same issue today. Since "WITH(TYPE = RDBMS)" is only applicable to Azure SQL DB, we get the error when attempting to import the bacpac into SQL Server 2017 on-premise. I did find a solution thanks to this article:
https://blogs.msdn.microsoft.com/azuresqldbsupport/2017/08/16/editing-a-bacpac-file/
The relevant steps rewritten here:
Make a copy of the bacpac file (for safety in case of errors).
Change the file extension to zip, then decompress it into a folder. Surprisingly, a bacpac is actually just a zip file, not something proprietary and hard to get into.
Find the model.xml file and edit it to remove the section that looks like this:
<Element Type="SqlExternalDataSource" Name="[BoxDataSrc]">
<Property Name="DataSourceType" Value="1" />
<Property Name="Location" Value="MYAZUREServer.database.windows.net" />
<Property Name="DatabaseName" Value="MyAzureDb" />
<Relationship Name="Credential">
<Entry>
<References Name="[SQL_Credential]" />
</Entry>
</Relationship>
</Element>
If you have multiple external data sources of this type, you will pobably need to repeat step 3 for each one. I only had one.
Save and close model.xml.
Now you need to re-generate the checksum for model.xml so that the bacpac doesn't think it was tampered with (since you just tampered with it). Create a PowerShell file named computeHash.ps1 and put this code into it.
$modelXmlPath = Read-Host "model.xml file path"
$hasher = [System.Security.Cryptography.HashAlgorithm]::Create("System.Security.Cryptography.SHA256CryptoServiceProvider")
$fileStream = new-object System.IO.FileStream ` -ArgumentList #($modelXmlPath, [System.IO.FileMode]::Open)
$hash = $hasher.ComputeHash($fileStream)
$hashString = ""
Foreach ($b in $hash) { $hashString += $b.ToString("X2") }
$fileStream.Close()
$hashString
Run the PowerShell script and give it the filepath to your unzipped and edited model.xml file. It will return a checksum value.
Copy the checksum value, then open up Origin.xml and replace the existing checksum, toward the bottom on the line that looks like this:
<Checksum Uri="/model.xml">9EA0F06B282D4F42955C78A98822A31AA0ED0225CB131B8759379055A482D01F</Checksum>
Save and close Origin.xml, then select all the files and put them into a new zip file and rename the extension to bacpac.
Now you can use this new bacpac to import the database without getting the error. It worked for me, it could work for you, too.
As per #SQLDoug's answer, this can happen if your Azure SQL database has External Tables (i.e. linked tables from other databases). You can check that in SSMS here:
Addendum to accepted answer
If you delete those external tables' datasouces you'll also need to delete the SqlExternalTable elements in the model.xml file that were using those datasources too, they'll look something like this:
<Element Type="SqlExternalTable" Name="[dbo].[DeliveryMethodsRestored]">
<Property Name="ExternalSchemaName" Value="dbo" />
<Property Name="ExternalObjectName" Value="DeliveryMethods" />
<Property Name="IsAnsiNullsOn" Value="True" />
<Property Name="IsQuotedIdentifierOn" Value="False" />
<Relationship Name="Columns">
<Entry>
<Element Type="SqlSimpleColumn" Name="[dbo].[DeliveryMethodsRestored].[DeliveryMethodId]">
<Property Name="IsNullable" Value="False" />
<Relationship Name="TypeSpecifier">
<Entry>
SNIP....
</Element>
If you do a search for 'SqlExternalTable' in model.xml you'll find them all easily.
Alternative approach to solving this issue
Rather than correcting the bacpac after downloading it, the other way to deal with this is simply to remove the external tables before creating the bacpac i.e.:
Restore a copy of your database to a separate database
Delete the External Tables in the restored copy
Delete the External Data Sources in the restored copy
Create the bacpac from that restored copy
Delete the copy database
This approach has the advantage that you aren't creating the bacpac from the live database, which apparently 'can cause the exported table data to be inconsistent because, unlike SQL Server's physical backup/restore, exports do not guarantee transactional consistency'.
If that's something you're likely to do a that a lot you could probably write scripts to automate most of the above steps.
Same error code with different error.
Could not import package.
Warning SQL72012: The object [PreProd_Data] exists in the target, but it will not be dropped even though you selected the 'Generate drop statements for objects that are in the target database but that are not in the source' check box.
Warning SQL72012: The object [PreProd_Log] exists in the target, but it will not be dropped even though you selected the 'Generate drop statements for objects that are in the target database but that are not in the source' check box.
Error SQL72014: .Net SqlClient Data Provider: Msg 102, Level 15, State 1, Line 5 Incorrect syntax near 'OPTIMIZE_FOR_AD_HOC_WORKLOADS'.
Error SQL72045: Script execution error. The executed script:
IF EXISTS (SELECT 1
FROM [master].[dbo].[sysdatabases]
WHERE [name] = N'$(DatabaseName)')
BEGIN
ALTER DATABASE SCOPED CONFIGURATION SET OPTIMIZE_FOR_AD_HOC_WORKLOADS = ON;
END
Solution
this blog will help to edit model.xml to remove Relationship command for OPTIMIZE_FOR_AD_HOC_WORKLOADS which is not necessary in SQL Server 2017 Instance.
https://blogs.msdn.microsoft.com/azuresqldbsupport/2017/08/16/editing-a-bacpac-file/
Make a copy of the bacpac file (for safety in case of errors).
Change the file extension to zip, then decompress it into a folder. Surprisingly, a bacpac is actually just a zip file, not something proprietary and hard to get into.
Find the model.xml file and edit it to remove the section that looks like this:
The relevant steps rewritten here:
Make a copy of the bacpac file (for safety in case of errors).
Change the file extension to zip, then decompress it into a folder.
Surprisingly, a bacpac is actually just a zip file, not something
proprietary and hard to get into.
Find the model.xml file and edit it to remove the section that looks
like this:
<Relationship Name="GenericDatabaseScopedConfigurationOptions">
<Entry>
<References Name="[OPTIMIZE_FOR_AD_HOC_WORKLOADS]" />
</Entry>
</Relationship>
Remove following block from model.xml
<Element Type="SqlGenericDatabaseScopedConfigurationOptions" Name="[OPTIMIZE_FOR_AD_HOC_WORKLOADS]">
<Property Name="GenericValueType" Value="2" />
<Property Name="GenericValue" Value="ON" />
</Element>
Save and close model.xml.
Now you need to re-generate the checksum for model.xml so that the bacpac doesn't think it was tampered with (since you just tampered with it). Create a PowerShell file named computeHash.ps1 and put this code into it.
Run the PowerShell script and give it the filepath to your unzipped and edited model.xml file. It will return a checksum value.
Copy the checksum value, then open up Origin.xml and replace the existing checksum.
Save and close Origin.xml, then select all the files and put them into a new zip file and rename the extension to bacpac.
Now bacpack file will is ready to import and it work for me.
Thanks.
Elastic Database queries are supported only on Azure SQL Database v12 or later, Not on local server.
https://msdn.microsoft.com/en-us/library/dn935022.aspx
I got the same error code (SQL72045) when importing bacpac even though we have deleted the external data sources in Azure that we used to sync data with. It turned out that there was a procedure "TransferDo" left with reference to SCOPED CREDENTIAL for another database. After we removed the procedure, the import worked well.
I am using liquibase to load data in my Mysql database like this :
<loadUpdateData encoding="UTF-8"
primaryKey="pk_id"
file="config/liquibase/site.csv"
separator=";"
tableName="site">
<column name="site" type="STRING"/>
</loadUpdateData>
How can I force liquibase to execute this task each time I run my application (in case site.csv has changed)? My problem is that when liquibase has executed the changeset, it won't execute it again.
If you only want to run it if the CSV file has changed, add runOnChange="true" as an attribute to the to the changeSet.
<changeSet id="42" author="arthur" runOnChange="true">
<loadUpdateData>
...
</loadUpdateData>
</changeSet>
If you always want to run it, use runAlways="true" instead.
See the manual for more details:
http://www.liquibase.org/documentation/changeset.html
I require to import multiple txt files with the same name and same schemas into the same table in SQL Server 2008 database. The problem that I have is that they are all in different directories:
TEST
201304
sample1.txt
sample2.txt
201305
sample1.txt
sample2.txt
201306
sample1.txt
sample2.txt
Is there any way in SSIS that I can set this up?
Yes. You will want to use a Foreach File Container and then check the Traverse Subfolder option.
Edit
Apparently my answer wasn't cromulent enough, so please accept this working code which illustrates what my brief original answer stated.
Source data
I created 3 folders as described above to contain files sample1.txt and sample2.txt
C:\>MKDIR SSISDATA\SO\TEST\201304
C:\>MKDIR SSISDATA\SO\TEST\201305
C:\>MKDIR SSISDATA\SO\TEST\201306
The contents of the file are below. Each version of the file in each folder has the ID value incremented along with the text values altered to prove it has picked up the new file.
ID,value
1,ABC
Package generation
This part assumes you have BIDS Helper installed. It is not required for the solution but simply provides a common framework future readers could use to reproduce this solution
I created a BIML file with the following content. Even though I have the table create step in there, I needed to have that run on the target server prior to generating the package.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<!-- Create a basic flat file source definition -->
<FileFormats>
<FlatFileFormat
Name="FFFSrc"
CodePage="1252"
RowDelimiter="CRLF"
IsUnicode="false"
FlatFileType="Delimited"
ColumnNamesInFirstDataRow="true"
>
<Columns>
<Column
Name="ID"
DataType="Int32"
Delimiter=","
ColumnType="Delimited"
/>
<Column
Name="value"
DataType="AnsiString"
Delimiter="CRLF"
InputLength="20"
MaximumWidth="20"
Length="20"
CodePage="1252"
ColumnType="Delimited"
/>
</Columns>
</FlatFileFormat>
</FileFormats>
<!-- Create a connection that uses the flat file format defined above-->
<Connections>
<FlatFileConnection
Name="FFSrc"
FileFormat="FFFSrc"
FilePath="C:\ssisdata\so\TEST\201306\sample1.txt"
DelayValidation="true"
/>
<OleDbConnection
Name="tempdb"
ConnectionString="Data Source=localhost\dev2012;Initial Catalog=tempdb;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"
/>
</Connections>
<!-- Create a package to illustrate how to apply an expression on the Connection Manager -->
<Packages>
<Package
Name="so_19957451"
ConstraintMode="Linear"
>
<Connections>
<Connection ConnectionName="tempdb"/>
<Connection ConnectionName="FFSrc">
<Expressions>
<!-- Assign a variable to the ConnectionString property.
The syntax for this is ConnectionManagerName.Property -->
<Expression PropertyName="FFSrc.ConnectionString">#[User::CurrentFileName]</Expression>
</Expressions>
</Connection>
</Connections>
<!-- Create a single variable that points to the current file -->
<Variables>
<Variable Name="CurrentFileName" DataType="String">C:\ssisdata\so\TEST\201306\sample1.txt</Variable>
<Variable Name="FileMask" DataType="String">*.txt</Variable>
<Variable Name="SourceFolder" DataType="String">C:\ssisdata\so\TEST</Variable>
<Variable Name="RowCountInput" DataType="Int32">0</Variable>
<Variable Name="TargetTable" DataType="String">[dbo].[so_19957451]</Variable>
</Variables>
<!-- Add a foreach file enumerator. Use the above -->
<Tasks>
<ExecuteSQL
Name="SQL Create Table"
ConnectionName="tempdb">
<DirectInput>
IF NOT EXISTS (SELECT * FROM sys.tables T WHERE T.name = 'so_19957451' and T.schema_id = schema_id('dbo'))
BEGIN
CREATE TABLE dbo.so_19957451(ID int NOT NULL, value varchar(20) NOT NULL);
END
</DirectInput>
</ExecuteSQL>
<ForEachFileLoop
Name="FELC Consume files"
FileSpecification="*.csv"
ProcessSubfolders="true"
RetrieveFileNameFormat="FullyQualified"
Folder="C:\"
ConstraintMode="Linear"
>
<!-- Define the expressions to make the input folder and the file mask
driven by variable values -->
<Expressions>
<Expression PropertyName="Directory">#[User::SourceFolder]</Expression>
<Expression PropertyName="FileSpec">#[User::FileMask]</Expression>
</Expressions>
<VariableMappings>
<!-- Notice that we use the convention of User.Variable name here -->
<VariableMapping
Name="0"
VariableName="User.CurrentFileName"
/>
</VariableMappings>
<Tasks>
<Dataflow Name="DFT Import file" DelayValidation="true">
<Transformations>
<FlatFileSource Name="FFS Sample" ConnectionName="FFSrc"/>
<RowCount Name="RC Source" VariableName="User.RowCountInput"/>
<OleDbDestination
Name="OLE_DST"
ConnectionName="tempdb">
<TableFromVariableOutput VariableName="User.TargetTable"/>
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</ForEachFileLoop>
</Tasks>
</Package>
</Packages>
</Biml>
Right click on the biml file and select Generate SSIS Package. At this point, you should have a package named so_19957451 added to your current SSIS project.
Package configuration
There's no need for any configuration because it's already been done via BIML but moar screenshots make for better answers.
This is the basic package
Here are my variables
Configuration of the Foreach Loop, as called out in the MSDN article as well as my note of select the Traverse subfolder
Assign the value generated per loop to the variable Current
The flat file source has an expression applied to the ConnectionString property to ensure it uses the Variable #User::CurrentFileName. This changes the source per execution of the loop.
Execution results
Results from the database
Match the output from the package execution
Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201304\sample1.txt" has started.
Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201304\sample1.txt" has ended.
Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201304\sample2.txt" has started.
Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201304\sample2.txt" has ended.
Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201305\sample1.txt" has started.
Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201305\sample1.txt" has ended.
Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201305\sample2.txt" has started.
Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201305\sample2.txt" has ended.
Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201306\sample1.txt" has started.
Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201306\sample1.txt" has ended.
Information: 0x402090DC at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201306\sample2.txt" has started.
Information: 0x402090DD at DFT Import file, FFS Sample [2]: The processing of file "C:\ssisdata\so\TEST\201306\sample2.txt" has ended.