I'm using SQL Server Data Tools in Microsoft SQL Server 2012 to load data from the staging to the data warehouse. During the ETL process, I use the Lookup Transformation to get the dimension key from the lookup table into my fact table. My issue is that when I use the Full Cache in Lookup Transformation, all the rows went to the no match output. When I use Partial cache or No cache, all the rows went to the match output as is supposed to be. I'm really confused and don't understand what's going on here. I really need some help here.
Thanks,
Dan
If you are looking up based on a VARCHAR or NVARCHAR field, as billinkc has suggested, if the fields are in different cases (Dan Vs dan) this would lead to a no match. Try doing an derived column of UPPER(SourceColumn) and use the query in the lookup Transformation to Select UPPER(MatchingColumn), LookedupKey from LookupTable and match on this.
Related
I have to copy data from one table to another, the tables are held in two different databases within Azure. I did a quick search for answers to this and whilst a query seems fairly straight forward i.e.
INSERT INTO table1 (make, model, type, serial)
SELECT the_make, the_model, the_type, ref_no
FROM database2.dbo.table2
I encountered issues because I'm using Azure.
Msg 40515, Level 15, State 1, Line 16 Reference to database and/or
server name in 'database2.dbo.table2' is not supported in this version of
SQL Server.
The above issue led me to the Cross-Database Queries articles. My requirements are a little more complicated than some of the scenarios provided and I need some help in making it work.
I also need to convert some columns such as reg_no which is a 'string' to an 'int' and then copy the value to the 'serial' column.
My question is, what the best way to create a script for this that allows me to reference both databases without any errors, copy the data and convert the columns at the same time? I tried the simple way of exporting data and importing it, editing the mappings for the columns, it wasn't that good I found and was causing problems all over the place.
Any guidance is appreciated on this.
You're getting this error because there's no linked server by default. You'll need to add it, in order to access the secondary db server. Here's a link about how to do it:
https://www.sqlshack.com/create-linked-server-azure-sql-database/
In terms of the transformation. It depends on many factors e.g. amount of rows, frequency, etc..
Usually the best alternative is by using an external tool (ETL) such as SSIS / Azure Data Factory because you can schedule it's execution and get the status of each execution.
#GregGalloway was able to answer the question I should have asked. I am adding a more concise question here, while maintaining the original lengthy text
How do I use a table valued function as the query for a partition, when the function is in separate database from my fact and referenced dimensions?
Overview: I am building a SSAS multidimensional cube that is built off of a single fact table in our application's data warehouse, and want to use the result set from a table valued function as my fact table's partition query. We are using SQL Server (and SSAS) 2014
Condition: For each environment (Dev,Tst,Prd) there are 2 separate databases on the same server, one for the application data warehouse [DW_App], the other for custom objects [DW_Custom]. I cannot create any objects in [DW_App], but have a lot of freedom in [DW_Custom]
Background info: I have not been able to find much information on using a TVF and partitions in this way. My thinking is that it will help streamline future development by giving me a single place to update the SQL if/when I modify the fact table.
So in testing out my crazy idea of using a TVF as the query for my partitions I have run into a bit of a conundrum. I am able to use my TVF when I explicitly state the Database in my FROM clause.
SELECT * FROM [DW_Custom].[dbo].[CubePartition](#StartDate, #EndDate)
However, that will not work, because the cube will be deployed in multiple environments before production, and it needs to point to different DBs for each. So I tried adding a new data source, setting my partition query to point to the new data source, and then remove the database name. IE:
SELECT * FROM [dbo].[CubePartition](#StartDate, #EndDate)
I get an error that
The SQL syntax is not valid. The relational database returned the following error message: Deferred prepare could not be completed. Invalid object name 'dbo.CubePartition'
If I click through this error and the subsequent warnings about the cube not being able to process if I continue I am able to build and deploy the cube. However I cannot process it, because I get an error that one of my dimensions does not exist.
Looking into the query that was generated and it is clear that it is querying my dimensions as well as fact, which do not exist inside of '[DW_Custom]' which explains that error perfectly fine.
So I guess 2 questions:
Is it possible to query another DB (on the same server) from inside of an SSAS partition query?
If not, is there any way I can use a variable as the database name in the query, and update that variable based on the project configuration (Dev,Tst,Prd)
Bonus question: Is the reason that I can not find much about doing it this way because it is an obviously bad idea that I am overlooking, and if so why?
How about creating a second SSAS Data Source pointing to the DW_Custom database (or whatever it's called in the particular environment you're deploying to)? Then when you deploy from Dev to Prod, you need only change that connection string. When you create your partitions, then specify the DW_Custom data source and then specify the query without database name:
SELECT * FROM [dbo].[CubePartition](#StartDate, #EndDate)
As long as the query plan for that table-valued function is efficient compared to a plain SELECT statement, then I don't see a problem with that.
I need to write a database table data to a text file with some transformation.
There are two steps available to retrieve the data from the table, namely Table input and Database join. I don't see much difference between them except the "outer join?" option (correct me if I understood wrongly). So which would be better to use?
Environment:
Database : oracle
Pentaho Spoon : 5.3.* (Community edition)
Thanks in advance.
Table Input step in PDI is used to read the data from your database tables. The query would be executed once and will return you the result set. Check the wiki.
Database Join works slightly different. It will allow you to execute your query based on the data received from the previous step. For every row coming in from the previous step, the query in this step will be substituted and is executed. Check the wiki.
The choice of using the above steps clearly depends on your requirement.
If you need to fetch the data set from a database table, you should use the Table Input Step - The best choice.
In case, you need to run the query in the database for every row to fetch the result, use Database Join - The best choice.
Hope it helps :)
How can I get the 'No Match Output' from a Lookup Transformation in SSIS 2005?
My past experience is in SSIS 2008, and I know that in 2008 this is an option on every Lookup transformation, but I have found that in SSIS 2005 this is not an option. Is there a reasonable work around to this, without having to use 'Ignore Failure' in Error handling.
The package I am working with needs to incrementally update data within a table. Currently it wipes the table clean and repopulates the data each run. This doesn't work well with the application side of the coin, and the package needs to be altered to only update records that have changed or add new records into the table. My plan was to compare the ODS table to the DM table using the lookup action, and I need the non-match output from the lookup to determine the changes.
Thanks
KJ
The appropriate workaround is to use the Ignore Failure option for SSIS 2005. That was updated in 2008 to add the No Match Output option.
No match is not available in SSIS 2005 You need to use ignore failure option. Lookup transformation is drastically changed in 2008 and 2008R2.
I would like to copy parts of an Oracle DB to a SQL Server DB. I need to move the data because the Oracle box is being decommissioned. I only need the data for reference purposes so don't need indexes or stored procedures or contstaints, etc. All I need is the data.
I have a link to the Oracle DB in SQL Server. I have tested the following query, which seemed to work just fine:
select
*
into
NewTableName
from
linkedserver.OracleTable
I was wondering if there are any potential issues with using this approach?
Using SSIS (sql integration services) may be a good alternative especially if your table names are the same on both servers. Use the import wizard via and it should create the destination tables for you and let you edit any mappings.
The only issue I see with that is you will need to execute that of course for each and every table you need. Glad you are decommissioning the oracle server :-). Otherwise if you are not concerned with indexes or any of the existing sprocs I don't see any issue in what you are doing.
The "select " approach could be very slow if tables are large. Consider writing pro*C in that case or use Fastreader http://www.wisdomforce.com/products-FastReader.html
A faster and easier approach might be to use the Data Transformation Services, depending on the number of objects you're trying to copy over.