Pentaho ETL : Database Join vs Table Input - pentaho

I need to write a database table data to a text file with some transformation.
There are two steps available to retrieve the data from the table, namely Table input and Database join. I don't see much difference between them except the "outer join?" option (correct me if I understood wrongly). So which would be better to use?
Environment:
Database : oracle
Pentaho Spoon : 5.3.* (Community edition)
Thanks in advance.

Table Input step in PDI is used to read the data from your database tables. The query would be executed once and will return you the result set. Check the wiki.
Database Join works slightly different. It will allow you to execute your query based on the data received from the previous step. For every row coming in from the previous step, the query in this step will be substituted and is executed. Check the wiki.
The choice of using the above steps clearly depends on your requirement.
If you need to fetch the data set from a database table, you should use the Table Input Step - The best choice.
In case, you need to run the query in the database for every row to fetch the result, use Database Join - The best choice.
Hope it helps :)

Related

IBM SPSS How to import a Custom SQL Database Query

I am looking to see if the capability is there to have a custom SSMS sql query imported in SPSS (Statistical Package for the Social Sciences). I would want to build syntax that generates this query as my new dataset that I can then continue my scripted analysis. I see the basic query capability of one table from a Sql Server but I would like to create a query that joins to many tables. I anticipate the query to be a bit complex with many joins and perhaps data transformations.
Has anybody had experience or a solution to this situation?
I know I could take the query and make a table of it that SPSS can then connect to but my data changes daily and I would need a job in another application to refresh this table before my SPSS syntax would pull it and I would like to eliminate that first step by just having the query that grabs the data at the beginning of my syntax.
Ultimately I am looking to build out my SPSS syntax and schedule it in the Production Facility to run daily.

Is there any way to get name of the first, the second, the third table in PostgreSQL?

I'm gathering information about the database by executing time-based SQL injection attacks (in lab environment). I discovered the current database user and the current database name. But now I don't know the way to get names of the first, the second, the third [,...] tables in that current database. Is there any way can help solve the problem?
I'm working with PostgreSQL, but if you know any way in another DBMS, please tell me, I'm so so grateful!
To list all tables in the current database, you can run the \dt command in psql.

Is it possible to output a detailed query result file?

I am new at working with SQL and need to know if it is possible to produce a detailed query result file. I know you can have this file but it only contains info like 1 row(s) affected, but I need to have detailed info like:
"added row ID,Name,Surname; 1, John, Adams".
This is not a feature of SQL Server at this time. If you need this level of insight into your database changes, you could look at using temporal tables or implementing a custom logging solution (like using Modified/Created columns on the table so that you can query the data to see when things changed or were created).
It's hard to say what your options are without knowing the version of SQL Server you're using and what level of control you have over how the data is getting into the system, but these are at least a couple options.

Adding new data to the end of a table using power query

I've got my query that pulls data from sql server 2012 into excel using power query. However, when I change the date range in my query I'd like to pull the new data into my table and store it below the previously pulled data without deleting it. So far all I've found is the refresh button, which will rerun my query with the new dates but replace the old. Is there a way to accomplish this? I'm using it to build an automated QA testing program that will compare this period to last. Thank you.
This sounds like incremental load. If your table doesn't exceed 1,1 Mio rows, you can use the technique described here: http://www.thebiccountant.com/2016/02/09/how-to-create-a-load-history-or-load-log-in-power-query-or-power-bi/

SQL Server 2012 SSIS Lookup Cache not working

I'm using SQL Server Data Tools in Microsoft SQL Server 2012 to load data from the staging to the data warehouse. During the ETL process, I use the Lookup Transformation to get the dimension key from the lookup table into my fact table. My issue is that when I use the Full Cache in Lookup Transformation, all the rows went to the no match output. When I use Partial cache or No cache, all the rows went to the match output as is supposed to be. I'm really confused and don't understand what's going on here. I really need some help here.
Thanks,
Dan
If you are looking up based on a VARCHAR or NVARCHAR field, as billinkc has suggested, if the fields are in different cases (Dan Vs dan) this would lead to a no match. Try doing an derived column of UPPER(SourceColumn) and use the query in the lookup Transformation to Select UPPER(MatchingColumn), LookedupKey from LookupTable and match on this.