Pentaho ETL : Data from 'Table input' to 'Table output'

Pentaho ETL : Data from 'Table input' to 'Table output' - pentaho

Scenario :
Generate rows -> Table input -> Delay row -> Table output.
Generate Rows : (4 copies)
Generate 10 rows.
Pass the field 'value' with value 1.
Table input : (4 copies)
Run for each row and use the value 1 (as where 1 = ?. So no effect).
insert data from previous step.
Get the count of a table my_table (select count(*) from...).
Field = val_count.
Delay row: (4 copies)
1 second delay.
Table output: (1 copy)
Insert the val_count to the same table, i.e. my_table.
commit row = 1.
truncate table.
Database :
Oracle
once the transform finish, the my_table filled with the value only 0 (totally 40 zero's). Why the table input not getting the actual row count after the first round of execution (round 2 to 10). or what mistake i did in this design?
Pentaho : Kettle - Spoon General Availability Release - 5.3.0.0-213
Java : jdk1.8.0_51 (64)
Os : Windows 8.1 (64)
Oracle : Oracle Database 11g Express Edition Release 11.2.0.2.0 - Production
More info added after analysis
In the four links to the table output, i removed the delay from one link. So i got some what expected. So i removed all the delay and i got the expected result. But i can't able to understand the reason.

Related

DB2 LUW table locked while reading

I have a range partitioned table in my database, it is range partitioned by a date column: transaction_date, with 1 partition per 1 month.
Now my problem is:
When running SQL statement to read data from the table,
select col1,col2 from mytable where ID=1
My table is very large so it takes a long time for the SQL to finish.
However, there is another ETL job to insert (append) data to the table at the same time, the insert operation cannot start until the read SQL finishes.
Any suggestions I can avoid this issue while reading data? Also are there any IBM official documents regarding this problem?
** EDIT 1:
$ db2level
DB21085I This instance or install (instance name, where applicable:
"db2inst1") uses "64" bits and Db2 code release "SQL11011" with level
identifier "0202010F".
Informational tokens are "DB2 v11.1.1.1", "s1610100100",
"DYN1610100100AMD64", and Fix Pack "1".
Product is installed at "/opt/ibm/db2/v11.1".
$ db2set -all
[i] DB2COMM=TCPIP
[i] DB2AUTOSTART=TRUE
[i] DB2OPTIONS=+c
[g] DB2FCMCOMM=TCPIP4
[g] DB2SYSTEM=<server hostname>
[g] DB2INSTDEF=db2inst1
** EDIT 2:
For the select and load SQL statement, I am not specifying any isolation level.
For the ETL job, it is an IBM DataStage job, the ETL insert is a bulk load append operation to insert data to a pre-existing range.

You may use the MON_LOCKWAITS administrative view to check what's happening during such a lock situation. You may optionally format a lock with the MON_FORMAT_LOCK_NAME function to get more details on this as well.
SELECT
W.*
--, F.NAME, F.VALUE
FROM SYSIBMADM.MON_LOCKWAITS W
--, TABLE(MON_FORMAT_LOCK_NAME(W.LOCK_NAME)) F
--WHERE W.REQ_APPLICATION_HANDLE = XXX -- if you know the holder's handle to reduce the amount of data returned
ORDER BY W.LOCK_NAME
;

ORA-01792: maximum number of columns in a table or view is 1000 error while using WITH in sql

I have a query :
WITH abc AS
(
(SELECT SRC_DATA.*,
(SELECT MAX(DECODE(OBJ.AUD_ACTION_FLAG,'D',OBJ.OUPDATE_COUNT,OBJ.NUPDATE_COUNT))
FROM SMARTTRIAL_ODR_LANDING.AUD_TRIAL_DESIGN OBJ
WHERE OBJ.AUD_DATE_CHANGED BETWEEN TO_DATE('01-JAN-1900') AND (SRC_DATA.AUD_DATE_CHANGED)
AND DECODE(OBJ.AUD_ACTION_FLAG,'D',OBJ.OTRIAL_NO,OBJ.NTRIAL_NO)= DECODE(SRC_DATA.AUD_ACTION_FLAG,'D',SRC_DATA.OTRIAL_NO,SRC_DATA.NTRIAL_NO)
AND OBJ.AUD_ACTION_FLAG <> 'D'
) UPDATE_COUNT,
/***Multiple select statement like above with many other look up tables like AUD_TRIAL_DESIGN ****/
FROM SMARTTRIAL_ODR_LANDING.AUD_TRIAL SRC_DATA /***AUD_TRIAL is the base table***/
),
WITH def AS
(SELECT OBJ_DATA .*,
/***Similar statement as mentioned in above block and lookup table is AUD_OBJECTIVE***/
FROM SMARTTRIAL_ODR_LANDING.AUD_TRIAL_OBJECTIVE OBJ_DATA /***AUD_TRIAL_OBJECTIVE is the base table***/
)
----Query to select columns-----
FROM abc
LEFT JOIN def
LEFT JOIN xyz ON (column from def = column from xyz)
For the simliar structure of query written by me, following error is returned :
ORA-01792: maximum number of columns in a table or view is 1000
01792. 00000 - "maximum number of columns in a table or view is 1000"
*Cause: An attempt was made to create a table or view with more than 1000
columns, or to add more columns to a table or view which pushes
it over the maximum allowable limit of 1000. Note that unused
columns in the table are counted toward the 1000 column limit.
*Action: If the error is a result of a CREATE command, then reduce the
number of columns in the command and resubmit. If the error is
a result of an ALTER TABLE command, then there are two options:
1) If the table contained unused columns, remove them by executing
ALTER TABLE DROP UNUSED COLUMNS before adding new columns;
2) Reduce the number of columns in the command and resubmit.
Could anyone please suggest a solution

We had a similar problem (Here is an excerpt from the SR):
Creating view generates ORA-01792 maximum number of columns in a table or view is 1000
We have a new application that has a view that contains 35 columns. However, when creating it, it errors out stating that there are over 1000 columns, which is false. I will attach the view definition
Here is what Oracle said (and it did fix the problem):
Bug 19893041 : ORA-01792 HAPPEN WHEN UPDATE TO 12.1.0.2
closed as dup of
Bug 19509982 : DISABLE FIX FOR RAISING ORA-1792 BY DEFAULT.
Solution:
SQL> alter system set "_fix_control"='17376322:OFF';
Or
B. Apply patch 19509982
(no conflicts found with the attached opatch)
That may be the same issue you're encountering.

How could retrieve the specific row from HIVE table?

I have table of 200 rows and 50 columns in a HIVE table.
I could write one Java program to read the input file data by increment line number, when the line counter reached 10th row for top , i could print 10th row of table.
Instead of writing Java program , is there any way to retrieve the 10th row from table using HIVE query?

Do something like this :
select * from (select * from tableName limit 10) as tb1 limit 1;
It should give you the 10th row.
PS : Checked on non-partitioned managed table

Migration of Data in Oracle DB using SQL

I have an issue with migration of data in Oracle DB during some release upgrades.
Case:
Table X in release 1 has three coulmns.
Same Table X in release 2 has five columns(two added in release 2).
Same table in release 3 has five columns as in release 2.
Upgrade paths include Release 1 to Release 3 and Release 2 to Release 3.
I need a Oracle SQL query which copies data from a TMP table to actual table in both cases based on coulmns size from TMP where i have stored the data temporarily(this has to be done).
Below is the query which i tried but it isnt working.
insert into USER.X values
(CASE (select count(*) from all_tab_columns where table_name='TMP')
WHEN '3' THEN (select USER.TMP.*, null NEWCOL1 from USER.TMP, null NEWCOL2 from USER.TMP)
WHEN '5' THEN (select USER.TMP.* from USER.TMP)
END
);
Please help in this regard and if there is a better way of doing the same please let me know.

Edit:
There are multiple problems in you logic.
You cannot determine number of parameters to insert statement at runtime. You have to determine it before creating insert statement.
Case returns only 1 value. More than that and you will get error
too many values
So you should
Create stored proc
Use if else and create insert statement based on
that.
Execute it by Execute immidiate
Prev. Response
The first problem in your query is that select count(*) from all_tab_columns where table_name='TMP' returns an integer, whereas in case you are comparing it to '3' and '5' as varchar. So assuming that rest of your query return result correctly, try replacing '3' and '5' as 3 and 5

How to proceed to the next task only if no records exist for a given query?

I have the following piece of SQL that will check if any duplicate records exist. How can I check to see if no records are returned? I'm using this in an SSIS package. I only want it to proceed to the next step if no records exist, otherwise error.
SELECT Number
, COUNT(Number) AS DuplicateCheckresult
FROM [TelephoneNumberManagement].[dbo].[Number]
GROUP BY Number
HAVING COUNT(Number) > 1

Following example created using SSIS 2008 R2 and SQL Server 2008 R2 backend illustrates how you can achieve your requirement in an SSIS package.
Create a table named dbo.Phone and populate it couple records that would return duplicate results.
CREATE TABLE [dbo].[Phone](
[Number] [int] NOT NULL
) ON [PRIMARY]
GO
INSERT INTO dbo.Phone (Number) VALUES
(1234567890),
(1234567890);
GO
You need to slightly modify your query so that it returns the total number of duplicates instead of the duplicate rows. This query will result only one value (scalar value) which could be either zero or non-zero value depending on if duplicates are found or not. This is the query we will use in the SSIS package's Execute SQL Task.
SELECT COUNT(Number) AS Duplicates
FROM
(
SELECT Number
, COUNT(Number) AS NumberCount
FROM dbo.Phone
GROUP BY Number
HAVING COUNT(Number) > 1
) T1
On the SSIS package, create a variable named DuplicatesCount of data type Int32.
On the SSIS package, create an OLE DB Connection manager to connect to the SQL Server database. I have named it as SQLServer.
On the Control Flow tab of the SSIS, package, place an Execute SQL Task and configure it as shown below in the screenshots. The task should accept a single row value and assign it to the newly create variable. Set the ResultSet to Single row. Set the Connection to SQLServer and the SQLStatement to SELECT COUNT(Number) AS Duplicates FROM (SELECT Number, COUNT(Number) AS NumberCount FROM dbo.Phone GROUP BY Number HAVING COUNT(Number) > 1) T1.
On the Result Set section, click on the Add button and set the Result Name to 0. Assign the variable User::DuplicatesCount to the result name. Then click OK.
Place another task after the Execute SQL Task. I have chosen Foreach Loop Container for sample. Connect the tasks as shown below.
Now, the requirement is if there are no duplicates, which means if the output value of the query in the Execute SQL task is zero, then the package should proceed to Foreach loop container. Otherwise, the package should not proceed to Foreach loop container. To achieve this, we need to add a expression to the precedence constraint (the green arrow between the tasks).
Right-click on the precedence constraint and select Edit...
On the Precedence constraint editor, select Expression from the Evaluation operation dropdown. Set the expression to #[User::DuplicatesCount] == 0 in order to check that the variable DuplicatesCount contains the value zero. Value zero means that there were no duplicates in the table dbo.Phone. Test the expression to verify that the syntax is correct. Click OK to close the verification message. Click OK to close the precedence constraint.
Now, the Control Flow should look like this. The precedence constraint will be denote with fx, which represents there is a constraint/expression in place.
Let's check the rows in the table dbo.Phone. As you see, the value 1234567890 exists twice. It means that there are duplicate rows and the Foreach loop container shouldn't execute.
Let's execute the package. You can notice that the Execute SQL Task executed successfully but it didn't proceed to Foreach Loop container. That's because the variable DuplicatesCount contains a value of 1 and we had written a condition to check that the value should be zero to proceed to Foreach loop container.
Let's delete the rows from the table dbo.Phone and populate it with non-duplicate rows using the following script.
TRUNCATE TABLE dbo.Phone;
INSERT INTO dbo.Phone (Number) VALUES
(1234567890),
(0987654321);
Now, the data in the table is as shown below.
If we execute the package, it will proceed to the Foreach Loop container because there are no duplicate rows in the table dbo.Phone
Hope that helps.

What you need to do to is work with ##ROWCOUNT, but how you do it depends on your data flows. Have a look at this discussion, which points out how to do it with either one or with two data flows.
Using Row Count In SSIS

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pentaho ETL : Data from 'Table input' to 'Table output' - pentaho

Related

DB2 LUW table locked while reading

ORA-01792: maximum number of columns in a table or view is 1000 error while using WITH in sql

How could retrieve the specific row from HIVE table?

Migration of Data in Oracle DB using SQL

How to proceed to the next task only if no records exist for a given query?

Categories

Resources