how to count rows of a table in pentaho etl, i didn't found the buttom that do the job - pentaho

I want to know how to count a row of a table in Pentaho?
A method or button on Pentaho
I didn't find the bottom that does the job,
Thanks in advance

You can use a table input step in a transformation performing a query to output the row count, something like SELECT COUNT(*) AS numrows FROM table.
Or you have a Job Entry specific for this, it's named Evaluate rows number in a table.

Related

SQL to identify duplicate columns from table having hundreds of column

I've 250+ columns in customer table. As per my process, there should be only one row per customer however I've found few customers who are having more than one entry in the table
After running distinct on entire table for that customer it still returns two rows for me. I suspect one of column may be suffixed with space / junk from source tables resulting two rows of same information.
select distinct * from ( select * from customer_table where custoemr = '123' ) a;
Above query returns two rows. If you see with naked eye to results there is not difference in any of column.
I can identify which column is causing duplicates if I run query every time for each column with distinct but thinking that would be very manual task for 250+ columns.
This sounds like very dumb question but kind of stuck here. Please suggest if you have any better way to identify this, thank you.
Solving this one-time issue with sql is too much effort. Simply copy-paste to excel, transpose data into columns and use some simple function like "if a==b then 1 else 0".

Push data in a table to another table

I have 3 tables like the below picture. I need to push data in Conditional table when it reach the ExpireDate, the database must move to Unconditional table (all codes in SQL). How can I do this?
Maybe the easiest way is to make a job that runs daily with:
insert into unconditional_table
select
*
from conditional_table a
where a.expiration_date = trunc(sysdate,'dd')
When the condition is not met - no data will be transferred.
You want to move records from a table to another.
When do you want this to happen ?
You can set up a job. This job will check the dates and move the records to another table at the beginnig of every day.

Create table of differences

I have a sheet of data to which I have run several macros to identify differences, I am now looking to create a matrix table which shows me the differences per column per department.
This is how my sheet looks after it has identified differences:
[DifferencesSheet] http://imgur.com/na6nvNH
And this is what I want to get to:
[FinalSheet] http://imgur.com/i6W60m7
I currently have image 1 which is a table of highlighted differences and I need to create matrix department on the y-axis vs. column headers along the x-axis and the amount of differences per column
Not sure if I can use a pivot table as the data is always changing.
Any advice will help thanks.
Chart will be more suitable as you got 2 manupulations
Use MS Query:
SELECT outTab.Department, SUM(outTab.DepartmentCode), SUM(outTab.Sales)
FROM
(SELECT
S1.Department,
Iif(S1.DepartmentCode=S2.DepartmentCode,0,1) as DepartmentCode,
Iif(S1.Sales=S2.Sales,0,1) as Sales
FROM [Sales$] as S1
INNER JOIN [Compare$] as S2 ON S1.ID = S2.ID) AS outTab
GROUP BY outTab.Department
You need to add the remaining columns above - I added the first two DepartmentCode and Sales. To refresh you will need to only right-click and hit Refresh.
How to create an MS Query in Excel?
Two ways:
Go to Data->From Other Sources->From Microsoft Query
Download my SQL AddIn (free and open sources) [here][4] and just input the output range of the query (F1) and input the SQL and hit Ok

Row count based on a list of tables in Pentaho

I am using an Input Table step to retrieve a list of owners and tables from Oracle's ALL_TABLES, then I want to pass to another step to, for each of this owner.table entries, it performs a SELECT COUNT(*) FROM owner.table.
The final result I want something like:
OWNER - TABLE - COUNT
How could I do that in Pentaho?
Thanks in advance!

How can I change row information in a Query?

I'm using Postgres and I'd like to know how to change row information within a query, Let's say I have a column called Numbers and it's got rows going 1,2,3,4,5 how could I edit the information in those rows? let's say I want the query to display 1,1,1,1,5 how would I write in a query that each row should be changed to 1 unless it's 5? Again it's only to change it within the Query, I'm not trying to do an UPDATE I realize how newbish this is on my part but I couldn't find this on google.
SELECT
CASE WHEN Numbers <> 5 THEN 1 ELSE Numbers END
FROM table
See 9.12. Conditional Expressions