PDI Kettle - How to Normalize Advanced Structure? - pentaho

I have 7 columns of data in a MySQL Database. The Year1 column belongs to the Revenue1 column. The following columns have the same structure. I know how to handle this in SQL, but not in PDI. Can anyone describe how to do it?
mySQL table structure
+--------+-------+-------+-------+----------+----------+----------+
| Ticker | Year1 | Year2 | Year3 | Revenue1 | Revenue2 | Revenue3 |
+--------+-------+-------+-------+----------+----------+----------+
| | | | | | | |
| ABC | 2010 | 2011 | 2012 | 250000 | 500000 | 1000000 |
+--------+-------+-------+-------+----------+----------+----------+
Desired normalized output from PDI:
+------------+------+-----------+---------+
| Ticker | Year | Keyfigure | Value |
+------------+------+-----------+---------+
| | | | |
| ABC | 2010 | Revenue | 250000 |
| | | | |
| ABC | 2011 | Revenue | 500000 |
| | | | |
| ABC | 2012 | Revenue | 1000000 |
+------------+------+-----------+---------+

Have you tried using the row denormaliser?

Related

Outer Join multible tables keeping all rows in common colums

I'm quite new to SQL - hope you can help:
I have several tables that all have 3 columns in common: ObjNo, Date(year-month), Product.
Each table has 1 other column, that represents an economic value (sales, count, netsales, plan ..)
I need to join all tables on the 3 common columns giving. The outcome must have one row for each existing combination of the 3 common columns. Not every combination exists in every table.
If I do full outer joins, I get ObjNo, Date, etc. for each table, but only need them once.
How can I achieve this?
+--------------+-------+--------+---------+-----------+
| tblCount | | | | |
+--------------+-------+--------+---------+-----------+
| | ObjNo | Date | Product | count |
| | 1 | 201601 | Snacks | 22 |
| | 2 | 201602 | Coffee | 23 |
| | 4 | 201605 | Tea | 30 |
| | | | | |
| tblSalesPlan | | | | |
| | ObjNo | Date | Product | salesplan |
| | 1 | 201601 | Beer | 2000 |
| | 2 | 201602 | Sancks | 2000 |
| | 5 | 201605 | Tea | 2000 |
| | | | | |
| | | | | |
| tblSales | | | | |
| | ObjNo | Date | Product | Sales |
| | 1 | 201601 | Beer | 1000 |
| | 2 | 201602 | Coffee | 2000 |
| | 3 | 201603 | Tea | 3000 |
+--------------+-------+--------+---------+-----------+
Thx
Devon
It sounds like you're using SELECT * FROM... which is giving you every field from every table. You probably only want to get the values from one table, so you should be explicit about which fields you want to include in the results.
If you're not sure which table is going to have a record for each case (i.e. there is not guaranteed to be a record in any particular table) you can use the COALESCE function to get the first non-null value in each case.
SELECT COALESCE(tbl1.ObjNo, tbl2.ObjNo, tbl3.ObjNo) AS ObjNo, ....
tbl1.Sales, tbl2.Count, tbl3.Netsales

MSAccess Query: Generate two fields

I have a MS Access view generating this result:
+-------+------------+-------+---------+--------+-------+
| Id | Date | Kind | Initial | Final | Total |
+-------+------------+-------+---------+--------+-------+
| 334AB | 01/04/2017 | Red | 199725 | 199789 | 64 |
| 334AB | 01/04/2017 | Green | 199789 | 199799 | 10 |
| 107AE | 01/04/2017 | Red | 73978 | 74074 | 96 |
| 107AE | 02/04/2017 | Green | 74074 | 74248 | 174 |
+-------+------------+-------+---------+--------+-------+
Generated with:
Group by ID, Date and Kind
Initial: Min(startKm)
Final: Max(endKm)
Total: Sum(Distance)
This is the query:
SELECT street.Id, street.Date, IIf(IsNull([agev]), Kind, Min(street.Initial) AS Iniziali, Max(street.Final) AS Finali, Sum(street.Distance) AS Total
FROM street
GROUP BY street.Id, street.Date, Kind
ORDER BY street.Date;
What I need is this result:
+-------+------------+---------+--------+----------+------------+-------+
| Id | Date | Initial | Final | TotalRed | TotalGreen | Total |
+-------+------------+---------+--------+----------+------------+-------+
| 334AB | 01/04/2017 | 199725 | 199799 | 64 | 10 | 74 |
| 107AE | 01/04/2017 | 73978 | 74074 | 96 | 0 | 96 |
| 107AE | 02/04/2017 | 74074 | 74248 | 0 | 174 | 174 |
+-------+------------+---------+--------+----------+------------+-------+
Where Initial is the lowest "initial" km in that day by that id
and Final is the higher "Final" km in that day by that id
What do you suggest?
thanks
should work out like this:
SELECT street.Id
,street.Date
,Min(street.Initial) AS Iniziali
,Max(street.Final) AS Finali
,SUM(IIF(street.Kind = 'Red',street.Distance,0)) AS TotalRed
,SUM(IIF(street.Kind = 'Green',street.Distance,0)) AS TotalGreen
,Sum(street.Distance) AS Total
FROM street
GROUP BY street.Id
,street.Date
ORDER BY street.Date;

how to remove 'Total' from excel while check ssas cube data

the rank is also using Total value of the column. how to remove that.
it's a calculation.
+---------------------+------------+------+
| Row Labels | Change | Rank |
+---------------------+------------+------+
| ADDERALL XR | 20,236.00 | 7 |
| ATOMOXETINE | 11,448.00 | 9 |
| BIPHENTIN | 87,007.00 | 4 |
| CONCERTA | 151,397.00 | 3 |
| CONCERTA | | 11 |
| DEXEDRINE | 2,065.00 | 10 |
| GENERIC ATOMOXETINE | 17,778.00 | 8 |
| INTUNIV XR | | 12 |
| METHYLPHENIDATE ER | 21,969.00 | 6 |
| METHYPHENIDATE ER | | 13 |
| RITALIN IR | 40,826.00 | 5 |
| RITALIN SR | -19,238.00 | 14 |
| STRATTERA | -19,555.00 | 15 |
| VYVANSE | 220,762.00 | 2 |
| Grand Total | 534,695.00 | 1 |
+---------------------+------------+------+
In the excel sheet, right click on the area where cube data is populated, Click on the Pivot table options, goto Totals and Filters tab, and uncheck grand totals for columns options, and your totals will go off. It will remove totals for all columns, and you will have to sum those columns where you need totals by explicitly writing formulas for those columns.

How to get table name when using TABLE_DATE_RANGE

I would like to get daily statistics using TABLE_DATE_RANGE like this:
Select count(*), tableName
FROM
(TABLE_DATE_RANGE(appengine_logs.appengine_googleapis_com_request_log_,
DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY'), CURRENT_TIMESTAMP()))
group by tableName
Is there any way to get a table name when using TABLE_DATE_RANGE?
You need to query your dataset with a metadata query.
SELECT * FROM publicdata:samples.__TABLES__
WHERE MSEC_TO_TIMESTAMP(creation_time) < DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')
this returns
+-----+------------+------------+-----------------+---------------+--------------------+-----------+--------------+------+---+
| Row | project_id | dataset_id | table_id | creation_time | last_modified_time | row_count | size_bytes | type | |
+-----+------------+------------+-----------------+---------------+--------------------+-----------+--------------+------+---+
| 1 | publicdata | samples | github_nested | 1348782587310 | 1348782587310 | 2541639 | 1694950811 | 1 | |
| 2 | publicdata | samples | github_timeline | 1335915950690 | 1335915950690 | 6219749 | 3801936185 | 1 | |
| 3 | publicdata | samples | gsod | 1335916040125 | 1413937987846 | 114420316 | 17290009238 | 1 | |
| 4 | publicdata | samples | natality | 1335916045005 | 1413925598038 | 137826763 | 23562717384 | 1 | |
| 5 | publicdata | samples | shakespeare | 1335916045099 | 1413926827257 | 164656 | 6432064 | 1 | |
| 6 | publicdata | samples | trigrams | 1335916127449 | 1335916127449 | 68051509 | 277168458677 | 1 | |
| 7 | publicdata | samples | wikipedia | 1335916132870 | 1423520879902 | 313797035 | 38324173849 | 1 | |
+-----+------------+------------+-----------------+---------------+--------------------+-----------+--------------+------+---+
You can add in the WHERE clauses to restrict to tables similar to
WHERE table_id contains "wiki"
or regexp like WHERE REGEXP_MATCH(table_id, r"^foo[\d]{3,5}")

Hiding inside group columns from other columns that don't have values

I'm working on a report. How do I get columns from the outside that are displaying dates to be next to a column inside the matrix that is displaying values.
For example it is setup like this:
| HiredDt | TermDt | [Type] | LicDt | MedDt |
---------------------------------------------------------------------------------
ID | [HiredDt] | [TermDt] | SUM([Count_of_Type]) | [LicDt] | [MedDt] |
---------------------------------------------------------------------------------
And looks like this:
| HiredDt | TermDt | Lic | Med | App | LicDt | MedDt |
----------------------------------------------------------------------------------------
1 | 1/31/12 | 1/31/14 | 1 | 1 | 12 | 6/1/15 | 9/1/14 |
2 | 2/19/12 | 9/18/14 | 1 | 1 | 12 | 3/2/15 | 9/1/14 |
But when I use inside grouping to match up the date next to the associated document type I get:
| HiredDt | TermDt | Lic | | | Med | | | App | | |
----------------------------------------------------------------------------------------------------------------------------
1 | 1/31/12 | 1/31/14 | 1 | 6/1/15 | | 1 | | 9/1/2014 | 12 | | |
2 | 2/19/12 | 9/18/14 | 1 | 3/2/15 | | 1 | | 9/1/2014 | 12 | | |
What I'm trying to get this:
| HiredDt | TermDt | Lic | LicDt | Med | MedDt | App |
--------------------------------------------------------------------------------------
1 | 1/31/12 | 1/31/14 | 1 | 6/1/15 | 1 | 9/1/14 | 12 |
2 | 2/19/12 | 9/18/14 | 1 | 3/2/15 | 1 | 9/1/14 | 12 |
Is this possible?
I would right-click on the cell you have labelled SUM([Count_of_Type]) and choose Insert Column - Inside Group - Right.
In that new cell I would set the expression to: = Max ( [LicDt] )