I have the following PDI transformation:
Table input --> Select values --> Microsoft Excel writer
This transformation fetches previous week data from database and I'd like to save this excel file where filename contains week number. How to insert previous week (2021W40) into filename for Microsoft Excel writer?
I think the best solution would be to pass the previous week as a parameter to the transformation, this parameter is calculated in a previous transformation with whatever method you prefer:
You can use the table input to use SQL to calculate the previous week, or use the Modified Java Script Value step to use date functions, whichever method suits you best, and then use the Set Variable step to fix the parameter value, afterwards you just pass that value to the following transformation and use it when you fill the file name.
Related
I am working on Excel file to load the data into sql table using SSIS VS2013.
How do I extract the column names into a temptable?
In the image attached,there is the "2017 Marketing Sales - Actual" column in the first row, I want to extract the YYYY from the column name and pass that value to a variable and update the table field with YYYY info.
Can you anyone help me how to do this?
In your Excel Connection Manager, uncheck the "First row has column names" check box. This should allow you to access that first row. You'll need to setup a Data Flow Task using this Excel Connection Manager, followed by a derived column to extract the left 4 characters from that 1st row. Unless you somehow limit the rest of the Excel data source, you'll probably also get a lot of extraneous rows importing into your destination. Might need to do some clean up to get it down to just that year.
I am creating a transformation in pentaho di to extract data from google analytics. I need to set in "Query Definition" the start date and end date as yesterday and today. I understand this can be done by create two varialbes e.g. ${todsy},${yesterday}. However, I don't know how to set these to change values dynamically at every run. ANy idea on how to do this?
Thanks,
I can think of an easy way to do this. The first thing is that you can't declare and use the variables in the same transformation. I would suggest you to approach this problem in the following way:
Create a transformation before this one, say "set variables transformation". In this transformation you will set the variables.
You can use Get System Info step to set today's and yesterday's dates as the variables. Use copy rows to result step to pass these rows to the next transformation.
In the next transformation, which will be the one you have attached the screenshot of, use the Get Variables step and use these variables in your input step. Or you can use Get rows from result step as well.
You don't need to worry about the dates now, because dates will be generated and your variables get the values dynamically.
You can check this article if you want to learn more about how to pass the values from one transformation to another:
https://anotherreeshu.wordpress.com/2014/12/23/using-copy-rows-to-result-in-pentaho-data-integration/
Hope it helps!
for that, you have to use a job, add the first transformation and inside it use
get system info step then add today's and yesterday's date as a variable, and link to the set variable step. Set the scope of variable as parent job,
in second job use **get variables **.
It took me a while to solve this myself and the way I ended up doing it is as follows:
I created a transformation (called 'set formatted_today variable') the transformation contains two objects:
The transformation contained a 'table input' object with a query like:
select to_char(current_timestamp, 'YYYY-MM-DD-HH-MI') as formatted_today
The output of my 'table input' goes to a 'set variables' object, you can use the 'get fields button to wire the fields you've named in your query to the variable you want to set. In this case, my field is called 'formatted_today' and so is my variable.
In my main job, I have a 'set session variables' object that creates my 'formatted_today' variable.
Immediately after it, I call my 'set formatted_today variable' transformation
Anywhere I need this variable I insert ${formatted_today} in the text
I have just started learning pentaho spoon steps and have one problem with solving one problem. I need to transform the data from xls-file and convert it do database. The problem is that my input file looks like this: table-description
And I can not find how to solve two problems:
For my next step I need to save not only the table itself (Range A8:D11), but also the date (cell A5). When I am trying to do it in pentaho with Microsoft Excel Input – Step it works only when I select A8-cell as a start row, but the date is not saved.
In Microsoft Excel Input – Step I must always select a start row in order to generate a table and use it in next steps. And I must do it manually, I mean to say that my table starts from A8-cell. In my case I can not always say for sure that the table starts from A8-cell. I know, that the start-cell is that cell, which is in A-Column and has value = “Date”. Microsoft Excel Input – Step will be first step in my kettle because I must get data and change them. That is why I think I can not use before Java Script.
I have not found the solution to these two problems and I do not know if it is possible to make it. I will be grateful for any help.
I am not sure what do you mean by converting an excel file to database but If you can convert the xls into csv and read that file then you know from which row you need to filter the data. Basically you can use a simple filter step to filter the data when it matches column name. I hope this will help.
Use two Microsoft Excel Input steps. One step reads the table (A8:D11). The other step reads the date (A5). Then merge the two streams, for example using a Join Rows (cartesian product) step
Read everything. Then use a Javascript step with two script tabs. For one of the tabs: Right-click and choose Set start script. Code : var start = 0; The other tab should be kept as a transformation script. Pseudocode: if(FieldA equals "Date") {start = 1;}. Now you will have an additional field in the stream called start. If start equals 0, then you know that your tabular data hasn't started yet, and you can filter out the row.
I am currently entering data into a SQL Server database using SSIS. The plan is for it to do this each week but the day that it happens may differ depending on when the data will be pushed through.
I use SSIS to grab data from an Excel worksheet and enter each row into the database (about 150 rows per week). The only common denominator is the date between all the rows. I want to add a date to each of the rows on the day that it gets pushed through. Because the push date may differ I can't use the current date I want to use a week from the previous date entered for that row.
But because there are about 150 rows I don't know how to achieve this. It would be nice if I could set this up in SQL Server where every time a new set of rows are entered it adds 7 days from the previous set of rows. But I would also be happy to do this in SSIS.
Does anyone have any clue how to achieve this? Alternatively, I don't mind doing this in C# either.
Here's one way to do what you want:
Create a column for tracking the data entry date in your target table.
Add an Execute SQL Task before the Data Flow Task. This task will retrieve the latest data entry date + 7 days. The query should be something like:
select dateadd(day,7,max(trackdate)) from targettable
Assign the SQL result to a package variable.
Add a Derived Column Transformation between your Source and Destination components in the Data Flow Task. Create a dummy column to hold the tracking date and assign the variable to it.
When you map the Excel to table in a Data Flow task, map the dummy column created earlier to the tracking date column. Now when you write the data to DB, your tracking column will have the desired date.
Derived Column Transformation
I am using Kettle Spoon for transformation.
How do I give fixed input date from 'Get System Info'? I see options of selecting yesterday, month ago etc. But I want to select fixed date manually such as: '2012-12-14'
I got an csv, 'text file output' from transformation. The outputs are for say A, B, C, D, E. I want to filter and get only A, B, D, E.
How do I filter from 'text file output' and select only desired columns to get my data into final table.
Thank you in advance.
1) use a select value step right after the "Get system info". In the Meta-data tab choose the field, use type Date and choose the desired format mask (yyyy-MM-dd).
2) if you need filtering columns, i.e. dropping some columns from output, again use select value step; if you need filtering rows based on the values contained in a field/column, then use filter rows step.
I'm guessing you want to add hard coded dates, rather than reformat existing dates. If that's the case, just use an Add Constants step. Set the column type to Date. If you need to do it as a source step you can use a Data Grid or Generate Rows step.
If you want to remove columns from a text file output, you can use a Select Values step as #andtorg said, but you can also simply remove the columns from the Fields tab of the Text File Output step.
Use Modified Javascript Value
Add Modified Javascript Value in PDI and Choose whatever format u want .