Im new to the pentaho suite and its automation functionality. i have files that come in on a daily basis and two columns need to be put in place. I have figured out how to add the columns but now i am stuck on the automation side of things. The filename is constant but it has a datestamp at the end. EG: LEAVER_REPORT_NEW_20110623.csv. The file will always be in the same directory. How do i go about using Pentaho data integration to solve this issue? ive tried get files but that doesnt seem to work.
create a variable in a previous transform which contains 20110623 (easy with a get system info step to get the date, and then a select values step to format to string, then a set variables step)
then change the filename of the textfile input to use:
LEAVER_REPORT_NEW_${variablename}.csv
Related
The column contains dates in the following format:
dd/mm/yyyy
I've tried DATE, TIMESTAMP etc but whatever I do, I can't seem to upload the file.
In the classic UI you can click on the Load Tablebutton and follow the dialogs to upload your file. It is a bit hard to find. Click on Databases right to the big snowflake icon. Then select a database and a table. Then you should the the button. In the wizard there will be a step for defining the 'File Format'. There, you have to scroll down to define the date format. (see Classic snowflake UI)
Without the classic UI you have to install SnowSQL on your device first (https://docs.snowflake.com/en/user-guide/snowsql-install-config.html).
Start SnowSQL and apply the following steps:
Use the database where to upload the file to. You need various privileges for creating a stage, a fileformat, and a table. E.g. USE TEST_DB
Create the fileformat you want to use for uploading your csv file. E.g.
CREATE FILE FORMAT "TEST_DB"."PUBLIC".MY_FILE_FORMAT TYPE = 'CSV' DATE_FORMAT = 'dd/mm/yyyy';
Create a stage using this fileformat
CREATE STAGE MY_STAGE file_format = "TEST_DB"."PUBLIC".MY_FILE_FORMAT;
Now you can put your file to this stage
PUT file://<file_path>/file.csv #MY_STAGE;
You can check the upload with
SELECT d.$1, ..., d.$N FROM #MY_STAGE/file.csv d;
Then, create your table.
Copy the content from your stage to your table. If you want to transform your data at this point, you have to use an inner select. If not then the following command is enough.
COPY INTO mycsvtable from #MY_STAGE/file.csv;
You can find documentation for configuring the fileupload at https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html
You can find documentation for configuring the stage at https://docs.snowflake.com/en/sql-reference/sql/create-stage.html
You can find documentation for copying the staged file into a table https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
I recommend that you upload your file with disabled automatically date detection OR that your initial table has a string column instead of a date column. IMHO, it is easier to transform your upload afterwards using the try_to_date function. Using this function it is much easier to handle possible parsing errors.
e.g. SELECT try_to_date(date_column, 'dd/mm/yyyy') as MY_DATE, IFNULL(MY_DATE, date_column) as MY_DATE_NOT_PARSABLE FROM upload;
You see that it is pretty much to do for loading a simple CSV file to Snowflake. It becomes even more complicated when you take into account that every step can cause some specific failures and that your file might contain erroneous lines. This is why my team and I are working at Datameer to make these types of tasks easier. We aim for a simple drag and drop solution that does most of the work for you. We would be happy if you would try it out here: https://www.datameer.com/upload-csv-to-snowflake/
I'm trying to find a specific string on my database. I'm currently using FlameRobin to open the FDB file, but this software doesn't seems to have a properly feature for this task.
I tried the following SQL query but i didn't work:
SELECT
*
FROM
*
WHERE
* LIKE '126278'
After all, what is the best solution to do that? Thanks in advance.
You can't do such thing. But you can convert your FDB file to a text file like CSV so you can search for your string in all the tables/files at the same time.
1. Download a database converter
First step you need a software to convert you databse file. I recommend using Full Convert to do it. Just get the free trial and download it. It is really easy to use and it will export each table in a different CSV file.
2. Find your string in multiple files at the same time
For that task you can use the Find in files feature of Notepad++ to search the string in all CSV files located at the same folder.
3. Open the desired table on FlameRobin
When Notepad++ highlight the string, it shows in what file it is located and the number of the line. Full Convert saves each CSV with the same name as the original table, so you can find it easily whatever database manager software you are using.
Here is Firebird documentation: https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25.html
You need to read about
Stored Procedures of "selectable" kind,
execute statement command, including for execute statement variant
system tables, having "relation" in names.
Then in your SP you do enumerate all the tables, then you do enumerate all the columns in those tables, then for every of them you run a usual
select 'tablename', 'columnname', columnname
from tablename
where columnname containing '12345'
over every field of every table.
But practically speaking, it most probably would be better to avoid SQL commands and just to extract ALL the database into a long SQL script and open that script in Notepad (or any other text editor) and there search for the string you need.
I have a transformation that is successfully writing the first row to the log file.
However the same transformation is not writing the first row to a text file.
The text file remains blank.
Does anyone know why this may be?
edited - only focusing on the applications to run and set pm variable transformations, as the other transformations are replications of set pm variable but for different fields
It looks like your Set Variables step is distributing its rows over the two follow-up steps in a round-robin way, which is the default setting in PDI.
Right-click the Set Variables step and under Data Movement, select Copy. That will send all rows to BOTH steps. You should see a documents icon on the hops then.
I'm trying to use oracle external tables to load flat files into a database but I'm having a bit of an issue with the location clause. The files we receive are appended with several pieces of information including the date so I was hoping to use wildcards in the location clause but it doesn't look like I'm able to.
I think I'm right in assuming I'm unable to use wildcards, does anyone have a suggestion on how I can accomplish this without writing large amounts of code per external table?
Current thoughts:
The only way I can think of doing it at the moment is to have a shell watcher script and parameter table. User can specify: input directory, file mask, external table etc. Then when a file is found in the directory, the shell script generates a list of files found with the file mask. For each file found issue a alter table command to change the location on the given external table to that file and launch the rest of the pl/sql associated with that file. This can be repeated for each file found with the file mask. I guess the benefit to this is I could also add the date to the end of the log and bad files after each run.
I'll post the solution I went with in the end which appears to be the only way.
I have a file watcher than looks for files in a given input dir with a certain file mask. The lookup table also includes the name of the external table. I then simply issue an alter table on the external table with the list of new file names.
For me this wasn't much of an issue as I'm already using shell for most of the file watching and file manipulation. Hopefully this saves someone searching for ages for a solution.
I have files abc.xlsx, 1234.xlsx, and xyz.xlsx in some folder. My requirement is to develop a transformation where the Microsoft Excel Input in PDI (Pentaho Data Integration) should only pick the file based on the output of a sql query. If the output query is abc.xlsx. Microsoft Excel Input should pick of abc.xlsx for further processing. How do I achieve this? Would really appreciate your help. Thanks.
Transformations in Kettle run asynchronously, so you're probably looking into needing a job for this.
Files to create
Create a transformation that performs the SQL query you're looking for and populates a variable based on the result
Create a transformation that pulls data from the Excel file, using the variable populated as the filename
Create a job that executes the first transformation, then steps into the second transformation
Jobs run sequentially, so it will execute the first transformation, perform the query, get the result, and set a variable. Variables need to be set and retrieved in different transformations because of their asynchronous nature. This is the reason for the second transformation; the job won't step into the second transformation until the first one is done running (therefore, not until the variable is populated).
This is all assuming you only want to run the transformation once, expecting a single result from the query. If you want to loop it, pulling data from a set, then setup is a little bit different.
The Excel input step has a "accept filenames from previous step" option. You can have a table input build the full path of the file you want to read (or you somehow build it later knowing the base dir and the short filename), pass the filename to the excel input, tick that box and specify the step and the field you want to use for the filename.