How to get the reorder the column with csv input fixed column in pentaho - pentaho

Scenario:
I have created transformation to load data into table from csv file and I have following columns in csv file:
Customer_Id
Company_Id
Employee_Name
But user may give input file with column ordering (random order) as
Employee_Name
Company_Id
Customer_Id
so, if I try to load file which has random column ordering, will kettle load correct column values as per column names ... ?

Using ETL Metadata Injection you can use a transformation like this, to either normalize the data, or to store it to your database:
Then you just need to send the correct data to that transformation. You can read the header line from the CSV, and use Row Normaliser to convert to the format used by ETL Metadata Injection.
I have included a quick example here: csv_inject on Dropbox, if you make something like this and run it from something that runs it per csv file it should work.

Ooh, thats some nasty javascript!
The way to do this is with metadata injection. Look at the samples, but basically you need a template which reads the file, and writes it back out. you then use another parent transformation to figure out the headings, configure that template and then execute it.
There are samples in the PDI samples folder, and also take a look at the "figuring out file format" example in matt casters blueprints project on github.

You could try something like this as your JavaScript:
//Script here
var seen;
trans_Status = CONTINUE_TRANSFORMATION;
var col_names = ['Customer_Id','Company_Id','Employee_Name'];
var col_pos;
if (!seen) {
// First line
trans_Status = SKIP_TRANSFORMATION;
seen = 1;
col_pos = [-1,-1,-1];
for (var i = 0; i < col_names.length; i++) {
for (var j = 0; j < row.length; j++) {
if (row[j] == col_names[i]) {
col_pos[i] = j;
break;
}
}
if (col_pos[i] === -1) {
writeToLog("e", "Cannot find " + col_names[i]);
trans_Status = ERROR_TRANSFORMATION;
break;
}
}
}
var Customer_Id = row[col_pos[0]];
var Company_Id = row[col_pos[1]];
var Employee_Name = row[col_pos[2]];
Here is the .ktr I tried: csv_reorder.ktr
(edit, here are the test csv files)
1.csv:
Customer_Id,Company_Id,Employee_Name
cust1,comp1,emp1
2.csv:
Employee_Name,Company_Id,Customer_Id
emp2,comp2,cust2

Assuming rejecting the input file is not an option you basically have 4 solutions.
reorder the fields in an external editor (don't use excel if it contains dates)
Use code within your transformation to detect the column headers and reorder the file.
Use metadata injection as proposed by bolav
Create a job. This need to:
a. load the file into a temporary database.
b. use an sql statement to retrieve the fields (use a SELECT with an ORDER By clause)
c. output the file in the correct order

Related

Incomplete data upload to SQL server using perl script

I have a script that reads a two column CSV I generate based on the current library of files I am accessing. I chomp the file into an array, then I upload the CSV element by element to an SQL table using the code below:
my $e_sth = $trans->prepare("delete from $table where 1 = 1");
$e_sth->execute();
my $u_sql = qq{
insert into $table (KTM, SUBSITE) values
(?, ?)
};
my $u_sth = $trans->prepare($u_sql);
foreach my $rec (#list) {
my ($klf, $subsite) = split(",", $rec);
# Upload
$u_sth->execute($klf, $subsite)
}
$trans->commit();
It is always the last line of the file, so I'm sure that will key some of you off right away to the issue. I am lost though. I assumed the foreach loop would prevent something like this from happening. The amount of characters from the last line being truncated varies as well. Both columns are set up as CHAR(100) with the KTM column being the primary key.
Any help or guidance is appreciated. Thanks!
EDIT: It looks like I might be reaching some sort of data limit for the table. I'll look into expanding the allocated size of the table.

Pentaho JsonInput GET fields

I'm trying to use PDI to read data from an API (json) and now I'm simply trying to use json input to get a few specific fields but the get fields button on the input step gives me.
ERROR (version 8.3.0.0-371, build 8.3.0.0-371 from 2019-06-11 11.09.08 by buildguy) : Index 1 out of bounds for length 1
all the steps execute fine, and produce data - just not the json input step doesn't wnat to give me the fields option! - I've tired the text file and json oput and both write valid json so IDK whats going on....
PS. this is my first time using PDI
ISSUE 2:
It looks like PDI uses jayway for its json path parsing so I've been using this site https://jsonpath.herokuapp.com/ jayway selection which gives me my expected path. When I put that into the 'fields' of the json input dialog I only get the FIRST instance of that path value vs it actually parsing the json and giving me every instance, and can't figure out why though I assume it has something to do with PDI's row based view on things but I also don't know how to get it to understand that its json and it should be giving me back all values that match that path.
UPDATE 1:
I've been looking at this https://forums.pentaho.com/threads/135882-Parsing-JSON-data-without-knowing-field-names/ it seems like this Modified Java Script Value step might be the way to go. Will continue testing.
UPDATE 2
OK - Used the MJSV as posted above along with a select fields step and finally able to get the key's
var obj = JSON.parse(mydata);
var keys = Object.keys(obj);
for (var i = 0; i < Object.keys(obj).length; i++) {
var row = createRowCopy(getOutputRowMeta().size());
var idx = getInputRowMeta().size();
row[idx++] = keys[i];
putRow(row);
}
trans_Status = SKIP_TRANSFORMATION;

oxid import old data SQL

I have an old oxid-version. I exported my old seo-data from the table "oxseo" to get the keywords and description for each article. Now i want to import these fields in my new version of the shop. My articles are already there, but not the seodata.
My first idea was to collect all the data i need from a csv-export of my old data.
For example, my output array could look something like:
$article = array();
$keywords = array();
$desc = array();
foreach($line as $l) {
$keywords[$i] = current_keyword
$desc[$i] = current_description
$oxid[$i] = current_oxid
}
So lets just assume I already have my filled array.
If i check the oxid's, they are still the same. So, from my exported CSV, picking a random OXID, looking for it in my new DB shows me the correct article.
Now my first thought was, to look in oxobject2seodata. I know that the data for the articles are stored in there, but i can't find a way to connect those, since the "oxid" from the old version is not the same as the objectId in the new version. In oxarticles, however, there is no "objectId".
Thank you in advance for any hints and tips
The field OXID in oxarticles table should match the field OXOBJECTID in oxobject2seodata table.
SELECT oa.OXID, o2s.* from oxobject2seodata o2s, oxarticles oa WHERE o2s.OXOBJECTID = oa.OXID AND oa.OXID = '[OXID-of-article]';
-- or
SELECT o2s.* from oxobject2seodata o2s WHERE o2s.OXOBJECTID = '[OXID-of-article]';

SSIS filename - file count

I'm currently creating a flat file export for one of our clients, i've managed to get the file in the format they want, i'm trying to get the easiest way of creating a dynamic file name. I've got the date in as a variable and the path ect but they want a count in the file name. For example
File name 1 : TDY_11-02-2013_{1}_T1.txt. The {} being the count. So next weeks file would be TDY_17-02-2013_{2}_T1.txt
I cant see an easy way of doing this!! any idea's??
EDIT:
on my first answer, I thought you meant count of values returned on a query. My bad!
two ways to achieve this, you could loop into the destination folder, select the last file by date, get its value and increase 1, which sound like a lot of trouble. Why not a simple log table on the DB with last execution date and ID and then you compose your file name base on the last row of this table?
where exactly is your problem?
you can make a dynamic file name using expressions:
the count, you can use a "row count" component inside your data flow to assign the result to a variable and use the variable on your expression:
Use Script task and get the number inside the curly braces of the file name and store it in a variable.
Create a variable(FileNo of type int) which stores the number for the file
Pseudo code
string name = string.Empty;
string loction = #"D:\";
/* Get the path from the connection manager like the code below
instead of hard coding like D: above
string flatFileConn =
(string(Dts.Connections["Yourfile"].AcquireConnection(null) as String);
*/
string pattern = string.Empty;
int number = 0;
string pattern = #"{([0-9])}"; // Not sure about the correct regular expression to retrieve the number inside braces
foreach (string s in Directory.GetFiles(loction,"*.txt"))
{
name = Path.GetFileNameWithoutExtension(s);
Match match = Regex.Match(name, pattern );
if (match.Success)
{
dts.Variables["User::FileNo"].Value = int.Parse(match.Value)+1;
}
}
Now once you get the value use it in your file expression in the connection manager
#[User::FilePath] +#[User::FileName]
+"_{"+ (DT_STR,10,1252) #[User::FileNo] + "}T1.txt"

Write extracted data to a file using jmeter

I am using JMeter v2.5.
I need to get data from the responses of the test and extract data from it (which I am doing using regular exp extractor). How do I store this extracted data to a file?
Just solved a similar problem. After getting the data using a regular expression extractor, add a BeanShell PostProcessor element. Use the code below to write the variables to a file:
name = vars.get("name");
email = vars.get("email");
log.info(email); // if you want to log something to jmeter.log file
// Pass true if you want to append to existing file
// If you want to overwrite, then don't pass the second argument
f = new FileOutputStream("/my/file/path/result.csv", true);
p = new PrintStream(f);
this.interpreter.setOut(p);
print(name + "," + email);
f.close();
import org.apache.jmeter.services.FileServer;
String path=FileServer.getFileServer().getBaseDir();
name1= vars.get("user_Name_value");
name2= vars.get("UserId_value");
f = new FileOutputStream("E://csvfile/result.csv", true); //spec-ify true if you want to overwrite file. Keep blank otherwise.
p = new PrintStream(f);
this.interpreter.setOut(p);
p.println(name1+"," +name2);
f.close();
this is worked for me i hope it will work for you also
If you just want to write extracted variables to CSV results file, then just add to user.properties the variables you want:
sample_variables=name,email
As per doc:
https://jmeter.apache.org/usermanual/properties_reference.html#results_file_config
They will be appended as last column of CSV results file.
You have a couple options
You can tally the results by adding an aggregate report listener to your thread group => add listener => aggregate report
You can get raw results by adding a simple data writer listener to your thread group => add listener => simple data writer
Hope this helps
You may use https://jmeter-plugins.org/wiki/FlexibleFileWriter/ with sample variables set up.
Or with fake Dummy Sampler.
Anyway Flexible File Writer is good for writing data into file.