Pentaho: how to use metedata injection in text file input step? - pentaho

I am new in pentaho and need some more information from experts. I have a transformation, where I need to read a file and then change the data from it. Problem is that the number of colums in the file can change every time (it can be in this month 10 colums in file, in next 12). I have read that I must use a „Metadata Injection“ step for such cases, but I can not find a good example for it. Can someone explain or show simple example how this Metadata injection generally works?
Thank you for your help.

Related

Creation of External Tables - Automation in Oracle

I was wondering if you help me with the following question and give me some guidelines. Before telling me that it is more of a google question, I would like to say that I am new to this topic which makes googling a bit hard.
Question:
I was wondering if it is possible to automate the creation of an external table by reading the first row of a txt file? I was reading something about UTL_FILE however I am not sure if it is correct approach. I am interested into this as sometimes I have txt files with 50+ columns and the manual creation of such an external table would take pretty long.
Any help or guidelines would be appreciated!
Thank you in advance!
You could create an external table that reads the whole line in one long VARCHAR2. Then SELECT from this table, read the first line, and you get a string with the column names. Stick this string inside a 'CREATE ...' statement and execute it.

Clean unstructured place name to a structured format

I have around 300k unstructured data as below screen.I'm trying to use Google refine or OpenRefine to make this correct. However, I'm unable to find a proper way to do this. I'm new to this tool. Anyone's help would be greatly appreciated.Also, this tool is quite slow to process 300k records. If I am trying out something its taking lots of time to process and give an output.
OR Please suggest any other opensource tools and techniques do this?
As Owen said in comments, your question is probably too broad and cannot receive acceptable answer. We can just provide you with a general procedure to follow.
In Open Refine, you'll need to create a column based on the messy column and apply transformations to delete unwanted characters. You'll have to use regular expressions. But for that, it's necessary to be able to identify patterns. It's not clear to me why the "ST" of "Nat.secu ST." is important, but not the "US" in "Massy Intertech US". Not even the "36" in "Plowk 36" (Google doesn't know this word, so I'm not sure is an organisation name).
On the basis of your fifteen lines, however, we seem to distinguish some clear patterns. For example, it looks like you'll have to remove the tokens (character suites without spaces) at the end of the string that contain a #. For that, the GREL formula in Open Refine could look like this:
value.trim().replace(/\b\w+#\w+\b$/,'')
Here is a screencast if it's not clear to you.
But sometimes a company name may contain a #, in which case you will need to create more complex rules. For example, remove the token only if the string contains more than two words.
if(value.split(' ').length() > 2, value.replace(/\b\w+#\w+\b$/, ''), value)
And so on for the other patterns that you'll find (for example, any number sequence at the end that contains more than 4 numbers and one - between them)
Feel free to check out the Open Refine documentation in case of doubt.

.Net Parsing Fixed Width Data... From a Concatenated, Single, Fixed-Width Column

I was bored and looking at old code that runs like molasses on a cold day. I found that a group of tables in our accounting system - each with 500,000 records of ~20 datapoints - that use a single column of concatenated, fixed-width values instead of separate columns. (Fixing the tables isn't an option.) An old .net ETL project is grabbing all records, doing a bunch of substrings on each record to set an object's corresponding attributes, then sending the object to merge with production data via a stored proc.
The way it is working is fine. It works. And, to be perfectly honest, I doubt I'll be given the go-ahead to fix it even if I come up with a better solution, but I was curious to see if anyone knew of a better way of doing this, because it's not entirely unlikely that I'll face a situation like this in the future.
I was thinking that if there was a way to use the TextFieldParser to parse a static string instead of a file/stream that might be a valid idea. Or, instead, I could write the entire table to a text file and then use the TextFieldParser to send data to the SProc. http://www.dotnetperls.com/textfieldparser does show that TextFieldParser is quite a bit faster than split, which I would assume is tantamount to the string manipulation our project is currently doing with substring. So there may be something to that idea.
Or perhaps the whole, old project should be dumped for a shiny new SSIS project. Would it also have to write the records to a flat file before importing into SQL? Or can it import directly from the table?
Thank you in advance!

Solr 5.3 implementation processes docs but doesn't return results

I have recently set up a local instance of Solr 5.3 in an effort to get it going for my company. As an initial test case I've set up a Data Import Handler (DIH) that returns PDFs stored within a file directory. When I execute the full import in the admin tool, the DIH processes all the files within the directory, and I'm able to run a general query (*:*) which returns all indexed fields for every record in the index.
When I switch to a specific query using a word definitely contained within the files, however, Solr returns no results. What connection am I not making here?
I can provide excerpts from the schema, solrconfig, and custom data config if needed, but I don't want to oversaturate this post.
The answer I came up with involved a simple newbie mistake combined with something I wasn't anticipating.
1) First, I didn't have my field set to indexed="true". I set that. Yeesh, it stinks being new to this!
2) I needed to make a change to solrconfig.xml for the core in question. Thanks to this article, I was able to determine that I needed to add a default field in the /select requestHandler. Uncommenting the relevant line in solrconfig and changing the field name did the trick-- I no longer need to supply the name in df to return results.
My carryover question for anyone coming across this question in the future is whether this latter point is the proper way to go about using default fields. I see in schema.xml that is deprecated (or heading that direction) in 5.3.0. So is it alright to define df in solrconfig instead?

How to execute a sql query in jscript/jscript.NET

First at all sorry for my English, this is not my native language. So.
I want to execute a SQL query in a script to get some data. I don't know if it's possible and if so, how to make it. To summarize :
The script add a button in M3 Smart Office (a ERP). I already done that.
When i select a row in a M3 function (like an article, or a client) i want to take and send his ID (and some other data) to a website.
They're is a lot of function in M3. In each function, they're are some field who contains a data. One of them contain the ID of the object (An article, a client,...). What i want to do, is to get this ID. The problem is that the field who contains the ID doesn't have the same name in all the function. So, i have two solutions :
Do a lot of if/elseif. Like "if it's such function, take such field". But if I (or somebody else) want to add a combination function/field later i (or somebody else ;) )need to do that in the script. It's not practical.
Create a sql table wich contain all the combination function/field. Then is the script, i do a sql query and i get all the data that the script need.
So here the situation. Maybe you have ideas to do that otherwise (without sql) and i take it !
Please see this in depth tutorial from the 4guysfromrolla site:
Server-Side JScript Objects