Using two different tables in a JavaScript step

Using two different tables in a JavaScript step - pentaho

I have 2 different tables (2 fields and 3 fields) from the same access file and I want to use them in a JavaScript Step but send me a hop error
I use join steps but one multiply the rows and the others ask me for a foreign key.
I would like to use the data from the 2 different tables in the javaScript step.
enter image description here

It is totally possible for to hops to target the same step (a Javascript in your case), and you do not have to do any thing special to "union" flows.
Except that the two flows must be strictly similar : columns in the same order with the same name and the same type. You can use two Select value steps for that.
Something you hate when you develop and love when you maintain.

Related

Access Unmatched or similar query where a column does not contain or is not like another column

I want to design a query that basically does a mass amount of "Not Like "*x*", except all of the things I would not like the query to contain are in another column.
I know I can do this one at a time by just using the criteria and specifying "Not like "*x*", but I have no idea how to do a not like for a whole column of data.
So, the long version is that I have a bunch of cameras hosted on several different severs on a corporate network. Each of these cameras are on the same subnet and everything but the last octet of the IP address matches the server. Now, I have already created a field in a query that trims off the last octet of my IP, so I now basically have a pre-made IP range of where the cameras could possibly be. However, I do not have an exact inventory of each of the cameras - and there's not really a fast way to do this.
I have a list of issues that I'm working on and I've noticed some of the cameras coming up in the list of issues (basically a table that includes a bunch of IP addresses). I'd like to remove all possible instances of the cameras from appearing in the report.
I've seen designs where people have been able to compare like columns, but I want to do the opposite. I want to generate a query where it does not contain anything like what's in the camera column.
For the sake of this, I'll call the query where I have the camera ranges Camera Ranges and the field Camera Range.
Is there a way I can accomplish this?
I'm open to designing a query or even changing up my table to make it easier to do the query.

Similar to the answer I provided here, rather than using a negative selection in which you are testing whether the value held by a record is not like any record in another dataset, the easier approach is to match those which are like the dataset and return those records with no match.
To accomplish this, you can use a left join coupled with an is null condition in the where clause, for example:
select
MainData.*
from
MainData left join ExclusionData on
MainData.TargetField like ExclusionData.Pattern
where
ExclusionData.Pattern is null
Or, if the pattern field does not already contain wildcard operators:
select
MainData.*
from
MainData left join ExclusionData on
MainData.TargetField like '*' & ExclusionData.Pattern & '*'
where
ExclusionData.Pattern is null
Note that MS Access will not be able to represent such calculated joins in the Query Designer, but the JET database engine used by MS Access will still be able to interpret and execute the valid SQL.

openrefine, cluster and edit two datasets

i have two datasets. Column A has ids from dataset one, column B, has the data i need to cluster and edit, using the various available algorithms. Dataset 2, has again in the first column, the ids, and in the next column, the data. I need to reconcile, data only from dataset one, against data from the second dataset. What i have done so far, is use one dataset, merge the two, but then openrefine, gives me mixed results, ie messy data that exist only in dataset two, which is not what i want, in the current phase.
I have also investigated Reconcile-csv, but without success, in achieving desired result. Any ideas?

An alternative approach to using the reconciliation approach described by Ettore is to use algorithms similar to the 'key collision' clustering algorithms to create shared keys between the two data sets and then use this to do lookups between the data sets using the 'cross' function.
As an example for Column B in each data set you could 'Add column based on this column' using the GREL:
value.fingerprint()
This creates the same key as is used by the "Fingerprint" clustering method. Lets call the new column 'Column C'
You can then look up between the two projects using the following GREL in Dataset 2:
cells["Column C"].cross("Dataset 1","Column C")
If the values in both Dataset 1 and Dataset 2 would have clustered based on the fingerprint cluster then the lookup between the projects will work
You can also use the phonetic keying algorithms to create match keys in Column C if that works better. What you can't do using this method (as far as I know) is the equivalent of the Nearest Neighbour matching - you'd have to have a reconciliation service with fuzzy matching of some kind, or merge the two data sets, to achieve this.
Owen

Reconcile-CSV is a very good tool, but not very user friendly. You can use as an alternative the free Excel plugin Fuzzy Lookup Add-In for Excel. It's very easy to use, as evidenced by this screencast. One constraint: the two tables to be reconciled must be in Excel table format (select and CTRL + L).
And here is the same procedure with reconcile-csv (the GREL formula used is cell.recon.best.name and comes from here)

Difference between "Preview" and Query in BigQuery

I have the following table schema:
+-----+---------+----------+
+ chn | INTEGER | NULLABLE |
+-----+---------+----------|
+ size| STRING | NULLABLE |
+-----+---------+----------|
+ char| REPEATED| NULLABLE |
+-----+---------+----------|
+ ped | INTEGER | NULLABLE |
+-----+---------+----------
When I click on 'preview' in the Google BigQuery Web UI, I get the following result:
But when I query my table, I get this result:
It seems like "preview" is interpreting my repeated field as an array, I would want to get the same result in a query to limit the number of rows.
I did try to uncheck "Use Legacy SQL" which gave me the same result but the problem is that with my table, a same query takes ~1.0 sec to execute with "Use Legacy SQL" checked and ~12 seconds when it's unchecked.
I am looking for speed here so unfortunately, not using Legacy SQL is not an option...
Is there another way to render my repeated field like it does in the "preview" ?
Thanks for the help :)

In legacy SQL, BigQuery flattens the result of queries by default. This means two things:
All child fields of RECORD fields are propagated to the top-level, with their names changed from record.subrecord.leaf to record_subrecord_leaf. Parent records are removed from the schema.
All repeated fields are converted to fields of optional mode, with each repeated value expanded into its own row. (As a side note, this step is very similar to the FLATTEN function exposed in legacy SQL.)
What you see here is a product of #2. Each repeated value is becoming its own row (as you can see by the row count on the left-hand side in your two images) and the values from the other columns are, well, repeated for each new row.
You can prevent this behavior and receive "unflattened results" in a couple ways.
Using standard SQL, as you note in your original question. All standard SQL queries return unflattened results.
While using legacy SQL, setting the flattenResults parameter to false. This requires also specifying a destination table and setting allowLargeResults to false. These can be found in the Show Options panel beneath the query editor if you want to set them within the UI. Mikhail has some good suggestions for managing the temporary-ness of destination tables if you aren't interested in keeping them around.
I should note that there are a number of corner cases with legacy SQL with flattenResults set to false which might trip you up if you start writing more complex queries. A prominent example is that you can't output more than one independently repeated field in query results using legacy SQL, but you can output multiple with standard SQL. These issues are unlikely to be resolved in legacy SQL, and going forward we're suggesting people use standard SQL when they run into them.
If you could provide more details about your much slower query using standard SQL (e.g. job ID in legacy SQL, job ID in standard SQL, for comparison), I, and the rest of the BigQuery team, would be very interested in investigating further.

Is there another way to render my repeated field like it does in the
"preview" ?
To see original not-flattened output in Web UI for Legacy SQL, i used to set respective options (click Show Options) to actually write output to table with checked Allow Large Results and unchecked Flatten Results.
This actually not only saves result into table but also shows result in the same way as preview does (because it is actually preview of that table). To make sure that table gets removed afterwards - i have "dedicated" dataset (temp) with default expiration set to 1 day (or hour - depends on how aggressive you want to be with your junk), so you don't need to worry of that table(s) - it will get deleted automatically for you. Wanted to note: this was quite a common pattern for us to deal with and having to do extra settings was boring, so we ended up with our own custom UI that does all this for user automatically

What you see is called Flatten.
By default the UI flattens the query output, there is currently no option to show query results like you want. In order to produce unflatten results you must write to a table, but that's different thing.

How to handle to input stream in Pentaho with script steps?

How many different kind of steps in Pentaho can accept more than one input stream, such as "Merge Join", "Stream Look up"?
What's the typical user scenario of them?
Any script related steps can accept more than one stream as input, like javascript or UDJC? e.g. use one stream as data source, another as filter condition?
Thank you all.

All the steps under "Joins" and "Lookup", joins just like table join, lookup is to using one stream as source dataset another as "translate" dictionary, this is what I know

Answer to 3 questions as below:
All the Steps available in "Joins" and "Lookup" section will accept two streams. (i haven't tried with 3 streams) Some filter steps like Java Filter will also accept more than one stream.
Typical use scenario is to get data from one or more streams and to work on your business logic. There is no specific example i can explain at the moment.
As per my knowledge, you cannot use more than one stream in JavaScript Step. You might get an error like
I am trying to stream two columns of different names. Input 1 has column "a" and Input 2 has column "b".
You can ignore this error if you can make both the input stream columns to the same name.
Hope this help :)

Check My Database Design / PHP/MySQL

I'm currently working on improving my database to make room for growth. As it stands, different users have different 'permissions' to areas of the website. Some users have permissions to multiple areas of the website.
I'd like some feedback if I'm doing this in the most efficient way:
tblUsers:
usrID usrFirst usrLast phone //etc....
1 John Doe
2 Jane Smith
3 Bill Jones
tblAreas:
id name
1 Marketing
2 Support
3 Human Resources
4 Media Relations
tblPermissions:
id usrID areaID
1 1 2
2 1 4
3 2 1
4 3 3
Right now, for each "area", I have separate directories. However, I'd like to minimize all of these directories down to one main directory, and then redirect users on logging in to their appropriate 'area' based upon their permissions.
Does it sound like I'm doing this correctly? I've never created a multi-layered site with different permissions and different groups of people, thus, I'm certainly open to learning more on how to do this correctly.
Thanks very much!

The general design is ok. The issues that pop out on me relate to naming.
SQL doesn't need hungarian notation -- generally considered unnecessary / bad (tblUsers -> users).
I wouldn't prefix table-names to column-names ...
... except for column "id" which should always include your table name (i.e. areaId)
Your "first" and "last" column don't make sense (hint: firstName)
I'd rename tblPermissions -> userAreas
Depending on your programming language and database, I'd also recommend using underscore instead of capitalization for your table/column-names.
As for using separate directories for different groups, I'd advise against it. Have the security-checks in your code instead of your directory layout.
Reasoning:
What happens when somebody decides that support is also allowed to do some marketing stuff? Should you change your code, or add a record into your database?
Or what if you have overlapping actions?
#brianpeiris: A couple of things come to mind:
No need for column aliases in JOINs
Makes it easier to search through code ("foo_id" gives less results than "id")
JOIN USING (foo_id) instead of JOIN ON (foo.id=bar.id).

The schema looks fine.
I would suggest that you put access control in the controller and base it of of URL path, so that you are not coding it into every section.

Yes, this seems like it is addressing your need perfectly from the database side.
The challenge will be using the data as simply and declaratively as possible. Where is the right place to declare what "area" you are in? Does each page do this, or is there a function that calculates it, or can your controllers do it? as someone suggests. The second part is evaluating the current user against this. Ideally you end up with a single function like "security_check_for_area(4)" that does it all.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas