ELKI PAM clustering

ELKI PAM clustering - k-means

I'm using ELKI for the first time and I'm having a problem understanding it's structure. It seems that I need to do a lot to produce some results.
How can I perform PAM K-medoid clustering with custom distance measure?

Implement your distance function (see this tutorial)
The section "Basic distance function" is a minimal working example. Add the elki.jar, create this class, run the project in eclipse, and you should be able to choose this distance function in PAM.
Make sure it has either
a public, parameterless constructor (easiest), or
a public static class Parameterizer (see documentation)
so that it can be instantiated and configured by the GUI.
To make it appear in the UI dropdown, ensure it is either
in a folder on your classpath (development mode - just don't build a jar - easiest), or
in a .jar, and listed in the META-INF/elki/<interfacename> service file
or type the class name. But usually there is a mistake in step 2 if it doesn't show up in the dropdown; and the class cannot be instantiated.
Run PAM and choose your distance function in the dropdown.
The most common mistake is that people don't have a constructor the UI could use.
The constructor must be public and parameterless, or you need to add a parameterizer to specify the parameters.

Related

Enforcing API boundaries at the Module (Distribution?) level

How do I structure Raku code so that certain symbols are public within the the library I am writing, but not public to users of the library? (I'm saying "library" to avoid the terms "distribution" and "module", which the docs sometimes use in overlapping ways. But if there's a more precise term that I should be using, please let me know.)
I understand how to control privacy within a single file. For example, I might have a file Foo.rakumod with the following contents:
unit module Foo;
sub private($priv) { #`[do internal stuff] }
our sub public($input) is export { #`[ code that calls &private ] }
With this setup, &public is part of my library's public API, but &private isn't – I can call it within Foo, but my users cannot.
How do I maintain this separation if &private gets large enough that I want to split it off into its own file? If I move &private into Bar.rakumod, then I will need to give it our (i.e., package) scope and export it from the Bar module in order to be able to use it from Foo. But doing so in the same way I exported &public from Foo would result in users of my library being able to use Foo and call &private – exactly the outcome I am trying to avoid. How do maintain &private's privacy?
(I looked into enforcing privacy by listing Foo as a module that my distribution provides in my META6.json file. But from the documentation, my understanding is that provides controls what modules package managers like zef install by default but do not actually control the privacy of the code. Is that correct?)
[EDIT: The first few responses I've gotten make me wonder whether I am running into something of an XY problem. I thought I was asking about something in the "easy things should be easy" category. I'm coming at the issue of enforcing API boundaries from a Rust background, where the common practice is to make modules public within a crate (or just to their parent module) – so that was the X I asked about. But if there's a better/different way to enforce API boundaries in Raku, I'd also be interested in that solution (since that's the Y I really care about)]

I will need to give it our (i.e., package) scope and export it from the Bar module
The first step is not necessary. The export mechanism works just as well on lexically scoped subs too, and means they are only available to modules that import them. Since there is no implicit re-export, the module user would have to explicitly use the module containing the implementation details to have them in reach. (As an aside, personally, I pretty much never use our scope for subs in my modules, and rely entirely on exporting. However, I see why one might decide to make them available under a fully qualified name too.)
It's also possible to use export tags for the internal things (is export(:INTERNAL), and then use My::Module::Internals :INTERNAL) to provide an even stronger hint to the module user that they're voiding the warranty. At the end of the day, no matter what the language offers, somebody sufficiently determined to re-use internals will find a way (even if it's copy-paste from your module). Raku is, generally, designed with more of a focus on making it easy for folks to do the right thing than to make it impossible to "wrong" things if they really want to, because sometimes that wrong thing is still less wrong than the alternatives.

Off the bat, there's very little you can't do, as long as you're in control of the meta-object protocol. Anything that's syntactically possible, you could in principle do it using a specific kind of method, or class, declared using that. For instance, you could have a private-class which would be visible only to members of the same namespace (to the level that you would design). There's Metamodel::Trusting which defines, for a particular entity, who it does trust (please bear in mind that this is part of the implementation, not spec, and then subject to change).
A less scalable way would be to use trusts. The new, private modules would need to be classes and issue a trusts X for every class that would access it. That could include classes belonging to the same distribution... or not, that's up to you to decide. It's that Metamodel class above who supplies this trait, so using it directly might give you a greater level of control (with a lower level of programming)

There is no way to enforce this 100%, as others have said. Raku simply provides the user with too much flexibility for you to be able to perfectly hide implementation details externally while still sharing them between files internally.
However, you can get pretty close with a structure like the following:
# in Foo.rakumod
use Bar;
unit module Foo;
sub public($input) is export { #`[ code that calls &private ] }
# In Bar.rakumod
unit module Bar;
sub private($priv) is export is implementation-detail {
unless callframe(1).code.?package.^name eq 'Foo' {
die '&private is a private function. Please use the public API in Foo.' }
#`[do internal stuff]
}
This function will work normally when called from a function declared in the mainline of Foo, but will throw an exception if called from elsewhere. (Of course, the user can catch the exception; if you want to prevent that, you could exit instead – but then a determined user could overwrite the &*EXIT handler! As I said, Raku gives users a lot of flexibility).
Unfortunately, the code above has a runtime cost and is fairly verbose. And, if you want to call &private from more locations, it would get even more verbose. So it is likely better to keep private functions in the same file the majority of the time – but this option exists for when the need arises.

How can I have a "private" Erlang module?

I prefer working with files that are less than 1000 lines long, so am thinking of breaking up some Erlang modules into more bite-sized pieces.
Is there a way of doing this without expanding the public API of my library?
What I mean is, any time there is a module, any user can do module:func_exported_from_the_module. The only way to really have something be private that I know of is to not export it from any module (and even then holes can be poked).
So if there is technically no way to accomplish what I'm looking for, is there a convention?
For example, there are no private methods in Python classes, but the convention is to use a leading _ in _my_private_method to mark it as private.
I accept that the answer may be, "no, you must have 4K LOC files."

The closest thing to a convention is to use edoc tags, like #private and #hidden.
From the docs:
#hidden
Marks the function so that it will not appear in the
documentation (even if "private" documentation is generated). Useful
for debug/test functions, etc. The content can be used as a comment;
it is ignored by EDoc.
#private
Marks the function as private (i.e., not part of the public
interface), so that it will not appear in the normal documentation.
(If "private" documentation is generated, the function will be
included.) Only useful for exported functions, e.g. entry points for
spawn. (Non-exported functions are always "private".) The content can
be used as a comment; it is ignored by EDoc.

Please note that this answer started as a comment to #legoscia's answer
Different visibilities for different methods is not currently supported.
The current convention, if you want to call it that way, is to have one (or several) 'facade' my_lib.erl module(s) that export the public API of your library/application. Calling any internal module of the library is playing with fire and should be avoided (call them at your own risk).
There are some very nice features in the BEAM VM that rely on being able to call exported functions from any module, such as
Callbacks (funs/anonymous funs), MFA, erlang:apply/3: The calling code does not need to know anything about the library, just that it's something that needs to be called
Behaviours such as gen_server need the previous point to work
Hot reloading: You can upgrade the bytecode of any module without stopping the VM. The code server inside the VM maintains at most two versions of the bytecode for any module, redirecting external calls (those with the Module:) to the most recent version and the internal calls to the current version. That's why you may see some ?MODULE: calls in long-running servers, to be able to upgrade the code
You'd be able to argue that these points'd be available with more fine-grained BEAM-oriented visibility levels, true. But I don't think it would solve anything that's not solved with the facade modules, and it'd complicate other parts of the VM/code a great deal.
Bonus
Something similar applies to records and opaque types, records only exist at compile time, and opaque types only at dialyzer time. Nothing stops you from accessing their internals anywhere, but you'll only find problems if you go that way:
You insert a new field in the record, suddenly, all your {record_name,...} = break
You use a library that returns an opaque_adt(), you know that it's a list and use like so. The library is upgraded to include the size of the list, so now opaque_adt() is a tuple() and chaos ensues

Only those functions that are specified in the -export attribute are visible to other modules i.e "public" functions. All other functions are private. If you have specified -compile(export_all) only then all functions in module are visible outside. It is not recommended to use -compile(export_all).

I don't know of any existing convention for Erlang, but why not adopt the Python convention? Let's say that "library-private" functions are prefixed with an underscore. You'll need to quote function names with single quotes for that to work:
-module(bar).
-export(['_my_private_function'/0]).
'_my_private_function'() ->
foo.
Then you can call it as:
> bar:'_my_private_function'().
foo
To me, that communicates clearly that I shouldn't be calling that function unless I know what I'm doing. (and probably not even then)

Lithium: How do I change the location Connections' and similar classes look for adapters

I've been trying to get Connections to use a customised adapter located at app/extensions/data/source/database/adapter/. I thought extending the Connections class and replacing
protected static $_adapters = 'data.source';
with
protected static $_adapters = 'adapter.extension.data.source';
and changing the connections class used at the top of app/config/bootstrap/connections.php to use app\extensions\data\Connections;
would be enough to get it started. However this just leads to a load of errors where the code is still trying to use the original Connections class.
Is there a simple way to achieve this, or do I have to recreate the entire set of classes from lithium/data in extensions with rewritten class references?
EDIT:
Turns out I was going about this the wrong way. After following Nate Abele's advice, Libraries::path('adapter') showed me where to correctly place the MySql.php file I'm trying to override ;-)

For dealing with how named classes (i.e. services, in the abstract) are located, you want to take a look at the Libraries class, specifically the paths() method, which allows you to define how class paths are looked up.
You can also look at the associated definitions, like locate() and $_paths, to give you an idea of what the default configuration looks like.
Finally, note that the Connections class is 'special' since it defines one path dynamically, based on the supplied configuration: http://li3.me/docs/api/lithium/1.0.x/lithium/data/Connections::_class()
This should help you reconfigure how your classes are organized without extending/overriding anything. Generally you shouldn't need to do that unless you need some drastically different behavior.

Getting lazy instance via kernel (Ninject)

I am using Ninject in substitution of MEF and I was wondering if it's possible to get lazy instances via standard kernel methods and not via [inject] .
I need this since when building up my application's menu I have to pass all particular view models and then if the user is enabled on that function to add it to the menu
Thanks

Sure thing, you can inject a Lazy<T> and the value will only be instanciated when you access Lazy<T>.Value.
You can also inject a Func<T> and use it to create T whenever you like (with the func, every call creates a new instance).
Of course you can also do IResolutionRoot.Get<Lazy<T>>() or IResolutionRoot.Get<Func<T>>(), but usually that's a sign of bad design (service locator), so use constructor injection when it's feasible.
EDIT: When is the "enabling of the user" happening? Is it a one time thing? What is being displayed before and after?
There might be other/better designs to achieve this but it's hard to say with that little information.

unit tests - white box vs. black box strategies

I found, that when I writing unit tests, especially for methods who do not return the value, I mostly write tests in white box testing manner. I could use reflection to read private data to check is it in the proper state after method execution, etc...
this approach has a lot of limitation, most important of which is
You need to change your tests if you rework method, even is API stay
the same
It's wrong from information hiding (encapsulation) point of view -
tests is a good documentation for our code, so person who will read
it could get some unnecessary info about implementation
But, if method do not return a value and operate with private data, so it's start's very hard (almost impossible) to test like with a black-box testing paradigm.
So, any ideas for a good solution in that problem?

White box testing means that you necessarily have to pull some of the wiring out on the table to hook up your instruments. Stuff I've found helpful:
1) One monolithic sequence of code, that I inherited and didn't want to rewrite, I was able to instrument by putting a state class variable into, and then setting the state as each step passed. Then I tested with different data and matched up the expected state with the actual state.
2) Create mocks for any method calls of your method under test. Check to see that the mock was called as expected.
3) Make needed properties into protected instead of private, and create a sub-class that I actually tested. The sub-class allowed me to inspect the state.

I could use reflection to read private data to check is it in the proper state after method execution
This can really be a great problem for maintenance of your test suite
in .Net instead you could use internal access modifier, so you could use the InternalsVisibleToAttribute in your class library to make your internal types visible to your unit test project.
The internal keyword is an access modifier for types and type members. Internal types or members are accessible only within files in the same assembly
This will not resolve every testing difficulty, but can help
Reference

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

ELKI PAM clustering - k-means

I'm using ELKI for the first time and I'm having a problem understanding it's structure. It seems that I need to do a lot to produce some results. How can I perform PAM K-medoid clustering with custom distance measure?

Related

Enforcing API boundaries at the Module (Distribution?) level

How can I have a "private" Erlang module?

Lithium: How do I change the location Connections' and similar classes look for adapters

Getting lazy instance via kernel (Ninject)

unit tests - white box vs. black box strategies

Categories

Resources