Inferring topics with mallet, using the saved topic state - text-mining

I've used the following command to generate a topic model from some documents:
bin/mallet train-topics --input topic-input.mallet --num-topics 100 --output-state topic-state.gz
I have not, however, used the --output-model option to generate a serialized topic trainer object. Is there any way I can use the state file to infer topics for new documents? Training is slow, and it'll take a couple of days for me to retrain, if I have to create the serialized model from scratch.

We did not use the command line tools shipped with mallet, we just use the mallet api to create the serialized model for inferences of the new document. Two point need special notice:
You need serialize out the pipes you used just after you finish the training (For my case, it is SerialPipes)
And of cause the model need also to be serialized after you finish the training(For my case, it is ParallelTopicModel)
Please check with the java doc:
http://mallet.cs.umass.edu/api/cc/mallet/pipe/SerialPipes.html
http://mallet.cs.umass.edu/api/cc/mallet/topics/ParallelTopicModel.html

Restoring a model from the state file appears to be a new feature in mallet 2.0.7 according to the release notes.
Ability to restore models from gzipped "state" files. From the new
TopicTrainer, use the --input-state [filename] argument. Note that you
can manually edit this file. Any token with topic set to -1 will be
immediately resampled upon loading.

If you mean you want to see how new documents fit into a previously trained topic model, then I'm afraid there is no simple command you can use to do it right.
The class cc.mallet.topics.LDA in mallet 2.0.7's source code provides such a utility, try to understand it and use it in your program.
P.S., If my memory serves, there is some problem with the implementation of the function in that class:
public void addDocuments(InstanceList additionalDocuments,
int numIterations, int showTopicsInterval,
int outputModelInterval, String outputModelFilename,
Randoms r)
You have to rewrite it.

Related

How do I launch a SmartSim orchestrator without models?

I'm trying to prototype using the SmartRedis Python client to interact with the SmartSim Orchestrator. Is it possible to launch the orchestrator without any other models in the experiment? If so, what would be the best way to do so?
It is entirely possible to do that. A SmartSim Experiment can contain different types of 'entities' including Models, Ensembles (i.e. groups of Models), and Orchestrator (i.e. the Redis-backed database). None of these entities, however, are 'required' to be in the Experiment.
Here's a short script that creates an experiment which includes only a database.
from SmartSim import Experiment
NUM_DB_NODES = 3
exp = Experiment("Database Only")
db = exp.create_database(db_nodes=NUM_DB_NODES)
exp.generate(db)
exp.start(db)
After this, the Orchestrator (with the number of shards specified by NUM_DB_NODES) will have been spunup. You can then connect the Python client using the following line:
client = smartredis.Client(db.get_address()[0],NUM_DB_NODES>1)

Jmeter Beanshell: Accessing global lists of data

I'm using Jmeter to design a test that requires data to be randomly read from text files. To save memory, I have set up a "setUp Thread Group" with a BeanShell PreProcessor with the following:
//Imports
import org.apache.commons.io.FileUtils;
//Read data files
List items = FileUtils.readLines(new File(vars.get("DataFolder") + "/items.txt"));
//Store for future use
props.put("items", items);
I then attempt to read this in my other thread groups and am trying to access a random line in my text files with something like this:
(props.get("items")).get(new Random().nextInt((props.get("items")).size()))
However, this throws a "Typed variable declaration" error and I think it's because the get() method returns an object and I'm trying to invoke size() on it, since it's really a List. I'm not sure what to do here. My ultimate goal is to define some lists of data once to be used globally in my test so my tests don't have to store this data themselves.
Does anyone have any thoughts as to what might be wrong?
EDIT
I've also tried defining the variables in the setUp thread group as follows:
bsh.shared.items = items;
And then using them as this:
(bsh.shared.items).get(new Random().nextInt((bsh.shared.items).size()))
But that fails with the error "Method size() not found in class'bsh.Primitive'".
You were very close, just add casting to List so the interpreter will know what's the expected object:
log.info(((List)props.get("items")).get(new Random().nextInt((props.get("items")).size())));
Be aware that since JMeter 3.1 it is recommended to use Groovy for any form of scripting as:
Groovy performance is much better
Groovy supports more modern Java features while with Beanshell you're stuck at Java 5 level
Groovy has a plenty of JDK enhancements, i.e. File.readLines() function
So kindly find Groovy solution below:
In the first Thread Group:
props.put('items', new File(vars.get('DataFolder') + '/items.txt').readLines()
In the second Thread Group:
def items = props.get('items')
def randomLine = items.get(new Random().nextInt(items.size))

Tensorflow: checkpoints simple load

I have a checkpoint file:
checkpoint-20001 checkpoint-20001.meta
how do I extract variables from this space, without having to load the previous model and starting session etc.
I want to do something like
cp = load(checkpoint-20001)
cp.var_a
It's not documented, but you can inspect the contents of a checkpoint from Python using the class tf.train.NewCheckpointReader.
Here's a test case that uses it, so you can see how the class works.
https://github.com/tensorflow/tensorflow/blob/861644c0bcae5d56f7b3f439696eefa6df8580ec/tensorflow/python/training/saver_test.py#L1203
Since it isn't a documented class, its API may change in the future.

JSR 352 : How do you write to a MVS Dataset from a Java Batch program?

I need to write to a non-VSAM dataset in the mainframe. I know that we need to use the ZFile library to do it and I found how to do it here
I am running my Java batch job in the WebSphere Liberty on zOS. How do I specify the dataset? Can I directly give the DataSet a name like this?
dsnFile = new ZFile("X.Y.Z", "wb,type=record,noseek");
I am able to write it to a text file on the server itself using Java's File Writers but I don't know how to access a mvs dataset.
I am relatively new to the world of zOS and mainframe.
It sounds like you might be asking more generally how to use the ZFile API on WebSphere Liberty on z/OS.
Have you tried something like:
String pdsName = ZFile.getSlashSlashQuotedDSN("X.Y.Z");
ZFile zfile = new ZFile(pdsName , ...options...)
As far as batch-specific use cases, you might obviously have to differentiate between writing to a new file that's created for the first time on an original execution, as opposed to appending to an already-existing one on a restart.
You also might find some useful snipopets in this doctorbatch.io repo, along with the original link you posted.
For reference, I'll copy/paste from the ZFile Javadoc:
ZFile dd = new ZFile("//DD:MYDD", "r");
Opens the DD namee MYDD for reading
ZFile dsn = new ZFile("//'SYS1.HELP(ACCOUNT)'", "rt");
Opens the member ACCOUNT from the PDS SYS1.HELP for reading text records
ZFile dsn = new ZFile("//SEQ", "wb,type=record,recfm=fb,lrecl=80,noseek");
Opens the data set {MVS_USER}.SEQ for sequential binary writing. Note that ",noseek" should be specified with "type=record" if access is sequential, since performance is greatly improved.
One final note, another couple useful ZFile helper methods are: bpxwdyn() and getFullyQualifiedDSN().

Magento - Trouble with setting up Model Read Adapter

I was following through on Alan Storm's tutorial on Magento's Model and ORM basics and I've run into a bit of a problem. When I get to the portion where I load from the Model for the first time I get this error "Fatal error: Call to a member function load() on a non-object...". I've reset everything already and tried again from scratch but I still get the same problem. My code looks like this:
$params = $this->getRequest()->getParams();
$blogpost = Mage::getModel('weblog/blogpost');
var_dump($blogpost);
echo("Loading the blogpost with an ID of ".$params['id']);
$blogpost->load($params['id']);
As you can see I dumped the contents of $blogpost and it shows that it is just a boolean false. My guess is that there's either a problem with the connection to the database or, for some reason, the code for Mage::getModel() didn't get installed correctly.
EDIT - Adding Code
There's so many that I just decided to pastebin them lol
app/code/local/Ahathaway/Weblog/controllers/IndexController.php
app/code/local/Ahathaway/Weblog/etc/config.xml
app/code/local/Ahathaway/Weblog/Model/Blogpost.php
app/etc/modules/Ahathaway_Weblog.xml
Your Model/Blogpost.php file should actually be Model/Mysql4/Blogpost.php, and you are missing the real Model/Blogpost.php.
My guess is that Mage cannot find your model class. Double check the module/model name and also verify if the model is in a correct place in the filesystem (it should be in app/code/local/Weblog/Model/Blogpost.php).
You also need to check if your config.xml correctly defines your model classes. It would be best if you could past your config.xml and your model class...
A quick glance reveals you're missing the model resource. Go back to the section around the following code example
File: app/code/local/Alanstormdotcom/Weblog/Model/Mysql4/Blogpost.php
class Alanstormdotcom_Weblog_Model_Mysql4_Blogpost extends Mage_Core_Model_Mysql4_Abstract{
protected function _construct()
{
$this->_init('weblog/blogpost', 'blogpost_id');
}
}