XGBoost4J Synchronization issue? - xgboost

We are using XGboost4J for ML predictions. We developed predictor using restful webservice so that within platform various components can call ML predictor. e.g from product titles and description finding out product category tree.
Just depicting code in fundamental way we implemented.
// This is done in
initialize method, for every model there is one singleton Booster object loaded.
Class Predictor{
private Booster xgboost;
//init call from Serivice initialization while injecting Predictor
public void init(final String modelFile, final Integer numThreads){
if (!(new File(modelFile).exists())) {
throw new IOException("Modelfile " + modelFile + " does not exist");
}
// we use a util class Params to handle parameters as example
final Iterable<Entry<String, Object>> param = new Params() {
{
put("nthread", numThreads);
}
};
xgboost = new Booster(param, modelFile);
}
//Predict method
public String predict(final String predictionString){
final String dummyLabel = "-1";
final String x_n = dummyLabel + "\t" + x_n_libsvm_idxStr;
final DataLoader.CSRSparseData spData = XGboostSparseData.format(x_n);
final DMatrix x_n_dmatrix = new DMatrix(spData.rowHeaders,
spData.colIndex, spData.data, DMatrix.SparseType.CSR);
final float[][] predict = xgboost.predict(x_n_dmatrix);
// Then there is conversion logic of predict to predicted model result which returns predictions
String prediction = getPrediction(predict);
return prediction
}
}
Above predictor class is singleton injected in webservices Services class
so for every services call thread call's
service.predict(predictionString);
There is problem in tomcat container when multiple concurrent threads calls predict method Boosters method is synchronized
private synchronized float[][] pred(DMatrix data, boolean outPutMargin, long treeLimit, boolean predLeaf) throws XGBoostError {
byte optionMask = 0;
if(outPutMargin) {
optionMask = 1;
}
if(predLeaf) {
optionMask = 2;
}
float[][] rawPredicts = new float[1][];
ErrorHandle.checkCall(XgboostJNI.XGBoosterPredict(this.handle, data.getHandle(), optionMask, treeLimit, rawPredicts));
int row = (int)data.rowNum();
int col = rawPredicts[0].length / row;
float[][] predicts = new float[row][col];
for(int i = 0; i < rawPredicts[0].length; ++i) {
int r = i / col;
int c = i % col;
predicts[r][c] = rawPredicts[0][i];
}
return predicts;
}
This created thread waits and locking because of synchronized block and this is resulting webservices not scalable.
We tried removing synchronized from XGboost4J source code and compiled jar but it crashes within first 1-2 mins. Heap dump showing its crashing at below line while doing native call to XgboostJNI
ErrorHandle.checkCall(XgboostJNI.XGBoosterPredict(this.handle, data.getHandle(), optionMask, treeLimit, rawPredicts));
Anyone knows better way of implementing Xgboost4J for highly scalable webservices approach using Java?

You could use PMML (https://github.com/jpmml/jpmml-xgboost), referring to https://github.com/jpmml/jpmml-xgboost/issues/7#issuecomment-250965282

Related

Hibernate Search manual indexing throw a "org.hibernate.TransientObjectException: The instance was not associated with this session"

I use Hibernate Search 5.11 on my Spring Boot 2 application, allowing to make full text research.
This librairy require to index documents.
When my app is launched, I try to re-index manually data of an indexed entity (MyEntity.class) each five minutes (for specific reason, due to my server context).
I try to index data of the MyEntity.class.
MyEntity.class has a property attachedFiles, which is an hashset, filled with a join #OneToMany(), with lazy loading mode enabled :
#OneToMany(mappedBy = "myEntity", cascade = CascadeType.ALL, orphanRemoval = true)
private Set<AttachedFile> attachedFiles = new HashSet<>();
I code the required indexing process, but an exception is thrown on "fullTextSession.index(result);" when attachedFiles property of a given entity is filled with one or more items :
org.hibernate.TransientObjectException: The instance was not associated with this session
The debug mode indicates a message like "Unable to load [...]" on entity hashset value in this case.
And if the HashSet is empty (not null, only empty), no exception is thrown.
My indexing method :
private void indexDocumentsByEntityIds(List<Long> ids) {
final int BATCH_SIZE = 128;
Session session = entityManager.unwrap(Session.class);
FullTextSession fullTextSession = Search.getFullTextSession(session);
fullTextSession.setFlushMode(FlushMode.MANUAL);
fullTextSession.setCacheMode(CacheMode.IGNORE);
CriteriaBuilder builder = session.getCriteriaBuilder();
CriteriaQuery<MyEntity> criteria = builder.createQuery(MyEntity.class);
Root<MyEntity> root = criteria.from(MyEntity.class);
criteria.select(root).where(root.get("id").in(ids));
TypedQuery<MyEntity> query = fullTextSession.createQuery(criteria);
List<MyEntity> results = query.getResultList();
int index = 0;
for (MyEntity result : results) {
index++;
try {
fullTextSession.index(result); //index each element
if (index % BATCH_SIZE == 0 || index == ids.size()) {
fullTextSession.flushToIndexes(); //apply changes to indexes
fullTextSession.clear(); //free memory since the queue is processed
}
} catch (TransientObjectException toEx) {
LOGGER.info(toEx.getMessage());
throw toEx;
}
}
}
Does someone have an idea ?
Thanks !
This is probably caused by the "clear" call you have in your loop.
In essence, what you're doing is:
load all entities to reindex into the session
index one batch of entities
remove all entities from the session (fullTextSession.clear())
try to index the next batch of entities, even though they are not in the session anymore... ?
What you need to do is to only load each batch of entities after the session clearing, so that you're sure they are still in the session when you index them.
There's an example of how to do this in the documentation, using a scroll and an appropriate batch size: https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#search-batchindex-flushtoindexes
Alternatively, you can just split your ID list in smaller lists of 128 elements, and for each of these lists, run a query to get the corresponding entities, reindex all these 128 entities, then flush and clear.
Thanks for the explanations #yrodiere, they helped me a lot !
I chose your alternative solution :
Alternatively, you can just split your ID list in smaller lists of 128 elements, and for each of these lists, run a query to get the corresponding entities, reindex all these 128 entities, then flush and clear.
...and everything works perfectly !
Well seen !
See the code solution below :
private List<List<Object>> splitList(List<Object> list, int subListSize) {
List<List<Object>> splittedList = new ArrayList<>();
if (!CollectionUtils.isEmpty(list)) {
int i = 0;
int nbItems = list.size();
while (i < nbItems) {
int maxLastSubListIndex = i + subListSize;
int lastSubListIndex = (maxLastSubListIndex > nbItems) ? nbItems : maxLastSubListIndex;
List<Object> subList = list.subList(i, lastSubListIndex);
splittedList.add(subList);
i = lastSubListIndex;
}
}
return splittedList;
}
private void indexDocumentsByEntityIds(Class<Object> clazz, String entityIdPropertyName, List<Object> ids) {
Session session = entityManager.unwrap(Session.class);
List<List<Object>> splittedIdsLists = splitList(ids, 128);
for (List<Object> splittedIds : splittedIdsLists) {
FullTextSession fullTextSession = Search.getFullTextSession(session);
fullTextSession.setFlushMode(FlushMode.MANUAL);
fullTextSession.setCacheMode(CacheMode.IGNORE);
Transaction transaction = fullTextSession.beginTransaction();
CriteriaBuilder builder = session.getCriteriaBuilder();
CriteriaQuery<Object> criteria = builder.createQuery(clazz);
Root<Object> root = criteria.from(clazz);
criteria.select(root).where(root.get(entityIdPropertyName).in(splittedIds));
TypedQuery<Object> query = fullTextSession.createQuery(criteria);
List<Object> results = query.getResultList();
int index = 0;
for (Object result : results) {
index++;
try {
fullTextSession.index(result); //index each element
if (index == splittedIds.size()) {
fullTextSession.flushToIndexes(); //apply changes to indexes
fullTextSession.clear(); //free memory since the queue is processed
}
} catch (TransientObjectException toEx) {
LOGGER.info(toEx.getMessage());
throw toEx;
}
}
transaction.commit();
}
}

Small AS2 OOP Issue

I'm working with Actionscript 2 (not ready to upgrade yet, although it's irreverent to the problem) but I'm having trouble with OOP and classes.
I've got a "Tool" class, written like so:
class com.Tool {
public var self:MovieClip;
private static var Type:String;
function Tool(T:String, X:Number, Y:Number) {
Type = T;
self = _root.createEmptyMovieClip("obj"+_root.getNextHighestDepth(), _root.getNextHighestDepth());
self._x = X;
self._y = Y;
self.width = 36;
self.height = 36;
self.onRollOver = function() {
trace(Type);
}
}
}
I create 3 of them in the main script like so:
var toolPan:Tool = new Tool("pan", 0, 0);
var toolSquare:Tool = new Tool("square", 0, 38);
var toolLine:Tool = new Tool("line", 0, 76);
It all works great, except the onRollOver. It's supposed to output the unique "Type" string, but it always outputs "line" (the last Type Tool created) regardless which one I roll over.
Needless to say, I'm still a beginner to all this. But it seems like they're all sharing the same variable :/ How do I make these variables unique to each object created?
Thank you very much!
It's because it's type static, so the value is shared by all instances of that class. Remove it and it should work.
private var Type:String;

How to calculate square root in sqlite

I need to calculate an euclidean distance in a sqlite database.
Does anyone know how to calculate square roots in sqlite beside writing and loading a dynamic library for math functions?
I am close to resorting to the fast inverse square root algorithm in here http://en.wikipedia.org/wiki/Fast_inverse_square_root though it might to turn into more fun than I need right now.
And as a side note, it'd be great to figure out how to do power (which is the generalized question, and is cleaner coding than multiplying a number by itself).
Thanks,
Simone
Well, I have a semi-answer.
Yes it involves a 3rd party, but you don't have to write it yourself : did you check the last extension on this page ?
It includes several math functions, and amongst them is sqrt().
Warning: this answer is dependent on the coding language. In my case C#.
User defined SQLite functions was for me a pain to implement. Finally, after a long time of searching I was able to implement it in my C# code. Main function looks like this:
[SQLiteFunction(Arguments = 1, FuncType = FunctionType.Scalar, Name = "Sqrt")]
class Sqrt : SQLiteFunction
{
public override object Invoke(object[] args)
{
return Math.Sqrt(Double.Parse(args[0].ToString()));
}
}
Registration of custom function:
SQLiteFunction.RegisterFunction(typeof(Sqrt));
And using in select:
SQLiteCommand com = new SQLiteCommand("select sqrt(10.42)", connection);
You can download full example here: http://db.tt/qzeNXwso
Or, if you want only view code (or get through all parts of my code), I paste below full working example code for calculate square root in SQLite database, because is very hard to find any working code for this. To create and run this example do this 6 steps:
Create new project (my name is Sqrt)
Include SQLite reference to your project: Solution Explorer -> References (right click: Add reference) -> Assemblies - Extensions - System.Data.SQLite (check) -> OK
Open App.config and replace to this (without this step you maybe get Mixed mode assembly error):
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<startup useLegacyV2RuntimeActivationPolicy="true">
<supportedRuntime version="v4.0"/>
</startup>
</configuration>
Replace your Form1.Designer.cs with this code:
namespace Sqrt
{
partial class Form1
{
/// <summary>
/// Required designer variable.
/// </summary>
private System.ComponentModel.IContainer components = null;
/// <summary>
/// Clean up any resources being used.
/// </summary>
/// <param name="disposing">true if managed resources should be disposed; otherwise, false.</param>
protected override void Dispose(bool disposing)
{
if (disposing && (components != null))
{
components.Dispose();
}
base.Dispose(disposing);
}
#region Windows Form Designer generated code
/// <summary>
/// Required method for Designer support - do not modify
/// the contents of this method with the code editor.
/// </summary>
private void InitializeComponent()
{
this.txb_Input = new System.Windows.Forms.TextBox();
this.txb_Output = new System.Windows.Forms.TextBox();
this.label1 = new System.Windows.Forms.Label();
this.label2 = new System.Windows.Forms.Label();
this.btn_Calcualte = new System.Windows.Forms.Button();
this.SuspendLayout();
//
// txb_Input
//
this.txb_Input.Location = new System.Drawing.Point(131, 12);
this.txb_Input.Name = "txb_Input";
this.txb_Input.Size = new System.Drawing.Size(201, 20);
this.txb_Input.TabIndex = 0;
//
// txb_Output
//
this.txb_Output.BackColor = System.Drawing.Color.WhiteSmoke;
this.txb_Output.Location = new System.Drawing.Point(131, 38);
this.txb_Output.Name = "txb_Output";
this.txb_Output.ReadOnly = true;
this.txb_Output.Size = new System.Drawing.Size(201, 20);
this.txb_Output.TabIndex = 0;
//
// label1
//
this.label1.AutoSize = true;
this.label1.Location = new System.Drawing.Point(12, 15);
this.label1.Name = "label1";
this.label1.Size = new System.Drawing.Size(31, 13);
this.label1.TabIndex = 1;
this.label1.Text = "Input";
//
// label2
//
this.label2.AutoSize = true;
this.label2.Location = new System.Drawing.Point(12, 41);
this.label2.Name = "label2";
this.label2.Size = new System.Drawing.Size(39, 13);
this.label2.TabIndex = 1;
this.label2.Text = "Output";
//
// btn_Calcualte
//
this.btn_Calcualte.Location = new System.Drawing.Point(257, 64);
this.btn_Calcualte.Name = "btn_Calcualte";
this.btn_Calcualte.Size = new System.Drawing.Size(75, 23);
this.btn_Calcualte.TabIndex = 2;
this.btn_Calcualte.Text = "Calculate";
this.btn_Calcualte.UseVisualStyleBackColor = true;
//
// Form1
//
this.AutoScaleDimensions = new System.Drawing.SizeF(6F, 13F);
this.AutoScaleMode = System.Windows.Forms.AutoScaleMode.Font;
this.ClientSize = new System.Drawing.Size(344, 98);
this.Controls.Add(this.btn_Calcualte);
this.Controls.Add(this.label2);
this.Controls.Add(this.label1);
this.Controls.Add(this.txb_Output);
this.Controls.Add(this.txb_Input);
this.Name = "Form1";
this.Text = "Root square example";
this.ResumeLayout(false);
this.PerformLayout();
}
#endregion
private System.Windows.Forms.TextBox txb_Input;
private System.Windows.Forms.TextBox txb_Output;
private System.Windows.Forms.Label label1;
private System.Windows.Forms.Label label2;
private System.Windows.Forms.Button btn_Calcualte;
}
}
Open Form1.cs (code) and replace code with this:
using System;
using System.Data.SQLite;
using System.Windows.Forms;
namespace Sqrt
{
// definition of custom sqlite function
[SQLiteFunction(Arguments = 1, FuncType = FunctionType.Scalar, Name = "Sqrt")]
class Sqrt : SQLiteFunction
{
public override object Invoke(object[] args)
{
return Math.Sqrt(Double.Parse(args[0].ToString())); // return result of math sqrt function
}
}
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
this.btn_Calcualte.Click += new System.EventHandler(this.btn_Calcualte_Click);
}
private void btn_Calcualte_Click(object sender, EventArgs e)
{
if (txb_Input.Text.Length == 0)
return;
try { SQLiteConnection.CreateFile(AppDomain.CurrentDomain.BaseDirectory + "test.s3db"); }
catch { }
SQLiteConnection con = new SQLiteConnection("Data Source=test.s3db");
SQLiteFunction.RegisterFunction(typeof(Sqrt)); // register custom function
con.Open();
SQLiteCommand com = new SQLiteCommand("select sqrt(" + txb_Input.Text.Replace(',', '.') + ")", con); // select result
string res = com.ExecuteScalar().ToString();
txb_Output.Text = res;
}
}
}
Run, try and enjoy.
This is an approximation of sqrt for numbers under 10000. It can be extended for arbitrary numbers, and can be extended to arbitrary precision as needed. This kind of tabular interpolation is what happens in most fast implementations anyway:
case when weight >= 1 and weight<=10 then 1+0.240253073*(weight-1)
when weight>=10 and weight<=100 then 3.16227766+0.075974693*(weight-10)
when weight>=100 and weight<=1000 then 10+0.024025307*(weight-100)
else 31.6227766+0.007597469 *(weight-1000) end
And there's the curious fact that each factor you use in such a power-of-10 square root interpolation table is 0.316227766 times the previous one - so you can make this work for an arbitrarily large number, or even stuff a table full of these values to make it work for any number. (Could that lead to some compression here?)
Or this cute one for log10 of integers, using the length function (an interpolation table might work better here, but I like that log10 and length() are similar, and that this works for any integer - no interpolation needed.
((length(x)+length(x*2)+length(x*3)
+length(x*4)+length(x*5))/5.0)-1.0
A better math head than I can probably come up with better and denser approximations. Considering that most sqrt functions in c use approximations anyway - this is a pretty good solution.
This is the only native way of doing it.
As far as I know - you can't do that using only core functions.
Here is the list of native functions Core functions and a list of aggregate functions Aggregate functions.
To solve your problem, you can write your own UDF (user defined function) as illustrated HERE
Only if math functions are not available... and really only in desperation because this isn't gonna be fast...
-- bisect to find the square root to any tolerance desired
with
input(n) as (select 500), --input
sqrt(lo, hi, guess, n, i) as (
select 1, n, n/2, n, 0 from input
union all
select case when guess*guess < n then guess else lo end,
case when guess*guess < n then hi else guess end,
case when guess*guess < n then (hi+guess)/2.0 else (lo+guess)/2.0 end,
n ,
i +1 from sqrt
where abs(guess*guess - n) > 0.0001), -- tolerance
sqrt_out(x, n) as (select guess, n from sqrt order by sqrt.i desc limit 1)
select * from sqrt_out
2021-03-12 (3.35.0)
Added built-in SQL math functions(). (Requires the -DSQLITE_ENABLE_MATH_FUNCTIONS compile-time option.)
Built-In Mathematical SQL Functions
sqrt(X) Return the square root of X. NULL is returned if X is negative.

Insert 1000000 documents into RavenDB

I want to insert 1000000 documents into RavenDB.
class Program
{
private static string serverName;
private static string databaseName;
private static DocumentStore documentstore;
private static IDocumentSession _session;
static void Main(string[] args)
{
Console.WriteLine("Start...");
serverName = ConfigurationManager.AppSettings["ServerName"];
databaseName = ConfigurationManager.AppSettings["Database"];
documentstore = new DocumentStore { Url = serverName };
documentstore.Initialize();
Console.WriteLine("Initial Databse...");
_session = documentstore.OpenSession(databaseName);
for (int i = 0; i < 1000000; i++)
{
var person = new Person()
{
Fname = "Meysam" + i,
Lname = " Savameri" + i,
Bdate = DateTime.Now,
Salary = 6001 + i,
Address = "BITS provides one foreground and three background priority levels that" +
"you can use to prioritize transBfer jobs. Higher priority jobs preempt"+
"lower priority jobs. Jobs at the same priority level share transfer time,"+
"which prevents a large job from blocking small jobs in the transfer"+
"queue. Lower priority jobs do not receive transfer time until all the "+
"higher priority jobs are complete or in an error state. Background"+
"transfers are optimal because BITS uses idle network bandwidth to"+
"transfer the files. BITS increases or decreases the rate at which files "+
"are transferred based on the amount of idle network bandwidth that is"+
"available. If a network application begins to consume more bandwidth,"+
"BITS decreases its transfer rate to preserve the user's interactive"+
"experience. BITS supports multiple foreground jobs and one background"+
"transfer job at the same time.",
Email = "Meysam" + i + "#hotmail.com",
};
_session.Store(person);
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine("Count:" + i);
Console.ForegroundColor = ConsoleColor.White;
}
Console.WriteLine("Commit...");
_session.SaveChanges();
documentstore.Dispose();
_session.Dispose();
Console.WriteLine("Complete...");
Console.ReadLine();
}
}
but session doesn't save changes, I get an error:
An unhandled exception of type 'System.OutOfMemoryException' occurred in mscorlib.dll
A document session is intended to handle a small number of requests. Instead, experiment with inserting in batches of 1024. After that, dispose the session and create a new one. The reason you get an OutOfMemoryException is because the document session caches all constituent objects to provide a unit of work, which is why you should dispose of the session after inserting a batch.
A neat way to do this is with the use of a Batch linq extension:
foreach (var batch in Enumerable.Range(1, 1000000)
.Select(i => new Person { /* set properties */ })
.Batch(1024))
{
using (var session = documentstore.OpenSession())
{
foreach (var person in batch)
{
session.Store(person);
}
session.SaveChanges();
}
}
The implementations of both Enumerable.Range and Batch are lazy and don't keep all the objects in memory.
RavenDB also has a bulk API that does a similar thing without the need for additional LINQ extensions:
using (var bulkInsert = store.BulkInsert())
{
for (int i = 0; i < 1000 * 1000; i++)
{
bulkInsert.Store(new User
{
Name = "Users #" + i
});
}
}
Note .SaveChanges() isn't called and will be called either when a batch size is reached (defined in the BulkInsert() if needed), or when the bulkInsert is disposed of.

How do I mock this value using Rhino Mocks

Here is the method I'm trying to test:
public override void CalculateReductionOnYield()
{
log.LogEnter();
if (illus.RpFundStreams.Count <= 0)
{
throw new InvalidDataException("No regular premium fund streams which are required in order to calculate reduction on yield");
}
// Add the individual ReductionOnYield classes to the collection.)
foreach (RegularPremiumFundStream fs in illus.RpFundStreams)
{
foreach (int i in ReductionOnYieldMonths)
{
ReductionOnYield roy = new ReductionOnYield(i);
roy.FundStream = fs;
ReductionsOnYield.Add(roy);
}
foreach (ReductionOnYield redOnYield in ReductionsOnYield)
{
if (redOnYield.Month == 0 || illus.RegularPremiumInPlanCurrency == 0M)
{
redOnYield.Reduction = 0M;
}
else
{
double[] regPremiums = new double[redOnYield.Month + 1];
for (int i = 1; i <= redOnYield.Month; i++)
{
regPremiums[i - 1] = Convert.ToDouble(-1*redOnYield.FundStream.FundStreamMonths[i].ValRegularPremium);
}
regPremiums[redOnYield.Month] = Convert.ToDouble(redOnYield.FundStream.GetFundStreamValue(redOnYield.Month));
redOnYield.Reduction = Convert.ToDecimal(Math.Pow((1 + Financial.IRR(ref regPremiums, 0.001D)), 12) - 1);
}
}
}
How do I mock all the required classes to test the value of redOnYield.Reduction to make sure that it working properly?
e.g. how do I mock redOnYield.FundStream.GetFundStreamValue(redOnYield.Month) and redOnYield.FundStream.FundStreamMonths[i].ValRegularPremium ?
Is this a valid test? Or am I going about this the wrong way?
without more info on your objects its hard to say, but you want something like:
var fundStream = MockRepository.GenerateStub<TFundStream>();
fundStream.Stub(f => f.GetFundStreamValue(60)).Return(220000M);
var redOnYeild = MockRepository.GenerateStub<TRedOnYeild>();
redOnYeild.Stub(r => r.FundStream).Return(fundStream);
redOnYield is an object returned from iterating ReductionsOnYield. I don't see where this is coming from. If we assume it's a virtual property, then you'll want to create a collection of mock ReductionOnYield objects and stub out ReductionsOnYield to return your mocked collection (or, to make it easier to test, have CalculateReductionOnYield accept an IEnumerable and operate on that collection).
Once you get the ReductionsOnYield issue resolved, Andrew's response of stubbing out the properties will get you where you want to be. Of course, this assumes that FundStream is virtual (so it can be mocked/stubbed) as well as RegularPremiumFundStream's GetFundStreamValue and FundStreamMonths.