Weighted sampling sequences to train a neural network in CNTK - cntk

I am training a LSTM on weighted sequences with CNTK. I started from the following example of language understanding: https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_202_Language_Understanding.ipynb
To train the network, they produce a CNTK Text Format (CTF) file like :
19 |S0 178:1 |# BOS |S1 14:1 |# flight |S2 128:1 |# O
19 |S0 770:1 |# show |S2 128:1 |# O
19 |S0 429:1 |# flights |S2 128:1 |# O
19 |S0 444:1 |# from |S2 128:1 |# O
I have a weight associated to each sequence. Therefore, I generated the following output:
19 |weight 10 |S0 178:1 |# BOS |S1 14:1 |# flight |S2 128:1 |# O
19 |weight 10 |S0 770:1 |# show |S2 128:1 |# O
19 |weight 10 |S0 429:1 |# flights |S2 128:1 |# O
19 |weight 10 |S0 444:1 |# from |S2 128:1 |# O
I want to take the weight into account when training the network. One possible way to take it into account is modifying the loss function as follow: I multiply the cross entropy by the weight of an instance.
def create_criterion_function(model):
labels = Placeholder(name='labels')
weight = Placeholder(name='weight')
ce = weight * cross_entropy_with_softmax(model, labels)
errs = classification_error (model, labels)
return combine ([ce, errs]) # (features, labels) -> (loss, metric)
However, when I have many sequences, the network seem to not learn. I have been told that this is a case of catastrophic forgetting:
Catastrophic forgetting (also: catastrophic interference) is a term, often used in connectionist literature, to describe a common problem with many traditional artificial neural network models. It refers to the catastrophic loss of previously learned responses, whenever an attempt is made to train the network with a single new (additional) response.
Another solution might be that I could sample the instances in the minibatch according to the weight: sequences with higher weight should appear more often in the minibatches. Is there a way to do it in CNTK?

I think your approach is correct. However, note that scaling up the objective by 10 will scale up your gradient by 10 for these samples. Before looking into catastrophic forgetting, I would first try reducing your learning rate by 10, as to bring back the gradient steps into the same range as before.

Related

How to select half precision (BFLOAT16 vs FLOAT16) for your trained model?

how will you decide what precision works best for your inference model? Both BF16 and F16 takes two bytes but they use different number of bits for fraction and exponent.
Range will be different but I am trying to understand why one chose one over other.
Thank you
|--------+------+----------+----------|
| Format | Bits | Exponent | Fraction |
|--------+------+----------+----------|
| FP32 | 32 | 8 | 23 |
| FP16 | 16 | 5 | 10 |
| BF16 | 16 | 8 | 7 |
|--------+------+----------+----------|
Range
bfloat16: ~1.18e-38 … ~3.40e38 with 3 significant decimal digits.
float16: ~5.96e−8 (6.10e−5) … 65504 with 4 significant decimal digits precision.
bfloat16 is generally easier to use, because it works as a drop-in replacement for float32. If your code doesn't create nan/inf numbers or turn a non-0 into a 0 with float32, then it shouldn't do it with bfloat16 either, roughly speaking. So, if your hardware supports it, I'd pick that.
Check out AMP if you choose float16.

Why do we consider time complexity as O(Logn) if a matrix is powered by a constant value?

I am trying to think about it, and I don't get it.
If the matrix size is 1x1, than power by a constant = n is O(n).
If the matrix size is 2x2, than power by 2 goes like that:
|a b| X |a b|
|c d| |c d|
=> aa + bc, ab + bd, ca + dc, cb + dd.
Multipication done was 8 times, addition - 4.
So 2X2 Matrix, goes to 8 MULT and 4 ADDs, overall - 12 calculations.
I don't see the connection to O(log n)?

How to parallelly process a pandas Dataframe for each unique value in a column?

I'm looking for ideas to optimize my function. I have limited knowledge on multiprocessing so just looking for someone to point me in the right direction!
So, I have a pandas DataFrame with the following format
---------------------------------------------------
|Date | Portfolio | Classification | P&L |
--------------------------------------------------
| 2016-01-01 | A | Class_1 | 100 |
--------------------------------------------------
|2016-01-02 | A | Class_2 | 200 |
. .
. .
. .
--------------------------------------------------
|2019-10-31 | A | Class_700 | -200 |
--------------------------------------------------
All I need to get from this is P&L attribution that basically explains how each unique Class generated p&l from its inception & I have another function to do that...
Currently my logic is:
for unique_class in df['Classification'].unique():
class_df = df[df['Classification'] == unique_class].copy()
result1, result2 = call_attribution_function(class_df)
overall_results1.append(result1)
overall_results2.append(results2)
This gets the job done but is obviously very slow. In real scenario, I have over 700 unique classifications.
However, none of the classifications are dependent on each other. Basically all 700 can be processed parallelly and that would significantly improve the performance.
Any ideas on how this can be achieved?
I've seen a few examples with joblib but I didn't get a lot of direction for slicing dataframes and parallelizing functions with multiple return variables..
Any pointers highly appreciated! Thanks in advance!

How to use LinearRegression across groups in DataFrame?

Let us say my spark DataFrame (DF) looks like
id | age | earnings| health
----------------------------
1 | 34 | 65 | 8
2 | 65 | 12 | 4
2 | 20 | 7 | 10
1 | 40 | 75 | 7
. | .. | .. | ..
and I would like to group the DF, apply a function (say linear
regression which depends on multiple columns - two columns in this case -
of aggregated DF) on each aggregated DF and get output like
id | intercept| slope
----------------------
1 | ? | ?
2 | ? | ?
from sklearn.linear_model import LinearRegression
lr_object = LinearRegression()
def linear_regression(ith_DF):
# Note: for me it is necessary that ith_DF should contain all
# data within this function scope, so that I can apply any
# function that needs all data in ith_DF
X = [i.earnings for i in ith_DF.select("earnings").rdd.collect()]
y = [i.health for i in ith_DF.select("health").rdd.collect()]
lr_object.fit(X, y)
return lr_object.intercept_, lr_object.coef_[0]
coefficient_collector = []
# following iteration is not possible in spark as 'GroupedData'
# object is not iterable, please consider it as pseudo code
for ith_df in df.groupby("id"):
c, m = linear_regression(ith_df)
coefficient_collector.append((float(c), float(m)))
model_df = spark.createDataFrame(coefficient_collector, ["intercept", "slope"])
model_df.show()
I think this can be done since Spark 2.3 using pandas_UDF. In fact, there is an example of fitting grouped regressions on the announcement of pandas_UDFs here:
Introducing Pandas UDF for Python
What I'd do is to filter the main DataFrame to create smaller DataFrames and do the processing, say a linear regression.
You can then execute the linear regression in parallel (on separate threads using the same SparkSession which is thread-safe) and the main DataFrame cached.
That should give you the full power of Spark.
p.s. My limited understanding of that part of Spark makes me think that a very similar approach is used for grid search-based model selection in Spark MLlib and also TensorFrames which is "Experimental TensorFlow binding for Scala and Apache Spark".

Date Join Query with Calculated Fields

I'm creating an Access 2010 database to replace an old Paradox one. Just now getting to queries, and there is no hiding that I am a new to SQL.
What I am trying to do is set up a query to be used by a graph. The graph's Y axis is to be a simple percentage passed, and the X axis is a certain day. The graph will be created on form load and subsequent new records entered with a date range of "Between Date() And Date()-30" (30 days, rolling).
The database I'm working with can have multiple inspections per day with multiple passes and multiple fails. Each inspection is a separate record.
For instance, on 11/26/2012 there were 7 inspections done; 5 passed and 2 failed, a 71% ((5/7)*100%) acceptance. The "11/26/2012" and "71%" represent a data point on the graph. On 11/27/2012 there were 8 inspections done; 4 passed and 4 failed, a 50% acceptance. Etc.
Here is an example of a query with fields "Date" and "Disposition" of date range "11/26/2012 - 11/27/2012:"
SELECT Inspection.Date, Inspection.Disposition
FROM Inspection
WHERE (((Inspection.Date) Between #11/26/2012# And #11/27/2012#) AND ((Inspection.Disposition)="PASS" Or (Inspection.Disposition)="FAIL"));
Date | Disposition
11/26/2012 | PASS
11/26/2012 | FAIL
11/26/2012 | FAIL
11/26/2012 | PASS
11/26/2012 | PASS
11/26/2012 | PASS
11/26/2012 | PASS
11/27/2012 | PASS
11/27/2012 | PASS
11/27/2012 | FAIL
11/27/2012 | PASS
11/27/2012 | FAIL
11/27/2012 | PASS
11/27/2012 | FAIL
11/27/2012 | FAIL
*NOTE - The date field is of type "Date," and the Disposition field is of type "Text." There are days where no inspections are done, and these days are not to show up on the graph. The inspection disposition can also be listed as "NA," which refers to another type of inspection not to be graphed.
Here is the layout I want to create in another query (again, for brevity, only 2 days in range):
Date | # Insp | # Passed | # Failed | % Acceptance
11/26/2012 | 7 | 5 | 2 | 71
11/27/2012 | 8 | 4 | 4 | 50
What I think needs to be done is some type of join on the record dates themselves and "calculated fields" in the rest of the query results. The problem is
that I haven't found out how to "flatten" the records by date AND maintain a count of the number of inspections and the number passed/failed all in one query. Do I need multiple layered queries for this? I prefer not to store any of the queries as tables as the only use of these numbers is in graphical form.
I was thinking of making new columns in the database to get around the "Disposition" field being Textual by assigning a PASS "1" and a FAIL "0," but this seems like a cop-out. There has to be a way to make this work in SQL, just I haven't found applicable examples.
Thanks for your help! Any input or suggestions are appreciated! Example databases with forms, queries, and graphs are also helpful!
You could group by Date, and then use aggregates like sum and count to calculate statistics for that group:
select Date
, count(*) as [# Insp]
, sum(iif(Disposition = 'PASS',1,0)) as [# Passed]
, sum(iif(Disposition = 'FAIL',1,0)) as [# Failed]
, 100.0 * sum(iif(Disposition = 'PASS',1,0)) / count(*) as [% Acceptance]
from YourTable
where Disposition in ('PASS', 'FAIL')
group by
Date