what is the difference between the point process and time-series model such as T-LSTM - process

Can someone please explain this question?
I know the point process is used to model the historical event's influence on the future event, and it seems that these time-series models can also achieve this function.
So what is the difference between them?

Related

Pyomo: passing information from one optimisation to another

I've been slowly building up an energy dispatch model using Pyomo, which now has most of the dispatch constraints that I want included.
Now comes for what I think may be the tricky bit. To avoid the problem of perfect foresight, I want to optimise one day at a time. When I optimise each subsequent day, the model will be passed new information: information on the load point and whether each unit is committed, from the previous day's result. At the moment my model just runs each day independently, and churns out a json file with the outputs from each day.
I think I now want to modify the workflow so that:
I optimise for day 'd'
Grab selected outputs from the optimisation of 'd'
These outputs become initial condition for 'd+1'
Optimise 'd+1'
Subsequently, I will add look-aheads into the code, but I think that will be straight-forward once I have cracked the above.
I think I could work this out by writing some code that processes the outputs from 'd' outside of Pyomo, creates a new set of inputs to 'd+1' and then goes back to Pyomo for optimise 'd+1', but that feels like a cumbersome solution that might be quite slow. Is anyone able to point me towards an example or guidance on how I might tackle this in a more efficient way?
I wrote a little something to help post process all parameters and variables of a Pyomo model after solve.
This may help you easily get all your outputs and put them as inputs in your next iteration.
Good luck.
https://github.com/judejeh/PyomoSolverWrapper

Neural Network: Convert HTML Table into JSON data

I'm kinda new to Neural Networks and just started to learn coding them by trying some examples.
Two weeks ago I was searching for an interesting challenge and I found one. But I'm about to give up because it seems to be too hard for me... But I was curious to know if anyone of you is able to solve this?
The Problem: Assume there are ".htm"-files that contain tables about the same topic. But the table structure isn't the same for every file. For example: We have a lot ".htm"-files containing information about teachers substitutions per day per school. Because the structure of those ".htm"-files isn't the same for every file it would be hard to program a parser that could extract the data from those tables. So my thought was that this is a task for a Neural Network.
First Question: Is it a task a Neural Network can/should handle or am I mistaken by that?
Because for me a Neural Network seemed to fit for this kind of a challenge I tried to thing of an Input. I came up with two options:
First Input Option: Take the HTML Code (only from the body-tag) as string and convert it as Tensor
Second Input Option: Convert the HTML Tables into Images (via Canvas maybe) and feed this input to the DNN through Conv2D-Layers.
Second Question: Are those Options any good? Do you have any better solution to this?
After that I wanted to figure out how I would make a DNN output this heavily dynamic data for me? My thought was to convert my desired JSON-Output into Tensors and feed them to the DNN while training and for every prediction i would expect the DNN to return a Tensor that is convertible into a JSON-Output...
Third Question: Is it even possible to get such a detailed Output from a DNN? And if Yes: Do you think the Output would be suitable for this task?
Last Question: Assuming all my assumptions are correct - Wouldn't training this DNN take for ever? Let's say you have a RTX 2080 ti for it. What would you guess?
I guess that's it. I hope i can learn a lot from you guys!
(I'm sorry about my bad English - it's not my native language)
Addition:
Here is a more in-depth Example. Lets say we have a ".htm"-file that looks like this:
The task would be to get all the relevant informations from this table. For example:
All Students from Class "9c" don't have lessons in their 6th hour due to cancellation.
1) This is not particularly suitable problem for a Neural Network, as you domain is a structured data with clear dependcies inside. Tree based ML algorithms tend to show much better results on such problems.
2) Both you choices of input are very unstructured. To learn from such data would be nearly impossible. The are clear ways to give more knowledge to the model. For example, you have the same data in different format, the difference is only the structure. It means that a model needs to learn a mapping from one structure to another, it doesn't need to know any data. Hence, words can be Tokenized with unique identifiers to remove unnecessary information. Htm data can be parsed to a tree, as well as json. Then, there are different ways to represent graph structures, which can be used in a ML model.
3) It seems that the only adequate option for output is a sequence of identifiers pointing to unique entities from text. The whole problem then is similar to Seq2Seq best solved by RNNs with an decoder-encoder architecture.
I believe that, if there is enough data and htm files don't have huge amount of noise, the task can be completed. Training time hugely depends on selected model and its complexity, as well as diversity of initial data.

Can .tflite capture tf.hub.text_embedding_column() processes?

Just a general question here, no reproducible example but thought this might be the right place anyway since its very software specific.
I am building a model which I want to convert to .tflite. It relies on tf.hub.text_embedding_collumn() for feature generation. When I convert to .tflite will this be captured such that the resulting model will take raw text as input rather than a sparse vector representation?
Would be good to know just generally before I invest too much time in this approach. Thanks in advance!
Currently I don't imagine this would work, as we do not support enough string ops to implement that. One approach would be to do this handling through a custom op, but implementing this custom op would require domain knowledge and mitigate the ease-of-use advance of using tf hub in the first place.
There is some interest in defining a set of hub operators that are verified to work well with tflite, but this is not yet ready.

What are the types of problems TensorFlow can help solve? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 months ago.
Improve this question
The TensorFlow home page describes its purpose as 'a software library for numerical computation'. Looking through the sample problems it looks like a problem is always formulated as follows:
Input
Model parameters
Desired output
Given some training data for 1) and 3), 2) can be computed.
I can see how this can be used to create bots, self-driving cars, image classifiers etc.
Given the broad definition of 'numerical computation', am I missing a class of other problems this can be used for? Can this be used for, say, more classical numerical computations such as the airflow around an aircraft or deformation of a structure under stress? Do you have any examples of how these classical problems would have to be formulated to fit the form above?
A nice discussion on what artificial neural networks could do, the fact that our brain is a neural network might imply that eventually an artificial neural network will be able to to the same tasks.
Some more examples of artificial neural networks used today: music creation, image based location, page rank, google voice, stock trade predictions, nasa star classifiaction, traffic management
Some fields i know of but do not have a good reference for:
optical quantum mechanics test set-up generator
medical diagnosis, reference only about safety
The Sharp LogiCook microwave oven, wiki, nasa mention
I think there are many millions of "problems" that can be solved with an ANN, deciding on the data representation (input,output) will be a challenge for some of these. some useful and useless examples i have been thinking about:
home thermostat that learns your wishes with certain weather types.
bakery production prediction
recognize go-stones on a board and map their locations
personal activity guesser and turn on appropriate device.
recognize person based on mouse movement
Given the right data and network these examples will work.
Dad has a pc controlling the heating system back home, i trained a network based on his 10years of heating data (outside temp, inside temp, humidity etc.) unfortunately i am not allowed to hook it up.
My aunt and uncle have a bakery, based on 6years of sales data i trained a network predicting how many breads and buns they should make. It showed me how important the correct inputs are. first i used the day of the year but when i switched to day of the week i saw a 15% increase in accuracy.
Currently i am working on a network that will detect a go board in a given image and map all 361 locations telling me if there is a black, white or no stone present.
Two examples that showed me how much information can be stored in a single neuron and of different ways to represent data:
Image example, neuron example (unfortunately you have to train both examples yourself so give them a little time.)
On to your example airflow around an aircraft.
I know none to nothing about airflow calculations and my try would be a really huge 3D input layer where you can "draw" an airplane and the direction and speed of the airflow.
It might work but it will require a tremendous amount of computation power, somebody knowing more about this specific topic probably knows a more abstract way of representing the data resulting in a more manageable network.
This nasa paper talks about a neural network for calculating airflow around a wing. Unfortunately i do not understand what kind of input they use, maybe it is more clear to you.

Amazon EC2 vs PiCloud [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
We are students trying to handling data size of about 140 million records and trying to run few machine learning algorithms. we are newbie to the entire cloud solutions and mahout implementations.Currently we have set them up in postgresql database but the current implementation doesn't scale up and read/write operations seems to be extremely slow after numerous performance tuning.Hence we are planning to go for cloud based services.
We have explored a few possible alternatives.
Amazon cloud based services( Mahout implementation)
Picloud with scikits learn (we were planning to use HDF5 format with NumPy)
Please recommend any other alternatives if any.
Here are the following questions
Which would yield us better results(turn around time) and would be cost effective? Please do mention us any other alternatives present.
In case if we set up amazon services how should we have the data format? If we use dynamodb will the cost shoot up?
Thanks
It depends on the nature of the machine learning problem you want to solve. I would recommend you to first subsample your dataset to something that fits in memory (e.g. 100k samples with a few hundred non-zero features per samples assuming a sparse representation).
Then try a couple of machine learning algorithms that scale to large number of samples in scikit-learn:
SGDClassifier or MultinomialNB if you want to do supervised classification (if you have categorical labels to predict in your dataset)
SGDRegressor if you want to do supervised regression (if you have continuous target variable to predict)
MiniBatchKMeans clustering to do unsupervised clustering (but then there is no objective way to quantify the quality of the resulting clusters by default).
...
Perform grid search to find the optimal values of the hyperparameters of the model (e.g. the regularizer alpha and the number of passes n_iter for SGDClassifier) and evaluate the performance using cross-validation.
Once done, retry with 2x larger dataset (still fitting in memory) and see if it improves you predictive accuracy significantly. If it's not the case then don't waste your time trying to parallelize this on a cluster to run that on the full dataset as it won't yield any better results.
If it does what you could do, is shard the data into pieces, then slices of data on each nodes, learn of SGDClassifier or SGDRegressor model on each node independently with picloud and collect back the weights (coef_ and intercept_) and then compute the average weights to build the final linear model and evaluate it on some held out slice of your dataset.
To learn more about the error analysis. Have look at how to plot learning curves:
http://digitheadslabnotebook.blogspot.fr/2011/12/practical-advice-for-applying-machine.html
https://gist.github.com/1540431
http://jakevdp.github.com/tutorial/astronomy/practical.html#bias-variance-over-fitting-and-under-fitting
PiCloud is built on top of AWS, so either way you're going to be using Amazon at the end of the day. The question is how much infrastructure you'll have to write yourself to get everything wired together. PiCloud gives some free usage to put it through the paces so you might give it shot initially. I haven't used it myself but clearly they're trying to provide ease of deployment for machine-learning type applications.
It seems like this is trying for results, not to be a cloud project, so I would either look into using one of Amazon's other services besides straight EC2 or otherwise some other software like PiCloud or Heroku or other service that can take care of the bootstrapping.
AWS has a program in place for supporting educational users, so you might want to do some research into that program.
You should take a look at numba if you are looking for some Numpy speed ups:
https://github.com/numba/numba
Doesn't solve your cloud scaling issue, but may reduce time to compute.
I just made a comparison between PiCloud & Amazon EC2 > might be helpful.