Types of algirthms using to predict item inventory stock - dataframe

I want to predict stock by analyzing stock mouvement.
Stock mouvement
enter image description here
STOCK :
enter image description here
Which step to start analzying data using machine learning.
Which algorithme ML and DL to use.
Thanks a lot
I need :
learn step to start analzying data using machine learning and deep learning.
Type of algorithme ML and DL to use.

Mohamed:
Your question requires a long and broad answer. I'll try to provide some steps and reference for you to explore further.
First of all you need the right data. The prediction will be as good as your data. So you need economic data, market trends, and company-specific events.
Then you need to follow the standard steps:
Collect and Preprocess Data: You'll need to gather stock data such as daily closing prices, trading volumes, and other relevant financial information. Preprocessing the data involves cleaning and transforming the data so that it can be used for modeling.
Feature Engineering: This involves creating new features from the existing data that can be used as inputs for the ML/DL models. For example, you can calculate technical indicators such as moving averages, relative strength index (RSI), and Bollinger Bands.
Split the Data: You'll need to split the data into training, validation, and testing sets. The training set is used to train the ML/DL models, the validation set is used to fine-tune the models, and the testing set is used to evaluate the performance of the models.
Select an Algorithm: There are various ML algorithms that can be used for stock price prediction. Some popular algorithms include:
ML Algorithms: Linear Regression, Random Forest, XGBoost, Support Vector Machines (SVM), etc.
Hope that helps.
Reference Book: https://learning.oreilly.com/api/v1/continue/9781800560796/
Machine Learning Engineering with MLflow - Chapter 2

Related

Does knowledge distillation have an ensemble effect?

I don't know much about knowledge distillation.
I have a one question.
There is a model with showing 99% performance(10class image classification). But I can't use a bigger model because I have to keep inference time.
Does it have an ensemble effect if I train knowledge distillation using another big model?
-------option-------
Or let me know if there's any way to improve performance than this.
enter image description here
The technical answer is no. KD is a different technique from ensembling.
But they are related in the sense that KD was originally proposed to distill larger models, and the authors specifically cite ensemble models as the type of larger model they experimented on.
Net net, give KD a try on your big model to see if you can keep a lot of the performance of the bigger model but with the size of the smaller model. I have empirically found that you can retain 75%-80% of the power of the a 5x larger model after distilling it down to the smaller model.
From the abstract of the KD paper:
A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.
https://arxiv.org/abs/1503.02531

AutoML select the model manually

It is been a while I am looking for the best pipeline to do some classification using AutoML. But I want to know if it is possible to select the model manually and then just optimize its hyperparameters. For example, I want to just optimize SVM's hyperparameters and don't care about other models.
You can optimize only the selected model in MLJAR AutoML. It is open-source AutoML with code available at GitHub: https://github.com/mljar/mljar-supervised
The example code will look like:
automl = AutoML(algorithms=["Xgboost"], mode="Compete")
automl.fit(X, y)
The above code will tune only the Xgboost algorithm. The mode Compete is needed because the MLJAR AutoML can work in three modes: Explain, Perform, and Compete. Algorithms available in MLJAR AutoML: Baseline, Linear, Random Forest, Extra Trees, Decision Tree, Neural Networks, Nearest Neighbors, Xgboost, LightGBM, CatBoost.
I'm the author of MLJAR AutoML, I'll be happy to help you set it and run.

Strange algorithm selection when using Azure AutoML with XBoostClassifier on categorial data

I have a data model consisting only of categorial features and a categorial label.
So when I build that model manually in XGBoost, I would basically transform the features to binary columns (using LabelEncoder and OneHotEncoder), and the label into classes using LabelEncoder. I would then run a Multilabel Classification (multi:softmax).
I tried that with my dataset and ended up with an accuracy around 0.4 (unfortunately can't share the dataset due to confidentiality)
Now, if I run the same dataset in Azure AutoML, I end up with an accuracy around 0.85 in the best experiment. But what is really interesting is that the AutoML uses SparseNormalizer, XGBoostClassifier, with reg:logistic as objective.
So if I interpret this right, AzureML just normalizes the data (somehow from categorial data?) and then executes a logistic regression? Is this even possible / does this make sense with categorial data?
Thanks in advance.
TL;DR You're right that normalization doesn't make sense for training gradient-boosted decision trees (GBDTs) on categorical data, but it won't have an adverse impact. AutoML is an automated framework for modeling. In exchange for calibration control, you get ease-of-use. It is still worth verifying first that AutoML is receiving data with the columns properly encoded as categorical.
Think of an AutoML model as effectively a sklearn Pipeline, which is a bundled set of pre-processing steps along with a predictive Estimator. AutoML will attempt to sample from a large swath of pre-configured Pipelines such that the most accurate Pipeline will be discovered. As the docs say:
In every automated machine learning experiment, your data is automatically scaled or normalized to help algorithms perform well. During model training, one of the following scaling or normalization techniques will be applied to each model.
Too see this, you can called .named_steps on your fitted model. Also check out fitted_model.get_featurization_summary()
I especially empathize with your concern especially w.r.t. how LightGBM (MSFT's GBDT implementation) is levered by AutoML. LightGBM accepts categorical columns and instead of one-hot encoding, will bin them into two subsets whenever split. Despite this, AutoML will pre-process away the categorical columns by one-hot encoding, scaling, and/or normalization; so this unique categorical approach is never utilized in AutoML.
If you're interested in "manual" ML in Azure ML, I highly suggest looking into Estimators and Azure ML Pipelines

Testing Unsupervised Machine Learning Algorithms

All over the Internet, I can see applications of supervised and unsupervised Machine Learning Algorithms but no one is talking about maintaining the quality of machine learning apps.
The recent analysis on how to test unsupervised machine learning algorithms brought up these points:
1) Cross-validation Testing: Dataset is divided into equal folds(parts) and all folds except the one are used as training dataset and later is used as test dataset Few more options around using test and training dataset.
Are there more effective ways of testing unsupervised ML algorithms where the output is uncertain?
Depending on the type of algorithm (and choosen distance) you used, you can still try to look if the variance between group and the variance within group is changing a lot.
If your algorithm is still as good as when you built it, the variance between and the variance within should not change that much. If the variance between shrink (or the reverse), it mean groups are not as well separated as before by your algorithm.
The second thing you can try is to keep some observations (that you know were well classified) to see if they are still in the same group once you retrained your algorithm. If not, it doesn't mean your algorithm is wrong, but you can send an alert in this case to look deeper.

Dataset for training text classifier

I'm new to data mining and I am trying to build a classifier that is able to classify student theses abstracts into a predefined set of categories under the area of Computer Science, e.g. Machine learning, Image Processing...etc.
I do not have enough classified abstracts to be used as a training dataset so would you please direct me to a dataset that can be used for this particular purpose.
You can use the DBLP data (downloadable from http://dblp.uni-trier.de/xml/) to generate a list of publications. Based on conferences/journal you can generate your classes e.g. MLJR is alway Machine Learning.
The abstracts you can acquire using:
https://github.com/arc12/Text-Mining-Weak-Signals/blob/master/Abstract%20Acquisition%20Scripts/DBLP%20XML%20fetch%20abstracts%20.pl