how to predict election turnout using previous data? - bayesian

I am trying to predict election turnout based on previous (historical) data without advanced machine learning, how can I use Bayesian Theorem to predict which candidate has how much chance of winning the election?

Related

Types of algirthms using to predict item inventory stock

I want to predict stock by analyzing stock mouvement.
Stock mouvement
enter image description here
STOCK :
enter image description here
Which step to start analzying data using machine learning.
Which algorithme ML and DL to use.
Thanks a lot
I need :
learn step to start analzying data using machine learning and deep learning.
Type of algorithme ML and DL to use.
Mohamed:
Your question requires a long and broad answer. I'll try to provide some steps and reference for you to explore further.
First of all you need the right data. The prediction will be as good as your data. So you need economic data, market trends, and company-specific events.
Then you need to follow the standard steps:
Collect and Preprocess Data: You'll need to gather stock data such as daily closing prices, trading volumes, and other relevant financial information. Preprocessing the data involves cleaning and transforming the data so that it can be used for modeling.
Feature Engineering: This involves creating new features from the existing data that can be used as inputs for the ML/DL models. For example, you can calculate technical indicators such as moving averages, relative strength index (RSI), and Bollinger Bands.
Split the Data: You'll need to split the data into training, validation, and testing sets. The training set is used to train the ML/DL models, the validation set is used to fine-tune the models, and the testing set is used to evaluate the performance of the models.
Select an Algorithm: There are various ML algorithms that can be used for stock price prediction. Some popular algorithms include:
ML Algorithms: Linear Regression, Random Forest, XGBoost, Support Vector Machines (SVM), etc.
Hope that helps.
Reference Book: https://learning.oreilly.com/api/v1/continue/9781800560796/
Machine Learning Engineering with MLflow - Chapter 2

Recurrent Neural Models with Multiple Individuals/Ids

I'm fairly beginner in the world of machine learning and am trying to make some predictions on fantasy sport performance in baseball on any given night. Given the sequential nature of the data, recurrent neural networks seemed liked a good starting point.
I understand the basic principles of rnn but what isn't clear to mean is how to incorporate multiple time series' from different individuals into a single model. For instance, we have performance for 2000 players across each of their career and hence have 2000 distinct time series. In order to make use of rnn, would I have to build models for each player separately, or is it possible/better to pass a player's ID into the model as a feature?
If the latter is possible, I'm still unsure about how this would mechanically work, because players have different time series lengths, and we would have many time series observations for a particular point in time.
Some references/examples/advice would be very helpful.

Dynamically find the right amount of timesteps for an LSTM RNN predicting timeseries

I'm fairly new to the world of NNs so it could be that my question is extremely stupid, so sorry in advance:
I'm working on a predictive autoscaler, trying to predict the workload of different, unknown, applications using only historic workload data about this application.
One of my predictors is a LSTM RNN. To make the RNN predict the values in question, I have to define the timestep - that is the amount of Lags I feed in to the RNN to predict the future value (I hope I used the right terms here). Alot of tutorials and literature seems to set the timestep to a value that seems pretty random to me. My Question can be divided in two subquestions:
1. Given I don't know the Timeseries during implementation: Is there any way to compute this value other than trying different values and comparing the confidence of the prediction?
2. How does the Value influence the assumptions the RNN learns about that time series?
I sadly lack of any intuition on what this value influences. To make an example of my confusion:
Given I have a Time Series with a yearly seasonality, but I decide that I will only feed in the data of a week to make the next prediction: Is the Network able to learn this yearly seasonality? Part of me says no because it can't learn that the partial correlation between the timestamp in question and the lag 365 days ago is very high, because it does not have that data, right? Or does it because it has seen the data from a year ago while training and learned that fairly similar pattern and simply applies it now (which is more likely to be right I guess)?
Is my assumption right, that taking too many timestamps into the equation overfits the network?
Can you please help me get a vague understanding of what this parameter influences in the great scheme of things and what properties of a time series should influence my choice of that value?
Thank you so much and stay healthy :)

Why machine learning algorithms focus on speed and not accuracy?

I study ML and I see that most of the time the focus of the algorithms is run time and not accuracy. Reducing features, taking sample from the data set, using approximation and so on.
Im not sure why its the focus since once I trained my model I dont need to train it anymore if my accuracy is high enough and for that if it will take me 1 hours or 10 days to train my model it does not really matter because I do it only 1 time and my goal is to predict as better as I can my outcomes (minimum loss).
If I train a model to differ between cats and dogs I want it to be the most accurate it can be and not the fasted since once I trained this model I dont need to train any more models.
I can understand why models that depends on fasting changing data need this focus of speed but for general training models I dont understand why the focus is on speed.
Speed is relative term. Accuracy is also relative depending on the difficulty of the task. Currently the goal is to achieve human-like performance for application at reasonable costs because this will replace human labor and cut costs.
From what I have seen in reading papers, people usually focus on accuracy first to produce something that works. Then do ablation studies - studies where pieces of the models are removed or modified - to achieve the same performance in less time or memory requirements.
The field is very experimentally validated. There really isn't much of a theory that states why CNN work so well other than that it can model any function given non-linear activations functions. (https://en.wikipedia.org/wiki/Universal_approximation_theorem) There have been some recent efforts to explain why it works well. One I recall is MobileNetV2: Inverted Residuals and Linear Bottlenecks. The explaination of embedding data into a low dimensional space without losing information might be worth reading.

Testing Unsupervised Machine Learning Algorithms

All over the Internet, I can see applications of supervised and unsupervised Machine Learning Algorithms but no one is talking about maintaining the quality of machine learning apps.
The recent analysis on how to test unsupervised machine learning algorithms brought up these points:
1) Cross-validation Testing: Dataset is divided into equal folds(parts) and all folds except the one are used as training dataset and later is used as test dataset Few more options around using test and training dataset.
Are there more effective ways of testing unsupervised ML algorithms where the output is uncertain?
Depending on the type of algorithm (and choosen distance) you used, you can still try to look if the variance between group and the variance within group is changing a lot.
If your algorithm is still as good as when you built it, the variance between and the variance within should not change that much. If the variance between shrink (or the reverse), it mean groups are not as well separated as before by your algorithm.
The second thing you can try is to keep some observations (that you know were well classified) to see if they are still in the same group once you retrained your algorithm. If not, it doesn't mean your algorithm is wrong, but you can send an alert in this case to look deeper.