Intrusion DetecTion DataSet - system

First Off all i hope my subject will not delte because maybe out Topic But i Didn't Find better Website to Post on it.
I'm working on Intrusion detection Project, as my research Most of intrusion detection dataset (KDD,DARPA, CDX,ISCX...) every base had their proper format(arff,tcmpdump,dump,csv...) So i want to Convert DataSet from Dump and Tcpdump to Arff, to arff format (if you have better idea to make the dataset into same format i'll be thankfull), what is the best way to do this ?
And the last question is the best Intrusion detection System, which can analyse Heterogenous Dataset format and give me Detection Rate of every Attack

First, the dataset KDDCup is based on traffic collected of DARPA. The Darpa Dataset isn’t labeled and contains only traffic collected of experiments, that is why format Tcpdump.
So, KDDCup is work on DARPA. They analyse all traffic(packets) and selected features that help algorithms classified traffic as normal and anomaly. We call this approach "Offline Learning" when we have the labeled data.

Related

Neural Network: Convert HTML Table into JSON data

I'm kinda new to Neural Networks and just started to learn coding them by trying some examples.
Two weeks ago I was searching for an interesting challenge and I found one. But I'm about to give up because it seems to be too hard for me... But I was curious to know if anyone of you is able to solve this?
The Problem: Assume there are ".htm"-files that contain tables about the same topic. But the table structure isn't the same for every file. For example: We have a lot ".htm"-files containing information about teachers substitutions per day per school. Because the structure of those ".htm"-files isn't the same for every file it would be hard to program a parser that could extract the data from those tables. So my thought was that this is a task for a Neural Network.
First Question: Is it a task a Neural Network can/should handle or am I mistaken by that?
Because for me a Neural Network seemed to fit for this kind of a challenge I tried to thing of an Input. I came up with two options:
First Input Option: Take the HTML Code (only from the body-tag) as string and convert it as Tensor
Second Input Option: Convert the HTML Tables into Images (via Canvas maybe) and feed this input to the DNN through Conv2D-Layers.
Second Question: Are those Options any good? Do you have any better solution to this?
After that I wanted to figure out how I would make a DNN output this heavily dynamic data for me? My thought was to convert my desired JSON-Output into Tensors and feed them to the DNN while training and for every prediction i would expect the DNN to return a Tensor that is convertible into a JSON-Output...
Third Question: Is it even possible to get such a detailed Output from a DNN? And if Yes: Do you think the Output would be suitable for this task?
Last Question: Assuming all my assumptions are correct - Wouldn't training this DNN take for ever? Let's say you have a RTX 2080 ti for it. What would you guess?
I guess that's it. I hope i can learn a lot from you guys!
(I'm sorry about my bad English - it's not my native language)
Addition:
Here is a more in-depth Example. Lets say we have a ".htm"-file that looks like this:
The task would be to get all the relevant informations from this table. For example:
All Students from Class "9c" don't have lessons in their 6th hour due to cancellation.
1) This is not particularly suitable problem for a Neural Network, as you domain is a structured data with clear dependcies inside. Tree based ML algorithms tend to show much better results on such problems.
2) Both you choices of input are very unstructured. To learn from such data would be nearly impossible. The are clear ways to give more knowledge to the model. For example, you have the same data in different format, the difference is only the structure. It means that a model needs to learn a mapping from one structure to another, it doesn't need to know any data. Hence, words can be Tokenized with unique identifiers to remove unnecessary information. Htm data can be parsed to a tree, as well as json. Then, there are different ways to represent graph structures, which can be used in a ML model.
3) It seems that the only adequate option for output is a sequence of identifiers pointing to unique entities from text. The whole problem then is similar to Seq2Seq best solved by RNNs with an decoder-encoder architecture.
I believe that, if there is enough data and htm files don't have huge amount of noise, the task can be completed. Training time hugely depends on selected model and its complexity, as well as diversity of initial data.

Is there a way to partition a tf.Dataset with TensorFlow’s Dataset API?

I checked the doc but I could not find a method for it. I want to de cross validation, so I kind of need it.
Note that I'm not asking how to split a tensor, as I know that TensorFlow provides an API for that an has been answered in another question. I'm asking on how to partition a tf.Dataset (which is an abstraction).
You could either:
1) Use the shard transformation partition the dataset into multiple "shards". Note that for best performance, sharding should be to data sources (e.g. filenames).
2) As of TensorFlow 1.12, you can also use the window transformation to build a dataset of datasets.
I am afraid you cannot. The dataset API is a way to efficiently stream inputs to your net at run time. It is not a set of tools to manipulate datasets as a whole -- in that regards it might be a bit of a misnomer.
Also, if you could, this would probably be a bad idea. You would rather have this train/test split done once and for all.
it let you review those sets offline
if the split is done each time you run an experiment there is a risk that samples start swapping sets if you are not extremely careful (e.g. when you add more data to your existing dataset)
See also a related question about how to split a set into training & testing in tensorflow.

IPA (International Phonetic Alphabet) Transcription with Tensorflow

I'm looking into designing a software platform that will aid linguists and anthropologists in their study of previously unstudied languages. Statistics show that around 1,000 languages exist that have never been studied by a person outside of their respective speaker groups.
My goal is to utilize TensorFlow to make a platform that will allow linguists to study and document these languages more efficiently, and to help them create written systems for the ones that don't have a written system already. One of their current methods of accomplishing such a task is three-fold: 1) Record a native speaker conversing in the language, 2) Listening to that recording and trying to transcribe it into the IPA, 3) From the phonetics, analyzing the phonemics and phonotactics of the language to eventually create a written system for the speaker.
My proposed platform would cut that research time down from a minimum of a year to a maximum of six months. Before I start, I have some questions...
What would be required to train TensorFlow to transcribe live audio into the IPA? Has this already been done? and if so, how would I utilize a previous solution for this project? Is a project like this even possible with TensorFlow? if not, what would you recommend using instead?
My apologies for the magnitude of this question. I don't have much experience in the realm of machine learning, as I am just beginning the research process for this project. Any help is appreciated!
I guess I will take a first shot at answering this. Since the question is pretty general, my answer will have to be pretty general as well.
What would be required. At the very least you would have to have a large dataset of pre-transcribed data. Ideally a large amount of spoken language audio mapped to characters in the phonetic alphabet, so the system could learn the sound of individual characters rather than whole transcribed words. If such a dataset doesn't exist, a less granular dataset could be used, mapping single words to their transcriptions. Then you would need a model, that is the actual neural network architecture implemented in code. And lastly you would need some computing resources. This is not something you can train casually, you would either have to buy some time in a cloud based machine learning framework (like Google Cloud ML) or build a fairly expensive machine to train at home.
Has this been done? I don't know. I don't think so. There have been published papers reporting various degrees of success at training systems to transcribe speech. Here is one, for example, http://deeplearning.stanford.edu/lexfree/lexfree.pdf It seems that since the alphabet you want to transcribe to is specifically designed to capture the way words sound rather than just write down the words you might have more success at training such a model.
Is it possible with TensorFlow. Yes, most likely. TensorFlow is well suited for implementing most modern deep learning architectures. Unless you end up designing some really weird and very original model for this purpose, TensorFlow should work just fine.
Edit: after some thought in part 1, you would have to use a dataset mapping spoken words to their transcriptions, since I expect that the same sound pronounced separately would be different from when the same sound is used in a word.
This has actually been done, albeit in PyTorch, by a group at CMU: https://github.com/xinjli/allosaurus

analysis Fitbit walking and sleeping data

I'm participating in small data analysis competition in our school.
We use Fitbit wearable devices, which is loaned to each participants by host of contest.
For 2 months during the contest, they walk and sleep with this small device 24/7,
allow it to gather data about participant's walk count with heart rate(bpm), etc.
and we need to solve some problems based on these participants' data
like, example,
show the relations between rainy days and participants' working out rate using the chart,
i think purpose of problem is,
because of rain, lot of participants are expected to be at home.
can you show some cause and effect numerically?
i'm now studying python library numpy, pandas with ipython notebook.
but still i have no idea about solving these problems..
could you recommend some projects or sites use for references? i really eager to win this competition.:(
and lastly, sorry for my poor English.
Thank you.
that's a fun project. I'm working on something kind of similar.
Here's what you need to do:
Learn the fitbit API and stream the data from the fitbit accelerometer and gyroscope. If you can combine this with heart rate data, great. The more types of data you have, the more effective your algorithm will be. You can store this data in a simple csv file (streaming the accel/gyro data at 50Hz is recommended). Or setup a web server and store it in a database for easy access
Learn how to use pandas and scikit learn
[optional but recommended]: Learn matplotlib so you can graph you data and get a feel for how it looks
Load the data into pandas and create features on the data - notably using 1-2 second sliding window analysis with 50% overlap. Good features include (for all three Accel X, Y, Z): max, min, standard deviation, root mean square, root sum square and tilt. Polynomials will help.
Since this is a supervised classification problem, you will need to create some labelled data - so do this manually (state 1 = rainy day, state 2 = non-rainy day) and then train a classification algorithm. I would recommend a random forest
Test using unlabeled data - don't forget to use cross validation
Voila, you now have a highly accurate model and will win the competition. Plus you've learned about a bunch of really cool Python and machine learning stuff.
For more tutorials on how all this stuff works, I'd highly recommend the Kaggle tutorial projects
BONUS: If you want to take it to a new level, you can start adding smoothers on top of your classifier, for example by using a Hidden Markov Model as explained in this talk
BONUS 2: Go get a PhD in Human Activity Recognition.

What are the types of problems TensorFlow can help solve? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 months ago.
Improve this question
The TensorFlow home page describes its purpose as 'a software library for numerical computation'. Looking through the sample problems it looks like a problem is always formulated as follows:
Input
Model parameters
Desired output
Given some training data for 1) and 3), 2) can be computed.
I can see how this can be used to create bots, self-driving cars, image classifiers etc.
Given the broad definition of 'numerical computation', am I missing a class of other problems this can be used for? Can this be used for, say, more classical numerical computations such as the airflow around an aircraft or deformation of a structure under stress? Do you have any examples of how these classical problems would have to be formulated to fit the form above?
A nice discussion on what artificial neural networks could do, the fact that our brain is a neural network might imply that eventually an artificial neural network will be able to to the same tasks.
Some more examples of artificial neural networks used today: music creation, image based location, page rank, google voice, stock trade predictions, nasa star classifiaction, traffic management
Some fields i know of but do not have a good reference for:
optical quantum mechanics test set-up generator
medical diagnosis, reference only about safety
The Sharp LogiCook microwave oven, wiki, nasa mention
I think there are many millions of "problems" that can be solved with an ANN, deciding on the data representation (input,output) will be a challenge for some of these. some useful and useless examples i have been thinking about:
home thermostat that learns your wishes with certain weather types.
bakery production prediction
recognize go-stones on a board and map their locations
personal activity guesser and turn on appropriate device.
recognize person based on mouse movement
Given the right data and network these examples will work.
Dad has a pc controlling the heating system back home, i trained a network based on his 10years of heating data (outside temp, inside temp, humidity etc.) unfortunately i am not allowed to hook it up.
My aunt and uncle have a bakery, based on 6years of sales data i trained a network predicting how many breads and buns they should make. It showed me how important the correct inputs are. first i used the day of the year but when i switched to day of the week i saw a 15% increase in accuracy.
Currently i am working on a network that will detect a go board in a given image and map all 361 locations telling me if there is a black, white or no stone present.
Two examples that showed me how much information can be stored in a single neuron and of different ways to represent data:
Image example, neuron example (unfortunately you have to train both examples yourself so give them a little time.)
On to your example airflow around an aircraft.
I know none to nothing about airflow calculations and my try would be a really huge 3D input layer where you can "draw" an airplane and the direction and speed of the airflow.
It might work but it will require a tremendous amount of computation power, somebody knowing more about this specific topic probably knows a more abstract way of representing the data resulting in a more manageable network.
This nasa paper talks about a neural network for calculating airflow around a wing. Unfortunately i do not understand what kind of input they use, maybe it is more clear to you.