# README 

### Cesar dataset and code

The cesar_data dataset is constituted of an extract of the AWS honeypot logs cited below, enriched with dummy SOC qualifications. The added qualifications are _not_ real and do not describe observations made during the initial AWS honeypot experience. The goal here is to demonstrate the manual and iterative approach of feature engineering and model learning in a SOC context, regardless of the data.

The *.ipynb files are to be opened with jupyter-notebook. They are written in Python 2.7 and illustrate 1/ the description phase of Machine Learning and 2/ the iterative training of models. 

### Requirements

The code has been tested on Linux Ubuntu 18.04 (4.15.0-20-generic), and with the following libraries and version numbers.

* pandas		0.23.0
* numpy			1.14.3
* matplotlib		2.2.2
* ggplot		0.11.5
* sklearn		0.0
* seaborn		0.8.1

### Usage
Copy the files 1_Description_des_donnees.ipynb, 2_Apprentissage.ipynb and cesar_data in the same directory. 

Then, run:

> jupyter-notebook 1_Description_des_donnees.ipynb

> jupyter-notebook 2_Apprentissage.ipynb


### Acknowledgements

http://datadrivensecurity.info/blog/pages/dds-dataset-collection.html Jay Jacobs & Bob Rudis