# README ### Cesar dataset and code The cesar_data dataset is constituted of an extract of the AWS honeypot logs cited below, enriched with dummy SOC qualifications. The added qualifications are _not_ real and do not describe observations made during the initial AWS honeypot experience. The goal here is to demonstrate the manual and iterative approach of feature engineering and model learning in a SOC context, regardless of the data. The *.ipynb files are to be opened with jupyter-notebook. They are written in Python 2.7 and illustrate 1/ the description phase of Machine Learning and 2/ the iterative training of models. ### Requirements The code has been tested on Linux Ubuntu 18.04 (4.15.0-20-generic), and with the following libraries and version numbers. * pandas 0.23.0 * numpy 1.14.3 * matplotlib 2.2.2 * ggplot 0.11.5 * sklearn 0.0 * seaborn 0.8.1 ### Usage Copy the files 1_Description_des_donnees.ipynb, 2_Apprentissage.ipynb and cesar_data in the same directory. Then, run: > jupyter-notebook 1_Description_des_donnees.ipynb > jupyter-notebook 2_Apprentissage.ipynb ### Acknowledgements http://datadrivensecurity.info/blog/pages/dds-dataset-collection.html Jay Jacobs & Bob Rudis