Posts

Showing posts from February, 2020

Git Pipelines

Image
Git has become the  de facto  standard for version control. This has given rise to many vendors hosting Git repositories. Each vendor provides Git functionality such as branching, pull requests, project membership. Now there is growing competition to provide Continuous Integration / Continuous Delivery (CI/CD) services. It has become a very competitive market. One feature that extends version control beyond just hosting source repositories, is pipelines. Pipelines are an extensible suite of tools to build, test and deploy source code. Even data hosting sites like  Kaggle  now support  pipelines . This article provides a brief summary of some pipeline features from three popular Git hosting sites:  GitLab ,  Bitbucket  and  GitHub . This article was written in GitLab flavoured Markdown, and rendered to HTML using  pandoc . This provides a version controlled project that can be used to show features for each Git repository. The fea...

Anomaly Detection and Change Point Detection - Reproduced

Image
Ano malies are patterns in the data that do not conform to a well-defined notion of normal behavior. Techniques used to detection anomalies typically require training before using on new data. Here we will reproduce the results from  Oana Niculaescu 's article in  XRDS ,  Applying Data Science for Anomaly and Change Point Detection . This article was generated using a Jupyter Notebook. The notebook is available  here . Detecting Changes The  CUSUM  algorithm is used to test for anomalies. This requires two parameters:  threshold  and  drift . But, how do you choose values for these parameters? Gustafsson (2000) provides this recipe: Start with a very large  threshold . Choose  drift  to one half of the expected change, or adjust drift such that  g = 0  more than 50% of the time. Then set the  threshold  so the required number of false alarms (this can be done automatically) or delay for detection is obtained...