The Problem:
In this series I will chronicle my journey of working with a marketing attribution team that faced a common challenge in the data science world. They had several marketing attribution models written as a series of scripts in the R programming language hosted on Github. They were running these scripts on MacBooks every week manually, but the time had come to move these operations to the cloud, specifically AWS. This will not be a deep-dive on R programming or model development, but more on the process to deploy models and run them in a cloud-native way on AWS.
The requirements for this migration were clear:
- Keep costs low
- Avoid long-running infrastructure
- Automate everything - including the code hosted on Github
In this series you will learn about how I approached packaging an R codebase on Github into a fully serverless, cloud-native environment in AWS.
The challenge
To address these requirements, I needed to implement several key cloud-native patterns:
- Serverless - Run R scripts in a completely serverless environment to avoid managing infrastructure.
- Event-Driven Architecture: Execute processes (models) at the right time based on schedules or systemic events.
- Continuous Integration and Deployment (CI/CD): Automate code management and deployment processes.
What’s Coming in This Series
- Docker: I will review how I packaged the process into a set of Docker layered components. I will cover why I took this approach and the benefits I achieved with this architecture.
- AWS ECS Fargate Learn how I arrived at a completely serverless compute infrastructure in AWS using AWS ECS Fargate.
- Code management and CI/CD: I will cover how I used Gihub actions to trigger Docker builds into an AWS Elastic Container Registry (ECR.)
- Quicksight: We will get into Quicksight and how I used it to visualize model output!
Join me on this Journey!
Whether you are a data scientist, DevOps engineer or cloud engineer looking to understand a real-world case study on building an end to end cloud native data analytics solution in AWS, this series has something for you.
In the next post, I will cover Docker in detail, how I approached the problem of Dockerizing the R code and scripts. I will cover the anatomy of a Dockerfile and how I built a layer-based architecture in Docker to be the heart of the data science model runtime.
Stay tuned, let’s dive deep into this project that is at the intersection of data science, cloud computing and marketing tech!