Ruthger Righart is a self employed data scientist from France (greater Geneva area). His passion is mining business value from data.
AWS Elastic Beanstalk (EB) is an Amazon cloud service that enables dashboard web applications (a GIF example, dashboard demo). These applications can be consulted from any place, any time, and are secure, robust and reliable. In this blog, I will discuss 3 powerful features of EB that are key for developing satisfying data projects: extensibility, flexibility, and continuous deployment.
Many software packages are excellent and specialized in creating wonderful dashboards. However, often various data science tasks need to be done in a project that cannot be accomplished with these packages. If these tasks are part of different components in a pipeline, such as data cleaning, data restructuring, forecasting, customer segmentation etc., it may be better to have a single platform that can handle and automate these steps. AWS has various services available that can be used together with EB in a single platform. For example, for an e-commerce it would be paramount to load at regular intervals the newest sales records, pre-process these data, and update sales statistics to adapt supply and forecast sales in the nearby future. Some services that can be used next to EB are AWS S3 for simple storage, RDS or Redshift for databases, or AWS forecast to predict future time-series. This extensibility would in the end save time-consuming switching costs between different platforms and allows automation of different steps.
Data are not static but become larger, sometimes every minute. EB has a flexibility to scale to larger data streams using higher capacity CPU or memory units. Using a load balancer, it is able to distribute traffic between the different instances. Instances are a kind of virtual machines that have different capacities, depending on your processing demands.
Another flexibility is the various programming languages that are supported (Python, Go, Java, Ruby). There are also multiple options for deploying applications, using online GUI, a command line interface, or through continuous deployment (see below).
Continuous deployment allows a team to build, test and deliver improvements to code. Developers can work on the same project and flawlessly integrate all work using for example GitHub. GitHub can be used to control code versions and automate the deployment to EB. A deployment to EB only takes a couple of minutes and can be monitored in the CodePipeline in AWS. New scripts that are pushed from GitHub to EB are listed in the CodePipeline history and the dashboard is updated.
Another advantage is that Docker can be integrated. Docker containerizes the application with all the required packages, for example Python Numpy, Pandas, Dash for making dashboards, etc. Package versions are pinned to take care that everyone in the development team runs compatible versions. This prevents time costly bugs due to version differences.
EB has insightful graphs for monitoring health status of your dashboard application. It will for example display how much memory and CPU is used. Email alerts can be sent to warn that something is wrong. If any, the resulting logfiles will tell where errors or bugs occurred.
Of course, like any tool, there are some downsides: AWS has a lot of functionalities and this may in the beginning feel a bit overwhelming. The learning curve may be steep at the start, but the investment pays out in the end as you will be able to scale, extend and improve your data projects.
Any questions or help needed setting up your AWS cloud computing? Feel free to contact me. EB demo requests can be sent as well.
Email: firstname.lastname@example.org | email@example.com
Images on this site were kindly provided by https://unsplash.com/ .