Juan Tejeria is a Lead DevOps Engineer at X-Team. He lives in Uruguay with his wife and two-year-old son. In this interview, he talks about his day-to-day life, the technologies he most enjoys working with, and some of the challenges he faces as a DevOps Engineer.
What are you currently working on? Walk me through a day-in-the-life.
I'm currently working on Beachbody's digital transformation. It started around eighteen months ago and we've been working on migrating each and every piece of infrastructure to what we think should be ideal for 2020 and beyond. I'm a Technical Lead at the DevOps team there.
My day starts early with my two-year-old waking me up. On regular days, I have about two to three hours before I start working, during which I have breakfast with my kid and my wife before I do anything else. Sometimes, though, I wake up and immediately need to take care of something urgent at Beachbody (considering they have teams around the world).
But if it's a regular day, I prepare my maté and start my day by checking Jira to see if anything has been added with an urgent flag. Sometimes we need to debug our servers to try and figure out some anomalies that happened during night shifts.
After my initial scan of how the infrastructure and services are behaving, I start working on my regular tasks, which are all meant to help multiple teams deliver scalable solutions to our client's customers. Two examples of such tasks are:
- Designing and implementing full CI/CD pipelines for a new service.
- Creating PoCs with new AWS features and services to see how they could complement the stack of our client.
As a DevOps Engineer, my day-to-day changes often, because we always try to keep moving forward and get things done. But we also want to support developers whenever they need us, so they can have better automation for their setups and deployments.
What's the technology you like working most with. Why?
I really enjoy working with AWS because I can do anything I need with it. It's well-documented and there are plenty of use cases that make it easy to adopt and implement new features or services.
In particular, the DevOps team uses an AWS service called EKS. Working with Kubernetes is great, because it allows us to define cloud-native applications and quickly deploy them into the cluster.
Apart from AWS, we also use a lot of Serverless. It helps us automate and define our services and resources.
Last but not least, I use Python for my scripting. It helps me automate and process certain tasks. For some tasks, I use Bash, but I always try Python first.
Tell me about the most challenging bug you faced this year.
It's been a busy year at Beachbody, in particular because of a big initiative that was released in April this year. Whenever one of X-Team's large clients has a big initiative, we always to try to implement full infrastructure for any branch a developer creates.
In this case, that meant a serverless deployment generating over 500 AWS resources, including over a hundred different Lambda functions and all the SQS, S3 Events, and different databases required for all of it to function properly.
It was challenging to set everything up in such a way that 30+ developers could have their active branch available as a service while giving different QA teams the ability to test each independent story. All this while maintaining a full CI/CD pipeline with automation tests and deployments to stable environments.
But we managed! We created pipelines with unique domains that were capable of doing this, as well as automation on teardown once a developer finished a story. We used GitHub events, TravisCI for our builds process, Harness to help our CD, and Serverless as the framework to get it all done.
Apart from that, we also had to design multiple CloudFormations for all the dependencies that we didn't want to handle in the Serverless definition (such as Databases or S3 buckets).
It was a really big project and it involved different parts of the company that all had to synchronize for this major release. There were lots of data migrations of MySQL rds to DynamoDBs to migrate photos and files from the old to the new infrastructure.
So that was the most challenging project, but I'd be amiss not to mention one "small" bug that challenged me to my limits. One of Beachbody's projects consumed from a Kinesis stream that held valuable data for BI that was used in stakeholder reports. All of a sudden, it stopped working. We couldn't figure out why. I hadn't worked with Kinesis before, so it was all new to me.
After learning about Kinesis, I increased the retention policy of the data that lives on the stream, so that we'd have enough time to fix the issue without losing information. That saved us, because it took four days to catch up with the data and fix this pernicious bug.
The problem was a mix of the memory on those old instances and how its consumer metrics were configured, plus some resharding on our Kinesis Stream itself. We were sending way more information to the stream than when it was first created (years ago) and it had reached its limits. Not easy to squash, that one!
What were some of the biggest lessons you've learned this year?
I learnt most about Serverless. I already knew some thing about Docker, Kubernetes, and deployments with ECS or EKS containers, but I'd never done such setups with Severless. It was really great to spin up and define more than 500 AWS resources at once with the API, plus all the events and infrastructure required for it to work.
Outside of the tech itself, I learned a lot about processes in general. Having to support more than a hundred developers with different projects and requirements made me realize how important it is to have clear processes that you can replicate and that people can adopt easily.
Before joining X-Team, most of my experience came from start-ups where I built projects from scratch without needing to support many people. It's very different working for a large multinational client.
As a DevOps engineer part of a larger team, one always has to try and help out the developers you serve. But you also need to think about how you can automate that help, so you can focus on solving the next problem.
Wise words. Thank you so much for your time, Juan.
TABLE OF CONTENTS