Machine Learning / Data Pipeline Engineer

About this job

Compensation: $140k - 200k
Location options: Remote
Job type: Full-time


python, ai, machinelearning, bigdata

Job description

About Us

Vorstella is an AI platform that automatically manages large scale distributed systems like Cassandra, Hadoop, Spark and Kafka for large companies. Founded by ex-DataStax engineers, we’ve designed some of the largest distributed system deployments in the world and we wanted to make this technology accessible to everyone. You shouldn’t need 3 years of experience to feel comfortable running a new system at scale. We take the guesswork out of using new technology and let you focus on building your applications.

Who we're looking for

We’re looking for someone that is probably 60% engineer, 40% machine learning. A little more street fighter, a little less ivory tower. Someone that can write production quality code, solve engineering problems, and knows enough ML to find good-enough solutions. Someone that’s creative and can solve problems without always reaching for the ML hammer. Sometimes we use rules, sometime we use ML, sometimes we need to ask the user better questions.

What you’ll be working on

You’ll be working on the machine learning pipeline and models. We’ve got multiple signals both synthesized and raw being fed into root-cause analysis, database tuning algorithms and cost optimization. These models feed data to the UI/API which presents next best action to the end user. We’re always looking for a solution that gets us to good outcomes as quickly as possible. Sometimes it’s basic, sometimes we’re pushing beyond the boundaries of what’s published.

Our stack

Our deployment target is Docker and Kubernetes. On the frontend we use React/Redux. Back-end services are written in Go, with the machine learning code written in Python. Our continuous integration system is CircleCI, and we use GitHub for all our code. We’re multi-cloud with deployments currently in Google and AWS.


  • Writing code that manages some of the largest distributed systems in the world. We work with customers that have hundreds of thousands of servers.
  • You’ll be a senior member of the team, you’ll have strong input over large swathes of infrastructure as well as product ownership for core products and features.
  • Writing code and working with a team that’s pushing the cutting edge of what’s possible. You’ll be working with some of the worlds experts in optimization and distributed systems.
  • The ability to publish and contribute to multiple popular open source projects.

Skills & requirements

  • Strong social, verbal and written communications skills. We’re currently a 100% remote team, most of engineering is spread across the southwest US.
  • Experience in a dynamic work environment with a bias for action.
  • A love of distributed systems, data, and learning new things.
  • We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
  • Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.