Companies you'll love to work for


Senior Site Reliability Engineer



Software Engineering
Posted on Wednesday, May 31, 2023

At Talkdesk, we are courageous innovators focused on helping organizations around the world create better customer experiences. Our AI-powered cloud contact center solutions optimize our customers’ most critical customer service processes. We are recognized as a Contact Center as a Service (CCaaS) leader by influential research organizations including Gartner. With $498 million in total funding, a valuation of more than $10 Billion, and a ranking of #8 on the Forbes Cloud 100 list, now is the time to be part of the Talkdesk legacy to help accelerate our success in a new decade of transformational growth.

We champion an inclusive and diverse culture representative of the communities in which we live and serve. And, we give back to our community by volunteering our time, supporting non-profits and minimizing our global footprint.

The SRE team at Talkdesk is responsible to build, run, and maintain the components that serve as the infrastructure foundation for the rest of Talkdesk with an automation-first mindset, all while also ensuring high availability and reliability of those components. It also partners with other product engineering teams to help make their services more performant, scalable, observable and reliable.

At Talkdesk we believe in a “you build it you own it” philosophy where every engineering team is responsible for the software they build and deploy. To support this, SRE’s also play a critical role in ensuring that the teams have the tools, practices, and expertise to make that happen in a blame free environment.

As a Talkdesk SRE you will be working with a large distributed and complex infrastructure that spans through multiple regions and cloud providers while using a number of leading edge technologies, for that you will:

Be responsible for:

  • Design, build, harden, and maintain the core infrastructure used by all of Talkdesk’s engineering teams
  • Automate every aspect of our infrastructure to remove as much as possible any human intervention
  • Participate in design reviews of new features, products or infrastructure, in order to guarantee resilience and high availability
  • Make sure the infrastructure is running smoothly using observability tools and being proactive on identifying issues
  • Develop effective tooling, alerts and processes that allow engineerings to maintain and support their production workloads
  • Contribute and disseminate the usage of protocols that promote production readiness and operational excellence
  • Participate in on-call rotation for the supported infrastructure alongside all the other engineering teams
  • Partner with product engineering teams to debug production outages, write incident post mortems and carry out action items that improve the service resilience
  • Drive and contribute for discussions on the evolution and growth of Talkdesk’s infrastructure

Need to have:

  • Experience supporting production systems
  • Extensive hands on experience of working with AWS
  • Good knowledge of Linux/Unix systems
  • Strong programming skills in at least one scripting language (e.g. bash, python, etc...)
  • Experience with Cloud Formation, Terraform or other Infrastructure code languages/tools
  • Experience on supporting messaging systems such as RabbitMQ or Kafka
  • Large experience on supporting data stores such as MongoDB, PostgreSQL, MySQL, Redis, Cassandra or Elasticsearch
  • Experience with Configuration Management software such as Ansible
  • Experience with Monitoring Tools like Datadog, New Relic, Grafana or similar
  • Ability to understand of the importance of observability and have good understanding of the most critical metrics and how to measure them
  • Ability to identify time consuming or error prone manual tasks for which makes sense to create tooling and automation
  • Ability to understand large-scale complex systems from a reliability & availability perspective
  • Ability to debug complex issues and identify root causes of instability in a large-scale distributed system
  • A software development mindset and apply it to infrastructure management
  • Critical thinking over problems and be solution focused

Be valued for:

  • Experience with technologies such as Docker, Consul, Vault, Jenkins, Concourse, Prometheus, Nexus
  • Experience with encryption technologies such as GoPass, ACM, KMS, Hashing
  • Experience with other cloud providers such as Google GCP or Microsoft Azure
  • Experience with Java or other JVM based development languages

The Talkdesk story hinges on empathy and acceptance. It is the shared goal among all Talkdeskers to empower a new kind of customer hero through our innovative software solution, and we firmly believe that the best path to success for our mission is inclusivity, diversity, and genuine acceptance. To that end, we will hire, promote, work along, cheer for, bond with, and warmly welcome into the Talkdesk family all persons without regard to ethnic and racial identity, indigenous heritage, national origin, religion, gender, gender identity, gender expression, sexual orientation, age, disability, marital status, veteran status, genetic information, or any other legally protected status.