At Talkdesk, we are courageous innovators focused on helping organizations around the world create better customer experiences. Our AI-powered cloud contact center solutions optimize our customers’ most critical customer service processes. We are recognized as a Contact Center as a Service (CCaaS) leader by influential research organizations including Gartner. With $498 million in total funding, a valuation of more than $10 Billion, and a ranking of #8 on the Forbes Cloud 100 list, now is the time to be part of the Talkdesk legacy to help accelerate our success in a new decade of transformational growth.

We champion an inclusive and diverse culture representative of the communities in which we live and serve. And, we give back to our community by volunteering our time, supporting non-profits and minimizing our global footprint.

The SRE team at Talkdesk is responsible to build, run, and maintain the components that serve as the infrastructure foundation for the rest of Talkdesk with an automation-first mindset, all while also ensuring high availability and reliability of those components. It also partners with other product engineering teams to help make their services more performant, scalable, observable and reliable.

At Talkdesk we believe in a “you build it you own it” philosophy where every engineering team is responsible for the software they build and deploy. To support this, SRE’s also play a critical role in ensuring that the teams have the tools, practices, and expertise to make that happen in a blame free environment.

As a Talkdesk SRE you will be working with a large distributed and complex infrastructure that spans through multiple regions and cloud providers while using a number of leading edge technologies, for that you will:

Be responsible for:

Design, build, harden, and maintain the core infrastructure used by all of Talkdesk’s engineering teams
Automate every aspect of our infrastructure to remove as much as possible any human intervention
Participate in design reviews of new features, products or infrastructure, in order to guarantee resilience and high availability
Make sure the infrastructure is running smoothly using observability tools and being proactive on identifying issues
Develop effective tooling, alerts and processes that allow engineerings to maintain and support their production workloads
Contribute and disseminate the usage of protocols that promote production readiness and operational excellence
Participate in on-call rotation for the supported infrastructure alongside all the other engineering teams
Partner with product engineering teams to debug production outages, write incident post mortems and carry out action items that improve the service resilience
Drive and contribute for discussions on the evolution and growth of Talkdesk’s infrastructure

Need to have:

Experience supporting production systems
Extensive hands on experience of working with AWS
Good knowledge of Linux/Unix systems
Strong programming skills in at least one scripting language (e.g. bash, python, etc...)
Experience with Cloud Formation, Terraform or other Infrastructure code languages/tools
Experience on supporting messaging systems such as RabbitMQ or Kafka
Large experience on supporting data stores such as MongoDB, PostgreSQL, MySQL, Redis, Cassandra or Elasticsearch
Experience with Configuration Management software such as Ansible
Experience with Monitoring Tools like Datadog, New Relic, Grafana or similar
Ability to understand of the importance of observability and have good understanding of the most critical metrics and how to measure them
Ability to identify time consuming or error prone manual tasks for which makes sense to create tooling and automation
Ability to understand large-scale complex systems from a reliability & availability perspective
Ability to debug complex issues and identify root causes of instability in a large-scale distributed system
A software development mindset and apply it to infrastructure management
Critical thinking over problems and be solution focused

Be valued for:

Experience with technologies such as Docker, Consul, Vault, Jenkins, Concourse, Prometheus, Nexus
Experience with encryption technologies such as GoPass, ACM, KMS, Hashing
Experience with other cloud providers such as Google GCP or Microsoft Azure
Experience with Java or other JVM based development languages

The Talkdesk story hinges on empathy and acceptance. It is the shared goal among all Talkdeskers to empower a new kind of customer hero through our innovative software solution, and we firmly believe that the best path to success for our mission is inclusivity, diversity, and genuine acceptance. To that end, we will hire, promote, work along, cheer for, bond with, and warmly welcome into the Talkdesk family all persons without regard to ethnic and racial identity, indigenous heritage, national origin, religion, gender, gender identity, gender expression, sexual orientation, age, disability, marital status, veteran status, genetic information, or any other legally protected status.

This job is no longer accepting applications

See open jobs at Talkdesk.See open jobs similar to "Senior Site Reliability Engineer" Threshold.

See more open positions at Talkdesk

Privacy policy Cookie policy