Blockchain – Technical Operations Remoto Latam ID- #00127
The Technical Operations team is responsible for ensuring the stability, reliability, and performance of our production systems. In this pivotal role as a Technical Operations Engineer, you will be instrumental in maintaining, troubleshooting, and enhancing the performance and reliability of our production systems, with a special emphasis on complex Web3 technologies. This position demands a unique blend of Web3 expertise, technical skills, analytical prowess, and a collaborative approach to ensure operational excellence.
Timezone: US EST (Eastern Timezone)
What You'll Do
- Chain Operations and Research: Lead the research, deployment, and operational management of new blockchain networks, ensuring efficient launch processes, thorough testing, and ongoing optimization of chain performance and reliability.
- Advanced Web3 Support: Tackle complex Web3 issues, including conducting thorough post-mortems, providing technical direction in customer calls, and leading remediation efforts during incidents.
- System Monitoring and Analytical Skills: Utilize advanced analytical and dashboard skills, including proficiency in tools like Grafana or DataDog, to monitor system performance and health.
- Process and SLA/SLO Management: Define, enforce, and ensure the organization meets stringent SLO/SLA objectives, contributing to the overall health and reliability of our platform services.
- Hands-On Problem Solving: Innovate to solve complex challenges quickly and efficiently, while being disciplined in task follow-through to minimize technical debt and implement preventive measures.
- Collaboration and Support: Work closely with the Support L1 team and other cross-functional teams, fostering a collaborative environment to maintain system efficiency.
What You'll Bring
- Proven Experience: At least 5 years in Technical Operations, SRE, or a similar role, with a deep understanding of Linux/Unix systems.
- Deep Blockchain / Web3 Expertise: Ability to handle complex Web3-related issues with proficiency, including troubleshooting JSON-RPC responses, analyzing validator logs, and working with chain foundations directly on improving network performance.
- DevOps Experience: Proficiency in automation and configuration management tools (e.g., Ansible, Terraform, Consul), and in programming languages such as Python, Go, or JavaScript. Familiarity with containerization technologies like Docker and Kubernetes.
- System Optimization: Skilled in system optimization, including benchmarking using in-house tools, cost analysis and optimization, and system-level tuning by comparing various cloud providers, hardware configurations, and kernel parameters.
- Analytical and Dashboard Proficiency: Demonstrated expertise in using tools like Grafana or DataDog for detailed system analysis and monitoring, essential for proactive system management and data-driven decision-making.
- SLA/SLO and Incident Management Expertise: Proven ability in defining and adhering to SLA/SLO objectives, coupled with efficient incident management using tools like PagerDuty, ensuring operational reliability and customer satisfaction.
- Proactive and Hands-On Approach: A proactive mindset with a hands-on approach to problem-solving, capable of innovating under pressure and committed to reducing risks and technical debt.
- Communication and Collaboration: Excellent communication skills, with the ability to collaborate effectively across teams and with various stakeholders.
- Personal Attributes: High energy, resilient, with a can-do mentality and a strong work ethic. Integrity, honesty, and maturity are key, along with a commitment to continuous improvement and a self-starter attitude.
Bonus
- Knowledge of database systems such as ScyllaDB, Redis, and Postgres
- Experience with WAF optimization and alerting, particularly with CloudFlare
- Familiarity with modern web hosting technologies, including lambda functions and caching strategies