About this role
• Have extensive practical experience and tool-building capabilities in any of the areas such as CI/CD, observability, and configuration management.
• Have experience in service mesh (Istio/Linkerd), chaos engineering, and multi-cloud architecture.
Responsibilities
• Be familiar with mainstream cloud services (AWS/Azure), and automate the management and maintenance of secure, compliant, and highly available production and development/test environments in code form.
• Be capable of building your own data center and setting up an internal Tee environment.
• Build and improve a three-dimensional monitoring, logging, alerting, and tracking system from infrastructure, applications to business layers (such as Prometheus, Grafana, ELK, Jaeger), actively detect and solve potential problems, and ensure SLA.
• As the core hub of the project, efficiently coordinate the R&D, design, marketing, operation, and security teams to ensure information alignment, consistent goals, and be responsible for the final product results and user experience. Design and optimize the CI/CD pipeline to achieve automation from code submission to secure launch.
• Provide efficient, self-service build, test, deployment, and debugging toolchains for the R&D team to enhance overall delivery efficiency.
• Implement strict identity access control, network isolation, key management (such as HSM/KMS integration) and security scanning.
• Optimize resource usage costs, design and practice disaster recovery and backup restoration plans across availability zones/regions.
Qualifications
• More than 5 years of experience in operation and maintenance development, SRE or cloud computing-related fields, with experience in managing large-scale distributed systems.
• Proficient in the service systems and best practices of at least one mainstream cloud service platform (AWS, GCP, Azure).
• Proficient in containerization technology (Docker) and orchestration systems (Kubernetes), with rich practical experience in production environments.
• Proficient in common CI/CD tools (Jenkins, Git-Action) or one or more of them.
• Skilled in at least one programming or scripting language (such as Go, Python, Shell), and able to independently develop operation tools and automation scripts.
• Possess excellent documentation skills, cross-team collaboration ability, and a strong sense of responsibility.
Interested in this position?