The ip4g site reliability engineer is a key member of the team, responsible for ensuring the reliability, scalability, and performance of ip4g workloads. This role combines expertise in ibm power systems, cloud infrastructure, and site reliability engineering principles to design, implement, and maintain resilient and efficient solutions for clients. This role collaborates closely with cross-functional teams to monitor, optimize, and automate ip4g, striving for continuous improvement and operational excellence.
job responsibilities:
* design, implement, and maintain highly available and resilient architectures for ip4g workloads on google cloud platform, leveraging fault-tolerant designs and redundancy strategies.
* monitor system performance, availability, and reliability metrics to proactively identify and address potential issues before they impact service uptime or performance.
* implement disaster recovery solutions and failover mechanisms to ensure business continuity and minimize service disruptions.
* optimize ip4g workloads for performance, scalability, and cost-efficiency in the google cloud environment, leveraging auto-scaling, load balancing, and caching strategies.
* conduct capacity planning exercises and performance tuning activities to ensure optimal resource utilization and performance of ip4g systems and applications.
* collaborate with cloud architects and devops teams to implement ci/cd pipelines and automation workflows for seamless deployment and scaling of ip4g workloads.
* respond to and resolve critical incidents impacting the availability or performance of ip4g systems and applications on google cloud, following established incident response procedures and slas.
* document incident response procedures, post-mortem reports, and lessons learned to improve incident management processes and enhance system reliability.
* develop automation scripts and infrastructure as code (iac) templates to automate routine tasks, streamline deployment processes, and improve operational efficiency.
* continuously evaluate and adopt emerging technologies and best practices in automation and devops to enhance the reliability and scalability of ibm power environments.
* implement comprehensive monitoring and alerting solutions for ip4g workloads on google cloud, utilizing monitoring tools such as stackdriver, prometheus, and grafana.
* define and configure alerting thresholds, notifications, and escalation policies to ensure timely detection and response to anomalous behavior or performance degradation.
technical skills:
* excellent verbal and written communication skills.
* ethical and critical thinking.
* excellent interpersonal and customer service skills.
* excellent organizational skills and attention to detail.
* excellent time management skills with a proven ability to meet deadlines.
* strong analytical and problem-solving skills.
* strong supervisory and leadership skills.
* ability to prioritize tasks and to delegate them when appropriate.
* ability to function well in a high-paced and at times stressful environment.
* proficient with microsoft office suite or related software.
* strong knowledge of ibm power architecture, aix/linux operating systems, virtualization technologies (e.g., powervm), and storage solutions (e.g., ibm spectrum storage).
* proficiency in cloud monitoring and observability tools such as stackdriver, prometheus, grafana, and elk stack.
education:
* bachelor’s degree in computer science, information technology, or a related field.
* certification in ibm power systems (e.g., ibm certified system administrator - aix, ibm certified technical sales specialist) and google cloud platform (e.g., google cloud certified - professional cloud architect) is highly preferred.
* proven experience designing, implementing, and supporting ibm power workloads in cloud environments, with a focus on reliability, scalability, and performance optimization.
what we offer
each employee has a chance to see the impact of their work. You can make a real contribution to the success of the company.
several activities are often organized throughout the year, such as weekly sports sessions, team-building events, monthly drinks, and much more.
healthcare, dental, life insurance, savings fund, christmas bonus, grocery bonus, annual bonus.
save on commute
paid office parking.
sport activity
join your colleagues in various sport activities in the area.
discount programs
medical-related discounts.
in the heart of puebla, with views of popocatepetl volcano, restaurants, and amenities close by.
ptos
vacation, sick, holiday, and paid leave.
sponsored events
eat & drink
enjoy a kitchen stocked with coffee and snacks at low charge.
#j-18808-ljbffr