Job req number: 88247
time type: full time
job title: it specialist – openstack site reliability engineer (sre)
purpose of the job/ overall responsibility:
a site reliability engineer (sre) is responsible for maintaining the reliability of infrastructure environments, ensuring that software applications run smoothly without causing errors after deployment and new changes. The sre combines software engineering and systems administration to ensure the scalability, performance, and reliability of large-scale, cloud-based applications and infrastructure.
success criteria/kpi:
* uptime percentage of openstack services.
* mean time to recovery (mttr) for incidents.
* performance metrics (e.g., response time, throughput).
* security vulnerabilities identified and mitigated.
* successful backup and recovery tests.
* documentation completeness and accuracy.
* stakeholder satisfaction with communication and collaboration.
* number of successful change implementations.
key tasks:
1. automation and infrastructure as code (iac):
o develop and maintain automation scripts for deployment, configuration, and management of openstack components.
o use tools like ansible to manage infrastructure as code.
o implement ci/cd pipelines to automate the deployment and testing of openstack updates and configurations.
2. system reliability and availability:
o implement and maintain monitoring systems to ensure the health and performance of the openstack environment.
o quickly respond to and resolve incidents to minimize downtime and service disruptions.
3. performance optimization:
o continuously monitor and optimize the performance of openstack services.
o forecast resource needs and plan for scaling the infrastructure to meet demand.
o conduct load testing to identify bottlenecks and optimize system performance.
4. security and compliance:
o implement and enforce security best practices to protect the openstack environment.
o regularly scan for and mitigate security vulnerabilities.
5. backup and disaster recovery:
o develop and implement backup strategies to protect data and ensure quick recovery in case of failures.
o create and maintain disaster recovery plans to minimize downtime and data loss.
o regularly test backup and disaster recovery processes to ensure their effectiveness.
6. documentation and knowledge sharing:
o maintain up-to-date documentation for all openstack configurations, processes, and procedures.
o share knowledge and best practices with the team and other stakeholders.
7. collaboration and communication:
o work closely with development, operations, and other teams to ensure smooth integration and operation of openstack.
o communicate effectively with stakeholders about system status, incidents, and planned maintenance.
o gather and incorporate feedback from users and stakeholders to improve the openstack environment.
8. continuous improvement:
o conduct post-mortem analyses after incidents to identify root causes and implement preventive measures.
o evaluate and integrate new technologies and tools to improve the openstack environment.
9. change management:
o manage and implement change requests in a controlled manner to minimize risks.
o develop and maintain rollback plans for changes to ensure quick recovery in case of issues.
o thoroughly test changes in a staging environment before deploying them to production.
staff responsibility:
* may supervise junior engineers or interns.
* may lead small project teams or task forces.
professional qualifications:
* bachelor’s degree in computer science, engineering, or a related field.
* proven experience in site reliability engineering or a similar role.
* strong knowledge of openstack and cloud technologies.
* proficiency in automation tools - ansible.
personal qualifications:
* strong attention to detail.
* ability to work under pressure and handle multiple tasks simultaneously.
* proactive and self-motivated.
* excellent teamwork and interpersonal skills.
* strong commitment to continuous learning and improvement.
dsv – global transport and logistics
dsv is a dynamic workplace that fosters inclusivity and diversity. We conduct our business with integrity, respecting different cultures and the dignity and rights of individuals. When you join dsv, you are working for one of the very best performing companies in the transport and logistics industry. You’ll join a talented team of approximately 75,000 employees in over 80 countries, working passionately to deliver great customer experiences and high-quality services. Dsv aspires to lead the way towards a more sustainable future for our industry and are committed to trading on nature’s terms.
we promote collaboration and transparency and strive to attract, motivate and retain talented people in a culture of respect. If you are driven, talented and wish to be part of a progressive and versatile organisation, we’ll support you and your need to achieve your potential and forward your career.
visit dsv.com and follow us on linkedin, facebook and twitter.
#j-18808-ljbffr