Important it company at the latin american level, growth requires:
spark eks engineerjob description: we are seeking a spark lead focused on operations to administer/scale our multi-petabyte spark eks clusters and related services that go with it.
this role focuses primarily on provisioning, ongoing capacity planning, monitoring, management of spark eks platform running on aws and performance enhancement of application/middleware that runs on this platform.
key qualifications: well versed with aws - emr/ s3, and other aws services and dashboards, at least aws administrator levelpreferred - aws certification for emr/ eks cluster managementresponsible for maintaining large scale (1000+ nodes) production spark clusterspoint of contact for all spark related issues coming from application teams and internal clusters, responsible for troubleshooting and recommendation for spark and mr jobs.
should be able to use existing logs to debug the issue.responsible for implementation and ongoing administration of spark, flink & trino infrastructure including monitoring, tuning and troubleshootingimprove scalability, service reliability, capacity, and performance of the cluster and applications running in the clustertriage production issues when they occur with other operation and engineering teams.conduct ongoing maintenance across our large scale deployments across the worldwrite automation code for managing large big data clustersparticipate in on-call rotationhands on experience to troubleshoot incidents, formulate theories and test hypothesis, and narrow down possibilities to find the root cause.deep understanding of spark eco systemhands on experience with managing production clusters (hadoop, spark).strong development/automation skills.
must be very comfortable with reading and writing python code/ scripting.at least 5+ years of spark experience in large scale, multi-tenant production clusters (1000+ instances)tools-first mindset.
you build tools for yourself and others to increase efficiency and to make hard or repetitive tasks easy and quick.experience with configuration management and automation.organized, focused on building, improving, resolving and delivering.advanced conversational english essential (will be evaluated).
job type: mostly remote
location: gdl- monterrey- mexico city (any of these cities)
salary: $100,000 gross.
benefits: excellent superior benefits.
#j-18808-ljbffr