*position summary*:
our team is looking for a *data engineer* with a diverse background in data integration to join the data services team.
some data are small, some data are very large (1 trillion+ rows), some data is structured, some data is not.
our data comes in all kinds of sizes, shapes and formats.
traditional rdbms like postgresql, oracle, sql server, mpps like starrocks, vertica, snowflake, google bigquery, and unstructured, key-value like mongodb, elasticsearch, to name a few.
we are looking for individuals who can design and solve any data problems using different types of databases and technologies supported within our team.
we use mpp databases to analyze billions of rows in seconds.
we use spark and iceberg, batch or streaming to process whatever the data needs are.
we also use trino to connect all different types of data without moving them around.
besides a competitive compensation package, you'll be working with a great group of technologists interested in finding the right database to use, the right technology for the job in a culture that encourages innovation.
if you're ready to step up and take on some new technical challenges at a well-respected company, this is a unique opportunity for you.
*responsibilities*:
- implement etl/elt processes using various tools and programming languages (scala, python) against our mpp databases starrocks, vertica and snowflake
- work with the hadoop team and optimize hive and iceberg tables
- contribute to the existing data lake and data warehouse imitative using hive, spark, iceberg, presto/trino
- analyze business requirements, design and implement required data models
*qualifications: (must have)*
- ba/bs in computer science or in related field
- 1+ years of experience with mpp databases such as starrocks, vertica, snowflake
- 3+ years of experience with rdbms databases such as oracle, mssql or postgresql
- programming background with scala, python, java or c/c++
- strong in any of the linux distributions, rhel,centos or fedora
- experience working in both olap and oltp environments
- experience working on-prem, not just cloud environments
*desired: (nice to have)*
- experience with elasticsearch or elk stack
- working knowledge of streaming technologies such as kafka
- working knowledge of orchestration tools such oozie and airflow
- experience with spark.
pyspark, sparksql, spark streaming, etc
- experience using etl tools such as informatica, talend and/or pentaho
- understanding of healthcare data
- data analyst or business intelligence would be a plus
tipo de puesto: tiempo completo, por tiempo indeterminado
salario: $30,000.00 - $40,000.00 al mes
horario:
- lunes a viernes
experiência:
- snowflake: 1 año (obligatorio)
- vertica: 1 año (obligatorio)
- oracle db: 3 años (obligatorio)
- postgresql: 3 años (obligatorio)
- mssql: 3 años (obligatorio)
- python: 3 años (obligatorio)
- linux: 3 años (obligatorio)
idioma:
- inglés (obligatorio)
lugar de trabajo: remoto híbrido en 45050, zapopan, jal.