Overview 

At Segment, we believe companies should be able to send their data wherever they want, whenever they want, with no fuss. Unfortunately, most product managers, analysts, and marketers spend too much time searching for the data they need, while engineers are stuck integrating the tools they want to use. Segment standardizes and streamlines data infrastructure with a single platform that collects, unifies, and sends data to hundreds of business tools with the flip of a switch. That way, our customers can focus on building amazing products and personalized messages for their customers, letting us take care of the complexities of processing their customer data reliably at scale. We’re in the running to power the entire customer data ecosystem, and we need the best people to take the market. 
 
This is a unique opportunity to be a founding member of Segment’s internal Data Engineering team. Data Engineering will enable Segment to derive insights about our customers and product usage effectively and efficiently, and it is the backbone of all data-driven decisions we make to move the business forward.
 
As a Data Engineer  and a key internal customer of the Segment product, you’ll have the unique opportunity to provide feedback to Product and Engineering teams that will help shape the future of Segment’s product. 

What you’ll do:

  • Design, build and launch extremely efficient and reliable data pipelines
  • Execute on our data lake strategy
  • Work with stakeholders including the Analytics, Product, Operations  and Design teams to assist with data-related technical issues and support their data infrastructure needs
  • Optimize the compute and storage resources to make data available to the rest of the company
  • Own and optimize Segment’s AWS internal data infrastructure to run the data warehouse and the data pipelines you build effectively and efficiently
  • Build backend services for internal applications teams to consume
  • Build tools and automation for schema evolutions and schema inference of unstructured data

You’re a great fit if you have…

  • You have 3+ years of experience with big data eco-system and tools such as Spark, Hive, and Hadoop
  • You have 3+ years of experience with relational SQL and NoSQL databases, including Postgres, MySQL and MongoDB.
  • You have 3+ years of experience with data pipeline and workflow management tools: Airflow, Azkaban, Luigi, etc.
  • You have 3+ years of experience with AWS cloud services: EC2, EMR Spark, EMR Hive, RDS, Redshift, Athena, Spectrum, AWS Glue
  • You have 3+ years of experience with data serialization formats such as JSON, AVRO, Parquet and ORC
  • You have 3+ years of experience with object-oriented/object function scripting languages: Python, Go, Java, C++, Scala, etc.
  • You have 3+ years of experience with infrastructure services like docker, ECS, Kubernetes
  • You have worked with Configuration Management, Continuous Integration and Development tools like Terraform, CircleCI etc..
  • You have advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases as well as database optimization techniques
  • You have the ability to write abstracted, reusable code components
  • You have strong communication skills
  • You have a BS or MS degree in Computer Science or a related technical field

Bonus points: 

  • Experience with stream-processing systems: Storm, Spark-Streaming, etc.
  • AWS Certified DevOps Engineer
  • AWS Certified Big Data
 
This role requires being full-time in our San Francisco office

Segment is an equal opportunity employer. We believe that everyone should receive equal consideration and treatment. Recruitment, hiring, placements, transfers, and promotions will happen based on qualifications for the positions being filled regardless of sex, gender identity, race, religious creed, color, national origin ancestry, age, physical disability, pregnancy, mental disability, or medical condition.