Data Engineering Internship



Data Science
San Jose, CA, USA
Posted on Friday, October 20, 2023

Vectra® is the leader in AI-driven threat detection and response for hybrid and multi-cloud enterprises.

The Vectra AI Platform delivers integrated signal across public cloud, SaaS, identity, and data center networks in a single platform. Powered by patented Attack Signal Intelligence, it empowers security teams to rapidly prioritize, investigate and respond to the most advanced cyber-attacks. With 35 patents in AI-driven threat detection and the most vendor references in MITRE D3FEND, organizations worldwide rely on the Vectra AI to move at the speed and scale of hybrid attackers. For more information, visit

Position Overview

Detecting attackers in real-time requires robust data pipelines that enable machine learning and statistical techniques. As an intern for the Data Engineering team, you will help transform rich network traffic data, cloud log data into meaningful features and develop data systems for collecting algorithm telemetry. You will be involved with building pipelines and tools for both on-prem and cloud deployments while collaborating with Data Scientists and Software Engineers in the process.


  • Work with the Data Engineers on the team to improve and develop new features enabling Data Scientists to access data in ways previously unavailable
  • Possible projects range from
    • Building out a data converter to parquet format and catalog using AWS Glue
    • Performing ETL on existing data to restructure time series data in a more accessible format
    • Automate the piping of network captures into a process to convert into metadata and load into Spark


  • Required
    • Working towards a BS or MS in Computer Science or related field
    • Strong programming skills with experience in Python, C++, or Java
    • Linux proficiency and shell scripting
  • Desirable
    • Experience with Docker, Kubernetes or other container orchestration tool
    • Experience working with AWS or GCP offerings
    • Experience with a source control system, preferably Git
    • Familiarity with Hadoop, Map/Reduce, Spark, and distributed computing
    • Understanding of data pipeline architectures (e.g. Lambda, Kappa)
    • Database hands-on experience (MySQL, MongoDB, couchdb, ElasticSearch, etc.)
    • Knowledge of real-time data pipelines (e.g. Kafka and Spark Streaming)
    • Experience with continuous integration and deployment workflows

Vectra provides a comprehensive total rewards package that supports the financial, physical, mental and overall health of our employees and their families. Compensation includes competitive base pay, incentive plan eligibility, and participation in the employee equity plan (stock options). Specific benefits offered varies by location, but commonly include health care insurance, income protection / life insurance, access to retirement savings plans, behavioral & emotional wellness services, generous time away from work, and a comprehensive employee recognition program.

Vectra is committed to creating a diverse environment and is proud to be an equal opportunity employer.

We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.