About this role

We are seeking a Member of Technical Staff to help advance the evaluation and development of frontier coding agents. Sitting at the intersection of AI research, software engineering, and model evaluation, you will design the benchmarks, methodologies, and data systems that shape how next-generation coding models are measured and improved.

Skills

LLMsCoding EvaluationAI EvaluationML Systems

Key responsibilities

Design and own evaluation frameworks for coding agents, including benchmark specifications, scoring methodologies, rubrics, and quality standards.
Lead end-to-end research initiatives focused on measuring and improving coding model performance across diverse software engineering tasks.
Develop high-quality datasets, golden examples, and evaluation protocols that enable reliable assessment of frontier coding systems.
Analyze model behavior and failure modes, identifying systematic weaknesses and translating findings into actionable improvements for training and evaluation.
Build tooling and infrastructure that support large-scale experimentation, data generation, review workflows, and evaluation pipelines.
Establish best practices for coding-agent assessment, ensuring methodological rigor, reproducibility, and measurement quality.
Partner closely with researchers, engineers, and applied AI teams to design experiments and evaluate emerging model capabilities.
Contribute to technical reports, benchmark studies, and client-facing research initiatives that communicate model performance and insights.

Required skills & qualifications

Strong software engineering background with expertise in Python, C++, or comparable programming languages.
3+ years of experience in software engineering, machine learning, AI research, evaluation, or related technical disciplines.
Experience designing, reviewing, or validating technical assessments, benchmarks, coding tasks, or evaluation methodologies.
Familiarity with large language models, coding agents, reinforcement learning, model evaluation, or related AI systems.
Proven ability to build tooling, automate workflows, and improve technical processes through systematic experimentation.
Strong analytical skills with the ability to investigate model behavior and derive insights from complex technical systems.
Excellent written and verbal communication skills, including the ability to clearly articulate technical findings to diverse audiences.
Comfortable operating in fast-moving research environments with significant ambiguity and evolving priorities.

Preferred qualifications

Experience working on frontier AI systems, coding agents, or model evaluation research.
Deep interest in understanding how data, evaluations, and feedback mechanisms influence model capabilities.
Track record of independently driving ambiguous technical or research projects from conception to execution.
Experience designing benchmarks or datasets for machine learning systems at scale.
Familiarity with agentic workflows, tool use, reinforcement learning, or post-training methodologies.
Publications, open-source contributions, or demonstrated technical leadership in AI, machine learning, or software engineering.

Apply on micro1 →

This role is posted on our partner platform. When you click Apply, you'll go to the posting, where the application, interview, skill validation, and onboarding all happen. lehico is an independent site that surfaces these opportunities — we don't process applications or guarantee acceptance.

Member of Technical Staff, Coding Research

About this role

Skills

Key responsibilities

Required skills & qualifications

Preferred qualifications