Citi · 20 hours ago
Wealth - Lead - GenAI Testing and Evaluation Framework - Vice President
Citi is seeking an innovative and detail-oriented professional to lead the development and management of the Generative AI (GenAI) testing and evaluation framework. The role focuses on creating methodologies to optimize GenAI models' performance, emphasizing prompt engineering and evaluation, while collaborating with various teams to integrate testing frameworks seamlessly into the development lifecycle.
BankingFinanceFinancial Services
Responsibilities
Design and implement a comprehensive testing and evaluation framework for GenAI model outputs
Develop standards and patterns for assessing the quality and "goodness" of prompts across diverse use cases
Create iterative processes for testing and refining prompts to optimize model outputs
Establish criteria for evaluating prompt performance, including accuracy, completeness, relevance, coherence, and alignment with desired outcomes
Experiment with prompt structures to identify optimal configurations for various business applications
Develop and document best practices for prompt design and refinement
Work closely with tech partners, engineers, and product teams to ensure testing frameworks integrate seamlessly into the development lifecycle
Partner with stakeholders to understand business requirements and tailor testing methodologies to address specific needs
Provide actionable insights and recommendations to improve model performance based on evaluation results
Identify and implement tools for automating the testing and evaluation process
Develop dashboards and reporting mechanisms to monitor prompt and model performance metrics
Stay updated on emerging tools and techniques in AI testing and integrate them into the framework
Establish feedback loops to iteratively improve testing methodologies and evaluation standards
Establish process for ongoing monitoring of prompts, once productionalized
Monitor industry trends and advancements in Generative AI to ensure the framework remains cutting-edge
Advocate for a culture of experimentation and continuous learning within the organization
Qualification
Required
Expertise in Generative AI and natural language processing (NLP) models
Strong proficiency in prompt engineering and familiarity with frameworks for AI evaluation
Hands-on experience with AI tools, libraries, and cloud platforms
Strong problem-solving skills and ability to derive actionable insights from complex data
Attention to detail with a focus on precision and accuracy in evaluation
Deep understanding of AI/ML testing methodologies and best practices
Proficiency in programming languages like Python and experience with relevant libraries (e.g., PyTorch, TensorFlow)
Passion for exploring new methodologies to improve AI evaluation frameworks
Creativity in designing experiments and testing approaches
Excellent communication skills to convey technical concepts to diverse audiences
Ability to work collaboratively across cross-functional teams and influence stakeholders
Comfortable working in a fast-paced, dynamic environment
Willingness to learn and adapt to new tools, technologies, and methodologies
Benefits
Medical, dental & vision coverage
401(k)
Life, accident, and disability insurance
Wellness programs
Paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays
Company
Citi
Citi's mission is to serve as a trusted partner to our clients by responsibly providing financial services that enable growth and economic progress.
H1B Sponsorship
Citi has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1386)
2024 (849)
2023 (1375)
2022 (1117)
2021 (876)
2020 (901)
Funding
Current Stage
Late StageLeadership Team
Recent News
2026-01-11
2026-01-06
Company data provided by crunchbase