AI Safety Research Intern-2 jobs in United States
cer-icon
Apply on Employer Site
company-logo

Centific · 2 months ago

AI Safety Research Intern-2

Centific is a frontier AI data foundry that empowers clients with safe, scalable AI deployment. The AI Safety Research Intern will focus on advancing AI safety, designing and evaluating attack and defense strategies for LLM jailbreaks, and contributing to the platform's security guarantees through high-impact experiments.

AnalyticsArtificial Intelligence (AI)DatabaseInformation TechnologyRetail Technology
check
H1B Sponsor Likelynote

Responsibilities

Advance AI Safety: Design, implement, and evaluate attack and defense strategies for LLM jailbreaks (prompt injection, obfuscation, narrative red teaming)
Evaluate AI Behavior: Analyze and simulate human-AI interaction patterns to uncover behavioral vulnerabilities, social engineering risks, and over-defensive vs. permissive response tradeoffs
Agentic AI Security: Prototype workflows for multi-agent safety (e.g., agent self-checks, regulatory compliance, defense chains) that span perception, reasoning, and action
Benchmark & Harden LLMs: Create reproducible evaluation protocols/KPIs for safety, over-defensiveness, adversarial resilience, and defense effectiveness across diverse models (including latest benchmarks and real-world exploit scenarios)
Deploy and Monitor: Package research into robust, monitorable AI services using modern stacks (Kubernetes, Docker, Ray, FastAPI); integrate safety telemetry, anomaly detection, and continuous red-teaming
Jailbreaking Analysis: Systematically red-team advanced LLMs (GPT-4o, GPT-5, LLaMA, Mistral, Gemma, etc.), uncovering novel exploits and defense gaps
Multi-turn Obfuscation Defense: Implement context-aware, multi-turn attack detection and guardrail mechanisms, including countermeasures for obfuscated prompts (e.g., StringJoin, narrative exploits)
Agent Self-Regulation: Develop agentic architectures for autonomous self-check and self-correct, minimizing risk in complex, multi-agent environments
Human-Centered Safety: Study human behavior models in adversarial contexts—how users probe, trick, or manipulate LLMs, and how defenses can adapt without excessive over-defensiveness

Qualification

PythonPyTorchAI Safety ResearchAdversarial MLMulti-agent architecturesKubernetesDockerFastAPIHuman-AI interactionRed-teamingBenchmarkingGitHub

Required

Ph.D. student in CS/EE/ML/Security (or related); actively publishing in AI Safety, NLP robustness, or adversarial ML (ACL, NeurIPS, BlackHat, IEEE S&P, etc.)
Strong Python and PyTorch/JAX skills; comfort with toolkits for language models, benchmarking, and simulation
Demonstrated research in at least one of: LLM jailbreak attacks/defense, agentic AI safety, human-AI interaction vulnerabilities
Proven ability to go from concept → code → experiment → result, with rigorous tracking and ablation studies

Preferred

Experience in adversarial prompt engineering, jailbreak detection (narrative, obfuscated, sequential attacks)
Prior work on multi-agent architectures or robust defense strategies for LLMs
Familiarity with red-teaming, synthetic behavioral data, and regulatory safety standards
Scalable training and deployment: Ray, distributed evaluation, CI/telemetry for defense protocols
Public code artifacts (GitHub) and first-author publications or strong open-source impact

Company

Centific

twittertwittertwitter
company-logo
Zero distance innovation for GenAI creators and industries Expertly engineering platforms and curating multimodal, multilingual data, we empower the ‘Magnificent Seven’ and enterprise clients with safe, scalable AI deployment We a team of over 150 PhDs and data scientists, along with more than 4,000 AI practitioners and engineers.

H1B Sponsorship

Centific has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (10)
2024 (22)
2023 (14)

Funding

Current Stage
Late Stage
Total Funding
$60M
Key Investors
Granite Asia
2025-06-24Series A· $60M

Leadership Team

leader-logo
Vasudevan Sundarababu
Chief Data and AI Officer
linkedin
Company data provided by crunchbase