USAJOBS · 1 day ago
Platform/Site Reliability Engineer, GS-2210-14, FPL GS-14 (MP)
USAJOBS is seeking a Platform/Site Reliability Engineer to join the Federal Student Aid (FSA) team within the Department of Education. The role involves leading the design and development of cloud platforms and reliability systems, enhancing the technical foundation of FSA’s applications, and collaborating with cross-functional partners to ensure system reliability and modernization.
ConsultingGovernmentHuman ResourcesInformation TechnologyInternetStaffing Agency
Responsibilities
Serving as an advisor to the IOG Director and Chief of the Network Support Division, acting as a network architect and engineer to develop and implement solutions across cloud and on-premises environments, while designing reusable platform services, container environments, identity integrations, networking patterns, and infrastructure components
Provide input to design and technical documentation, review final deliverables, and ensure adherence to the enterprise network operations engineering framework through leadership, while serving as a principal-level expert in platform engineering, cloud architecture, Site Reliability Engineering (SRE) practices, and infrastructure automation
Engage with technology leaders, business partners, and contractors to ensure operational requirements and needs are met, while clearly communicating technical concepts to non-technical stakeholders and producing platform standards, design documents, and technical evaluations
Evaluate system security plans and procedures, manage and direct office support contractors, address IT compliance issues, and oversee project planning and updates, while designing and maintaining continuous improvement/continuous Development (CI/CD) pipelines to support automated testing, deployment, change control, and compliance validation
Drive network engineering direction and response for CISA Binding Operational Directives (BODs) impacting data center operations, developing plans and processes to strengthen security, while implementing secure cloud configurations, identity and access management (IAM) models, encryption, and zero-trust architectural patterns
Qualification
Required
Education cannot be substituted for experience for this position and grade level
Candidates must possess any combination of the following certifications from a recognized professional organization at the time of hire and acceptance of the position: IT Information Library v4 (ITIL), Project Management Professional (PMP), AWS Certified Advanced Networking, Certified Information Systems Security Professional (CISSP), Certified Cloud Security Professional (CCSP), F5 Networks Certified Technology Specialist
One year of experience in either federal or non-federal service that is equivalent to at least a GS-13 performing two (2) out of three (3) of the following duties or work assignments: Experience leading the design and deployment of scalable cloud platforms using Infrastructure as Code (IaC), CI/CD, containers, and automated security controls to accelerate engineering delivery and ensure compliance; Experience enhancing reliability and observability for distributed systems, including SLO/SLA development, incident response, root-cause analysis, telemetry workflows, and/or performance/automation improvements; Experience translating platform and reliability engineering concepts into clear documentation, technical standards, and architecture guidance for non-technical audiences, and influencing engineering practices across multiple teams
You must possess IT-related experience (paid or unpaid) and/or completion of specific, intensive training (e.g., IT certification) demonstrating each of the four competencies listed below: Attention to Detail – Is thorough when performing work and conscientious about attending to detail; Customer Service – Works with clients and customers to assess their needs, provide information or assistance, resolve problems, or satisfy expectations; Oral Communication – Expresses information effectively to individuals or groups, taking into account the audience and nature of the information; Problem Solving – Identifies problems; determines accuracy and relevance of information; uses sound judgment to generate and evaluate alternatives, and to make recommendations
Skill in applying systems engineering and Site Reliability Engineering (SRE) concepts to ensure reliability, performance, scalability, security, and maintainability across complex, multi-cloud environments
Knowledge of platform and reliability engineering principles and the ability to apply them through real-world implementation, debugging, optimization, and modernization of cloud environments
Skill in computer engineering cloud automation, observability tooling, testing frameworks, and Continuous improvement/Continuous development (CI/CD) pipelines, including telemetry, logging, alerting, and distributed tracing
Ability to leverage modern cloud, data, and security technologies to design, test, and deploy resilient platform and reliability systems that support mission-critical applications
Must be a US Citizen
You may be subject to serve a one-year probationary period
Must complete a Background Investigation and Fingerprint check
You must meet all qualification requirements within 30 days of the closing date of this vacancy announcement
Benefits
Excused leave for Parent/Teacher Conferences (3 hours)
Excused leave for annual health screenings (4 hours)
Matching leave for community volunteer service
Alternative work schedules
Recruitment incentive
Relocation incentive
Student Loan Repayment Program
Company
USAJOBS
USAJOBS enables federal job seekers to access job opportunities across hundreds of federal agencies and organizations.
Funding
Current Stage
Late StageRecent News
Company data provided by crunchbase