Jobs via Dice ยท 11 hours ago
Site Reliability Engineer Connect WMS
Dice is the leading career destination for tech experts at every stage of their careers. Our client, SDVS Technologies LLC, is seeking a Site Reliability Engineer to monitor and resolve operational issues affecting WMS functions, debug application code, and improve system observability. The role involves collaboration with various teams to enhance performance and ensure seamless operations.
Computer Software
Responsibilities
Operational Issue Investigation and Quick Resolution
Monitor and respond to operational issues affecting WMS functions (e.g., receiving, shipping, inventory)
Analyze system logs, error reports, and transaction flows to identify anomalies or failures
Work closely with Level 1 support and warehouse operation teams to understand incident symptoms and timelines
Execute quick resolutions by using extended user rights, database interventions, or WMS configuration changes
Code-Level Debugging
Debug application code, workflows, customizations, and interfaces to identify bugs or performance bottlenecks
Collaborate with WMS QA team to reproduce issues in test environments and trace through application workflows to isolate root causes
Collaborate with Product/Development teams to propose, implement, and test code fixes
Real-Time System Monitoring
Use tools like Datadog or internal diagnostics to monitor WMS behavior
Proactively set up or refine alerts for failure patterns (e.g., inventory mismatches, interface timeouts, RF disconnects)
Improve observability by suggesting/implement better logging practices and metric coverage
Interface Troubleshooting
Investigate communication failures between WMS and other Products (e.g., LinOS, Link, EDI, Easymetrics)
Troubleshoot integration issues between the WMS and external systems (e.g., DevOps, DCOps)
Provide software-side support during integration testing, mainly remote and on-site by occasion
Incident Management & Escalation
Participate in on-call rotations or site support shifts for time-sensitive incidents
Coordinate with operations, IT, and engineering during critical events to ensure fast resolution
Document incidents thoroughly, including root causes, fixes, and follow-up actions
Post-Incident Review & Continuous Improvement
Contribute to postmortem analysis for high-impact incidents
Recommend and implement configuration changes or process improvements to prevent repeated issues
Update or create playbooks and troubleshooting guides for known WMS issues
Internal Tooling and Automation
Develop scripts or queries (e.g., SQL) to streamline log analysis, system diagnostics, or data validation
Propose internal utilities to detect edge-case failures or performance degradations early
Support development of internal test tooling and simulations for recurring business scenarios
Cross-Functional Collaboration
Work with Product/Development teams to escalate and fix production bugs
Collaborate with QA teams to validate fixes or reproduce intermittent issues
Partner with implementation teams to train staff on WMS behavior and provide escalation support
Qualification
Required
Operational Issue Investigation and Quick Resolution
Monitor and respond to operational issues affecting WMS functions (e.g., receiving, shipping, inventory)
Analyze system logs, error reports, and transaction flows to identify anomalies or failures
Work closely with Level 1 support and warehouse operation teams to understand incident symptoms and timelines
Execute quick resolutions by using extended user rights, database interventions, or WMS configuration changes
Code-Level Debugging
Debug application code, workflows, customizations, and interfaces to identify bugs or performance bottlenecks
Collaborate with WMS QA team to reproduce issues in test environments and trace through application workflows to isolate root causes
Collaborate with Product/Development teams to propose, implement, and test code fixes
Real-Time System Monitoring
Use tools like Datadog or internal diagnostics to monitor WMS behavior
Proactively set up or refine alerts for failure patterns (e.g., inventory mismatches, interface timeouts, RF disconnects)
Improve observability by suggesting/implement better logging practices and metric coverage
Interface Troubleshooting
Investigate communication failures between WMS and other Products (e.g., LinOS, Link, EDI, Easymetrics)
Troubleshoot integration issues between the WMS and external systems (e.g., DevOps, DCOps)
Provide software-side support during integration testing, mainly remote and on-site by occasion
Incident Management & Escalation
Participate in on-call rotations or site support shifts for time-sensitive incidents
Coordinate with operations, IT, and engineering during critical events to ensure fast resolution
Document incidents thoroughly, including root causes, fixes, and follow-up actions
Post-Incident Review & Continuous Improvement
Contribute to postmortem analysis for high-impact incidents
Recommend and implement configuration changes or process improvements to prevent repeated issues
Update or create playbooks and troubleshooting guides for known WMS issues
Internal Tooling and Automation
Develop scripts or queries (e.g., SQL) to streamline log analysis, system diagnostics, or data validation
Propose internal utilities to detect edge-case failures or performance degradations early
Support development of internal test tooling and simulations for recurring business scenarios
Cross-Functional Collaboration
Work with Product/Development teams to escalate and fix production bugs
Collaborate with QA teams to validate fixes or reproduce intermittent issues
Partner with implementation teams to train staff on WMS behavior and provide escalation support
Company
Jobs via Dice
Welcome to Jobs via Dice, the go-to destination for discovering the tech jobs you want.
Funding
Current Stage
Early StageCompany data provided by crunchbase