This job has closed.

Jobs via Dice · 11 hours ago

Site Reliability Engineer Connect WMS

Houston, TX

Full-time

Hybrid

Mid Level

Dice is the leading career destination for tech experts at every stage of their careers. Our client, SDVS Technologies LLC, is seeking a Site Reliability Engineer to monitor and resolve operational issues affecting WMS functions, debug application code, and improve system observability. The role involves collaboration with various teams to enhance performance and ensure seamless operations.

Computer Software

Responsibilities

Operational Issue Investigation and Quick Resolution

Monitor and respond to operational issues affecting WMS functions (e.g., receiving, shipping, inventory)

Analyze system logs, error reports, and transaction flows to identify anomalies or failures

Work closely with Level 1 support and warehouse operation teams to understand incident symptoms and timelines

Execute quick resolutions by using extended user rights, database interventions, or WMS configuration changes

Code-Level Debugging

Debug application code, workflows, customizations, and interfaces to identify bugs or performance bottlenecks

Collaborate with WMS QA team to reproduce issues in test environments and trace through application workflows to isolate root causes

Collaborate with Product/Development teams to propose, implement, and test code fixes

Real-Time System Monitoring

Use tools like Datadog or internal diagnostics to monitor WMS behavior

Proactively set up or refine alerts for failure patterns (e.g., inventory mismatches, interface timeouts, RF disconnects)

Improve observability by suggesting/implement better logging practices and metric coverage

Interface Troubleshooting

Investigate communication failures between WMS and other Products (e.g., LinOS, Link, EDI, Easymetrics)

Troubleshoot integration issues between the WMS and external systems (e.g., DevOps, DCOps)

Provide software-side support during integration testing, mainly remote and on-site by occasion

Incident Management & Escalation

Participate in on-call rotations or site support shifts for time-sensitive incidents

Coordinate with operations, IT, and engineering during critical events to ensure fast resolution

Document incidents thoroughly, including root causes, fixes, and follow-up actions

Post-Incident Review & Continuous Improvement

Contribute to postmortem analysis for high-impact incidents

Recommend and implement configuration changes or process improvements to prevent repeated issues

Update or create playbooks and troubleshooting guides for known WMS issues

Internal Tooling and Automation

Develop scripts or queries (e.g., SQL) to streamline log analysis, system diagnostics, or data validation

Propose internal utilities to detect edge-case failures or performance degradations early

Support development of internal test tooling and simulations for recurring business scenarios

Cross-Functional Collaboration

Work with Product/Development teams to escalate and fix production bugs

Collaborate with QA teams to validate fixes or reproduce intermittent issues

Partner with implementation teams to train staff on WMS behavior and provide escalation support

Qualification

WMS functionsCode debuggingReal-time monitoringSQLIncident managementPost-incident reviewAutomation scriptingCross-functional collaboration

Required