Alldus · 4 days ago
Software Engineer - Tech Lead - Scalable Data and Analytics Systems
Alldus is looking for exceptional software engineers to design and scale next-generation data processing and analytics platforms. The role involves architecting and optimizing pipelines and services that handle billions of records daily, enabling real-time transactions and analytical insights.
Responsibilities
Design and implement highly scalable OLTP systems for real-time workloads and OLAP systems for complex analytical queries across massive datasets
Build, optimize, and maintain large-scale batch and streaming data pipelines using frameworks such as Apache Spark, Flink, Presto/Trino, or Kafka Streams
Optimize systems for low-latency queries, high-throughput ingestion, and interactive analytics—ensuring seamless performance as data volumes grow to petabyte scale
Develop and integrate with modern storage and compute platforms (e.g., Snowflake, BigQuery, Redshift, Cassandra, HDFS, Delta Lake, Iceberg) to support hybrid analytical and transactional workloads
Ensure high availability, reliability, and robust monitoring across distributed compute and storage clusters with automated failover and recovery
Work closely with data scientists, ML engineers, and product teams to build unified, secure, and cost-efficient data platforms
Qualification
Required
Strong proficiency in Java, Scala, Python, or Go, with demonstrated experience building distributed back-end systems
Deep understanding of database internals, query optimization, indexing, and ACID vs. eventual consistency trade-offs
Hands-on experience with big data frameworks (e.g., Spark, Flink, Kafka) and distributed SQL engines (e.g., Presto, Trino, Hive, Impala)
Expertise in designing OLAP/OLTP architectures for scale and high concurrency
Solid foundation in distributed systems, concurrency, parallelism, and caching techniques
Preferred
Experience with HTAP (Hybrid Transactional/Analytical Processing) systems or real-time analytics platforms
Familiarity with data lakehouse architectures and formats such as Parquet, ORC, Delta, Iceberg, and Hudi
Knowledge of containerized deployments (Docker, Kubernetes) and cloud-native data architectures (AWS, GCP, Azure)
Background in query engine development or contributions to open-source OLAP/OLTP frameworks