About Us

Angel One Limited (formerly known as Angel Broking Limited) is a Fintech company providing broking services, margin trading facility, loan against shares, research services, depository services, investment education and financial products distribution to its clients, on a mission to become the No. 1 fintech organization in India. With more than 11 million registered clients and over 28 million app downloads, we are onboarding an average of about 400K new clients every month. We are working to build personalized financial journeys for our clients via a single app, powered by new-age engineering tech and Machine Learning. We are a group of self-driven, motivated individuals who enjoy taking ownership and believe in providing best value for money to investors through innovative products and investment strategies. We apply and amplify design thinking with our products and solution. We have a flat structure, with ample opportunity to showcase your talent and a growth path for engineers to the very top. We are remote-first, with people spread across the country. Come on Onboard, to become a part of this epic journey! We are aggressively hiring Engineers, Product Managers & Data science rockstars across India.. check out our career’s sections!

About The Role

Job Description: Staff Site Reliability Engineer (Staff SRE)

Key Responsibilities

Leadership & Strategy

  • Lead site reliability engineering efforts, establishing best practices and guiding the team in their implementation.
  • Collaborate with cross-functional teams to ensure SRE practices align with business objectives.
  • Drive the strategic vision for system and network reliability and performance.

Systems and Network Management

  • Oversee the design, implementation, and maintenance of systems and network infrastructure to ensure high availability, reliability, and performance.
  • Manage system configuration, monitoring, and performance tuning to optimize overall system performance.
  • Implement and maintain robust backup, recovery, and disaster recovery strategies for both systems and networks.

Cloud Infrastructure Management

  • Leverage AWS services to build and maintain scalable and resilient infrastructure.
  • Design, deploy, and manage AWS-based solutions to ensure optimal performance and cost-efficiency.
  • Implement best practices for cloud security and compliance.

Automation & Tooling

  • Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention.
  • Implement Infrastructure as Code (IaC) practices to manage infrastructure efficiently and consistently.

Incident Management

  • Lead incident response efforts for system and network-related issues, ensuring quick resolution and minimal impact on business operations.
  • Conduct root cause analysis and implement preventative measures to avoid future incidents.
  • Establish and enforce incident response protocols to ensure swift and effective resolution.

Monitoring & Metrics

  • Design and implement comprehensive monitoring solutions to track the health and performance of systems and network infrastructure.
  • Develop and maintain metrics and dashboards to provide visibility into operations and performance.
  • Utilize data-driven insights to proactively identify and address potential issues.

Capacity Planning & Optimization

  • Perform thorough capacity planning to ensure systems and network infrastructure can scale to meet future demands.
  • Optimize configurations and resource allocation to enhance performance and efficiency.
  • Continuously evaluate and improve performance to support business growth.

Security & Compliance

  • Ensure systems and network infrastructure comply with security standards and regulatory requirements.
  • Implement security best practices to protect data integrity and confidentiality.
  • Conduct regular security audits and risk assessments.

Technical Skills

  • Expertise in networking technologies, protocols, and concepts (TCP/IP, DNS, load balancing, firewalls, etc.).
  • Strong knowledge of AWS services and cloud-native architecture (EC2, S3, VPC, Lambda, RDS, etc.).
  • Proficiency in systems administration (Linux/Unix) and automation tools (Python, Bash, Terraform, Ansible, etc.).
  • Experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.).

Desired Outcomes

  • Achieve and maintain 99.99% system and network uptime.
  • Implement automation solutions that reduce manual intervention by 50%.
  • Decrease incident response time by 40%.
  • Improve system and network performance metrics by 30% within the first year.
  • Ensure compliance with all relevant security and regulatory standards.

Working With Us

We strive to help millions of Indians make informed investment decisions. We constantly are in search of the brightest, most talented individuals to join our world-class team. We give each of our teammates the freedom to ideate, innovate, express, solve and create customer experience through #Fintech & #ConsumerTech. With our continuous learning interventions and upskilling, we carved out the best of each of our teammates.

Benefits

  •    We have a “work-from-anywhere” policy.
  •    We have flexible work timing and fast-track promotions for all our employees.
  •    Our employees enjoy highly competitive pay structures. One of the best!
  •    AngelOne has been 6-time certified as a “Great Place To Work” culture.
  •    Long & Short term incentives to reward our team members for their efforts.
  •    Enjoy benefits like: Annual Leaves, Insurance Mediclaim, Variable Pay plans, internet allowance and many more.