Back to Jobs
M

Software Engineer II

Microsoft Hyderabad, Telangana, India

Apply for this Position

Job Description

"Unlock the power of AI-driven SRE and join the Azure SRE Agent Platform team at Microsoft, where you'll design and build cutting-edge systems that transform the way organizations detect, diagnose, and mitigate production issues."

As a Software Engineer II in Microsoft's CoreAI division, you'll be part of a high-performing team that develops and runs AI Agents as Service, empowering customers to maintain exceptional system reliability and uptime.

With a focus on quality, safety, security, enterprise scale, and real-world impact, our agents are 'virtual SRE teammates' that continuously watch systems, investigate problems, and recommend or perform fixes.

Why you should learn this:

The demand for AI-driven SRE solutions is skyrocketing, with a projected growth rate of 35% in the next 5 years, driven by the increasing need for organizations to maintain high system reliability and uptime.

Expected Salary: $140,000 - $200,000 per year, depending on location and experience

How it works:

  • Design and improve core capabilities that shape agent behavior, including tool design, planning and execution loops, orchestration, evaluation, and safety guardrails.
  • Build operational foundations that make agentic systems dependable, including monitoring, logging, and alerting.

Core Concepts to Master

1

Agent Architecture

Understand the design principles and patterns for building scalable, fault-tolerant, and secure AI-driven SRE systems, including microservices architecture, containerization, and service mesh.

2

Observability and Monitoring

Learn how to design and implement comprehensive observability and monitoring strategies for agentic systems, including logging, metrics, and tracing.

3

Machine Learning and AI

Master the application of machine learning and AI techniques for building predictive models, anomaly detection, and root cause analysis in SRE systems.

Interview Questions (Beginner)

  • What do you know about AI-driven SRE, and how do you think it can be applied in real-world scenarios?
  • Can you explain the difference between a microservices architecture and a monolithic architecture?
  • How would you approach designing a monitoring and logging strategy for a complex system?

Job Overview

CompanyMicrosoft
Employment TypeFull-time
LocationHyderabad, Telangana, India
Experience LevelFresher

Advance Questions

  • Design a scalable and fault-tolerant architecture for an AI-driven SRE system, including containerization and service mesh.
  • Implement a machine learning model for anomaly detection in a production system, and explain how you would deploy and monitor it.
  • Explain how you would approach debugging a complex issue in an agentic system, including tools and techniques you would use.