We are seeking an experienced Cloud AIOps Architect to lead the design and implementation of advanced AI-driven operational systems across multi-cloud and hybrid cloud environments. This role demands a blend of technical expertise, innovation, and leadership to develop scalable solutions for complex IT systems with a focus on automation, machine learning, and operational efficiency.
Responsibilities
Architect and design the AIOps solution leveraging AWS, Azure, and Cloud Agnostic services, ensuring portability and scalability
Develop an end-to-end automated machine learning (ML) pipeline from data ingestion, DataOps, model training, to inference pipelines across multi-cloud environments
Design hybrid architectures leveraging cloud-native services like Amazon SageMaker, Azure Machine Learning, and Kubernetes for development, model deployment, and orchestration
Design and implement ChatOps integration, allowing users to interface with the platform through Slack, Microsoft Teams, or similar communication platforms
Leverage Jupyter Notebooks in AWS SageMaker, Azure Machine Learning Studio, or cloud-agnostic environments to create model prototypes and experiment with datasets
Lead the design of classification models and other ML models using AWS SageMaker training jobs, Azure ML training jobs, or open-source tools in a Kubernetes container
Implement automated rule management systems using Python in containers deployed to AWS ECS/EKS, Azure AKS, or Kubernetes for cloud-agnostic solutions
Architect the integration of ChatOps backend services using Python containers running in AWS ECS/EKS, Azure AKS, or Kubernetes for real-time interactions and updates
Oversee the continuous deployment and retraining of models based on updated data and feedback loops, ensuring models remain efficient and adaptive
Design platform-agnostic solutions to ensure that the system can be ported across different cloud environments or run in hybrid clouds (on-premises and cloud)
Requirements
13+ years of overall experience and 7+ years of experience in AIOps, Cloud Architecture, or DevOps roles
Hands-on experience with AWS services such as SageMaker, S3, Glue, Kinesis, ECS, EKS
Strong experience with Azure services such as Azure Machine Learning, Blob Storage, Azure Event Hubs, Azure AKS
Hands-on experience working on the design, development, and deployment of contact centre solutions at scale
Proficiency in container orchestration (e.g., Kubernetes) and experience with multi-cloud environments
Experience with machine learning model training, deployment, and data management across cloud-native and cloud-agnostic environments
Expertise in implementing ChatOps solutions using platforms like Microsoft Teams, Slack, and integrating them with AIOps automation
Familiarity with data lake architectures, data pipelines, and inference pipelines using event-driven architectures
Strong programming skills in Python for rule management, automation, and integration with cloud services
Experience in Kafka, Azure DevOps, and AWS DevOps for CI/CD pipelines
We offer
Opportunity to work on technical challenges that may impact across geographies
Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications
Opportunity to share your ideas on international platforms
Sponsored Tech Talks & Hackathons
Unlimited access to LinkedIn learning solutions
Possibility to relocate to any EPAM office for short and long-term projects
Focused individual development
Benefit package:
Health benefits
Retirement benefits
Paid time off
Flexible benefits
Forums to explore beyond work passion (CSR, photography, painting, sports, etc.)
Job Classification
Industry: IT Services & Consulting Functional Area / Department: Engineering - Software & QA, Role Category: Software Development Role: Technical Architect Employement Type: Full time