Job Description
Description
1. Coverage requirement is 24/7 with continuous work through Saturday and Sunday each week (Working days Thursday-Monday or Friday-Tuesday, this will be decided later)
2. It would be best to have them in Mumbai office, not others as this is speeding up onboarding process/knowledge transfer significantly.
3. Starting date: as soon as possible, preferrable the 1st of November
4. Technical requirements:
Job Duties and Responsibilities:
- Maintain and monitor our environments in a 24/7 rotation system
- Develop and improve our monitoring systems, automate repetitive tasks
- Cooperate with international teams
- Identify and address performance challenges
- Document and communicate progress on resolving issues
Must Have: - Working knowledge of Linux and networking
- Track record of improving and maintaining monitoring tools (Icinga, Zabbix, Prometheus, Grafana, OpentsDB)
- Incident management skills - must be able to own, cooperate and resolve large scale incidents under time pressure
- Troubleshooting skills to hunt down the root causes of issues and persistence in preventing them from happening again
- Experience handling large numbers of diverse systems with configuration management systems like Puppet, Ansible, Terraform
- Knowledge of both self-hosted and cloud environments (preferably the Google Cloud Platform)
- Ability to work effectively in a globally distributed team structure
- Good English skills (B2+) to effectively communicate about technical matters
Nice to Have :
- Coding experience in Python/Golang/Perl/Ruby
- GCP certificate, CCNA certificate, RHCE or equivalent
- Experience using CI/CD tools
- Operational knowledge of the ELK stack
Qualifications
1. Coverage requirement is 24/7 with continuous work through Saturday and Sunday each week (Working days Thursday-Monday or Friday-Tuesday, this will be decided later)
2. It would be best to have them in Mumbai office, not others as this is speeding up onboarding process/knowledge transfer significantly.
3. Starting date: as soon as possible, preferrable the 1st of November
4. Technical requirements:
Job Duties and Responsibilities:
- Maintain and monitor our environments in a 24/7 rotation system
- Develop and improve our monitoring systems, automate repetitive tasks
- Cooperate with international teams
- Identify and address performance challenges
- Document and communicate progress on resolving issues
Must Have: - Working knowledge of Linux and networking
- Track record of improving and maintaining monitoring tools (Icinga, Zabbix, Prometheus, Grafana, OpentsDB)
- Incident management skills - must be able to own, cooperate and resolve large scale incidents under time pressure
- Troubleshooting skills to hunt down the root causes of issues and persistence in preventing them from happening again
- Experience handling large numbers of diverse systems with configuration management systems like Puppet, Ansible, Terraform
- Knowledge of both self-hosted and cloud environments (preferably the Google Cloud Platform)
- Ability to work effectively in a globally distributed team structure
- Good English skills (B2+) to effectively communicate about technical matters
Nice to Have :
- Coding experience in Python/Golang/Perl/Ruby
- GCP certificate, CCNA certificate, RHCE or equivalent
- Experience using CI/CD tools
- Operational knowledge of the ELK stack
Secondary Skills : L1 Support
1. Coverage requirement is 24/7 with continuous work through Saturday and Sunday each week (Working days Thursday-Monday or Friday-Tuesday, this will be decided later)
2. It would be best to have them in Mumbai office, not others as this is speeding up onboarding process/knowledge transfer significantly.
3. Starting date: as soon as possible, preferrable the 1st of November
4. Technical requirements:
Job Duties and Responsibilities:
- Maintain and monitor our environments in a 24/7 rotation system
- Develop and improve our monitoring systems, automate repetitive tasks
- Cooperate with international teams
- Identify and address performance challenges
- Document and communicate progress on resolving issues
Must Have: - Working knowledge of Linux and networking
- Track record of improving and maintaining monitoring tools (Icinga, Zabbix, Prometheus, Grafana, OpentsDB)
- Incident management skills - must be able to own, cooperate and resolve large scale incidents under time pressure
- Troubleshooting skills to hunt down the root causes of issues and persistence in preventing them from happening again
- Experience handling large numbers of diverse systems with configuration management systems like Puppet, Ansible, Terraform
- Knowledge of both self-hosted and cloud environments (preferably the Google Cloud Platform)
- Ability to work effectively in a globally distributed team structure
- Good English skills (B2 ) to effectively communicate about technical matters
Nice to Have :
- Coding experience in Python/Golang/Perl/Ruby
- GCP certificate, CCNA certificate, RHCE or equivalent
- Kubernetes knowledge
- Experience using CI/CD tools
- Operational knowledge of the ELK stack
Job Classification
Industry: IT-Software, Software Services
Functional Area: IT Software - Application Programming, Maintenance,
Role Category: Admin/Maintenance/Security/Datawarehousing
Role: Admin/Maintenance/Security/Datawarehousing
Employement Type: Full time
Education
Under Graduation: Any Graduate in Any Specialization
Post Graduation: Post Graduation Not Required
Doctorate: Doctorate Not Required, Any Doctorate in Any Specialization
Contact Details:
Company: Xoriant Solutions
Location(s): Mumbai
Keyskills:
Networking
Linux
Coding
Configuration management
Incident management
Perl
Troubleshooting
CCNA
Operations
RHCE