Job Summary:
HPC Engineer will be responsible for supporting the existing HPC setup, an engineer must have considerable knowledge of HPC, Storage, Operating Systems, Networking. Must have strong communication skills, including ability to interact with customer.
Technical qualifications for this role are:
Must have 3+ years' relevant experience providing technical support to end user customers for hpc and storage.
Good experience in managing HPC setup administration.
Good experience HPC project implementation and technical customer support.
Hands-on experience commercial or open source cluster deployment/management software i.e. Bright Cluster Manager, ROCKS, xCAT
Good knowledge of installing and configuring Linux services (eg: NIS, LDAP, NFS, SAMBA etc)
Good experience in Job Scheduler - (PBSpro, Torque, Slurm etc)
Good experience in Parallel File Systems - GPFS/Lustre
Hands-on experience with HPC Interconnects - InfiniBand, Ethernet and FC
Intermediate experience in HPC Application compiling, running and optimizing of : NAMD, LAMMPS, QE, VASP, Gromacs, Matlab, Ansys etc
Benchmarking: IOR & IOZONE HPL: CPU & GPU
Well experience in storage implementation and administration (DAS, NAS & SAN)
Excellent troubleshooting skills.
Document and automate system administration tasks and procedures.
Responsibilities for this role include but are not limited to:
Provide post-implementation, onsite support for customers to solve technical issues on HPC components including Storage, PFS and Applications.
Daily monitoring of HPC and Storage environment.
Handling customer tickets and ensure customers get regular updates regarding case status.
Manage all open cases to the fastest resolution.
Coordinating and follow up with OEM or ISV for the resolution of issue.
Develop innovative, customized solutions to meet customers' business needs.
Develop positive and trustworthy relationships with customers.
Ability to multi-task and manage competing priorities, ensuring all objectives are accomplished.
Daily Activities:
Professional Certifications:
RHCSA or RHCE Certified
Any other certification related to hpc & storage is plus
Keyskills: linux system admin cluster operations hpc engineer nagios redhat linux data center management slurm scheduler rhce disk management linux server administrator bright cluster manager hpc high performance computing cluster manager system admin samba
Locuz Enterprise Solutions ltd About LOCUZ : Convergence is not just in our logo. Convergence is our credo. We started in 2000, with a belief that the world of Technology Infrastructure will converge. Now, we see convergence happening inside the datacenter of every enterprise. We are a trusted ...