John Rankin
Altoona, WI
📞 (715) 296-0693
✉️ rankintoday@gmail.com
Summary
Software Development Manager and Senior HPC Engineer with 10+ years of experience building, automating, and operating large-scale high-performance computing systems. Deep expertise in HPC networking, cluster management, DevOps automation, and end-to-end system testing. Proven leader supporting mission-critical and exascale environments, driving quality, scalability, and operational efficiency.
Education
University of Wisconsin – Eau Claire
Bachelor of Science, Computer Science
Technical Skills
Languages & Automation
- Python, Bash, Ansible
- Java, JavaScript, Node.js
- REST APIs, Jenkins
Systems & Platforms
- Linux System Administration
- Redfish, Lustre, MySQL
Operating Systems & Tools
- RHEL, Rocky Linux, SLES
- Git, Jenkins, VS Code
- JIRA, Confluence, Grafana, IPMI
Cluster & Cloud Technologies
- HPE Performance Cluster Manager (HPCM)
- Bright Cluster Manager
- Kubernetes, OpenStack
- Cray HPC
Professional Experience
Hewlett Packard Enterprise (HPE) — Remote
Software Development Manager – HPC Networking
June 2022 – Present
- Own and lead development of all Switch CLI configuration utilities for the Slingshot networking product line.
- Own lab R&D, test, and development infrastructure supporting 30+ internal HPC systems.
- Partner with Product Management to plan initiatives, prioritize work, and align engineering resources with organizational goals.
- Lead DevOps and infrastructure automation efforts to provision bare-metal hardware efficiently, significantly reducing manual effort and turnaround time.
- Lead a team responsible for the Slingshot product testing suite, enabling continuous 24/7 end-to-end testing across:
- Network bring-up and configuration
- High availability
- Dragonfly and Fat Tree topologies
- Traffic validation, LAG/LACP, LLDP, and critical networking features
- Designed and drove development of an interactive Switch CLI integrated with API frameworks to improve usability and reduce R&D support burden.
- Own the R&D HPC system configuration tool used by internal stakeholders and customers, reducing cost and complexity for new system deployments and upgrades.
- Implement monitoring and observability enhancements including fabric counters, node metrics, telemetry pipelines, fault detection, and anomaly detection.
- Build, deploy, and maintain HPE Cray supercomputers, including compute nodes, service nodes, Slingshot and InfiniBand fabrics, storage, and management services.
- Provision compute node images using cluster management frameworks, handling drivers, kernel modules, OS configuration, and packages.
- Troubleshoot complex, multi-layer issues spanning networking (RDMA, congestion, routing), OS/kernel, firmware, cabling, and performance regressions.
- Support exascale and marquee customers, leading incident response, root-cause analysis, and cross-team coordination.
Hewlett Packard Enterprise (HPE) — Remote
Senior Software Engineer – HPC Networking
August 2020 – June 2022
- Developed features for a microservices-based REST application, including packet forwarding from fabric managers to database services.
- Built internal tooling for system configuration, cable validation, fabric bring-up, and overall network health assessment.
- Maintained software deployments across standalone and Kubernetes environments, validating containers for each release.
- Created an internal documentation platform enabling collaborative Markdown contributions for Slingshot installation and troubleshooting guides used by customers.
- Designed and implemented a modular Slackbot integrating with Jenkins and internal systems to report build status and manage lab system reservations.
- Maintained Slingshot DNS infrastructure within Kubernetes environments.
Cray / Hewlett Packard Enterprise — Chippewa Falls, WI
Senior Software Engineer – HPC System Management, Configuration & Test
May 2014 – August 2020
- Designed and implemented end-to-end HPC cluster automation, from bare-metal installation through full software configuration using Python, Bash, Ansible, DHCP, and PXE.
- Reduced full system bring-up time from one week to approximately four hours through automation improvements.
- Defined installation and test processes for Bright Cluster Manager in large, distributed environments.
- Designed and implemented hardware testing suite software for HPC systems.
- Developed tools to analyze hardware performance and efficiency.
- Provided system configuration guidance and validation based on customer and internal requirements.
- Supported one-off hardware and software configurations under aggressive delivery timelines.
- Collaborated with Cray R&D on major platforms including Shasta (Kubernetes) and Urika-GX (OpenStack), aligning manufacturing requirements with engineering expectations.
- Troubleshot early hardware and software failures during new platform bring-up.
- Independently designed and built a custom HPC cluster management and testing solution capable of operating on 64+ cabinets / 1,000+ nodes, preventing shipment delays and avoiding significant financial penalties.
JB Systems LLC — Eau Claire, WI
Web Developer
September 2013 – May 2014
- Developed and maintained multiple web applications using the CodeIgniter PHP MVC framework.
- Designed and implemented MySQL database solutions.
- Delivered new features rapidly based on evolving client requirements.
Liberty Mutual Insurance — Wausau, WI
Software Developer Intern (SharePoint)
May 2013 – August 2013
- Built an internal web application from the ground up using SharePoint, C#, JavaScript, jQuery, and Knockout.js.
- Created deployment documentation and maintained internal Wikis.
- Developed a SharePoint 2013 App proof of concept.
Menards Inc. — Eau Claire, WI
Software Developer Intern
September 2012 – February 2013
- Contributed to a Spring MVC Java web application used in a commercial product.
- Implemented backend and frontend features in a production environment.