Summary
Overview
Work History
Education
Skills
Technical skills
Software
Thesis Project
Accomplishments
Timeline
Generic

Kalpak Burgul

SoC Performance Architect
Bengaluru

Summary

As an HPC enthusiast eager to work in parallel computing architectures performance tuning and contribute towards exascale computing

Experienced in performance analysis at SoC interfaces and fine tuning HPC/ML workloads

Being interested to work in power-on eco systems have successfully collaborated multiple projects power-on while driving PnP activities

Overview

7
7
years of professional experience
6
6
years of post-secondary education

Work History

SoC Performance Architect

Intel Technologies India Pvt. Ltd.
03.2024 - Current

Working development of performance models and characterizing latency-bandwidth profiles for SoC interfaces on Intel Xeon servers.

Current exploration includes CXL memory performance profiling and identifying gaps with respect to core, cache coherency interconnects and DRAM-CXL memory balance.

Performance and Power - Post Silicon

Intel Technologies India Pvt. Ltd.
8 2019 - 02.2024
  • Worked on constructing SoC debug methodology for Intel GPU Max series 1100 code named Pontevecchio to meet performance goals based on Architectural targets.
  • Charted mostly performance measurement procedures, profiling - performance debug flows, identifying gaps and causes for drop in performance and mitigation strategies with hardware and software workarounds to minimize perf drop.
  • Delivered with this methodologies across multiple SoC interfaces
  • Xe-link - multi GPU interface using SERDES for high bandwidth connectivity - Scaling from 2 to 8 GPU connectivity models. Worked on performance tuning for best bandwidth to provide inputs to software stack alignment with hardware config
  • Intra-die connectivity - tile-to-tile interface to achieve better scaling across memory and compute per GPU
  • GPU peer-to-peer connectivity using PCIE switch to scale across multi-GPU node
  • PCIE interface - exploiting causes for gap and performance tuning to meet architectural targets.
  • Implemented Multi-GPU-to-socket interface PCIE performance measurements exploiting blade level memory limitations across Xeon servers and performance impact with UPI for socket to socket interface
  • Worked on memory performance tuning for HBM 2.0 chasing architectural targets to improve GPU local memory performance using various memory latency and timings parameters
  • Worked on performance tuning debug of HPC and ML workloads which includes micro architecture level compute rate measurements and hardware performance debug to root cause SW and HW bugs and propose optimizations to improve performance across workloads

Firmware Engineer LTE

Intel Technologies India Pvt. Ltd
07.2018 - 08.2019
  • Developed test cases for component verification in firmware at physical layer of LTE protocol stack and collaborated in debug for Modem simulation bring up.
  • Developed test cases for functional checkout and debugging in Silicon Polaris Modem at component level

Firmware Validation Engineer, Intern

Intel Technologies India Pvt. Ltd
07.2017 - 04.2018
  • Supported in functional validation during power-on at unit level debug for modem firmware with 3GPP standards
  • Supported in configuration of base station simulator and 3G and 4G signal strength measurements


Education

M.Tech - Embedded Systems

Vellore Institute of Technology
Vellore, TN
06.2016 - 04.2018

Bachelor of Engineering - Electronics And Telecommunication

Walchand Instituteof Technology
Solapur
06.2010 - 05.2014

Skills

    C programming

C programming

Python programming

Linux kernel programming

Technical skills

GPGPU and SoC architecture level performance analysis and optimization

Memory architecture and HBM perf optimization at hardware and software level

PCIE architectural understanding and perf optimization

Xe-link/NV-link debug and optimization

SoC level power management and optimizations using Pmax, Psys, RAPL

Software

OpenMP and MPI

OpenCL

Thesis Project

The project deals with implementation of real time scheduling algorithm in kernel of non-real time OS for real time systems by reducing the latency time. The implementation gives a result of 50 to 70 ns latency improvement approx for modified kernel of non-real time OS.

https://www.ijitee.org/wp-content/uploads/papers/v8i11/K25320981119.pdf

Accomplishments

    Received DRA for Excellent planning and execution for Discrete Server GPU based workloads for performance tuning.


Timeline

SoC Performance Architect

Intel Technologies India Pvt. Ltd.
03.2024 - Current

Firmware Engineer LTE

Intel Technologies India Pvt. Ltd
07.2018 - 08.2019

Firmware Validation Engineer, Intern

Intel Technologies India Pvt. Ltd
07.2017 - 04.2018

M.Tech - Embedded Systems

Vellore Institute of Technology
06.2016 - 04.2018

Bachelor of Engineering - Electronics And Telecommunication

Walchand Instituteof Technology
06.2010 - 05.2014

Performance and Power - Post Silicon

Intel Technologies India Pvt. Ltd.
8 2019 - 02.2024
Kalpak BurgulSoC Performance Architect