Breadcrumb

AI-Empowered Thermal Modeling and Run-Time Management for Manycore Processor and Chiplet Designs

 

Principle Investigators

Graduate Students

Current Students

  • Jincong Lu
  • Subed Lamichlane 

Graduate Students (graduated)

  • Sheriff Sadiqbatcha 
  • Shuyuan Yu
  • Wentian Jin
  • Jinwei Zhang
  • Mohammadamir Kavousi
  • Yibo Liu
  • Liang Chen (post-doc, SJTU)
  • Han Zhou (First job: Synopsys)

Industry Liaisons


Funding

  1. National Science Foundation CISE CCF Core Small program (CCF-1816361), "SHF:Small: Data-Driven Thermal Monitoring and Run-Time Management for Manycore Processor and Chiplet Designs", $500,000,  Oct. 1st, 2021  to Sept 30th, 2024, single PI.

Project Goals

This project seeks to develop a new generation of data-driven  super fast  thermal modeling/monitoring and smart run-time thermal/power and
reliability management techniques by harnessing the latest advances in
deep learning  and numerical methods for commercial multi/many core
processors and chiplet design. The project capitalizes the unique thermal IR imaging system at VSCLAB for measurement of commercial multi/many cores processors and future chiplet based system in a package design. 

This project also leads to the first open source "Thermal map database for commercial CPU/GPU/TPU multi/many core processors"

 

Project Highlights 


thermGAN results
The proposed ThermGAN performance and comparison on Intel i7-8650U CPU (4 cores)

 

ThermTransformer highlight
The proposed thermTransformer result and comparison on AMD R7-4800U 4-core CPU

Publications

 

Journal publications

  1. S. Sadiqbatcha, J. Zhang, H. Zhao, H. Amrouch, J. Hankel, S. X.-D. Tan, “Post-silicon heat-source identification and machine-learning-based thermal  modeling using infrared thermal imaging”, IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD),  July 2020, 10.1109/TCAD.2020.3007541
  2. S. Sadiqbatcha, J. Zhang, H. Amrouch and S. X.-D. Tan, “Real-time full-chip thermal tracking: a post-silicon, machine learning perspective”, IEEE Transaction on Computers (TC),  June, 2021. 10.1109/TC.2021.3086112
  3. J. Zhang, S. Sadiqbatcha,  M. O’Dea, H. Amrouch and S. X.-D. Tan, “Full-chip power density and thermal map characterization for commercial microprocessors under heat sink cooling”,  IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), Vol. 41, No. 5, pp. 1453-1466, May 2022, 10.1109/TCAD.2021.3088081. 
  4. L. Chen, S. Sadiqbatcha, H. Amrouch and S. X.-D. Tan, “Electrothermal simulation and optimal design of thermoelectric cooler using analytic approach”, IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems(TCAD), vol. 41, no. 9, page: 3066-3077, 2022, 10.1109/TCAD.2021.3120533
  5. J. Zhang, S. Sadiqbatcha, L. Chen, C. Thi, S. Sachdeva, H. Amrouch and S. X.-D. Tan, “Hot-spot aware thermoelectric array based cooling for multicore processors”, Integration, vol. 89, pp. 73-82, 2023, https://doi.org/10.1016/j.vlsi.2022.11.006.
  6. J. Zhang, S. Sadiqbatcha and S. X.-D. Tan, “Hot-Trim: thermal and reliability management for commercial multi-core processors considering workload dependent hot spots”, IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 42, no. 7, pp. 2290-2302, July 2023, doi: 10.1109/TCAD.2022.3216552. 
  7. L. Chen, W. Jin, J. Zhang and S. X.-D. Tan, “Thermoelectric cooler modeling and optimization via surrogate modeling using implicit physics-constrained neural networks”, IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 42, no. 11, Nov. 2023.  10.1109/TCAD.2023.3269385

Conference publications

  1. S. Sadiqbatcha, H. Zhao, H. Amrouch, J. Henkel and S. X-D. Tan, "Hot spot identification and system parameterized thermal modeling for multi-core processors through infrared thermal imaging”, Proc. Design, Automation and Test in Europe (DATE'19),  Florence,  Italy, March 2019.
  2. Z. Sun, H. Zhou, and S. X.-D. Tan, “Dynamic reliability management for multi-core processor based on deep reinforcement learning”, International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD’19), Lausanne, Switzerland, July  2019.
  3. S. Sadiqbatcha, Y. Zhao, J. Zhang, H. Amrouch, J. Henkel and S. X.-D. Tan, "Machine learning based online full-chip heatmap estimation," Proc. Asia South Pacific Design Automation Conference (ASP-DAC’20), Beijing, China,  Jan. 2020. (35% acceptance rate)
  4. J. Zhang, S. Sadiqbatcha, W. Jin and S. X.-D. Tan, “Accurate power density map estimation for commercial multi-core microprocessors”, Proc. Design, Automation and Test in Europe (DATE’20), Grenoble, France, March 2020. (26% acceptance rate)
  5. S. Yu, H. Zhou, H. Amrouch, J. Henkel, S. X.-D. Tan, “Run-time accuracy reconfigurable stochastic computing for dynamic reliability and power management: work-in-progress”, Proc. International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’20), ESWeek 2020,  Sept  2020.
  6. W. Jin, S. Sadiqbatcha, J. Zhang and S. X.-D. Tan, “Full-chip thermal map estimation for multi-core commercial CPUs with generative adversarial learning”, Proc. IEEE/ACM International Conf. on Computer-Aided Design (ICCAD’20), San Diego, CA, Nov.  2020. (invited), https://doi.org/10.1145/3400302.3415764
  7. J. Zhang, S. Sadiqbatcha, Y. Gao, M. O’Dea, N. Yu, and S. X.-D. Tan, “HAT-DRL: Hotspot-Aware Task Mapping for Lifetime Improvement of Multicore System using Deep Reinforcement Learning”, Proc. 2nd IEEE/ACM Workshop on Machine Learning for CAD (MLCAD’20), Virtual Event, Nov. 2020, doi: 10.1145/3380446.3430623.
  8. L. Chen, W. Jin and S. X.-D. Tan, "Fast thermal analysis for chiplet design based on graph convolution networks”,  Proc. Asia South Pacific Design Automation Conference (ASP-DAC’22), virtual,  Jan. 2022. (invited), doi: 10.1109/ASP-DAC52403.2022.9712583
  9. J. Lu J. Zhang, W. Jin and S. Sachdeva and S. X.-D. Tan, “Learning based spatial power characterization and full-chip power estimation for commercial TPUs”, Proc. Asia South Pacific Design Automation Conference (ASP-DAC’23),   Japan,  Jan. 2023. (invited), https://doi.org/10.1145/3566097.3568347

  10. J. Lu, J. Zhang and S. X.-D. Tan, “Real-time thermal map estimation for AMD multi-core CPUs using transformer”, Proc. IEEE/ACM International Conf. on Computer-Aided Design (ICCAD’23)San Francisco, CA,  Nov.  2023, 10.1109/ICCAD57390.2023.10323817

  11. L. Chen, J. Lu, W. Jin and S. X.-D. Tan, “Fast full-chip parametric thermal analysis based on enhanced physics enforced neural networks”, Proc. IEEE/ACM International Conf. on Computer-Aided Design (ICCAD’23), San Francisco, CA,  Nov.  2023, 10.1109/ICCAD57390.2023.10323696

  12. J. Lu and S. X.-D. Tan, “Thermal map dataset for commercial multi/many core CPU/GPU/TPU”, Proc. Of 6th ACM/IEEE International Symposium on Machine Learning for CAD (MLCAD'24), Snowbird, Utah, Sept, 2024, https://doi.org/10.1145/3670474.36859