Latest Posts

For my posts before 2022, please visit https://forrestbao.blogspot.com

How to do `dataset.Datasets.map()` with `rank` on multiple GPUs

I just spent 2 hours on following the official example of HuggingFace’s datasets library to do dataset.Datasets.map() with rank on multiple GPUs. It did not work as complained by many others on the Internet: datasets issue #6186, datasets PR #6415, and datasets PR #6550. Many wanted complete working code.
The unscablable Jupyter - Part I: scrolling up and down

(This is the first post of my series of thoughts on the limitations of Jupyter notebooks, the tool that every AI/ML/Data person uses.)
The post-2010 solution for connecting to WiFi from command line on Linux via nmcli

First, forget about solutions using iwconfig (which does not support WPA) or wpa_supplicant. They are outdated.
How to communicate with a Jupyter server

I have begun my journey to hack into Jupyter. The first step is to understand how to send code to a Jupyter kernel and get the results back.
Question generation and its evaluation

General info
- The input of a question generator can contain the answer (e.g., a span in the input document) or not. The former is answer-answer while the latter is answer-agnostic, or without-answer-supervision. The answer-agnostic case is similar to summarization.
- Shallow vs. Deep QG: Shallow, also called low cognitive demanding, if a question’s answer can be found in one sentence and/or is explicitly given. Recently, the focus shifts to multi-hop QG, where the answer can only be obtained by inferencing on multiple sentences. Thus, the questions are considered high cognitive demanding (HCD) or deep.
- The output of a question generator can be sentences or multiple choices. In the multiple choice case, a key is generate good distractors.
Finetuning GPT-style Language Models in Lit-GPT

A friend recommended me a library from LightingAI called Lit-GPT for finetuning GPT-style/generative/causal language models. I gave it a try. This blog post is about my experience.
Resources on Parameter-efficient Fine-Tuning (PEFT) of Language Models

Blogs & review papers
- https://lightning.ai/pages/community/article/understanding-llama-adapters/ (I highly recommend this blog post which has good graphics and code to explain the PEFT methods)
- Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
Defining commands and titles for a VSCode extension/plug-in

I recently need to access the commands and titles of VSCode plug-ins/extensions.
Jupyter is not for debugging (even in VSCode)

Jupyter is cool. It takes advantage of scripting or interpretting that variables, functions, and types/classes that you defined will remain in the memory for you to access handily. Jupyter wraps a notebook layer for that experience.
Multi Gpu Motherboard

Z690 motherboards for multi-GPU workstaions
Demystifying PCIe lanes in GPU computing for Deep Learning: Forgetting about AMD Threadrippers and Intel X-series
- I am building a multi-GPU workstation of my own.
- I previously owned an Intel i9 10980XE CPU.
- And I am here to tell you that the 48 PCIe lanes of Intel X-series CPUs and the 128 lanes of AMD Threadripper CPUs are overkills for multi-GPU computers despite that the GPUs can be most effective with 16 PCIe lanes each.
- Just get a regular desktop CPU from Intel or AMD (with Intel’s perferred), paired with a proper chipset.
The case against buying a multi-GPU workstation from a vendor

I have a few RTX 3090 GPU cards and few students working with me. I used to give one 3090 card to one student for running his own experiment on his own computer. Recently, we decide to pool those cards together to one workstation. The solutions from vendors fail to make me happy. So I decide to buy a CPU and a motherboard that can host multiple 3090 cards.

Latest Posts

General info

Z690 motherboards for multi-GPU workstaions