Research opportunities for passionate UW students
Published:
I frequently get contacted by students inquiring about projects, so this is an outline of the conditions and opportunities for UW students. That is, this article focuses on students who are currently enrolled in undergraduate or graduate studies at Madison.
The information below aims to provide clarity on the expectations, background, and skills for engaging in research projects under my supervision. While my approach and insights continue to evolve, my experience supervising BSc and MSc students over the past several years has given me a solid understanding of what works best.
If you are passionate about machine learning and eager to contribute to cutting-edge research, please read on to determine if this aligns with your academic and professional goals.
A. Do you work with BSc/MSc students?
B. What background should I have?
C. Beyond technical skills, what are your expectations?
D. How does the project evolve?
E. How do I get in touch to start a project?
G. Indicative Ideas and Directions
A. Do you work with BSc/MSc students?
Yes, I frequently work with BSc and MSc students on research projects. Below are indicative papers accepted as full papers in top-tier conferences such as NeurIPS, ICML, and ICLR. The projects below (in reverse chronological order) were completed by BSc/MSc students:
- Going beyond compositional generalization, DDPMs can produce zero-shot interpolation, ICML’24. With Justin.
- Learning to Remove Cuts in Integer Linear Programming, ICML’24. With Pol.
- Multilinear Operator Networks, ICLR’24. With Yixin.
- Generalization of Scaled Deep ResNets in the Mean-Field Regime, ICLR’24. With Yihang.
- Maximum Independent Set: Self-Training through Dynamic Programming, NeurIPS’23. With Lorenzo and Lars.
- Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization), NeurIPS’22. With Zhenyu.
- Sound and Complete Verification of Polynomial Networks, NeurIPS’22. With Elias.
- Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a Polynomial Net Study, NeurIPS’22. With Yongtao and Zhenyu.
- The Spectral Bias of Polynomial Neural Networks, ICLR’22. With Moulik.
I have thoroughly enjoyed working with each of these students, and I believe they have found the experience rewarding as well.
Notice that the list above is not frequently updated. Please check my publications or Google Scholar for the most updated list.
B. What background should I have?
Each project and student is unique. While curiosity and a willingness to learn quickly are more important than individual grades or specific courses, I have found that the students I collaborate best with typically have the following background:
- Strong Analytical Thinking: This often translates to having completed multiple Math courses, but extends beyond that. For example, the Algorithms course CS 577 is highly recommended.
- ML Courses: Familiarity with the content of UW’s excellent ML and Math courses is beneficial. I am particularly fond of courses such as ECE/CS 532, ECE 524 or at least introductory courses such as CS 540.
- Time to Learn: Research requires time to think and iterate. This is not a 1-hour-a-week alternative to an easy course. If you wish to work with me, please ensure you have the time and willingness to learn.
C. Beyond technical skills, what are your expectations?
While technical skills are essential, they are not the only significant skills required. Based on past experience, here is a partial (and imperfect) list of non-technical skills that are crucial:
- Curiosity: This is the most important skill imo. Although most projects start with an endgoal, we often drive the project forward guided by the student’s curiosity to explore a particular technique or component. This is best summarized by the quote of “Research is formalized curiosity. It is poking and prying with a purpose” by Zora Neale Hurston.
- Proactive Attitude: This is equally significant as curiosity. My weekly schedule includes many obligations, including teaching, committees, university service, service inside the ML community, organizing ML events, etc. Therefore, I do encourage students to have a proactive attitude with a fast turnaround from ideation to testing of ideas. Can we test this (crazy) idea today?
- Communication: Effective communication is key. You will lead in conducting experiments and deriving proofs, so your intuition on what to check next is important. Clear communication of your results and intuition is crucial for understanding and debating the next steps.
- Creativity: Research is not like undergraduate courses where there is a known solution to assignments. It requires creativity and thinking outside the box. Expect a curiosity-driven path where you will need to think and act creatively.
D. How does the project evolve?
Every project is unique in terms of its technical content, but here is the general outlook:
- If you have a project you are passionate about, that is a great starting point. Focusing on what excites you will likely result in interesting outcomes and fast progress.
- We will meet weekly, either in a reading group or one-on-one, to discuss your progress. Developing intuition about what to try next or identifying the current limiting factor is crucial.
- I prefer progress-oriented projects, where we focus on the progress of the project rather than the number of hours worked. This enables us to address the most limiting factors quickly.
E. How do I get in touch to start a project?
- If you have studied this entire article and it resonates with you, that is a very positive sign.
- Due to the high volume of emails I receive, I recommend sending a brief but targeted email justifying why you would like to work with me and which expectations excite you the most; please check the next question.
- If you are taking one of my courses this semester, it might be worthwhile to wait until the end of the semester to start the project. This way, you will gain experience from the course and we can interact during the course to see if you are interested in my perspective.
F. Alright, I am interested. What should the email contain?
Please ensure you have read the above question. If you want to get in touch, follow the guidelines below before sending the email. This is only applicable for active UW Madison students.
Title: [Project] Interest in UW Project - [Your Name]
Content:
- Year of study: 1st, 2nd, 3rd, …
- Home department: Math/CS/ECE/…
- Timeline for project start: ASAP, in 1 month, next semester, etc.
- Independent study: Yes/No. [An independent study is often the best way to conduct your project]
- [Optional] Project idea: … [If you have a concrete idea, please mention it. If not, consult some indicative ideas below]
Once I receive your information, I will likely respond with a few preliminary steps to assess alignment in terms of research interests and working style. If I do not respond within a week, please do not be disheartened. I receive many emails, and sometimes I might miss an email from a very strong student.
G. Indicative Ideas and Directions
There are numerous exciting research ideas, but here are a few that particularly intrigue me at the moment:
- Training Dynamics of Multilinear Networks
Our ICLR’24 paper demonstrated that multilinear networks can perform on par with vision transformers on standard image recognition benchmarks, which are traditionally used for prototyping architectures. This raises several intriguing questions: How do the training dynamics of these networks differ from standard transformers? The premise of these networks is built upon low-rank assumptions on the polynomial expansion, but the effect of the rank on the training dynamics remains understudied.
- Synthesizing Out-of-Distribution Examples with Generative Models
Generative models have recently shown impressive results in both images and text. Trained on large-scale datasets, the quality and quantity of synthesized examples are remarkable. However, a key question arises: How much of this success can be attributed to the architecture or learning objectives of the model? Our ICML’24 work demonstrated that interpolation between extreme attributes can be performed by existing diffusion models under specific learning objectives. But how does this extend to out-of-distribution examples?
If you have any further questions or feedback on the article, please feel free to get in touch. I am happy to refine my answers if they would be beneficial to a broader audience.