First Semester at CMU!
Hey there! In this post, I will talk about my experience in my very first semester as a Masters in CS student at Carnegie Mellon University (which happens to be the #1 ranked CS university in the world). The article will be an ill-organized dump of all that happened in this crazy semester, from moving to US for the first time to choosing the right courses to internship search. This article will not only help me have a retrospective on what went right and what could have gone better, but also to any new students who want that “senior’s” advice on what to do in the first semester :)
Moving to the US
This was my very first time moving out of my country, India and that too with 46 kgs of luggage and a whole load of dreams on my shoulders. No one in my family had previously done a Masters, let alone doing it abroad. While I was super excited for this new opportunity, there was also this lurking fear and anxiety of a lot of “what-if’s”. I knew that I was headed to one of the best universities of the world for CS, but I had given away a very good job offer in India, and was going thousands of miles away from my family and loved ones. It’s not always about the logic, but also about the emotions. And emotions do overpower you sometimes. Anywaysss, this is a lecture for another day. Coming back to the moving in story…
To make things smooth, we (my flatmates Daksh, Amit and me) had sorted out the housing already by May, and also got done with all of the tedious visa formalities well in time. We flew by Etihad Airlines from Delhi -> Abu Dhabi -> Boston -> Pittsburgh. We arrived a couple of weeks before classes started to make sure we settled in well, and also figured out any formalities that needed to be done. We collected our ID’s from CMU to avail the free buses, and then went around Pitts exploring the city and doing a lot of shopping xD. We had also started cooking since we were damn sure that we aren’t going to be eating out every day, to have mercy both on our gut and wallet. Meal prep was coming in real handy, and we definitely got better at cooking through the semester. We would do meal-preps for 6-7 dishes on weekends, and the food would last for the coming week which ensured we didn’t need to cook (not that we had the energy to do it everyday anwyays 😅). There was a lot of culture shock for me, especially the roads. From where I come, there is a constant honking of vehicles throughout the day and there is no concept of lanes or traffic signals. To see people driving in such disciplined manner, and constantly spotting Corvettes, Porsches and Mercs, I was quickly in awe with the road infra here. I plan on practicing driving as much as possible in this winter break, so I can go out for some sweet road trips in the next semester 🛣.
Courses and their reviews
So moving on to stuff of actual substance here. Well, the MS in CS program at CMU needs 108 qualifying units which translates to 9 standard 12-unit courses. Most of my seniors and our academic advisor suggested to take only 3 courses in the first semester, since it’d help us settle in to the workload here and also set time aside for internship search. I will explain the reasoning behind taking each course and a short review on what I liked and what didn’t work for me so much.
Distributed Systems (15640)
So, setting some context here, when I got admitted to the MSCS program at CMU, my broad aim was to specialize in ML Systems, whatever that meant. Upon doing some “research”, I found out that ML Systems is actually very broad and I need to have a good grasp on the basics of both ML and Systems, and then build upon that with advanced specialized courses. Taking Dist Sys in the first semester not only satisfied my Systems breadth requirement, but also helped me prepare for internship interviews incase I applied to SWE roles. The course was very breadth-heavy, and there was a lot of focus on the programming projects/assignments. We had 4 projects in the course, all of them in the Go programming language:
- Implement a key-value database server in Go, which has a server that handles various operation on the key-value store.
- Implement Live Sequence Protocol, a custom protocol for providing reliable communication with simple client and server APIs on top of the UDP protocol. Then, implemented a distributed Bitcoin Miner on top of this LSP protocol.
- Implement the Raft consensus algorithm in Go, which uses log replicas on multiple servers to ensure fault-tolerance.
- Implement parts of the backend storage and actor system to achieve consistency in a multiplayer online game.
The course covered a lot of topics that you must know if you’re ever going to work with systems involving more than 5 servers or more than a 100 users. The course website is linked here and the slides should be available to public: 15-440/640 Distributed Systems. I wish I had more time to go through the reading materials in details, especially the Tannenbaum book. The lectures could definitely have been better, but I found that being compensated with the projects. There were written homeworks too, which served as nice revision materials before midterms. I can confidently say that this course will help you significantly not just from a learning perspective, but also to impress your interviewers. I clearly remember an interview with a fast-growing startup, and they were quite impressed by my knowledge of both ML and Systems, where Dist Sys had a huge role to play in basic understanding of Systems jargon. This course will help you know what terms you need to Google or what resources you need to refer to when you find yourself designing a scalable system. I got an A+ in this course, after having bombed the second midterm somewhat. You can easily score almost full on all projects, homeworks and participation if you are diligent enough, and these form a hefty chunk of the grade weightage.
Advanced Introduction to ML (PhD) (10715)
This was a course I was most confused while taking, just because of the sheer number of Intro to ML courses CMU has to offer. We had 10601, 10701 and then 10715. 715 was supposed to be the most theoretical and advanced, aimed mainly at PhD students in the MLD. I chose 715 over 701 as I already had some background in ML, and didn’t want to learn the same topics over again. 715 had bits of learning theory in the first half of the course, and I had heard very good things about the instructor, Nihar Shah. Fun fact: I ran into Nihar at the Abu Dhabi airport while coming to the US.
Coming to the course contents, the course started from absolute zero, building intuition about Machine Learning, SVM’s, Learning Theory in the first half, and then focusing on recent developments in ML in the second half covering topics like Neural Networks, State-Space models, Transformers, RL and Diffusion Models. The thing I liked most about this course was that Nihar used to teach in the traditional way, using chalk and blackboard. I don’t know if there’s scientific evidence to back this, but I find myself learning and focusing a whole lot better in this medium, compared to the common slideshow-based teaching. I made notes actively throughout the class, and that definitely helped retain a lot of information. From an interview perspective, it helped me in all of the ML-related interviews I had and definitely helped me analyze ML algorithms with a more intuitive eye. My complaints with all the Intro to ML classes during my undergrad or MOOC’s was that they felt a lot like information being thrown at you, and not something that is intuitively developed. This course is just like 3blue1brown’s series on ML, developed into a full-course by one of the best ML departments in the world. The course audience was mostly MSML and PhD students, with only a sprinkle of MSCS students (including me) since it was told to us that this course is too theory-heavy, but the other courses I took helped me complement this with their focus on assignments. This is an old course website: Fall’23 offering. I got an A in this course, thanks to a good comeback in midterm 2 after having bombed midterm 1 heavily. The assignments were a cakewalk, and were just aimed at making sure the students get some hands-on practice, but I wouldn’t say that this is sufficient to make you proficient in writing PyTorch code, as a lot of it is already implemented for you. I heard bad reviews on teaching in 10701 (Fall’24) from my peers, but that course was definitely more programming and assignment heavy than 715.
ML with Large Datasets (10605)
I had planned to take ML with Large Datasets sometime during my time at CMU, and schedule conflicts meant that first semester was not a bad time to take it. Also this course could be useful from a recruitment perspective, given the rising interest for ML Engineers who know how to work with humongous amounts of data. I really enjoyed the teaching in this course by Prof. Virginia Smith. I was fairly active during interactions in this class, and learnt a lot about the nitty-gritties of how things change when we go from toy benchmark datasets like MNIST or CIFAR to actual big datasets used in the industry. You can no longer run a lot of algorithms, and we fallback to making a lot of reasonable assumptions to approximate the results. The later half of the course covered more modern topics like Distributed training of models, Low-rank approximations, Federated Learning etc. I felt that the course was a good mix of depth and breadth. The assignments were on the easier side, but definitely not trivial. There was also a miniproject where we built multiple model pruning algorithms to retain model performance, while reducing parameters by more than 90% in a CNN model. One of the assignments also required us to do Hyperparameter Tuning on the GPT-2 model implemented on top of Andrej Karpathy’s LLM.C project. I went for the Hyperband approach, which required writing significant infrastucture code to checkpoint and resume training of a large number of configurations. Though my method didn’t yield the best performance in the class (because I didn’t consider the right LR thresholds :-‘) ), it was a super learning experience!
Special Topics: Data Privacy, Memorization and Copyright in GenAI (10799)
This was a totally unplanned course that I just happened to know about from a random poster in the ML Department building. It was taught by Pratyush Maini who is a PhD student at MLD CMU, advised by Zico Kolter and Zachary Lipton. I had seen some of Pratyush’s works before coming to CMU, and was very excited by the prospect of this course. And I wasn’t disappointed. The course was gamified to a large extents, dividing the class into red and blue teams, and make them tackle core issues related to GenAI memorization and copyrighting. The course began with a lecture comparing GenAI with Printing Press, and how both of them shared a lot of issues while also promising a significant positive change in the world. Before coming to CMU, I was always interested in the security aspects of ML models, more so about these LLM’s and Diffusion Models. And this course was exactly what I was looking for. We studied about Differential Privacy, data memorization in ML models and mitigation them, unlearning techniques, watermarking, etc. We also got to interact with leading legal experts represeting OpenAI in cases related to copyright infringements by ML models, and understand their perspective on the entire situation. The class was small, with less than 10 students which meant that we got to interact with each other and the instructor far more than usual, and the discussions were awesome!
The assignments were also super-cool! Firstly, we had to do red-teaming and blue-teaming on a popular Diffusion Model to generate Pokemon characters (that were copyrighted) and prevent their generation, respectively. It was a great learning exercise, and showed us that this problem had a lot more nuances than we first thought. The second assignment was to implement a watermarking scheme from scratch in a Diffusion Model, and other teams would try and attack the watermark. Again, a great hands-on exercise. Since this was the first offering of the course, the material and the assignments were not very polished, but I personally didn’t have much of a problem with that, given that it led to even more discussion in the classroom. Here is the course website: 10-799 CMU.
Internship Search
Another very important aspect of this first semester was the Internship hunt for Summer 2025. It was especially important as I don’t have plans to do PhD after MS, and thus securing a good internship is kinda important. I was looking for ML-aligned roles for my internship, as I already had done a SWE internship at Rubrik previously. My main aim was to do an MLE/ML-SDE kind-of internship this summer and see how I like it. This will help me make a good choice for my full-time role. Now the way recruitment worked during my undergrad was that all of it was handled on-campus, where companies would take coding tests for all students together, and then conduct interviews. Then a matching would be done between the preferred employers for each students, and the students shortlisted by the employers. This was so much easier as we didn’t have to connect with folks on LinkedIn to ask for referrals, keep searching for openings and apply to 100’s of jobs.
But, in the US, the recruitment process was almost entirely driven by the individual. The university name definitely helped. There was an on-campus career fair, but I wouldn’t say that was too useful. I got a couple of interviews from startups, but for big companies, fairs don’t really help. I started applying selectively to only MLE roles in the beginning (around August), but seeing that I was barely getting any OA’s or interviews, I started applying to QR and SWE roles in Trading firms. One thing which was by-far the most useful was seeking referrals from seniors of CMU/IITG or even approaching folks on LinkedIn and asking for referrals. Almost all of the interviews I got were from the companies I applied with a referral. I know this can be tiresome, but trust me this really helps. So start early, approach people on LinkedIn or from your alumni groups (most of them are very kind to help out) and do Leetcode! If you haven’t done Data Structures and Algorithms before, make sure you spend time on doing Blind 75 or Neetcode 150 on Leetcode. After multiple grueling interviews at big tech firms, trading firms and a Bay area-startup, I will be interning at Apple (Cupertino) in Summer 2025.
Research Activities
Extra Activities and Fun
There wasn’t a whole lot of extracurriculars I did here, especially compared to my involvement in clubs during my undergrad. However, this was mostly part of the plan. I had done enough of extra-curricular and leadership activities during my undergrad, and since I am paying quite a steep fee for attending CMU, I want to make as much of it academically and technically as possible. That being said, I was part of the CMU explorers group, since I’ve always been into hiking and going to places. But the CMUX group is more focused on climbing activites, which I don’t have the physical build or general inclination for. I made a trip to New York City to meet a friend from IITG, who was in the US on a work trip for his company. If you asked the sixth-grader version of myself, he would’ve never thought that he’d get to visit NYC someday. Sitting on the Brooklyn Bridge at sunset, I was really grateful for everything that happened in my life that led to this. I am lucky, and I intend to exploit that to the fullest! We also tried ice-skating in Central Park, and after sucking at it for the first hour, I started having real fun later. I may continue doing that here in Pittsburgh at Schenley Park, but I do have commitment issues to things except relationships :)
Plans for the next semester
Enjoy Reading This Article?
Here are some more articles you might like to read next: