James Green: Research is being challenged by access to High-Performance Computing, data sets, and new technologies

James Green is a distinguished professor of the Department of Systems and Computer Engineering at Carleton University in Ottawa, Canada. Part of his research focus is on Bioinformatics, proteomics, and prediction of protein structure, function, interaction, and post-translational modification, among other areas. Computing for Humanity (CFH) had the great pleasure of interviewing him as part of our series with researchers that are using our High-Performance Computing resources.

His passion as a Machine Learning researcher looking for answers to new challenges is contagious.  Much of his research in interdisciplinary and he has learned that the success of a project hinges on finding the right collaborator that is willing to stick their neck out and work with somebody with complementary expertise. This thought that everyone is an endless learner in the lab is part of his success as a mentor.

We talked about his career, challenges, and projects that would be not possible but for Computing for Humanity. He offers stellar advice for researchers, and he is a great example of hard work and determination for anyone who wants to be more successful in his/her research career.

Which moment of your life did you decide to become a researcher?

In my undergraduate studies, I took co-op and tried a variety of jobs. I found that the research-focused jobs were the most exciting and rewarding. In one job, I was trying to model oil spills floating in the ocean. And that was fantastic, this idea that I could use my software skills to advance research on a practical and important problem that really inspired me to kind of develop those research skills and software skills.

What is your proudest moment as a researcher?

I think I've entered the point in my career where I am most proud of my students’ accomplishments. When one of my graduate students makes a discovery, publishes a paper, earns an award or scholarship, that gives me enormous satisfaction.

I'm proud of the work that we do in the lab and my job is as much as a researcher as a mentor; so I try to foster research excellence in the lab.

What is the most significant challenge that research scientists face today?

Now I'm working in the area of machine learning for biomedical informatics so one of the challenges that we face is access to data. It’s a complex process to start collecting clinical data from patients. You have to build meaningful collaborations with not only medical doctors but also nursing staff and clinical engineers in order to get valuable data. When considering both research ethics approval and Health Canada approval to test a new medical device, and then start recruiting patients, it takes about two to three years to get a complete data set. Only once I have the data can I start analyzing it, extracting value from it, getting publications from it. It's a little hard to get the funding for the project upfront to enable this type of blue-sky research. The research papers will come, three years later.

In working in machine learning, there is all this hype and excitement over emerging technologies, such as large language models. It's difficult for me to keep up with what's real, where's the state-of-the-art actually at? How good is it? Is it safely applicable to actual problems yet? Can we trust it?

When I speak with clinicians, they have seen all these great claims that you know, ChatGPT can do anything and now they want to use it to cure cancer. Well, that's far too large a problem, right?

The large language models aren't actually at the stage where we can necessarily trust what they generate. So again, separating the hype from the reality is a challenge and I guess that's really the responsibility of people such as myself working in the field.

And because technology changes every day, how do you stay up to date with emerging scientific research trends?

No one can read all the new papers every day. Luckily, I spend a lot of my time interacting with very bright graduate students, each one of them has their niche of expertise and they do a lot of reading helping me discover the most exciting new papers. In addition to the research literature, I also try to get a sense for how emerging technologies are being portrayed to non-experts in the media.

How do you plan and prioritize your research projects?

I am looking for opportunities to collect new and unique data tailored to research problems where there's opportunities for real impact.

Data that enables real machine learning at a scale. Often, we're trying to predict things that are very rare such as rare diseases or rare conditions. That's a much more difficult thing to tackle with machine learning.

If you just apply a machine learning algorithm to a data set where only 1% of the people have the disease, then the machine learning algorithm will very quickly provide you with a model that's almost perfect: It simply says that everybody's healthy.

And so, its performance metrics are almost always right. And of course, it's perfectly useless, right? So, those types of problems are interesting, trying to force the model to do the harder problem of finding the “needle in the haystack”.

When was the first moment did you heard about our charity, Computing for Humanity?

Roy Chartier, the founder of your organization, reached out to me back in 2020. He described at the time your mission was to take donated high performance computing equipment and find a host that's willing to provide power, network, and space, and then connect with researchers looking for high-performance computing capacity.

It was at that point where our research interests overlapped, when our own lab had also pivoted to focusing on COVID research.

That's when we did work with your organization for several months to leverage the compute resources that you made available to us.

Without the help of our charity, which projects would not become reality?

I'll say the principal project that benefited from the compute resources was the exploration of all possible protein interactions between SARS-CoV-2 and human proteins. We were predicting hundreds of thousands of possible inter-species protein combinations and such an analysis would simply not be possible without high performance computing. We do have access to some high-performance computing through the national consortia, however, their planning time frame tends to be a year in advance if you want more than the basic allocation, which is often insufficient. If you want a special allocation, then you have to apply a year in advance. I didn't know COVID was coming a year in advance...

So, to be able to turn to Computing for Humanity, which could very rapidly make significant resources available allowed us to respond very quickly to an emerging challenge.

The results of this research where Computing for Humanity supported with High Performance Computing resources are described in the paper “Multi-schema computational prediction of the comprehensive SARS-CoV-2 vs. human interactome” here

What are your long-term career goals?

Keep having fun doing interesting and impactful research and mentoring. I've got a lab full of really great and enthusiastic graduate students that I'm happy to work with. I'm working with a number of clinical partners across biomedical informatics and I'm happy to develop machine learning to try to solve those problems and achieve impact.

How is your work with researchers with different backgrounds?

What I find important for successful interdisciplinary collaboration is patience and humility.

The trick is to find the right collaborator that's willing to stick their neck out and work with somebody with complementary expertise. One must set their ego aside and admit that they have very little expertise in their collaborator’s domain.

Over time, I teach them a little bit about machine learning: What is possible, what approaches might be relevant, what kind of training and testing data are required, how should we measure success in the problem? Conversely, they take the time to teach me about the clinical application area. What is the problem that they want to be solved? How do we achieve impact? How should we measure clinical impact, not just getting a paper?

Have you experienced any delay when you connect with our network?

Not that I recall.

Would you like to tell some words for researchers that would like to access computing time with CFH?

We enjoyed working with your organization because the people that we were interfacing with were equally enthusiastic about research and about solving these important problems; it wasn't just a technician doing their job. These are passionate volunteers and experts who shared our excitement for discovery and were happy to get us going, whatever it took so. That was a positive experience.

Previous
Previous

Saptarshi Purkayastha, Ph.D. and Robert Quick from Indiana University talk about the digital divide and AI in a post-Covid context

Next
Next

NCSA and Computing for Humanity reaffirm collaboration in a new era of scientific exploration