Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date:

Time: 1:00pm ET (11:00am MT)

Link to recording

Passcode: 



Attendees

AttendeesRegretsNotified

Mark Lacy







Agenda/Notes

Presentation from Adele's data science students.

Team: Arnav Boppudi, Ryan Lipps, Noah McIntire, Kaleigh O’Hara, Brendan Puglisi; Faculty mentor: Antonios Mamalakis

Title: Optimizing the ALMA Research Proposal Process with Machine Learning

Abstract: Every year, astronomers from around the world submit research proposals to the Atacama Large Millimeter Array (ALMA), the largest radio telescope array in the world. The aim of the current work is to streamline the proposal process for astronomers submitting projects to ALMA by suggesting frequency ranges that may be relevant to their research based on their proposal text. We introduce a pipeline of supervised and unsupervised machine learning models, each using various representations of the title and abstract of an incoming proposal. First, a logistic regression filters out proposed projects that are not expected to need specific technical setups. Second, if a technical setup is deemed necessary, our pipeline assigns an incoming project to one of 50 “similar project” groups, defined by topics generated from Latent Dirichlet Allocation (LDA). Third, we apply Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to mine patterns in measurements (“areas of interest”) made in previous projects, for each one of the 50 “similar project” groups. In parallel to the aforementioned topic modeling and HDBSCAN mining, we employ a Multinomial Naive Bayes classifier to predict the broad frequency range defined by the technical limitations of ALMA (frequency band) that we expect a project to make measurements in. Finally, we offer researchers a list of the mined “areas of interest” filtered by the predictions of the Multinomial Naive Bayes classifier. Ultimately, given a proposed project title and abstract, our pipeline generates several recommended “areas of interest” that one should consider measuring in. 

GPUs for Charlottesville (and for general use in Socorro)?

Meeting "Modeling of Interferometric Data workshop" - Charlottesville May 28-31