Student Research and Internship Projects

Student Research and Internship Projects


Project 1: Information management for large-scale interdisciplinary research programs.

Large-scale interdisciplinary science programs generate and use a vast amount of data and information. Open access repositories have been developed to help projects organize, manage, and share data and information. This project aims to compare and analyze various repositories by exploring and examining the features of each repository. For this project, students will review and compare repositories, including options, features, analytics, and the creation of a prototype to determine the best option for information and data management, organization, and dissemination. 

Project 2: Serverless computing for science. 

Serverless computing is an easily scalable, cost-effective, cloud-based infrastructure that allows researchers and enterprises to adopt cloud services by focusing their time and resources on writing, deploying, and optimizing code without the burden of provisioning or managing server instances. In this project students will learn how to architect and deploy proof of concept cloud solutions to satisfy science needs of data analysis in the cloud. Students will build infrastructures and deploy services such as Amazon RedShift, and similar.

Project 3: Science Communication and Social Media

Science communication via social media plays an important role in facilitating public understanding of science at NSF major facilities. Student interns will learn about science communication messaging, science storytelling, social media strategies, platform analytics monitoring, audience engagement, etc. The social media/communication strategies expertise interns would develop will be valuable for future careers in science communication and public/or relations/advertising in other fields and industries.

Project 4: Reproducing Machine Learning Workflows

Explore reproducibility and workflows, can you reproduce Machine Learning workflows on the Chameleon cloud? Students will be learning about clouds, workflows and the development of reproducible artifacts. They will be creating Jupyter notebooks, running machine learning workflows that examine lung images, find proper or improper use of facial masks and more! The student’s work will be published in an online archive and citable by others.

Project 5: Image processing workflows: Harnessing edge to cloud cyberinfrastructure.

Modern applications often require faster response times and more stringent privacy constraints. To meet such requirements, edge computing has emerged as a useful method of computation which can complement traditional means of computing through the cloud. The goal of this project is to analyze the execution of scientific workflows, which incorporate image processing and machine learning components, when running specifically on edge-cloud infrastructures. Students will 1) look into existing publications on scientific workflows, edge computing, and computer vision at the edge, 2) recreate findings from the literature, and 3) gain a good understanding of the work being done in this space.

Project 6: Scalable cloud solution for real-time data acquisition. 

NSF Major Facilities, such as NEON, SAGE/GAGE, collect data in real time, and then the  data needs to be preprocessed. Scalable, dynamic cloud services, such as AWS Lambda, can help with automation of data acquisition and preprocessing. A student engaged in this project will be exposed to public cloud infrastructures, such as AWS, Azure, Google Cloud, and learn how public cloud services facilitate scalable and dynamic data acquisition, preprosing, storing and sharing.  

Project 7: FAIR data principles in practice.

One well-known issue within the research data community is the impedance mismatch between systems used to capture and analyze digital research data and the systems used to archive it for storage and reuse. Not only are different machines and software used for these two tasks, but also storage and reuse systems are optimized for different needs, such as cost and longevity. One promise of the “Findable, Accessible, Interoperable, Reuseable” (FAIR) data movement is to facilitate machine agents to assist humans in these endeavors to connect data and systems that capture and store data. Given the rise of assistive agents in commercial applications, such as the Microsoft research project Co-Pilot for computer code generation through controlled natural language, can these principles be applied to create “assistive research agents” that lower the technical expertise required for researchers to generate archival quality data artifacts. The student will explore how commercial equipment, tools, and software can be adapted to serve the needs of scholarship with respect to data transport and archiving.

Project 8: Data quality assurance for major facilities. 

Understanding what data has failed or been marked suspicious by quality assurance checks for instrument data is vital to determine the cause. For this project, students will learn the processes for quality assurance checks for instrument data, examine the quality assurance results, detail and provide basic statistics of the failed and suspicious data, detail the metadata of the data examined, determine the percentage of the data that has been tested, and graphically represent the results of the data tested. An understanding of file access, database knowledge, a programming language (python, java), and Jupyter would be beneficial for this project.

Project 9: Enabling cloud resources to the climate modeling community.

The purpose of this project is to simplify the use of the cloud by separating the back-end vendor-specific cyberinfrastructure from a custom front-end API to enable the climate-modeling community to easily use cloud resources by provisioning resources for them in an automated way. The primary outcome would be for the student to learn and understand how to use cloud capabilities, tie multiple capabilities together, and provide services to the user community. Another potential outcome would be the successful creation of a cloud-facing API. No prior expected knowledge is needed; the complexity of the tasks will be adapted according to their skill/experience. However, an understanding of cloud APIs and containers would be beneficial.

Project 10: Cloud Application Testing

This project would test application code across a variety of cloud-based hardware in an automated fashion to evaluate new hardware, software, and code changes. The motivation is that new hardware are rapidly evolving, and there's significant work ahead in terms of evaluating programming paradigms, compiler support of those paradigms, and the capabilities of the hardware itself. A cyberinfrastructure tool that enables us to point it at a code, run tests across several configurations, collect and summarize the results would be immensely helpful. 

This project would involve understanding how to abstract the specifics of this code away from the infrastructure to run it (focus: cyberinfrastructure code design), requirements to deploy to different environments (focus: cloud APIs), monitoring, error checking and shutdown, and an end-user focus on delivering the desired results from the CI. This would enable developers to rapidly get feedback on how their code changes are impacting performance. The expected outcome would be a successful run of this small configuration, with more configurations (like GPU systems, different compilers).