Spring 2024 Colloquia
Unless otherwise noted, Spring 2024 colloquia will be held on Thursdays at 12:45 pm in Stanley Thomas 316. All colloquia will be available for in-person attendance as well as remote attendance via Zoom. Current Tulane faculty, staff, and students are encouraged to attend in person. Zoom details will be provided via the announcement listserv, or you may email email@example.com to request the corresponding link. If you would like to receive notifications about upcoming seminars, you can subscribe to the announcement listserv.
Nanoinformatics: Rational Design of Biocompatible Nanomaterials by Nanostructure Annotation, Machine Learning and Data Platform Development
Hao Zhu | Tulane University School of Medicine
Abstract: The use of nanomaterials has grown substantially over the past decade. Traditional discovery of biocompatible nanomaterials is expensive and time-consuming. Computational modeling methods thus are highly demanded in designing new nanomaterials. Here, we report several novel nanoinformatics techniques that build large virtual nanomaterial libraries to investigate their biological activities/properties and guide the design of new nanomaterials. The key to these approaches is to simulate and annotate complex nanostructures. Then, we can quantify nanostructures by calculating various nanodescriptors, and develop relevant predictive models. Based on the prediction of the resulting models, we can virtually tune the target activities/properties of new nanomaterials to the desired values. All the curated nanomaterial data, modeling approaches and resulted models are being shared publicly via an informatics web portal. Therefore, the new nanoinformatics technique, which was developed based on machine learning and data science, paves the path for a new generation of nanomodeling and can be easily applied to designing biocompatible nanomaterials with multiple desired bioactivities.
About the Speaker: Dr. Zhu is a Professor of Biomedical Informatics and Genomics at the Tulane University Medical School. His major research interest is to use cheminformatics tools to develop predictive models. All resulted models can be used to directly predict the chemical efficacy and toxicity based on the public big data and molecular structure information. His current research interests also include data-driven modeling, artificial intelligence algorithm development and computer-aided nanomedicine design. He is the Principal Investigator (PI) of several prestigious research grants (NIH R01, U01, R15, NSF, and etc) with total amount over 8 million dollars. Dr. Zhu is author/co-author of 90 peer-reviewed journal articles and 10 book chapters with over 6800 citations (H-index as 47). His research was recognized with different awards, such as Rutgers Chancellor’s Award for Outstanding Research and Creative Activity, Society of Toxicology Best Paper of the Year (two times, 2021 and 2023), National Institute of Environmental Health Sciences (NIEHS) Extramural Paper of the Month (three times, 2019, 2020 and 2022) and Drug Discovery Today top citation paper of the year (2018).
Towards User Experience Enhancements for Image Processing
Cody Licorish | Computer Science PhD Student, Tulane University
This talk will be held via ZOOM ONLY on Thursday, January 25th, at 12:00 p.m. Please note the special time for this event. Zoom details will be provided via the announcement listserv, or you may email firstname.lastname@example.org to request the corresponding link.
Abstract: Image segmentation, the process of calculating boundaries between distinct parts of images, is used in diverse fields such as art and medicine. Commonly used segmentation algorithms can suffer from artifacts, high runtime cost, and input waste. The works presented here seek to alleviate some of these problems, while using more robust algorithms to enable new interaction modes with existing techniques. Included are projects dealing with non-pixel-specific Live-wire, Graph Cuts with inset image support, and zoom scalar field generation for enhanced image navigation. Each of these projects expand the interaction capabilities of their respective existing techniques. The presented works aim to improve user experience when working with various image editing tasks, and discussion will include future directions of these projects.
Building Web Services for Geospatial and Environmental Non-Image Raster Data
Elias Ioup | Navy Research Laboratory Center for Geospatial Sciences
This talk will be held on Thursday, February 22nd, at 11:00 a.m. Please note the special time for this event.
Abstract: Geospatial and environmental data is often stored as a raster, a grid of cells with an associated value. While images are the raster products people are most familiar with, there are a number of other non-image data products stored as rasters such as elevation, weather model forecasts, observed sea temperature, and land cover classification. Web-based delivery of this data, especially for interactive Web Services, presents several challenges including extremely large data sets, varying data resolution across the coverage area, temporal variation, data interpolation, and data visualization. Solutions to these challenges will be presented in the context of software applications developed by the Naval Research Laboratory’s Center for Geospatial Sciences for use in applications such as web-based mapping systems, atmospheric and ocean analysis systems, navigation systems, and mission planners.
About the Speaker: Dr. Elias Ioup is a Computer Scientist and head of Geospatial Computing at the Naval Research Laboratory located at Stennis Space Center in Mississippi. He earned his Ph.D. from the University of New Orleans in 2011. Dr. Ioup's research interests include geospatial and environmental data management, large data processing, web services, tactical computing architectures, cloud computing architectures, and decision aids. In addition to leading a research group at NRL, Dr. Ioup provides technical direction to multiple research programs at the Office of Naval Research on topics of AI/ML and Decision Sciences. He is co-author of the book "Tile-Based Geospatial Information Systems: Principles and Practices."
A Behavioral Study on Human Decision-Making over Complex Domain
Andrea Martin | University of West Florida
Abstract: It is common for decision-makers to face choices among multiple options that have several attributes. Such choices can be influenced by a number of factors, including the context in which they are presented. For instance, in the realm of consumer preferences, the context of available products can play a significant role in shaping and even reversing preferences. Studying preferences and human decisions with AI techniques aids in better understanding and modeling the complexities of decision-making. This knowledge can be applied in a variety of settings, from marketing and advertising to policy-making and beyond. Understanding how context shapes preferences and decisions is essential for creating effective decision-making models and ultimately improving outcomes for individuals and society as a whole. This work presents a comprehensive behavioral study aimed at understanding the dynamics of human decision-making across complex domains. The study leverages the combination of AI techniques, a cognitive model and behavioral effects in decision-making to observe how the context, particularly the availability of options, influences preferences and decisions. We employed various models that incorporate a modified Recurrent Neural Network (RNN) architecture to simulate participant choice distributions. A significant finding of our research was that participants tend to pay more attention to expert recommendations, such as those offered by doctors, when making decisions. Our study provides the foundation for future research in modelling complex decision-making processes and can also be used to inform the design of decision-making systems in fields like healthcare, policy-making, and marketing.
About the Speaker: Andrea Martin works as Data Scientist and her research has focused on the interaction of AI and cognitive psychology. She earned her undergraduate degree in Electrical Engineering and Psychology from the University of New Orleans. She later pursued her PhD in computer science at Tulane University and progressed to complete her doctoral degree in Intelligent Systems and Robotics at the University of West Florida. Andrea has spent the last two years working in the AI department at Intuit, where she specializes in the customer behavior space. Outside of her work, Andrea enjoys spending time with her husband and 3 year old daughter as well as practicing her favorite hobbies including reading, walking, and yoga.
Android Analysis and Applications
Raina Samuel | Colgate University
Abstract: The Android mobile system is home to millions of apps that offer a wide range of functionalities. Users rely on Android apps in various facets of daily life, including critical, e.g., medical, settings. Generally, users trust that apps perform their stated purpose safely and accurately. However, despite the platform's efforts to maintain a safe environment, apps routinely manage to evade scrutiny. This talk will address various revealed weaknesses: lapses in device authentication schemes, deceptive practices such as apps covering their traces, as well as behavioral and descriptive inaccuracies in medical apps.
About the Speaker: Raina Samuel received her PhD in Information Systems from the New Jersey Institute of Technology. Her research interests focus on smartphone security and reliability and user privacy. Currently she is a Visiting Assistant Professor at Colgate University.
Victor Bankston | Tulane University
About the Speaker: TBA
Mergesort: How Does It work? Algorithm and Implementation Demonstration
Gabriel Silva de Oliveira | North Carolina State University
Abstract: Understanding how fundamental algorithms work is one way to gain and improve the logical thinking necessary for programming and other Computer Science subjects. It is also one of the first steps in learning how to write your own algorithms. Both skills are valuable for jobs in Computer Science and related fields. In this lecture, we will see how the Mergesort algorithm works through a recursive approach, and how it can be implemented using Python. As other programming languages, Python has built-in sorting functions. However, in order to know how to use those functions correctly, it is crucial to understand how the algorithms beneath those functions work.
About the Speaker:My name is Gabriel Silva de Oliveira, and I am a CS Ed Ph.D. candidate at North Carolina State University. My undergraduate major was not Computer Science, which made graduate courses a challenge. While being a teaching assistant, and later an instructor, I watched many undergraduate students with inadequate prior computing experience struggle just like I did. From those experiences, I decided to use my PhD to study and research Computer Science education, so I could create an environment where all students are welcome and can receive the help they need. My goal as a professor is to make all students feel empowered and included. Currently, my research focuses on finding ways to automatically identify struggling students in situations where professors do not have the time and/or resources to review each student's work, particularly in large courses. In the future, I hope to use education to aid students from all backgrounds in achieving equal opportunities.
Fall 2023 Colloquia
Droplet: A First Step in Autonomous Underwater Construction
Sam Lensgraf | Dartmouth College
Abstract: I will present Droplet, the first free-floating autonomous underwater construction system. Droplet builds mortarless interlocking cement block structures weighing up to 100Kg (75Kg in water), the heaviest structures built by a single free-floating robot. Droplet is the first construction robot to apply dynamic buoyancy adjustments to transport construction materials efficiently.
The underwater domain places unique challenges on construction robots: turbidity and currents make precise positioning challenging. To overcome these challenges, we take the perspective of co-design. We consider the robot and its building materials as a unit which works together to achieve construction. Through this perspective, we allow robust underwater assembly while limiting the complexity of any single component. Droplet uses a novel one degree-of-freedom manipulator that allows compliant, error correcting grasps of cement blocks. The cement blocks are designed to correct placement error by passively sliding together..
About the Speaker: I am a Ph.D. student at Dartmouth College where I work on autonomous underwater construction. I’m interested in applying computational techniques to solve fabrication, engineering and construction problems. I’ve developed the first autonomous underwater construction robot and developed algorithms for analyzing the stability of large systems of loosely connected building blocks. I’ve also worked on developing planning algorithms that optimize fabrication plans to speed up 3D printing on standard hardware; this work received the Best Automation Paper award at IEEE ICRA 2016. My work on underwater construction is supported by an NSF GRFP fellowship.
I graduated from Tulane in 2015 with a BS in Math and Computer Science. After undergrad, I spent several years making schedule optimization algorithms in the moving industry. I enjoy applying an engineering mindset to build robust, large-scale prototypes of novel ideas.
Similarity Measures for Geometric Graphs
Sushovan Majhi | George Washington University
Abstract: Many applications in pattern recognition represent patterns as a geometric graph. A geometric graph is a combinatorial graph, endowed with a geometry that is inherited from its embedding in a Euclidean space. Formulation of a meaningful measure of (dis-)similarity in both the combinatorial and geometric structures of two such geometric graphs is a challenging problem. The geometric graph distance (GGD) is a naturally arising similarity measure, which was developed and studied recently. Alongside some of the intriguing theoretical properties of the GGD, we discuss the NP-hardness of its computation. Due to the computational challenges, the distance measure proves an impractical choice for applications. As a computationally tractable alternative, the Graph Mover’s Distance (GMD) has very recently been proposed. This cubic-time computable distance measure demonstrates extremely promising empirical evidence in practical applications. In this talk, we further discuss the compatibility of the two measures and some relevant questions that are still open.
About the Speaker: Sushovan Majhi is currently a visiting assistant professor for the data science program at George Washington University. He obtained his Ph.D. in mathematics at Tulane University. Afterward, he spent two years as a postdoctoral researcher at the University of California, Berkeley. Sushovan's research interest lies in the mathematical foundations of data science, with a particular focus on the applications of algebraic topology and computational geometry. Sushovan also maintains a keen interest in teaching. His teaching interests span a broad spectrum of fields—including foundations of data science, statistics, mathematics, and computer science.
How Safe Is the Web? Analyzing the Robustness of Prevalent Social Engineering Defense Mechanisms via Honeypots
Bhupendra Acharya | CISPA Helmholtz Center for Information Security, Saarbrucken, Germany
This talk will be held online only on Tuesday, November 7th at 12:45 p.m. Central Time. Please note the special weekday for this event. Zoom details will be provided via the announcement listserv, or you may email email@example.com to request the corresponding link.
Abstract: Most cybersecurity attacks begin with a social engineering attack component that exploits human fallibilities. Hence, it is very important to study the prevailing defense mechanisms against such attacks. Unfortunately, not much is known about the effectiveness of these defense mechanisms. This talk attempts to fill this knowledge gap by adopting a twofold approach that conducts a holistic analysis of social engineering attacks via the deployment of novel Honeypots. In the first fold, the talk focuses on phishing attacks, which remain a predominant class of social engineering attacks despite two decades of their existence. Entities such as Google and Microsoft deploy enormous Anti-Phishing Entity systems (APEs) to enable automatic and manual visits to billions of candidate phishing websites globally. The talk presents a novel, low-cost framework named PhishPrint to evaluate several flaws of 22 companies that enable attackers to easily deploy evasive phishing sites that can blindside them. In the second fold, the talk focuses on emerging social engineering attacks and their defense mechanisms by choosing cryptocurrency scams that run rampant on social media networks such as Twitter, Instagram, Telegram, and WhatsApp. In order to evaluate such effectiveness author presents HoneyTweet a novel framework that posts baiting tweets on Twitter luring scammers. HoneyTweet further reveals the scammer’s payment profile via direct message engagement and linkage of scammers across multiple social media platforms. The talk presents multiple evaluation frameworks that can be used for continuous measurement of social engineering defense systems and aid in building defenses against any weaknesses found. .
About the Speaker: Bhupendra Acharya received his Ph.D. from the University of New Orleans (2018-2022). Currently, he is a postdoctoral researcher at CISPA Helmholtz Center for Information Security, Saarbrucken, Germany. He works with Professor Thorsten Holz and mentors several graduate and undergraduate students at SysSec Lab at CISPA. His research focus is on web and network security. In particular, his areas of interest lie in conducting hands-on security and privacy measurements related to web security crawlers, advertisement ecosystems, browser fingerprinting attacks, cryptocurrency scams, and other in-the-wild social engineering attacks. He is also interested in developing robust defenses against such attack vectors. Prior to joining as a Ph.D. student at UNO, he worked for eight years in several industries (Amazon, Microsoft, and Raima) in the areas of software design, development, and assurances.
Inclusive Design of Creative Technology
Willie Payne | The University of North Carolina at Chapel Hill
This talk will be held at the Howard-Tilton Memorial Library, Room 430 (4th Floor) on Tuesday, November 7th at 2:00 p.m. Please note the special venue, weekday, and time for this event. Zoom details will be provided via the announcement listserv, or you may email firstname.lastname@example.org to request the corresponding link.
Abstract: The entry barrier for novices to make compelling music and art with technology has never been lower. Yet, creative technologies can prevent access among individuals with diverse abilities, interests, and values, perpetuating cycles of exclusion. For example, music notation and production environments use highly visual interfaces that limit engagement among Blind and Visually Impaired (BVI) people.
To advance arts and computing education, I develop interactive systems that leverage multiple modalities (e.g., audio, tactile, visual) and machine learning (e.g., pose detection). I use an iterative, inclusive approach to research: I conduct formative studies, co-design prototypes with educators and learners, and deploy creative systems in learning environments that culminate in original art and performance. In this talk, I will discuss three recent projects that follow this approach - 1) Cyclops eye-gaze synthesizer, 2) danceON movement-based programming, and 3) Fil Laptop Orchestra (FiLOrk). I will conclude by reflecting on recurring challenges and outlining future plans to broaden participation in both the arts and computing.
About the Speaker: Dr. William (Willie) Payne is an Assistant Professor of User Experience and Design at the UNC School of Information and Library Science (SILS). Dr. Payne studies how technology can facilitate creative expression and open pathways for individuals to express themselves on their own terms. Across his research, he uses participatory methods to co-design and deploy novel systems with community partners. For example, he developed the creative coding environment danceON with STEM from Dance, and the music notation software SoundCells with blind musicians at FMDG Music School. Dr. Payne publishes in Human-Computer Interaction (HCI) venues including CHI, ASSETS, NIME, and MOCO. His work may be found at https://williepayne.com.
Interdisciplinary Project Presentations
Cristian Garces, Erika Leal, and Kun Liu | Computer Science PhD Students, Tulane University
This event will be held on Wednesday, November 8th, from 12:00 p.m. to 2:00 p.m. in Gibson Hall 400A. Please note the special weekday, time, and venue for this event. Zoom details will be provided via the announcement listserv, or you may email email@example.com to request the corresponding link.
An Improved Ground Truth for Indirect Call Prediction and Potential Security Applications
Abstract: In binary analysis, predicting the target or set of targets) of indirect call site is a challenging task to accurately perform. This is because an operand of a call instruction is not known until the program reaches that instruction (cal eax) at runtime. In this presentation, we will briefly discuss how previous traditional methods attempted to solve this challenge. In addition, we will provide context on why current state of the art machine learning approaches lack sufficient information to accurately train their models. Lastly, we will discuss how this model can assist in bolstering security measures by means of software debloating.
Leveraging Hardware Counters for Efficient Classification of Binary Packers
Abstract: The detection and classification of packers serves as a fundamental approach in the study of malware unpacking. In our research, we employ Hardware Performance Counters (HPCs) as features for our classification process. Hardware Performance Counters are integrated into a processor’s Performance Monitoring Unit. The advantages of utilizing hardware performance counters include low overhead access and obviating the need for source code. By selecting hardware features relevant to the unpacking process, we train machine learning classifiers in a supervised learning manner. In our study, we investigated the use of hardware performance counters for the classification purpose of binary packers. Our findings indicate that when configured to alleviate nondeterminism, Hardware Performance Counters possess the capability and potential to classify prevalent packers that are readily accessible.
Apply Deep Learning in Binary Indirect Call Analysis
Abstract: The analysis of binary code involves the intricate examination and understanding of machine-level instructions encoded in binary form. Traditionally, this process encompasses various techniques, including disassembly, control flow analysis, and data flow analysis, to reveal the underlying functionality, structure, and potential security vulnerabilities of a program. In the evolving landscape of technology, the increasing maturity of deep learning techniques has introduced a new dimension to binary code analysis. My presentation will primarily focus on the integration of deep learning methodologies into binary code analysis, exploring their effectiveness and applications in enhancing our understanding of binary code structures and functionalities.
Interdisciplinary Project Presentations
Xin Hu, Kiran Shrestha, and Ziyu Zhou | Computer Science PhD Students, Tulane University
This event will be held on Friday, November 10th, from 12:00 p.m. to 2:30 p.m. in Gibson Hall 400A. Please note the special weekday, time, and venue for this event. Zoom details will be provided via the announcement listserv, or you may email firstname.lastname@example.org to request the corresponding link.
Weakly-Supervised Temporal Action Localization with Multi-Modal Plateau Transformers
Abstract: Weakly-Supervised Temporal Action Localization (WS-TAL) aims to jointly localize and classify action segments in untrimmed videos with only video-level annotations. To leverage video-level annotations, most existing methods adopt the multiple-instance learning paradigm where frame-/snippet-level action predictions are first produced and then aggregated to form a video-level prediction. Although there are trials to improve snippet-level predictions by modeling temporal relationships, we argue that those implementations have not sufficiently exploited such information. In this project, we propose Multi-Modal Plateau Transformers (M2PT) for WS-TAL by simultaneously exploiting temporal relationships among snippets, complementary information across data modalities, and temporal coherence among consecutive snippets. Specifically, M2PT explores a dual-Transformer architecture for RBB and optical flow modalities, which models intra-modality temporal relationship with a self-attention mechanism and inter-modality temporal relationship with a cross-attention mechanism. To capture the temporal coherence that consecutive snippets are supposed to be assigned with the same action, M2PT deploys a Plateau model to refine the temporal localization of action segments.
Weakly-Supervised Fracture Detection in X-ray Material Tomography
Abstract: Automatic fracture detection is an important tool, given its ability to identify fracture regions and monitor fracture changes quickly and accurately. The challenge, however, lies in the lack of detailed fracture annotations, as pixel-level annotation of high-resolution images is time consuming and expensive, especially with complex fracture textures. This problem underscores the importance of leveraging existing models, such as pre-trained AI models, for fracture detection. We can consider crack detection as an anomaly detection problem where the crack regions are the anomalies. While PADIM has been implemented previously in an unsupervised setting for anomaly detection in MVTec dataset, its performance drops a lot on crack images without available normal or anomalous classes and due to complex crack structure. To overcome the label shortage, we plan to use Segment Anything (SAM) large model to generate pseudo masks for crack regions using Fast Fourier Transform (FFT) generated points. Using image thresholding on FFT images, we generate points lying on the crack regions, which are then used as SAM prompts to generate crack masks. The goal is to filter the embedding features for normal classes in this modified PADIM using pseudo masks to build a normal representation of crack images and thus use it to detect the cracks as an outlier.
An Interpretable Cross-Attentive Multi-modal MRI Fusion Framework for Schizophrenia Diagnosis
Abstract: Both functional and structural magnetic resonance imaging (fMRI and SMRI) are widely explored for computer-assisted diagnosis of mental disorders, such as schizophrenia. However, combining the valuable information from these two modalities is challenging due to their inherent heterogeneity. Man, existing methods fall short of fully capturing the interaction between these modalities, often resorting to a simple concatenation of latent features. In this project, we propose a novel multi-modal transformer-based fusion framework, which not only leverages intra-modal information but also delves into the inter-modal relationships between fMRI and SMRI to overcome the overfitting issue. We subsequently assess its performance in diagnosing schizophrenia and explicitly identify key biomarkers, including disease-related brain regions. These identified biomarkers are aligned with previous research findings, underscoring their significance in schizophrenia diagnosis.
Computer Aided Proofs for the VC-Dimension of Art Gallery Variants
Zhongxiu Yang | University of Texas at San Antonio
This talk will be held on Tuesday, November 14th in ST 316 at 12:45 p.m. Please note the special weekday for this event.
Abstract: Visibility problems are fundamental to computational geometry, and many versions of geometric set cover where coverage is based on visibility have been considered. In most settings, points can see “infinitely far” so long as visibility is not “blocked” by some obstacle. In many applications, this may be an unreasonable assumption. We consider a limited visibility variant of the art gallery problem where each point has a sight radius ρ. We show that the VC-dimension of a limited visibility terrain is exactly 7, and the VC-dimension of limited visibility on the boundary of a simple polygon is exactly 8. To do that, we give the lower bound construction, and we prove the upper bound by showing several key structural lemmas, and then use a computer program to show that there is no permutation of the points and the viewpoints that satisfies all of the visibility requirements. We also show the VC-dimension of the art gallery problem using half guards in various settings. Finally, we start from Knutt’s axioms to improve the upper bound of the VC-dimension of visibility of a simple polygon where points can be placed inside the polygon.
About the Speaker: Zhongxiu Yang was born in Hangzhou, China. He is a Ph.D. candidate at the University of Texas at San Antonio. He obtained his B.S. at CSU Fullerton in 2016 and M.S. at CSU East Bay in 2018. During his time at UT San Antonio, he served as an instructor for C programming, discrete math, and algorithms. He has been working in the San Antonio Geometry Algorithms (SAGA) research lab since 2018, his research interests include algorithms and computational geometry, specifically in the variants of the art gallery problem.
Video Analysis from a Spatio-Temporal Perspective
Yu-Ke Li | Tulane University
In this talk, I will briefly introduce a few topics of my past works, e.g., agent activity forecasting and trajectory analysis.
Agent activity forecasting: We aim to forecast both upcoming actions and paths of all agents in a scene based on their past activities, which can be jointly formulated by a probabilistic model over time. Learning this model is challenging because: 1) it has a large number of time-dependent variables that must scale with the forecast horizon and the number of agents; 2) distribution functions have to contain multiple modes in order to capture the spatio-temporal complexities of each agent’s activities. To address these challenges, we put forth a novel Energy-based Learning approach for Multi-Agent activity forecasting (ELMA) to estimate this complex objective via maximum log-likelihood estimation. Specifically, by sampling from a sequence of factorized marginalized multi-model distributions, ELMA generates most possible future actions efficiently. Moreover, by graph-based representations, ELMA also explicitly resolves the spatio-temporal dependencies of all agents’ activities in a single pass. Our experiments on two large-scale datasets prove that ELMA outperforms recent leading studies by an obvious margin.
Trajectory analysis: In order to predict a pedestrian’s trajectory in a crowd accurately, one has to take into account her/his underlying socio-temporal interactions with other pedestrians consistently. Unlike existing work that represents the relevant information separately, partially, or implicitly, we propose a complete representation for it to be fully and explicitly captured and analyzed. In particular, we introduce a Directed Acyclic Graph-based structure, which we term Socio-Temporal Graph (STG), to explicitly capture pair-wise socio-temporal interactions among a group of people across both space and time. Our model is built on a time-varying generative process, whose latent variables determine the structure of the STGs. We design an attention-based model named STGformer that affords an end-to-end pipeline to learn the structure of the STGs for trajectory prediction. Our solution achieves overall state-of-the-art prediction accuracy in two largescale benchmark datasets. Our analysis shows that a person’s past trajectory is critical for predicting another person’s future path. Our model learns this relationship with a strong notion of sociotemporal localities. Statistics show that utilizing this information explicitly for prediction yields a noticeable performance gain with respect to the trajectory-only approaches.
About the Speaker: Dr. Yuke obtained his joint Ph.D. from Wuhan University, China, and Telecom Paris, France, in 2018. He is currently a Research Scientist working with Prof. Carola Wenk at Tulane University.
Towards Developing Machines That Understand Multi-modal Medical Records
Edward Choi | Kim Jaechul Graduate School of AI, KAIST
This talk will be held on Wednesday, November 13th, in ST 302 at 4:30 p.m. Please note the special weekday, venue, and time for this event.
Abstract: It is a non-trivial task for humans to understand medical records and obtain meaningful insight from them, as they are large in volume, complex in structure, and include multiple modalities. When it comes to machines understanding medical records, it holds great promise as such machines can assist providers in practice, perform large-scale analysis, or even design research experiments such as clinical trials. However, this seems far from reality despite the revolutionary advance of AI technologies in the last decade, including the recent large language models. In this talk, I will first establish question answering as a means for testing machines' understanding of medical records, and describe some of the recently developed QA datasets. Then I will talk about what role I think LLMs can play in the big picture. Lastly, I will introduce how we can develop clinically fine-tuned LLMs without access to medical records, allowing biomedical AI research by a larger research community.
About the Speaker: Edward Choi is an assistant professor at Kim Jaechul Graduate School of AI, KAIST. He received his PhD in Georgia Tech, under the supervision of Prof. Jimeng Sun, focusing on interpretable deep learning methods for electronic health records. Prior to joining KAIST, he worked on developing and analyzing deep learning models for medical prediction at Google Brain and Google Health. His current research interests include general prediction framework for EHRs, multi-modal learning, medical data generation, question answering for medical records, and clinical LLMs. For more information, please see his Google Scholars page.
Towards Reinforcement Learning for Real-time and Dynamic Robotic Tasks
Josiah Hanna | University of Wisconsin - Madison
Abstract: Recent years have seen a surge of interest in reinforcement learning (RL) as a powerful method for enabling AI agents to learn how to take actions to achieve the goals set by their designers. In robotics, RL should be a natural choice for developing high performing controllers and yet a number of challenges prevent its application especially when robust behavior is critical. In this talk I will start by briefly describing my lab’s work on enabling RL in the real time and dynamic domain of robot soccer. This work will then motivate a deeper dive into recent work that addresses two of the central challenges encountered when deploying RL in such challenging domains: data efficiency and safe deployment of learned behaviors. In the first part of the talk I will introduce a new RL algorithm that increases the data efficiency of the widely used policy gradient class of reinforcement learning algorithms. The key novelty of this approach is a technique for controlling the learning agent’s data distribution to improve the accuracy of gradient estimation. In the second part of the talk I will describe recent work on predicting how well an untested learned behavior will perform without actually deploying it.
About the Speaker: Josiah Hanna is an assistant professor in the Computer Sciences Department at the University of Wisconsin -- Madison. He received his Ph.D. in the Computer Science Department at the University of Texas at Austin. Prior to attending UT Austin, he completed his B.S. in computer science and mathematics at the University of Kentucky. Before joining UW--Madison, he was a post-doc at the University of Edinburgh and also spent time at FiveAI working on autonomous driving. Josiah is a recipient of the NSF Graduate Research Fellowship and the IBM Ph.D. Fellowship. His research interests lie in artificial intelligence and machine learning, seeking to develop algorithms that allow autonomous agents to learn (efficiently) from their experience. In particular, he studies reinforcement learning and methods to make reinforcement learning more broadly applicable to real world domains.