- 1980 BSE Computer Science/Systems Engineering, University of Pennsylvania (with numerous honors)
- 1981 MSE Computer Science/Artificial Intelligence, University of Pennsylvania
- 1982 MS Experimental and Developmental Cognitive Psychology, Carnegie Mellon University
- 1985 Ph.D Cognitive and Developmental Psychology, Carnegie
(supported by an IBM Graduate Fellowship in Computer Science)
Experience and Entrepreneurship
- Since Apr. 2018 Director of Research, xCures. My work focuses on clinical
problem solving in oncology, including novel bioNLP techniques,
AI-based genomic pathway analysis, and Global Cumulative Treatment
- Cancer Commons Knowledge Pipeline: Symbolic biocomputing services in support of cancer research and clinical problem solving. Current services include: controlled NL understanding service, NCI Thesaurus ontology server, "TrEx" the Treatment Explorer treatment ranking and evidence service. ( Sweetnamet al. Prototyping a precision oncology 3.0 rapid learning platform. BMC Bioinformatics (2018) 19:341)
- GCTA in PO3.0 (Global Cumulative Treatment Analysis in Precision Oncology 3.0): A highly efficient approach to clinical trials that combines n-of-1 and adaptive bayesian methods.
- Since 2000: Consulting Professor, Symbolic Systems Program,
Stanford. Current courses: Symbolic Systems 245:
Interaction Analysis, in which we focus on human learning about,
use of, and interaction with complex systems, such as computer-based
devices. Other courses taught: SSP216/BIOMEDINFO216A: Symbolic
- Efficent and Ethical Science (with xCures). [See Shrager, J, Shapiro, M, Hoos, W (in press)]
- The Scientist as User (with Paul Fuoss and Teddy Rendhal at SLAC/LCLS, the Stanford Linear Accelerator Center/Linac Coherent Light Source, and Devangi Vivrekar, Stanford class of 2018).
- SCADS2015 (with Lang Chen, Tanya Evans, and Vinod Menon, at Stanford): A new computational model of human mathematical development based on modern machine learning and systems neuroscience.
2018-present: Co-Founder and Director of Research, xCures.
xCures is Reinventing Oncology through
Virtual Trials that Continuously Learn from
All Patients, on All Treatments, All the Time.
2008-2012: Co-Founder and Chief Technology Officer, CollabRx, Inc.,
originally a startup, now a public company (NASDAQ:CLRX), engaged in
omics-based personalized biomedicine, esp. in cancer. (See the Historical Major Project section,
- 1997-2000: Director of Engineering, Afferent Systems, Inc., a startup (acquired by MDL) that developed AI-based software to support drug discovery via high throughput robotic chemistry and protodrug screening. (I was the first employee of Afferent, second only to the founder.) (See the Historical Major Project section, below; also: J Shrager (2001a) High throughput discovery: Search and interpretation on the path to new drugs. In K. Crowley, et al. (Eds.) Design for Science. Hillsdale, NJ: Lawrence Erlbaum. 325-348 [pdf].)
- 2012-2018: Director of Research, Cancer Commons. My work with
Cancer Commons focuses on clinical problem solving in oncology,
including novel bioNLP techniques, AI-based genomic pathway analysis,
and Global Cumulative Treatment Analysis. (Prior to 2016 I was
- 2007-2016: Senior Research Fellow, Commerce Net. My work with CommerceNet focused on virtualization of science, esp. in drug discovery, and technologies related to online privacy. Both CollabRx and Cancer Commons were founded with the support and guidance of CommerceNet.
- 2012-2013: Senior Computer Scientist, The AURA Project, The AI Center, SRI International
- 1999-2006: Senior Scientist, Institute for the Study of Learning and Expertise. I headed ISLE's research on computational discovery of metabolic and regulatory models by combining knowledge and data, such as microarray RNA expression data. We also worked in other scientific domains, to extend and refine an existing model of, for example, the Earth ecosystem based on ground and satellite measurements of spatio-temporal variables.
- 1985-1994 and 2003-2005: Research Scientist, Xerox Palo Alto Research Center (PARC). I developed intelligent information access and automated discovery systems, adaptive interface technologies, and computational simulations of complex systems. I headed a team that built one of the world's first intelligent web search engines, and another that applied nonlinear mathematics to cognitive problems, producing analytical solutions to fundamental problems of learning and memory. In my later (2003-5) visit I developed knowledge-based support mechanisms for collaborative analytical activity, such as that carried out by teams of scientists.
- 1976-1980: Manager and Lead Programmer, The APL Project, University of Pennsylvania. (See the Historical Major Project section, below.)
- 1992-1997: Visiting Faculty in Cognitive and Developmental Neuroscience and Functional NeuroImaging at the University of Pittsburgh and Carnegie-Mellon University. I directed research teams in the analysis combining EEG, functional MRI, and computational modeling of learning and development. We studied brain development in neonates, children, and in adults engaged in complex learning, and developed simulations of learning and development.
- 1999-2007: Visiting Scholar in Marine Molecular Phyto-Microbiology at the Carnegie Institute of Washington, Department of Plant Biology (at Stanford). Our lab studied how marine and fresh water phytoplankton and algae, such as Cyanobacteria and Chlamydomonas, adapt to environmental change, both metabolically and evolutionarily. We combined laboratory molecular biology with large-scale DNA MicroArray technology and computational models of the biochemistry of these organisms, and of the dynamics of evolution and adaptation. We developed novel AI/ML-based methods for genome assembly, and for MicroArray and biological pathway analysis.
- I have lectured and advised students in Psychology, Cognitive Science, Computer Science, and Biocomputing at Stanford, UC Santa Cruz, U.Penn, CMU, U.Oregon, U.Alaska, U.Pitt, and elsewhere on topics including bioinformatics and computational biology, computational models of learning and development, functional neuroimaging, artificial intelligence, statistics, experimental psychology, and mathematical modeling. I have participated in numerous NSF and NIH peer review panels, and reviewed for numerous journals and conferences.
Historical Major Projects
- METALS=STEAM+Logic (with Su Su): An elementary curriculum that infuses logic into all aspects of STEM, leading to early experience in algorithms and programming.
- Cancer Commons Knowledge Pipeline: Symbolic biocomputing services in support of cancer research and clinical problem solving. Current services include: controlled NL understanding service, NCI Thesaurus ontology server, "TrEx" the Treatment Explorer treatment ranking and evidence service. Thousands of lines of lisp code.
- GCTA in PO3.0 (Global Cumulative Treatment Analysis in Precision Oncology 3.0): A highly efficient approach to clinical trials that combines n-of-1 and adaptive bayesian methods. Multi-agent simulation of science, written primarily in Lisp.
- CTRACS (the Cancer Treatment Rationale Archiving and
Communication System) -- This is a
Cancer Commons project,
funded by DARPA among others. Its overall goal is to provide AI-based
decision support services to Molecular Tumor Boards; that is, to
oncologist engaged in clinical reasoning and problem-solving about
targeted therapies for advanced cancer patients. The overall CTRACS
has many technologies and subprojects. The ones that I lead are: (a)
encoding case data from oncologists' written summaries (essentially an
NLP project), and (b) simulation of
Global Cumulative Treatment Analysis.
The Eliza Genealogy Project -- Seeking to reconstruct and preserve
the origins and history of Eliza and its (her?) descendants.
- CollabRx projects: At CollabRx, Inc., where I was co-founder and CTO, I
managed and participated in the development of numerous complex
software products. Most of these never were never deployed to real
users, but many were. I did some, though not nearly all, of the
hands-on programming, and managed a team of terrific engineers who did
most of the work.
Other CollabRx-related projects (that should someday be filled out with descriptions):
- MMMP Drug Ranker: See Mocellin, S., Shrager, J. et al. (2010) Targeted Therapy Database (TTD): a model to match patient's molecular profile with current knowledge on cancer biology. PLoS ONE, 5(8): e11965. doi:10.1371/journal.pone.0011965. [open access link]
- Melanoma Molecular Disease Model: [See PLoS ONE Pub.]
- Targeted Therapy Finder
- Virtual BioTech
- CACHE and the Bayes Community model: Scientists constantly layer inferences upon inferences; indeed, there is often very little true ground data, but a great many layers of interpretation and inference. CACHE (the Collaborative Analysis of Computing Hypotheses Environment) supports multi-staged analysis by community of scientists (who may be widely separated in space and time) through a novel hybrid of decision matrices, Bayes networks, and semantic nets. In CACHE, one analyst's hypotheses can be another analyst's evidence, creating a provenance network that is both locally manipulated yet globally coherent. Complex problems are naturally broken down by the CACHE user community into interacting sub-problems, and the same problem may be analyzed in different ways by different members of the community. In this method, called a 'Bayes Community', individual analysts are explicitly working on small part of a larger problem, while, at the same time, implicitly building an analysis of the problem as a whole. CACHE builds the provenance relations automatically, requiring no additional effort from the analysts. Built on KnowOS technology, CACHE's smart back end makes results available when and where they are relevant through an integration of knowledge bases and natural language understanding technologies. Each piece of evidence is processed by PARC's XLE language technologies, connected via wordnet and a knowledge backbone (semantic net) containing both general and specific ontologies. Evidence thus connected can be semantically searched and ranked with respect to its relevance to ongoing problem-solving. Since all of the evidence and hypotheses are linked into the semantic network, the evidence and hypotheses forming the matrix under development may be used to contextualize the search. (CACHE/Bayes Community white papers are available upon request. They are in process of clearance by PARC and various government agencies for publication.)
- KnowOS: Whereas Unix and Windows are operating systems for ASCII strings and files, and there are operating systems for tables (such as Oracle), a Knowledge Operating System (or "KnowOS") is an operating systems for "knowledge", that is: graphs of objects! KnowOS is a generalization of BioBike (see below). [ILC2007 paper (pdf)] [www.knowos.org].
- BioBike (formerly BioLingua): I conceived of the
idea, and lead the team of engineers and biologists, in the
development of a web-based programmable biological knowledge-base,
called BioBike, which went to live public alpha testing in August
2003. BioBike is built on top of BioLisp, which is, in turn, built on
top of Lisp. It is a complete knowledge-based computational biology
resource, enabling biologists to manipulate biological knowledge and
data, and providing a platform for computer scientists working on
methods in computational biology to develop their methods and deploy
them immediately to working biologists. If you'd like to experiment
with the BioBike Multi-Cyano programmable knowledge base, drop me an
email and I'll be happy to give you an account. [www.biobike.org].
- Between 1999 and about 2004 I was the project lead and
principal engineer for a number of large Computational Biology
projects at The Carnegie Institute of Washington, Department of Plant
Biology, and the Inst. for the Study of Learning and Expertise. These
are funded variously by the Carnegie, NSF, NASA, and others. They
- Chlamydomonas cDNA sequencing and microarray project: We have identified over 9000 of Chlamydomonas' 12,000-15,000 genes (on 17 chromosomes). I designed and implemented all of the algorithms, protocols, and robot programs for this work, including a novel gene assembly protocol, complete 'semantic' annotation analysis, and pathway modeling. Z-D Zhang, C-W Chiung, and I designed and built the first set of arrays.
- Semi-autonomous qualitative and quantitative reasoning and discovery in regulatory and metabolic networks (esp. using microarray data). Here is an overview of our work in this area.
- Autonomous "cyclostatic" methods for the analysis of the dynamic effects of environmental stress (esp. light and nutrient stressors) on algae and Cyanobacteria. These methods include simulation of diurnal light regimes at any latitude, automated data collection and analysis, and integration of data gathered from long-term time-course experiments with microarray regulatory data describing the expression response of the stressed organisms.
- Biological natural language processing (with James Evans, Stanford). We are analyzing a database of papers relating to plant molecular biology to produce a network analysis of relationships between laboratories, biological concepts, techniques, and organisms.
- Formalization of biological process semantics. Laura McIntosh and I have been developing a formal semantics for principal biological concepts, such as regulation and inhibition, and applying these semantics to meta-analysis of biological knowledge bases, esp. Kegg, BioCyc, the Transpath family of knowledge bases.
- A multi-organism pathway knowledge base (based upon SRI's EcoCyc knowledge base) for photosynthetic bacteria, and novel reasoning and discovery methods that integrate pathway knowledge with microarray expression data.
- Novel method for combining nucleotide or amino acid sequence data with organismic/phylogenetic information to form phylogenetic trees, including reprogramming the Phylip PROTPARS program, and working out the underlying statistical basis for the method. (MS in preparation.)
- Novel statistical methodologies for analysis of Cyanobacteria environmental stress microarray experiments.
- BioLisp: I co-founded, and was webmaster and editor of BioLisp.org, a site dedicated to intelligent applications in BioComputing. Between about 1999 and 2003 I developed a series of tools for biological Artificial Intelligence (in Lisp), and an associated curriculum, collectively called "Introduction to Intelligent Computational Biology. These eventually merged together to create BioBike, and the development of these tools continued in that context.
- Afferent: Between 1997 and 2000 I was the first employee and director of engineering for Afferent Systems, Inc (acquired by MDL in 2000). Afferent developed Artificial Intelligence-based software tools for combinatorial drug discovery, including: robot control, virtual chemistry, large-scale scientific database management, and analytic tools. I designed and wrote the entire database and object-model substructure for all Afferent products, including ODBC integration, and the entire analytical chemistry module of the Afferent flagship product. Our software is used by most of the world's largest pharmaceutical companies. I also did all of Afferent's user-support engineering, regularly interacting with biologists and chemists to adapt our products to their needs.
- Circa 1998/9 Pat Langley and I developed what was, as far as
I know, the first conversational adaptive interface, at the
Daimler Palo Alto Research Ctr.
- Mnemotheque: In 1999 and 2000 my sister, Monique, and I developed an explorable multimedia record of our mom's family's life in France during the holocaust. A large part of my mother's side of my family, including my grandparents, was killed by the Nazis. The system, called "Mnemotheque", includes photographs, audio, music, and digital video interviews with my mom and aunts. It uses Spreading Activation to enable the user to explore the space of these mementos through conceptual links, in a way analogous to the way that human memory links one concept to another. Mnemotheque was presented in Paris at two public forums, including one that took place at the Centre Pompidou et Musee National d'Art Moderne. I did all the engineering for the system, which was written entirely in Java 2 with JMF. My sister was the media editor and producer, and played the largest role in conceiving the project. She was assisted by a small team of students in Paris.
- The Adaptive Place Advisor: Around 1997/8 I consulted for Daimler Research in Palo Alto. Working with Pat Langley and Afsaneh Haddadi, I built an Eliza-like conversational program that knew a bunch about where you happened to be (presumably driving -- this is Daimler, after all!) It knew about restaurants and stores and gas stations and such, and would carry on a conversation (of sorts) regarding, for example, where you wanted to eat lunch. The "adaptive" in the title referred to its tendency to "tune in" to your preferences. For example, if you selected Chinese for lunch a bunch, it would tend to suggest those, if there was any nearby at the time. Our Adaptive Place Advisor way predated similar tools like Siri, Apple's voice-activated personal assistant.
- fMRI: Between 1995 and 1997, at the University of Pittsburgh Learning Research and Development Center, my colleagues and I developed one of the first integrated data-capture and analysis systems for functional neuroimaging (fMRI). I developed the statistical techniques, and built the top-level user interface and the multi-media database which combined image data, analytical results, and arbitrary notes, and which produced HTML "electronic notebooks" and reports. I also managed the early in-magnet testing of the product. The product was marketed by the company: MRI Devices, and used in laboratories world-wide.
- Around 1995, Mark Johnson and I, at CMU, developed a neural network model of how the cortex of the mammalian brain, which begins in a relatively undifferentiated state, differentiates into functional systems and subsystems with a particular organization. We introduced the concept of a "wave of growth", which has been independently discovered by laboratory neuroscientists. Our theory (and the model) became the centerpiece of an important new theory of brain development.
- Aphrodite: Between 1990 and 1995 I developed an intelligent search assistant, at Xerox PARC, for a large distributed multimedia database (what we call "the web" these days). The system, called Aphrodite, learned how to do conduct searches based upon observation of users' search activity, and could offer search guidance for new tasks.
- PCOND: Around 1985 I invented a probabilistic programming extension to Lisp called PCOND (Probabilistic CONDitional -- COND being the conditional function in Lisp). You would program exactly like using COND, but the compiler would actually expand into a complex morass of machinery that would run every branch (with optional specified priors), and record the statistics of the results returned from the PCOND. This turns out to have been an early version of probabilistic programming, which is now getting to be a big deal, with a DARPA program, even! Of course, in the 30-some years since I proposed this, folks have become way more sophisticated at how to make it work correctly and efficiently.
- IE - The Instructionless Experimenter: Between 1981 and 1985 I developed IE, the Instructionless Experimenter, one of the first autonomous discovery systems. IE conducted experiments on complex domains and formed theories about them, using a mental model reformulation mechanism called View Application.
- In 1983/84 and then again in 1997/98 Bob Siegler and I developed two important cognitive simulations of how children learn and select strategies. These heterogeneous simulations (including both symbolic and sub-symbolic mechanisms) succeeded in closely modeling children's use and discovery of arithmetic strategies.
- Wizard: Around 1980 Tim Finin and I developed the world's first Computer Wizard, called (oddly enough) "Wizard", a program that observes a user's behavior and offers assistance and advice. This work lead (indirectly) to the present-day wizards found in many computer systems. (Here is some amusing press coverage that appeared when the Wizard paper was presented at AAAI in 1982.)
- The APL Project: Between 1978 and 1981 I was lead programmer and project manager for The APL Project at the University of Pennsylvania, where we developed and supported Univac's APL interpreter, which consisted of hundreds of thousands of lines of IBM 360/370 Assembly Language. I was responsible for all aspects of the project, including writing or rewriting most of the APL interpreter, interaction with Univac business units, and support of world-wide users. I had managerial responsibility for five programmers (including students).
Early Works (just for fun)
- TSOS Mail, Bboard, and Help: Around 1978 I developed a number of user-level utilities for the Univac 90/70 series computer (under the TSOS operating system) at the University of Pennsylvania's Moore School of Electrical Engineering, including that community's first email program and one of the first community-wide electronic bulletin boards in existence (called "#BULL"). Ira Winston and I also built a user-level help system for that community which, until that time, had no such thing.
- Filer: Around 1977 I wrote "Filer" a regular expression-like meta-command processor (a sort of combination of unix sed and xargs, but with a significantly simpler interface) for the Univac TSOS operating system. Filer quickly became among the most useful and often-used program for that operating system. I still think that this is the most useful program I ever wrote, and its equivalent does not yet exist in the world of Windows, Mac, or Unix (although, as above, you can do this sort of thing in Unix, but it's clunky). Here is an imperfect but usable c++ version of filer for Unix.
- Simplex Chemical Process Optimization: In high school, around 1976, I worked with Dr. John Myers of Villanova University Dept. of Chemical Engineering to optimize a complex chemical manufacturing process, described by about 10 rather complex equations. I used the Nelder-Mead simplex optimization algorithm (which I wrote from scratch) in Fortran on a PDP-8.
- EDT: Around 1974-5 I wrote a "programmable" text editor for PDP-8 in BASIC called EDT. Although it was a line editor (as most were at that time!) you could do simple TECO-like macros, although they were pretty clunky. (TECO's macros were pretty clunky too!)
- MIMIC: Around 1973 Eric Jacobs and I, at Haverford Junior High School, developed an interpreter for a programming language of our own design, called MIMIC (the Machine Independent Mathematical Instructional Code), in Fortran IV and some assembly on an IBM 1130 computer. MIMIC was essentially a simplified version of BASIC (although we didn't know BASIC at the time, only Fortran and assembly!), and was the only interpreted programming language that I know of that ran on the popular IBM 1130 platform. The interpreter was thousands of punch cards in length, filled several of those old punch-card boxes, and had to be all loaded into the 1130 in order to use the MIMIC language. (I'm certain that this is why MIMIC didn't catch on! :-)
- LOGIX: The first serious computer program that I remember writing was called "LOGIX", around 1972. It was a simple natural language interpreter and theorem prover, written in Fortran IV on the IBM 1130. You could enter statements like: "All men are mortal" and "Socrates is a man", and LOGIX would compute the logical consequents and report them in natural language. At the time, being only 13 (in an era when newborns weren't learning to program!), I didn't know what Theorem Proving, Natural Language Processing, or AI were!
- Eliza: My first published work was a BASIC version of
Eliza that appeared in Creative Computing (now defunct) in the
July/August issue of 1977. (I actually wrote the program in 1973!)
See it here: [pdf]
The original Eliza was written by Joseph Weizenbaum in
the mid '60s in Lisp. (Technically it was written in SLIP by JW and
translated to Lisp by Bernie Cossell of BBN) Although I saw a copy of
the Lisp code much later, my translation was done without that
benefit, and was, to put it gently, "conceptual", and pretty
awful. Regardless, 1977 was the dawning of "The Age of The Personal
Computer", and few folks had (or knew) Lisp, but everyone had (and
knew!) Basic. So my version of Eliza was hugely influential --
probably the most influential thing I've ever done. Hundreds of
knock-offs appeared, some as late as 2015 (!), and it was translated
into many other programming languages. In fact, my version has even
been translated back into Lisp! Some AI researchers and engineers to
this day credit this version of Eliza and their introduction to the
"magic" of AI. Now, with the emergence of Siri, Apple's
voice-activated personal assistant, people all over the blog-o-sphere
are recalling my Eliza, and comparing it to Siri. Some have even
hosted conversations between the two. Much more information on the
history of Eliza can be found at The Eliza Genealogy Project.
For more information, send electronic mail to email@example.com