The logic of domains: where do they come from and what do they do?
Part III of the series. The work of David Ribes and colleagues which insights into how domains are organized and how the term has its roots in computing and policy circles.
Over the last few years, David Ribes and his colleagues at the University of Washington have produced some of the best and searching studies of the relationship between “data science” and the “domains” (see the list at the end of this post). The methods in these papers range from historical [1], autobiographical and somewhat ethnographic [2, 3], to discourse analysis [4]. All are about academic data-driven science, often in the realm of the physical sciences (rather than, say, academic data-driven biological science or corporate data science).
The top-line finding that gets most cited is about “prospecting” [4]: that the logic of data science leads to a kind of hunger to stretch the data scientific method to as many fields as possible. I will get to the “prospecting” finding in a subsequent post along with [5] which is a rebuttal to this finding.
But even before we get to “prospecting,” there is a wealth of observations in the papers about the history of the term “domain,” its involvement with the development of software and AI, and how it keeps manifesting itself through the long 20th century.
Section 1 will describe what a domain feels like for a computer scientist or an eponymous domain expert.
Section 2 will describe the history of domains from its use in early AI research, then in software engineering, and then as a term to organize the computational sciences.
1. What is a domain?
“How I learned what a domain was” [2] is, in my view, the best introduction to the idea of domains. Ribes starts off by recounting his fieldwork in 2002 into the NSF-funded GEON project which aimed to develop a “cyberinfrastructure for the geosciences.” In the early 2000s, the geosciences, as with many other sciences, faced the data deluge. All of the data that they could collect could now be digitized and with the capacities of computers increasing massively, they could bring new mathematical and statistical techniques to bear on this massive trove of digital data. But this data needed to be stored, processed and shared between the varieties of geoscientists who also had to be trained into the techniques needed to mine and use this data (from statistics to advanced programming and visualization).
GEON, then, was a project that involved computer scientists and the various varieties of geoscientists (geologists, paleobotanists, geophysicists, etc.), all of them coming together to create a computing infrastructure (hence “cyberinfrastructure) that would allow this data to be used and shared in as frictionless a way as possible.
It is here, in this setting, in his very first meeting, that Ribes first heard the invocation of the term “domain” and as an ethnographer, realized that this term could really help him understand the social dynamics of his research site.
How was the term “domain” used? First, Ribes thought that it was just a term to group different kinds of expertise. But once he asked a computer scientist what his domain was and received a blank stare in return1, he realized that the term was an index for certain kinds of expertise but not others.
Thus,
To cut to the chase, most often calling something a domain, or someone a ‘domain scientist’ or ‘domain expert’ is something that computer, information or data scientists call some other group.
There are two things to unpack in this definition. First, that “domain” is a category that goes outwards from certain technological fields to others. It is only computer scientists or programmers or information scientists or data scientists who call other experts they work with as “domain experts.” It only refers to “some form of worldly expertise or specialization other than data, information or computer science” (my emphasis).
Second, the computer scientists themselves are “domain independent.” While they are not the most comfortable with this label (it seems hubristic and impractical), they are very serious that their job is to build a set of techniques that can be ported from domain to domain. One of the goals of computer scientists or programmers is to make the techniques relevant for domain X also relevant to domain Y.
It’s worth nothing something else here. To be domain independent is not just about achieving a level of abstraction in your analysis. For instance, if you look at the websites of the American Sociological Association or the American Anthropological Association, you will see that they have many sections: religion, culture, science, technology, social movements. One could argue that these are “domains” to which sociologists or anthropologists apply their principles and theory to, although this is not a term that they use at all.
But the computer scientists have a fundamentally different relation to their “domains” than the sociologists or anthropologists. According to Ribes, the whole point of characterizing something as a “domain,” alongside a set of other domains or subdomains, was to then build software tools that would allow for data and code and concepts to travel across the boundaries of these domains. The domain and its associated experts are “downstream benefactors” of these tools. This is simply not true of sociologists or anthropologists whose generalizable or domain-nonspecific concepts that travel across different domains do not have any explicit downstream benefactors beyond other sociologists or anthropologists.
As Ribes puts it, “Domain differences are articulated and captured for the purpose of developing tools that intermediate those differences.” In other words, a boundary is being created by computer scientists to either demarcate a domain (geosciences) or to separate two domains from each other (geosciences from brain imaging). But the goal of the boundary is to create techniques that can work in either domain or to find a way to get a technique (say, a protocol for storing data) that was used in geosciences to be adapted for brain imaging. Ribes calls this “boundary work for its crossing.”
There is another interesting and potentially significant characteristic of domains: they are recursive and fractal [3]. This is best illustrated using one of Ribes’ examples:
For instance, GEON was “cyberinfrastructure for the geosciences,” but the term geoscience spanned broader than any university disciplinary department, with GEON tackling fields as diverse as geology, seismology, or paleobotany. In parlance, each of these was referred to as a domain as well and, as such, demonstrated undesirable difficulties in collaborating or sharing data with one another. The term domain can refer to a broad swath, top-level category, and its nested constituents. [3, p. 15]
So, not only is “geosciences” a domain for computer scientists seeking to build an online infrastructure for geoscientists, some of the subfields of the geosciences such as “geology, seismology, or paleobotany” are domains as well.
In practice, this means that the computer scientists building software tools for geoscientists are seeking to connect many different types of geoscientists who were presumably also having trouble communicating with each other (meaning, they did not share techniques or data with each other). On the other hand, for a computer scientist who is only building infrastructure for geology might not consider paleobotany as her domain.
In fact, Ribes points out that “geosciences” itself is a term that seems to have been made up by the computer scientists because it does not index to, and indeed, is “broader than any university disciplinary department.” Domains, in other words, are constructed by computer scientists to solve problems that they (and downstream others) deem serious enough to need solving.
2. Where do domains come from?
So far, we have learned that the term “domain” is only used by computer scientists and other information specialists and that it is fractal. But why do we have this terminology in the first place?
In “The Logic of Domains,” Ribes, Andrew Hoffman, Steven Slota, and Geoffrey Bowker highlight three different “figurations” in which the term domain played a role. By “figurations,” they mean something like a “style of reasoning,” a way of thinking promulgated by a specific groups of experts, in relationship to particular technologies and institutions, trying to solve a particular problem. The three sites of the use of “domain” are: early AI, software engineering, and science policy.
The authors go to great lengths to tell us that they are not telling the history of the term “domain” but I’m going to read their paper as a history anyway.
The first figuration where “domain” is an important component comes from the incipient field of Artificial Intelligence in its first few decades. There were two paradigms of AI in those first three decades: general problem-solver type systems based on heuristic search and specific “expert systems” based on knowledge engineering. In both these paradigms, AI builders made a distinction between a universal reasoning component that would be part of all systems and the specific knowledge components that AI systems needed to actually work in its specific context (i.e., “domain”). E.g., the idea was that a problem solving system to make medical diagnostic decisions would use its universal reasoning component to reason generally but to make actual diagnostic claims, it would also draw on a more specific body of knowledge—i.e., “domain knowledge—about medicine (about symptoms, medicines, tests, and so on).
Reconstructing the knowledge representation of the domain was an especially important activity for expert systems researchers, more so than researchers in heuristic search.2 Expert systems researchers both invented—and debated—a variety of techniques, which they called knowledge elicitation or knowledge engineering, to extract the domain knowledge from a particular site. This ranged from interviewing experts to conducting walkthroughs and interruption studies.
The position of this boundary, between domain-independent mechanisms and domain-specific knowledge, and which component was more important for producing general-level intelligence (heuristic search researchers thought universal reasoning was more important while expert systems researchers thought it was domain knowledge), these were all objects of much debate in the first few decades of AI.
However, it was only with the take-up of expert systems research that there was an explicit discussion of the possibilities and limits of knowledge engineering. The general consensus is that expert systems “failed” but a better interpretation is that, in an era of PCs and mass-produced software, they just became ubiquitous and stopped being called “expert systems” and became regular software programs that people could draw on to do specific tasks (i.e., where they actually failed was in living up to the web of hype that had been spun around them that they could function just as well as human experts).
As a result, the techniques of knowledge engineering moved into the software design process, the second figuration, where they gained new meanings and valences. Software engineers were concerned with building software for different types of work; their goal was not to build intelligent programs that could work across contexts (like the AI researchers) but allow software components to be reused across contexts. Software engineers thus focused more on "activities" in a domain than they did on "knowledge." Just like AI researchers, they utilized a wide variety of methods to render a domain legible to software intervention which they labeled "domain analysis." Domain analysis is not just done at the point of production of software but instead at every point of software deployment, as software companies use a variety of techniques, including managing client expectations, to make their software components usable in multiple contexts.
From here, the term “domain” percolated into its third figuration in the late 1990s and 2000s: in science policy and the discourse around the organization of science. These were unprecedented times: computer power had exploded, “data”—natural and artificial—was everywhere, and there was a huge interest in building infrastructures for the particular sciences (this is the origin of Ribes’ first fieldwork site dedicated to building “cyberinfrastructure for the geosciences”).
But what was the place of the infrastructure builders and maintainers—who tended to be programmers and computer scientists—in these data-driven sciences? Previously, the distinction that was often applied was the one between the “pure” and “applied” computing sciences. But policy-makers (which is to say elite scientists and computer scientists and others running organizations like the NSF) argued that it would be best if computing researchers worked with other established experts in "application domains" as "intellectual equals." The best computing research then was something that was undertaken in close collaboration with a domain. This way, it would both benefit concrete stakeholders but also produce a body of domain-independent knowledge and techniques that could be applied to other domains.
Unlike the case of AI and software engineering, where the participants had vigorously debated the boundary between domain knowledge and domain-independent technique, the shift to domain-based policy talk was mostly unreflective. Yet, funding agencies' adoption of the logic of domains made it a fertile ground for "technology development efforts that serve to facilitate the analysis, representation and intermediation of domains."
3. Discussion
So what does this history tell us in terms of the relationship between domain experts and data/computing experts?
First, it should be clear that the logic of domains is not the same thing as the sciences searching for generalizable mechanisms. As I mentioned in Section 1, one can think of many of the social sciences as seeking generalizable mechanisms that apply to specific contexts. So sociology is the study of social mechanisms that might be used to explain the prevalence of religious beliefs or the success of a social movement. But what makes the logic of domains different is that computer scientists invent domains so that they can then build software tools that would allow for data and code and concepts to travel across the boundaries of these domains.
This framework means that computer scientists or programmers can have multiple, conflicting, and simultaneous attitudes towards a domain. A domain expert can be an equal collaborator, an unreflecting user, a client, as well as the beneficiary of universal techniques.
This also explains to some extent why the term “domain” is used in a recursive fashion. So the umbrella term “biology” can be cast as a domain, but so can the much more specific field of “genomic epidemiology.”. A data-driven molecular biologist, not native to the field of “genomic epidemiology,” who is interested in building software tools for that field, is much more likely to call it a domain and a genomic epidemiologist a domain expert. But that same person might want to be called a “biologist” by his fellow molecular biologists. When a speaker uses the term "domain,” it signifies his relationship to the particular traditions or contexts they work in and the task they are engaged in.
Last, but not the least, the distinction between the domain nonspecific and domain specific expert has changed with time. As Ribes writes in [3],
During my fieldwork in the 2000s, the category “domain scientist” was quite definitive: in my study of GEON, saying domain scientist was equivalent to saying geoscientist, and everyone else was a computer scientist or technologist (except me—more on this below). Today the distinction is not nearly so stark. Many domain scientists identify as being data scientists: a data scientist may be a geoscientist.
This seems important to me and raises other questions. If a data scientist is just a geoscientist who works with high-volume data and writes programs to parse it, what work is the term “data scientist” doing anyway? But Ribes argues that “the term domain still does work, perhaps more work, by defining or constituting a terrain between or beyond individual domains as the ultimate target for data science.”
This last point sets up the “prospecting” paper [4] very nicely: “prospecting” means reconfiguring the structure of domains and discovering and inventing problems within and between domains that can be solved by programmers and computer scientists. We will get into this more in a subsequent post (along with a rebuttal).
Papers discussed or mentioned in this post:
[1] Ribes, David, Andrew S. Hoffman, Steven C. Slota, and Geoffrey C. Bowker. "The logic of domains." Social studies of science 49, no. 3 (2019): 281-309.
[2] Ribes, David. "How I learned what a domain was." Proceedings of the ACM on Human-Computer Interaction 3, no. CSCW (2019): 1-12.
[3] Ribes, David. "STS, meet data science, once again." Science, Technology, & Human Values 44, no. 3 (2019): 514-539.
[4] Slota, Stephen C., Andrew S. Hoffman, David Ribes, and Geoffrey C. Bowker. "Prospecting (in) the data sciences." Big Data & Society 7, no. 1 (2020): 2053951720906849.
[5] Tanweer, Anissa, and James Steinhoff. "Academic data science: Transdisciplinary and extradisciplinary visions." Social Studies of Science 54, no. 1 (2024): 133-160.
For the computer scientist, “domain” was an index for other experts he worked with on the GEON project, although when pressed, he admitted that if topics inside computer science could be considered “domains,” then his domain was “knowledge representation.”
Ed Feigenbaum, a pioneer of expert systems research, has a famous quote where he argues that “the problem-solving power exhibited by an intelligent agent’s performance is primarily the consequence of its knowledge base, and only secondarily, a consequence of the inference method employed.”