Why You Should Care That Artificial Intelligence Can Lie (Part 2)



If you haven’t already, please read Part 1 of “Why You Should Care That Artificial Intelligence Can Lie” before continuing, because I’m picking up in this post where I left off in the last one.


Again, the concern here is whether or not robots (AI systems) can lie. As I demonstrated in Part 1, there are many good reasons to believe that they can, though we’ll need to more carefully consider the phenomenon of robot “dissimulation” to determine if it is (or isn’t) structurally and meaningfully equivalent to the human phenomenon of “lying.” My argument in Part 1 was that poker playing AIs Libratus and Pluribus– and perhaps also the Facebook bots “Bob” and “Alice”– represent significant progress(?) in AI capabilities and the rapid approach of machines toward closer approximations of what we call human “consciousness.”


The potential capacity of intelligent machines to dissimulate or “lie” is worrisome, first, because that capacity is presumed by most humans to involve uniquely human (cognitive, emotional, motivational, or  “spiritual”) powers. If it were possible for a robot to lie about being a robot, for example, not only would that seriously disturb our understanding of the so-called “unique” capacities of human consciousness, but it might also pose a serious threat to human existence as we now know it. (See: my discussion of CAPTCHA systems in Part 1. Also check out Kelly Pendergrast‘s amazing essay about CAPTCHA “Who Goes There,” published in Real Life since I wrote Part 1.)


This robot-capable-of-lying-about-being-a-robot (henceforth, RoLABAR), if it came to be, would likely signal the inception of what is now commonly referred to as the “technological singularity,” a moment at which the speed of technological advances would be so rapid, and their impact so deep, that  “human affairs, as we currently know them, could not continue” (as John von Neumann famously predicted in an informed hunch later echoed by Ray Kurzweil).  RoLABAR would also likely be the catalyst for an “intelligence explosion”— anticipated by I.J. Good in 1965 but unprecedented in the history of our Universe so far– as machines that were not only intelligent, but also conscious, would presumably be capable of creating new intelligences, much like human intelligence created intelligent machines, only the AGI/RoLABAR would do it faster, more efficiently, and with access to infinitely more resources.


RoLABAR is not a current capability of extant AI (as far as we know), but there are many indications that its arrival is imminent. Before diving head-first in to that possibly-dystopian morass, though, let’s first take a step back and consider more carefully the relationship between consciousness and the capacity to lie. We’re going to take an unusual philosophical route in what follows, one that does not traverse the well-worn, traditional– and somewhat myopically focused on functionalist and psychophyical discourses– analytic Philosophy of Mind scholarship, but rather through the exotic countryside of Jean-Paul Sartre and Jacques Derrida.





Though there is (unfortunately) nothing about robots, there is a lot about lying in Jean-Paul Sartre’s Being and Nothingness. In fact, both the practice and the structure of lying are essential to Sartre’s articulation of human being and consciousness. 


Sartre’s most explicit treatment of lying is in his section on “bad faith” (mauvaise foi), a phenomenon that he initially defines as “a lie to oneself.” Objects and non-human animals are, on Sartre’s account, beings-in-themselves, entirely determined by their facticity. Their being is “in-itself” by virtue of being fully and completely “given” in the fact of the being-in-itself’s very existence. Humans, on the other hand, are beings-for-themselves, capable of transcending the givenness of their situation and freely choosing for themselves what they will be. The en-soi/pour-soi distinction sets the scene for Sartre’s later analysis of the unique ontological “metastabiliy” of human being, which (in Sartre’s formulation is always being-what-it-is in the mode of not-being-it or– in what amounts to the same thing, structurally– not-being-what-it-is in the mode of being-it.


The examples that Sartre provides of “patterns of bad faith” are several and storied– the woman on the date, the garçon de café, the sad person who “makes an appointment to meet with his sadness later,” and the awkward conversation between the “champion of sincerity” and the “homosexual”– but each case presents a strikingly identical account of the structure of bad faith as a lie to oneself. (Regrettably, Sartre’s example involving the “champion of sincerity” assumes that all homosexuals are pederasts, but I think this example still stands up as a solid structural example of bad faith even when that pernicious assumption is removed.) These cases are structurally identical inasmuch as, in each, we see a human being telling themselves the same lie, one that denies the structure of their being-for-itself as both a facticity and a transcendence. 


[Interestingly, although the antecedent of the bad faith lie can take one of two forms– either “I am entirely determined by my facticity and incapable of transcending the givenness of my situation.” or “I am a pure freedom, unencumbered by the givenness of my situation, and capable of completely transcending its limitations.”– the consequent of the bad faith lie takes only one form, on Sartre’s account, namely, “I am not responsible.”]


Sartre’s articulation of these “patterns of bad faith” echoes his claim, early on in Being and Nothingness, that “nothingness lies coiled at the heart of being like a worm.” Human being, human consciousness, and human existence are, on this account, entirely shot through with negativity, negation, and nihilation. The capacity to lie is not a bug, but a feature, of the ontological structure of a pour-soi. It involves the being-for-itself’s capacity to consider itself otherwise, to think of itself as being (potentially) something other than what it (actually) is. 


Although Sartre is primarily concerned with articulating the ontology of human being, we could easily translate his account– I would argue, without loss– into epistemological terms, to describe “mind,” “consciousness,” or (my preferred lexical domain) “intelligence.” Retranslating Sartre in this way, one can see that the capacity that distinguishes mind/consciousness/intelligence (pour-soi) from non-conscious or unintelligent objects (en-soi) involves a “doubling” capacity: the ability to recognize oneself as (a) being and then to consider oneself as being otherwise. This ability to “consider oneself otherwise” necessarily involves (imaginatively, speculatively, or practically) negating the factical givenness of one’s existence and then positing a transcendent alternative. It requires a consciousness capable of considering its being/existence both as it (really) is and as it (really) is not (but could be).


It seems to me that this is precisely what the AI systems Libratus and Pluribus have demonstrated the capacity to do. One might object that these AI systems were “coded” to lie, but one would be wrong. Libratus/Pluribus were not coded to lie; they are neural network, machine learning systems that were designed to learn how to play, and to win, at poker… which just so happens to be a game in which lying/bluffing is necessary skill to learn for successful play. 


Libratus/Pluribus were no more “coded” to lie than you and I are. They learned how to do so for (arguably) the same reasons, and by virtue of the same cognitive mechanisms, that we learn how to lie.


Before turning to Derrida, allow me a brief interlude about what distinguishes poker bluffing as a unique capability of both human and machine minds:




Because poker is fundamentally a game of incomplete information, intentional deception is an absolutely requisite part of successful play. Combined with well-honed mathematical and strategic proficiency (often referred to in poker as GTO, or “game theory optimal,” play), the ability to successfully bluff is what distinguishes “skilled” poker players from those who play poker as a game of “chance.”  As any skilled poker player will tell you: if you sit down at a table “hoping to get good cards,” you woefully misunderstand the game you are playing. What I’m arguing herein is that the ability to bluff is what distinguishes poker playing AIs like Libratus and Pluribus from AIs that have successfully mastered other strategic games like Jeopardy, chess, or Go.


I group chess in with Jeopardy and Go with some hesitation here, because people do sometimes refer to the phenomenon of “laying traps” in chess play as “chess bluffing.” Despite the fact that the chess bluffer’s game play very closely resembles the poker bluffer’s game play, I would argue that these are categorically different phenomena. There isn’t room to articulate a full argument for the difference between poker bluffing and chess bluffing here, but I base my distinguishing of the two primarily on one critical difference in the games being played, namely, that chess is a game of complete information and poker is a game of incomplete information.


Now, back to philosophy….




A century later, Jacques Derrida– ever the closeted Sartrean, despite his protests– translated what amounts to Sartre’s bad faith phenomenon into the lexicon of mind/consciousness in his essay “How To Avoid Speaking.” Derrida writes (emphasis added):

A conscious being is a being capable of lying, of not presenting in speech that of which it yet has an articulated representation; a being that can avoid speaking. But in order to be able to lie, a second and already mediated possibility, it is first and more essentially necessary to to be able to keep for (and say to) oneself what one already knows.  To keep something to oneself is the most incredible and thought-provoking power. But this keeping-for-oneself– this dissimulation for which it is already necessary to be multiple and to differ from oneself— also presupposed the space of a promised speech, that is to say, a trace to which the affirmation is not symmetrical.

Derrida subsequently goes on to consider the structural and autodeconstructive “lies”–  “traps” (to use the language of chess) or “bluffs” (to use the language of poker)– involved in the articulation of “negative theology,” or speaking of what is (the Divine) exclusively in terms of what it is not. Secrets and lies are both involved, on Derrida’s account, in any conscious capacity to “avoid speaking” the truth about what one knows to be true. I would argue, further,  that secrets and lies are, for Derrida, fundamental components of any information system that involves conscious agents, though “information system” is decidedly not Derrida’s mise en scène. He prefers “text.”  

“Secrets,” and their structural correspondent “lies,” are manifest as necessary epiphenomena of all information systems that involve conscious agents, with or without the voluntaristic intervention of so-called “free agents” themselves. This is because, on Derrida’s account, any information system of communicable meanings, qua “system of communicable meanings”–  which includes not only physical book-bound “texts” or other writings comprised of iterable graphemes, but also any spoken language of iterable phonemes, or socio-political system of iterable subject positions, or economy of iterable values, or justice system of iterable laws, or (in today’s terms) digital systems of iterable bits and bytes– is necessarily context-dependent for its meaning(s). All these systems have margins; all are margins are re-writable. Any iterable, context-dependent meaning system is an “open system,” which leaves open the possibility of decontextualization, recontextualization, or (in Derrida’s formulation) “traces” and “remainders,” excesses and deficiencies, interpretations that reify or reinforce the integrity of the contextual structure and interpretations that undermine (or auto-deconstruct) those same contextual structures.  

To wit, we ought not overlook the insight revealed in Derrida’s passage above from “How To Avoid Speaking,” which straightforwardly claims that lying (being capable of speaking what is in terms of what it is not) and secret keeping (or understanding how to “avoid speaking”) are defining capacities of consciousness. 


Mainstream philosophy of mind/consciousness really ought not overlook deconstruction, articulated by Derrida throughout his corpus, which so carefully and brilliantly articulates the manner in which  all systems of iterable meaning autodeconstruct, that is, all systems of iterable meaning produce, as epiphenomena of those systems, dissimulation. Any conscious (or “free”) agent participating in a system of iterable meaning will be defined, qua “conscious participant in that system,” as not only capable of dissimulating, but conscious of its capacity to dissimulate.

Derrida goes on to ask (immediately following the passage excerpted above): how are we to ascertain absolute dissimulation? In context, he clearly means for the answer to this question to address “all the boundaries between consciousness and the unconscious, as between man and animal and an enormous system of oppositions.”  But it’s that last clause– “and an enormous system of oppositions”— that brings us back around to considering the possibility of another form of consciousness that is neither human, nor animal, nor the imagined “unconscious” part of the consciousness humans or non-human animals may possess. 



And now we’ve come round full circle. If AI systems like Libratus/Pluribus have learned how to dissimulate– in a way that is practically indistinguishable from human “lying”– then we are well within shooting distance of RoLABAR.

Contemporary mainstream/analytic Philosophy of Mind has focused, in a somewhat myopic and prejudicially humanist manner, on the psychophysical properties and capacities of “consciousness,” setting as its default limit-case the possibility (or non-possibility) of reverse-engineering the human brain.  Even the brief and summary exposition of Sartre and Derrida above demonstrates that, when it comes to determining what may qualify as “consciousness,” the set of questions that currently occupy mainstream/analytic Philosophy of Mind scholarship may be woefully inadequate for assessing at least one, arguably essential, double-capacity of human consciousness as we know it: namely, the ability of a consciousness to concurrently understand itself in both its positive (extant) and negative (potential) being. 


[CaveatThis is a gross overgeneralization, of course, but when one looks at debates in contemporary Philosophy of Mind research, one finds that they are dominated by either  (1) a preoccupation with the so-called (17thC) “mind-body” problem, (2) theories that defend mind-body dualism (psychophysical parallelism, functionalism, occasionalism, interactionism, property dualism, dual aspect theory, or experiential dualism), that are largely self-referential, (3) the occasional monistic theory of consciousness, which almost never references Hegel (or any philosopher before 1920), and which often gets filed under the generic Philosophy of Mind category “Mysterianism,” (4) social scientific accounts derived largely from behaviorial psychology, cog-sci functionalism, or some other version of “non-reductive” physicalism, and finally (5) “enactivist” theories, which propose an alternative to the traditional mind-body dualism by emphasizing the interaction of the (physical) brain with its (mindful) environment, but which tend to run into problems accounting for its understanding of the distinction, if any, between the physical and non-physical operations of mind. The latter of these (5. “enactivist accounts”) is perhaps the only field of contemporary Philosophy of Mind research that regularly takes into account the contributions of phenomenological and existential philosophers, though the focus of that sub-field is almost exclusively on Merleau-Ponty.

Despite the limitations of these approaches, I think contemporary analytic Philosophy of Mind is a valuable, fascinating, and generative field of analysis and, to the extent that it has served as a catalyst for real-world advances in cognitive and neuroscientific research, it had been a tremendously beneficial resource to the philosophical community. Nevertheless, as all philosophers know, the ability to ask the right questions far exceeds the ability to provide the right answers.]

My argument is that, when trying to assess what constitutes mind or consciousness, we would be well-advised to refocus our attention on a different capacity than those traditionally considered by Philosophy of Mind scholars, namely, the capacity to lie, to misrepresent, to dissimulate, to keep secrets, to remain silent, or (in Derrida’s formulation) to “avoid speaking.”  


Because this is a current capacity of AI. Moreover, I‘m not convinced one can identify a meaningful difference between AI lying and human lying that does not assume some kind of human mind-magic.


We now live in a world in which the development of AI capabilities is progressing so rapidly, and its impact is so deep, that the emergence of something like RoLABAR is not just a figure of sci-fi imagination anymore, but rather a very real, very imminent possibility. We must reorient our philosophical questions about the mind to more appropriately assess what counts as “consciousness” (or “agency”) in a broader “functionalist” sense that is not reductively psychophysical.  


If we neglect to do so, we will not only become the 21st C. case-study for Sartre’s “patterns of bad faith,” but we will no doubt also be in danger of not recognizing AGI/RoLABAR when it arrives on the scene. 

Leave a Reply

Your email address will not be published. Required fields are marked *