A dinger heard around the world: Racial slur or rocky evidence?

~ 8 minutes read

During a broadcast of a Colorado Rockies baseball game on Sunday night, observers could hear an attendee yelling what sounded like the n-word while Lewis Brinson, a black outfielder for the Miami Marlins, was up to bat.   The commentator for the game immediately apologized to the audience, calling the language “horrible.” The Rockies later issued a statement expressing “disgust” for the language used, assuring that any fan using such language would be banned from attending games.     Later on, some pointed out that the man in the video appears to be yelling and waving in the direction of the Rockies’ mascot, named “Dinger,” not the pit where Brinson was hitting. However, some are skeptical about whether the fan was actually calling for the mascot.   These distinct responses are based on two completely different interpretations of what happened — either the man said a racial slur or the name of a mascot. To some, this is an unacceptable instance of racism. To others, this is a clear instance where the media is creating a story out of nothing.       On Monday morning, Steve Staeger, a local reporter, interviewed the man, who said he did not use a slur and was attempting to summon the mascot for a photo with his family.     With video of the incident readily available, how is this event so confusing and divisive?   

Why these words sound so similar

The n-word and “Dinger” are so phonetically similar that it’s very easy to perceive one as the other.   Both words have three consonants and two vowels. The difference in the pronunciation of these words is the first and second consonants, which are closely related.      The first consonant is a voiced alveolar.    Voiced means that sound is being produced by the vocal folds,1 and alveolar refers to the position of the mouth as slightly open with the tip of the tongue on or near the ridge behind the top row of teeth.      Position of tongue for a voiced alveolar consonant sound, such as D or N. (Image source   The difference between the D and N of the two words is that the former is a voiced alveolar stop, whereas the latter is a voiced alveolar nasal. Stop means that the air is interrupted, and nasal means the air travels up through the nose.    Velar refers to the position of the tongue with it’s back part against the back of the roof of the mouth. The ng in Dinger, and the g in the n-word are also nearly identical.     Position of tongue for a voiced velar consonant sound, such as “NG” or “G.” (Image source   The difference, again, is that one is a nasal and one is a stop.    Although these words have drastically different meanings, their sounds are extremely similar.    If these words were crayon colors in the 120 pack, they’d be right next to each other. It’s why people are still arguing on the internet about what the man was saying, and it’s the reason why — if the man was actually saying “Dinger” — the announcer immediately apologized.    But this isn’t the only reason why the Rockies incident was confusing.  

The nature of video

Despite the existence of audio and video recordings of the incident, people have reached varied conclusions of what happened.  Although video recordings are often considered indisputable evidence, they aren’t immune from framing. Events before and after the video was taken (or before and after it was clipped), as well as things happening outside the literal rectangular frame, allow for ambiguity and interpretation of what actually happened.    There is a lot of context missing from this (and every) video. For example, casual viewers on the internet might not know the name of the Colorado Rockies mascot. And even if you did know the mascot’s name, the mascot is just out of frame in most of the widely shared clips, so you might not even think of it.    In her book On Photography, philosopher Susan Sontag discusses the flaws in seeing images as reality or incontrovertible evidence. To Sontag, photos and videos are an inherently incomplete and a detached representation of a moment. Because of this, she argues that photos and videos should not be seen as incontrovertible evidence. She continues:  

The ultimate wisdom of the photographic image is to say: “There is the surface. Now think — or rather feel, intuit — what is beyond it, what the reality must be like if it looks this way.” Photographs, which cannot themselves explain anything, are inexhaustible invitations to deduction, speculation, and fantasy…Strictly speaking, one never understands anything from a photograph.

  Whenever we watch a video or see a photo, it often feels like a substitute for being there and witnessing it in person. But it’s important to remember that, since videos and photos are inherently framed, we are drawing conclusions based on imperfect information. Even if we were there and witnessed the event, our limited perceptions will still frame our experience.   Because we interpret them within our individual worldview, our understanding of videos and photos feels like obvious truth to us. We often see racist videos on the internet, so when a video of a person yelling a word so similar to a racial slur comes on our feed, it’s easy to accept that as the full story — it feels true. And since we don’t have unlimited time or resources to research the validity of every video we see, we often run with that unexamined gut instinct.   Just because our understanding of videos and photos isn’t complete doesn’t mean our interpretation is not important. But when we remember that photos and videos are a small slice of a much more complex reality, it’s easier to understand how another rational person can view the same media and interpret it so differently.   


Because of the phonetic similarity between the n-word and “Dinger” and the inherent framing of video material, people immediately reached distinct conclusions about what had happened at the Colorado Rockies game. On top of the phonetic and photographic ambiguity, racism is a deeply emotional topic, which lends itself to quick, passionate responses.    When we encounter something that moves our emotions –– like racism –– our brain immediately places the new information within our pre-existing worldview to create a comprehensible story. This often leads us to draw incomplete conclusions, based on what feels intuitive to us, rather than taking into account all existing information.    Keeping in mind that our immediate conclusions are not objective truth can help us recognize a wider range of information and, hopefully, help us to better understand the world around us.  

Further reading

Here is the longest clip of the video we could find:     See this thread for some information from an interview with the attendee:     1 For a demonstration of this, put two fingers on your Adam’s apple and pronounce S (a voiceless phoneme) and Z (a voiced phoneme.) you will feel the vocal folds vibrate with Z, but not with S.   What do you think? Give us feedback by emailing info@narrativesproject.com. If you’re enjoying our analyses, follow us on Twitter, Instagram, and Facebook for more.
Anna Tyger, Shaun Cammack, and Sofia Sedergren-Booker August 9, 2021