Skip to content
AI LLM

English as a Framework: A Linguistic API for the Age of AI

Eva Arsirova |
English as a Framework: A Linguistic API for the Age of AI
0:33

As artificial intelligence increasingly bridges the gap between human language and machine logic, a pressing question emerges: is English becoming the new "programming language" of our age? This post explores the analytic nature of English and its potential as a semantic interface for AI systems. Through a linguistics-informed lens, we investigate how English functions like a framework, comparable to code libraries and runtime environments, and the implications this carries for language models, prompt engineering, and digital communication.

Language has always been the most powerful tool for human cognition, culture, and connection. But in the new AI era, it is also becoming the interface between humans and machines. As LLMs grow in complexity and capacity, the burden of clear and effective communication increasingly falls on the language we use. LLMs (Large Language Models) are being asked to process human language as if it were code, but natural language (especially English) is far more ambiguous, contextual, and culturally encoded than formal languages like Python or SQL. Among all global tongues though, English - analytic language, flexible, and globally adopted emerges as a de facto medium for instructing and interpreting machine learning systems. In this case English is not just a communication tool, but a linguistic framework - a layered system that, much like code, processes, structures, and executes meaning.

English: An Analytic Language with a Structural Advantage. From a typological perspective, English is classified as a moderately analytic language, relying primarily on word order and function words (like prepositions, articles, and auxiliary verbs) rather than inflection or complex morphology. This contrasts with synthetic languages (e.g., Latin, Ukrainian, Arabic, Finnish), which encode grammatical relationships through extensive inflection and word formation rules.

Why does this matter for AI? The reliance on structure over form makes English somewhat suitable for machine parsing and natural language processing (NLP). It aligns well with computational logic that depends on clear, predictable rules rather than fluid, inflected meaning. However, even though from the structural standpoint English is almost a suitable language to serve as a framework, there are other sides of it that make this framework way more complex. 

Binary Beginnings and the Evolution Toward Natural Language. At the foundation of computation lies binary code - a minimalist language designed for machines to process electrical signals through 0s and 1s. As programming evolved, intermediate layers - from assembly to Python to natural language prompts were developed to make human-computer interaction more intuitive.

Now, AI models represent a new middle layer: systems trained to translate human language (often English) into machine-executable logic. But this introduces a linguistic dilemma: Are these models trained by people with a deep understanding of language's full complexity: its denotations, connotations, and contextual nuance?

Denotation and Connotation: A Dual-Meaning Dilemma. To unpack this, we must revisit two key linguistic definitions:

  • Denotative meaning: The objective, direct definition of a word.
  • Connotative meaning: The emotional, cultural, or subjective associations attached to a word.

A model might recognize that “home” denotes a residence. But does it understand that to many, it connotes warmth, family, or even trauma?

Dictionaries as datasource to train AI models. The limits of dictionaries. English dictionaries like Merriam-Webster contain hundreds of thousands of entries, yet they prioritize denotative clarity over connotative depth. While slang, regionalisms, and jargon may enter dictionaries eventually, there's often a significant lag between linguistic reality and lexicographic recognition.

Dictionary Type

Approx. Entries

Coverage Notes

Merriam-Webster Collegiate

~225,000

Core modern vocabulary

Merriam-Webster Unabridged

~470,000+

Includes archaic, technical, and rare terms

Merriam-Webster Online

500,000+

Continuously updated

With over 1 million definitions, English averages approximately 2.2 meanings per word. However, high-frequency words like set, run, or go can have dozens, even hundreds of distinct senses, significantly complicating training data for both NLP systems and large language models (LLMs).

English as a Framework: A Linguistic API. When viewed through the lens of computational systems, English reveals a layered architecture, where each linguistic component maps to a corresponding coding construct and LLM function, forming a semantic framework much like a programming API:

Linguistic Layer

Coding Equivalent

LLM Equivalent

Function

Phonology & Prosody

UI/UX Layer

Limited (in speech models like Whisper, more so)

Sound, tone, emotional delivery

Morphology

Object Construction

Handled implicitly via tokenization

Word formation (prefixes, suffixes)

Syntax

Compiler

Captured statistically through token co-occurrence

Sentence structure rules

Semantics

Interpreter

Represented through embeddings in high-dimensional space

Meaning based on context

Pragmatics

Runtime Environment

Modeled via instruction tuning (e.g. ChatGPT as assistant)

Social context and intent

Lexicon

Standard Library

Embedded in token vocab and training corpus

Vocabulary and usage

Logic

Conditionals

Emerges from training (e.g. "if X then Y")

Reasoning and flow control

Discourse

API Integration

Supported through long-context modeling (attention windows, coherence)

Multi-sentence coherence

Each word, phrase, or sentence in English passes through this multi-layered engine to generate meaning, mirroring how code compiles, interprets, and executes.

The Risk of Miscommunication in Natural Language Interfaces. While coding languages are strictly structured, live human languages are not. English is inherently:

  • Ambiguous
  • Context-dependent
  • Culturally variable
  • Rapidly evolving

This means “data in” is messy, and “data out” from AI can reflect or amplify that messiness, especially when models are trained without nuanced linguistic oversight. For instance, the difference between “slim” and “skinny” may seem minor, but their connotative load varies widely by culture, speaker identity, and context.

Word

Denotation

Connotation

Home

Place of residence

Safety, warmth, family

Cheap

Low price

Inferior, low quality

Slim

Thin

Healthy, attractive

Skinny

Thin

Unhealthy, weak

 

The Case for Linguists in AI. Just as engineers design hardware and data scientists refine algorithms, linguists must shape the language layer. Those involved in training, tagging, localizing, and deploying language models should deeply understand:

  • Regional variation
  • Pragmatic intention
  • Semantic nuance
  • Connotative diversity

Without this expertise, we risk building systems that overfit to surface meaning, failing to capture the subtle human dimensions of language.

Conclusion: Are We Overengineering? We are rapidly moving toward a world where language is both the tool and the interface. But with every abstraction - each translation from prompt to model to machine - we introduce potential semantic loss.

The solution isn’t to reduce complexity but to honor it. That means involving linguists in model design, investing in broader lexicons and corpora, and treating English not as a monolithic code, but as a living, breathing framework with millions of edge cases.

“Who can say where the road goes? Only time.” - Enya

But one thing is clear: if we expect AI to understand us fully, we must first understand our own language systems better.

Share this post