Thursday, 2 October 2025

Surprising Truths About the Future of Databases

Introduction: The Database is Not What You Think It Is

For decades, the easiest metaphor for a database has been the digital filing cabinet—or perhaps a super-powered spreadsheet. It's a place where we neatly store structured information, organise it into tables, and retrieve it when needed. This model has been the backbone of modern IT, a passive utility for holding the clean, orderly data that powers our applications.

That traditional view is being completely upended. Driven by the explosive demands of AI, big data, and cloud computing, the database is undergoing a radical, counter-intuitive transformation. The database is no longer a passive container; it has become an active, intelligent, and highly specialised engine at the core of our digital infrastructure. This article will reveal the most surprising truths about the modern database landscape and what they mean for the future of technology.

"NoSQL" Doesn't Mean "No to SQL"

One of the biggest misconceptions in the database world revolves around the term "NoSQL." It's often interpreted as a wholesale rejection of SQL (Structured Query Language), the standard for relational databases. However, the term originally stood for "Not only SQL," signifying a move to embrace database models beyond the traditional relational structure, not to abandon SQL entirely.

While early NoSQL databases diverged from SQL to achieve massive scalability and flexibility, the industry is now seeing a powerful convergence. As noted in Database Trends and Applications, SQL is "back and possibly more vital than ever." This is because SQL's declarative power and widespread familiarity are too valuable to discard, as it is backed by decades of tooling, a massive global talent pool, and a proven ability to handle complex queries. Modern systems are increasingly blending the horizontal scalability of NoSQL architectures with the proven, expressive query power of SQL, giving developers the best of both worlds.

Pure Blockchains Are Actually Terrible Databases

Blockchain is often hailed as a revolutionary new type of database, promising unparalleled security and decentralisation. While its cryptographic linking of records creates an immutable ledger, a pure blockchain is fundamentally unsuited for the demands of a general-purpose database. This highlights a fundamental conflict: a traditional database is optimised for rapid, flexible querying, while a pure blockchain is optimised for decentralised, trustless verification. They are architected to solve entirely different problems.

A Wikipedia entry on the topic clarifies this with a stark assessment:

In actual case, the blockchain essentially has no querying abilities when compared to traditional database and with a doubling of nodes, network traffic quadruples with no improvement in throughput, latency, or capacity.

The practical solution is not to replace databases with blockchains, but to augment them. The emerging concept of a "blockchain-based database" involves taking a traditional, high-performance database and integrating blockchain features like data immutability, integrity assurance, and decentralised control. This hybrid approach delivers cryptographic trust without sacrificing essential database functionality.

The "One Database to Rule Them All" Era Is Over

For a long period, relational database management systems (RDBMS) dominated almost all large-scale data processing applications. This "one-size-fits-all" model created a landscape where a single type of database was forced to handle every kind of workload, from transaction processing to analytics.

That era is definitively over. A 2025 trends report highlights the "demise of general-purpose legacy systems" and the corresponding "rise of specialised engines." Instead of a single, monolithic system, we are moving toward a diverse ecosystem of databases, each engineered to solve a specific problem with maximum efficiency. Examples of this specialisation include:

  • Graph databases: Built for highly connected data where relationships are key, like mapping a professional network on LinkedIn or detecting complex fraud rings.
  • Time-series databases: Optimised for data where every point has a timestamp, essential for tracking millions of IoT sensor readings or the fluctuating price of a stock second-by-second.
  • Vector databases: Designed to store mathematical representations (vectors) of data, powering the 'semantic search' in modern AI applications, allowing you to search by meaning, not just keywords.

Your Next Database Might Live Inside Your App

When we think of a database, we typically envision a separate server that applications connect to over a network. However, a powerful and surprisingly common architecture flips this model on its head: the embedded database. An embedded database is a system that is "tightly integrated with an application software," running as an internal library rather than a standalone server.

This approach is more prevalent than you might think. SQLite, an embedded database, is the most widely deployed SQL database engine in the world, running inside countless operating systems, web browsers, and mobile applications. The trend is accelerating with new, high-performance embedded engines like DuckDB, which is described as being "ideal for local data analysis...without the need for a server." This is impactful because it enables powerful and complex data processing directly on client devices, reducing infrastructure complexity, eliminating network latency, and enabling robust offline capabilities.

Databases Are Becoming Active Partners in AI

The relationship between AI and databases is evolving from a simple one—where the database just stores training data—to a deeply integrated partnership. Two major trends are driving this shift.

The first is the rise of databases built specifically for AI workloads. Vector Databases are a prime example, designed to store and query high-dimensional vector embeddings. These systems are a critical component for implementing Retrieval-Augmented Generation (RAG), a technique that allows Large Language Models to pull in domain-specific information and provide more accurate, context-aware responses.

The second, even more profound trend is embedding AI capabilities directly into the database itself. Systems like MindsDB allow developers to "leverage AI models using SQL" from within their database. Instead of moving massive datasets to a separate AI platform for processing, developers can bring machine learning models to the data. This in-situ processing is more efficient, more secure, and dramatically simplifies the architecture for building AI-powered applications.

You Can Now "Branch" Your Database Like Code

In software development, version control systems like Git revolutionised collaboration. Developers can create isolated "branches" of the codebase to work on new features without interfering with the stable, production version. This proven, powerful workflow is now becoming available for databases.

Database platforms like NeonDB are bringing branching capabilities to data management. According to the technology blog Budibase, developers can "check out new branches which will take a snapshot of the data and structure at that point in time." This allows them to experiment with schema changes, test new features with a production-like data set, and validate everything in complete isolation. Once the changes are approved, the new structure can be safely merged back into the production database. This innovation makes developing and testing data-intensive applications dramatically safer, faster, and more collaborative. This directly translates to business agility, reducing the risk of data-related outages and accelerating the time-to-market for new, data-dependent features.

The Future is "Cloud-Native," Not Just "In the Cloud"

For years, moving to the cloud simply meant taking a traditional, on-premise database and running it on a cloud provider's virtual server. This "in the cloud" approach offered some benefits but failed to capitalise on the unique architecture of the cloud itself. The new strategic imperative is not just being in the cloud, but being cloud-native.

Cloud-native databases—like Snowflake, Databricks, FaunaDB, and NeonDB—are built from the ground up to leverage the fundamental properties of cloud infrastructure. They are designed for distributed processing, dynamic scalability, and high resiliency, separating compute from storage to allow each to scale independently. This architectural shift away from monolithic legacy systems is a primary driver in the modern data landscape, enabling organisations to handle massive analytical workloads and fluctuating demand with unprecedented efficiency and cost-effectiveness.

Conclusion: A New Era of Data

The database has fundamentally evolved. It is no longer a passive, monolithic utility for simple storage but a diverse ecosystem of active, intelligent, and highly specialised tools designed for the unique demands of the modern data landscape. From cloud-native platforms that scale globally to embedded engines that bring analytics to the edge, the very definition of a database is expanding.

This shift marks a new era where data infrastructure is purpose-built for the task at hand. As every piece of our digital world gets its own specialised database, what new innovations will become possible when data is no longer forced to fit in a one-size-fits-all box?

Video

A summary of Databases

During a recent conversation I started to think about the various types of databases now available. What types are there? What are they used for? In a later article I will explore developments in databases.

Relational Database Management Systems (RDBMS)

Relational databases model data using rows and columns organised into a series of tables. This architecture became dominant in the 1980s. The design involves splitting data into a set of normalised tables, or relations, which aims to ensure that each elementary "fact" is stored only once, thereby simplifying update operations and helping to maintain consistency. The vast majority of these databases use Structured Query Language (SQL) for querying and writing data. Compared to non-relational databases, RDBMSs typically provide strong consistency (also known as immediate consistency).

Examples of Use: Relational systems dominate large-scale data processing applications. Specific implementations like PostgreSQL are often utilised for global mission-critical applications. This includes systems like the .org and .info domain name registries, as well as those used by many large companies and financial institutions. Traditional databases frequently use the SQL model.

Non-Relational (NoSQL) Databases

The term NoSQL originally meant "Not only SQL" or "non-relational," referring to a class of databases that diverge from the traditional table-based structure of relational databases. NoSQL databases typically use a single, simpler data structure—such as key–value pairs, graphs, or documents—and do not require a fixed schema. This flexible design allows them to scale easily for large, often unstructured datasets. NoSQL systems are generally designed to scale horizontally across clusters of machines and often prioritise speed and availability over strict consistency, frequently employing eventual consistency. NoSQL databases can be broadly categorised into four types: key-value stores, document databases, wide-column stores, and graph databases.

Examples of Use: NoSQL databases are popular for big data and real-time web applications. Their emergence was spurred by the scaling demands of Web 2.0 companies, such as social media platforms.

Key–Value Store

Key–value (KV) stores utilise the associative array (or map/dictionary) as their fundamental data model, representing data as a collection of key–value pairs where each key is unique. This is one of the simplest non-trivial data models. Since data storage is schema-less, the value associated with a key is typically a primitive data type or an object marshalled by the programming language. Some key-value stores feature ordering of keys, enabling efficient retrieval of specific key ranges.

Examples of Use: Notable examples of key–value stores include Redis, Amazon DynamoDB, and LMDB.

Document Database

The core concept of a document store is the "document," which encapsulates and encodes data using standard formats like JSON, XML, or binary forms like BSON. Each document is addressed by a unique key. Unlike tables in relational databases, documents within a collection do not necessarily need to have the same fields. Document databases belong to the main categories of NoSQL databases.

Examples of Use: Document databases are useful in applications where information is naturally viewed as a collection of documents with varying structure, such as scientific articles, patents, tax filings, or personnel records. MongoDB is an example of a distributed NoSQL database that stores data as BSON documents.

Graph Database

A graph database is a type of NoSQL database designed specifically to manage data whose relationships are best modelled as a graph structure composed of nodes, edges, and properties. These databases allow complex relationship queries to be performed efficiently.

Examples of Use: Graph databases are ideal for representing complex relationships such as social relations, road maps, network topologies, and public transport links. Notable examples include Neo4j and OrientDB.

Wide-Column Store

The wide-column store is classified as one of the four main categories of NoSQL databases. These systems model data using columns.

Examples of Use: Specific wide-column store implementations include Bigtable, Cassandra, and HBase.

Embedded Database

An embedded database system is a Database Management System (DBMS) that is tightly integrated with an application software, meaning it is embedded within the application rather than operating as a standalone system. The DBMS is generally hidden from the end-users of the application, and the system requires little or no ongoing maintenance. This category encompasses databases using various architectures (client-server or in-process) and models (relational, object-oriented, etc.).

Examples of Use: The most widely deployed SQL database engine globally, SQLite, is an embedded database, utilised in operating systems like Android, iOS, and Windows 10, as well as web browsers like Chromium. Informix Dynamic Server (IDS) is used in deeply embedded scenarios such as point of sale applications, financial transaction processing systems, and IP telephony call-processing systems.

In-Memory Database

An in-memory database primarily resides in the main memory of the computer, although it is typically backed up by non-volatile storage. The primary advantage of this approach is increased speed, as main memory access is faster than disk access.

Examples of Use: In-memory databases are deployed in situations where a critical response time is required, such as in telecommunications network equipment. solidDB is a hybrid in-memory relational database often used as an embedded system database in network software and telecommunications equipment, designed to handle tens of thousands of transactions per second with microsecond response times.

Vector Database

A vector database (or vector store/search engine) is a specialised system that stores vectors (fixed-length lists of numbers) using the vector space model. These vectors are high-dimensional mathematical representations of data, computed using machine learning methods like deep learning networks. They use approximate nearest neighbour algorithms to allow searching the database with a query vector to retrieve records that are the closest semantic match.

Examples of Use: Vector databases are used for similarity search, semantic search, multi-modal search, recommendation engines, and in conjunction with Large Language Models (LLMs). They are frequently used to implement Retrieval-Augmented Generation (RAG), a method for improving the domain-specific responses of LLMs. Implementations include Postgres with pgvector, MongoDB Atlas, and Apache Cassandra.

Blockchain-based Database

A blockchain-based database is a form of distributed database that combines traditional database features with distributed database concepts. Data is recorded and transacted through a Database Interface (or Compute Interface) supported by multiple layers of blockchains. The resulting database is an encrypted and immutable ledger that is open to everyone. The aim is to augment the features of SQL and NoSQL databases with blockchain properties, such as data immutability, transaction traceability, Byzantine fault tolerance, integrity assurance, and decentralised control.

Examples of Use: The Oracle DBMS currently implements support for this blockchain-based database model.

Immutable/Ledger Database (e.g., immudb)

An immutable database, such as immudb (which is also a ledger database), is designed for performance and tamper protection, ensuring that data can only be appended and never altered or deleted. It provides cryptographic verification of data integrity for every transaction and supports both SQL and Key/Value insertion. These systems offer an alternative to complex blockchain solutions.

Examples of Use: Immutable databases can be used to store every update made to sensitive database fields, such as credit card or bank account data, in an existing application database. They are also suitable for storing tamper-proof log streams, such as audit logs, and generating data change reports for compliance officers and auditors. A French financial services company successfully migrated from AWS QLDB (Quantum Ledger Database) to immudb.

Video



Thursday, 25 September 2025

CRISPR: Genome Editing

CRISPR: The Tiny Tool Changing Everything

Every now and then, a scientific breakthrough comes along that feels like it’s been plucked from science fiction. For the past decade, that breakthrough has been CRISPR.

At its core, CRISPR is a gene-editing tool—short for Clustered Regularly Interspaced Short Palindromic Repeats (yes, a mouthful). It started out as a defence system in bacteria, but scientists figured out how to re-purpose it for editing DNA. Imagine having a pair of molecular scissors that can snip, tweak, or rewrite the genetic code. That’s CRISPR. And it’s powerful enough that its discoverers, Emmanuelle Charpentier and Jennifer Doudna, were awarded the Nobel Prize in Chemistry back in 2020.

How Does CRISPR Actually Work?

Think of CRISPR as a tag-team:

  • A guide RNA, like the GPS coordinates, tells the system where to go.
  • The Cas9 protein, acting as scissors, makes the cut in the DNA.

Once the cut is made, the cell rushes in to repair it. That repair step is where the magic happens—scientists can use it to silence genes, fix mutations, or even swap out one DNA “letter” for another. There’s also a version of CRISPR that doesn’t cut at all but instead turns genes on or off, like flipping a switch.

From the Lab to Real-Life Cures

This isn’t just theory—it’s already saving lives.

  • Blood disorders: In the U.S. and U.K., doctors can now treat sickle cell disease and β-thalassaemia with Casgevy, the first CRISPR therapy. Patients’ bone marrow cells are edited so they start producing healthy fetal haemoglobin again. Early results are stunning—many patients no longer suffer from the painful episodes that once defined their lives.
  • Cancer: Doctors are experimenting with using CRISPR to reprogram a patient’s immune cells so they can better recognise and attack tumours.
  • Antidotes: Believe it or not, CRISPR even helped uncover a potential antidote to the deadly death cap mushroom. By screening how the toxin affects human cells, researchers found a way to block it.

Beyond Medicine

CRISPR isn’t stopping at healthcare.

  • Controlling pests: With “gene drives,” scientists could spread traits through entire populations of mosquitoes to wipe out malaria—or even tackle invasive species like cane toads.
  • Organ transplants: Pigs are being genetically edited so their organs can be safely transplanted into humans. This could someday ease the shortage of donor organs.
  • Science at turbo speed: Tools like Google’s AlphaFold, which predicts protein structures, are teaming up with CRISPR to speed up everything from drug discovery to climate research.

The Big Questions

Of course, with great power comes… a lot of responsibility. DNA editing isn’t foolproof, and mistakes can cause unintended side effects. The ethics of editing genes that can be passed on to future generations remain hotly debated.

There’s also the financial reality: treatments like Casgevy can cost upwards of $2 million per patient. Who gets access, and who doesn’t?

And here’s something that really gives me pause: CRISPR isn’t locked up in high-tech government labs. The tools are so accessible that bio-hackers working in DIY community labs—or even small setups in a garage—could, in theory, experiment with gene editing. That raises uncomfortable questions about safety, oversight, and how to prevent misuse while still encouraging innovation.

Video

Is China becoming an innovation hub?

Dispelling the Myth that China Does Not Innovate

The idea that China is merely a "copycat" nation, lacking originality, is increasingly outdated. While its economic rise did involve adapting foreign technologies, this narrative ignores both a long history of invention and a present-day transformation into a global innovation leader.

Historically, China produced world-changing inventions such as paper-making, printing, gunpowder, and the compass—the "Four Great Inventions." These were not isolated breakthroughs but part of a sustained tradition of technological advancement. Other contributions, including the wheelbarrow, umbrella, abacus, and cast iron, highlight a culture of ingenuity that predates modern globalisation.

In recent decades, China has shifted from imitation to independent innovation, driven by a deliberate, government-led strategy for technological self-reliance. The country has massively increased research and development (R&D) funding, now second only to the United States, and employs more researchers than the U.S. and EU combined. Its vast pipeline of STEM graduates further strengthens this capacity. Resources are concentrated in strategic sectors such as AI, biotechnology, and quantum computing, producing rapid advances and global leadership.

This investment is reflected in outcomes. According to the World Intellectual Property Organisation (WIPO), China leads the world in patent applications, particularly in emerging technologies like generative AI and 5G. Innovations are visible in high-speed rail, drone technology (with DJI dominating the global market), renewable energy, and commercial applications such as facial recognition payments and dockless bike-sharing. These represent not just adaptation but new forms of large-scale technological deployment.

China’s rise is not simply about catching up but, in many areas, surpassing competitors. Its state-directed model creates a "national innovation chain" that accelerates development and commercialisation. Nowhere is this clearer than in the electric vehicle (EV) industry. While Western firms focused on combustion engines, Chinese policymakers prioritised EVs, pairing investment with supportive policies. The result: China leads the world in EV production, sales, and battery manufacturing—critical for the green energy transition.

Other sectors show similar momentum. In AI, a coordinated national strategy enables Chinese firms to advance quickly. In 5G, China has deployed more base stations than the rest of the world combined, laying the foundation for smart cities and industrial automation.

In summary, the perception of China as a technological imitator is incomplete. With deep historical roots in invention and a modern, state-driven strategy focused on critical sectors, China has emerged as an innovation powerhouse. Its dominance in EVs, AI, 5G, and renewable energy, coupled with world-leading patent activity and rapid commercialisation, demonstrates a distinct innovation ecosystem. At "China speed," the country is not only reshaping industries but also redefining the global technological landscape.

Videos