Elasticsearch 101: Part 1
An Introduction to the Basics of Search and Indexing with Elasticsearch
Welcome to Hello Engineer, your weekly guide to becoming a better software engineer! No fluff - pure engineering insights.
You can also checkout : CAP Theorem Explained!
Don’t forget to check out the latest job openings at the end of the article!
Intro
A lot of system design problems boil down to one thing: finding stuff. You’ve got a huge pile of "things" and need a smart way to get the right one when you ask for it. Sure, databases like Postgres (with full-text search) can handle that up to a point. But when things get serious, think more scale, more features, you’ll want a tool built just for the job. That’s where Elasticsearch shines.
Elasticsearch is one of the most popular search engines out there. It's fast, it’s flexible, and it’s built to handle stuff like sorting, filtering, ranking, and all the other bells and whistles that make modern search actually useful.
Basics
Let’s start with the basics. If you're working with Elasticsearch, there are a few key terms you’ll hear all the time: documents, indices, mappings, and fields.
Documents
Think of documents as the core pieces of data you’re storing and searching. Don’t let the term throw you off, it doesn’t have to be a blog post or a PDF. In Elasticsearch, a document is just a JSON object.
Here’s a new example: imagine you’re building a recipe app. Each recipe can be a document, like this:
{
"id": "RECIPE001",
"title": "Spaghetti Carbonara",
"chef": "Mario Rossi",
"ingredients": ["spaghetti", "eggs", "pancetta", "parmesan", "pepper"],
"prepTimeMinutes": 25,
"createdAt": "2025-03-12T10:00:00.000Z"
}
This is what Elasticsearch will index and allow you to search through, by title, chef, ingredients, or whatever else you include.
Indices
An index in Elasticsearch is basically a collection of documents. Think of it like a table in a traditional database, each document in that index has a unique ID and a bunch of fields (just key-value pairs) that hold the data you want to search.
For example, in our recipe app, we might have an index called recipes, where each document is a different recipe.
Just a quick note: the word index can be a bit confusing. In general tech terms, “index” often means a data structure that makes lookups faster (like in SQL). Here in Elasticsearch, we’re talking about the top-level container that holds documents.
You could have multiple indices, one for recipes, one for chefs, another for user reviews, and so on, whatever makes sense for your app. And when you search, you’re querying one or more of these indices to find what you need.
Mappings and Fields
Okay, so you’ve got your documents, and you’ve got your index. But how does Elasticsearch know what kind of data it’s dealing with? That’s where mappings come in.
A mapping is like a blueprint or schema for your index. It tells Elasticsearch, “Hey, this field is a number, that one’s a date, and this one over here? Yeah, treat it like plain ol’ text.”
For example, let’s say we’re storing recipes. Here’s a simple mapping:
{
"properties": {
"id": { "type": "keyword" },
"title": { "type": "text" },
"chef": { "type": "text" },
"rating": { "type": "float" },
"createdAt": { "type": "date" }
}
}
Let’s break that down:
id
is a keyword, which means it’s not broken up, Elasticsearch treats it as a single value, perfect for exact matches.title
andchef
are text, so they get analyzed and split up into terms. Useful for full-text search (like searching for "chocolate").rating
is a float, so we can sort or filter by it.createdAt
is a date, which means we can do date-range queries and sorting.
Elasticsearch is super flexible, you can even use nested objects, arrays, geolocation fields, or fancy stuff like embeddings for AI-powered search. But don’t worry, you don’t need all of that to get started.
Tip:
Only include fields you actually need for searching or filtering. Adding extra fields “just in case” makes your index bigger and slower. If you’re only searching on 2 out of 10 fields, don’t waste memory on the other 8!
Mapping is one of the best tools you have to control search behavior and performance, so get to know it well!
Let’s walk through a series of operations to create an index, store some data, and perform a search to get a feel for the essential functionality.
Elasticsearch offers a clean and friendly REST API to handle these operations. You can use tools like curl, Postman, or Kibana’s Dev Tools interface to interact with your cluster. While Elasticsearch also has many client libraries and GUI tools, the REST API is where everything starts.
This step-by-step flow gives you a taste of what working with Elasticsearch actually looks like.
Use Cases
Creating an Index
Imagine you're starting a library and need to set up shelves. A basic index is like setting up your shelves in a library. By default, Elasticsearch will automatically organize things for you, but you can add your own specific instructions, like how many shelves (shards) you want and how many backup shelves (replicas) there should be. Here's a quick example:
PUT /library
{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1
}
}
Setting a Mapping
Now, you want to tell Elasticsearch exactly how to handle each book (field). Maybe you have a "title," an "author," and a "price" that need to be treated differently, right? Let’s be clear about the categories (types) before we start putting in books. Here's what that could look like:
PUT /library/_mapping
{
"properties": {
"title": { "type": "text" },
"author": { "type": "keyword" },
"price": { "type": "float" },
"publish_date": { "type": "date" }
}
}
Now Elasticsearch knows that your books will have titles, authors, prices, and publish dates. You can add more complex stuff, like reviews, later if needed.
Adding Documents
Now, let’s fill up our shelves! To add books, you just make a quick POST request. For instance, let's add a book:
POST /library/_doc
{
"title": "The Alchemist",
"author": "Paulo Coelho",
"price": 15.99,
"publish_date": "1988-05-01"
}
When Elasticsearch accepts this, it’ll give you a nice little ID for the book (like a library catalog number). So, you can always retrieve or update it later.
Updating Documents
Let's say you want to raise the price of "The Alchemist" because it's now a bestseller. You can do that with a PUT request. Just target the book by its unique catalog ID:
PUT /library/_doc/1
{
"title": "The Alchemist",
"author": "Paulo Coelho",
"price": 19.99,
"publish_date": "1988-05-01"
}
But beware! If someone else is also updating the same book at the same time, you could accidentally overwrite their changes. Elasticsearch offers a little helper called the version number to avoid that.
Search
Now that we have books in the library, let’s search for them! If you’re looking for books by Paulo Coelho, just use a simple search like this:
GET /library/_search
{
"query": {
"match": {
"author": "Paulo Coelho"
}
}
}
Boom! You’ll get all the books by this author. What if you want to look for books that cost less than $20? No problem!
GET /library/_search
{
"query": {
"bool": {
"must": [
{ "match": { "author": "Paulo Coelho" } },
{ "range": { "price": { "lte": 20 } } }
]
}
}
}
Now you're getting books that are by Paulo Coelho and cost under $20.
Sort
But wait, we’re not done! Sometimes you want to see the books in a certain order. Maybe you want to see the cheapest books first or the newest ones last. Sorting is simple:
Sort by price (ascending):
GET /library/_search
{
"sort": [
{ "price": "asc" }
],
"query": {
"match_all": {}
}
}
Sort by price and then by publish date (descending):
GET /library/_search
{
"sort": [
{ "price": "asc" },
{ "publish_date": "desc" }
],
"query": {
"match_all": {}
}
}
This sorts first by price, and then if two books have the same price, it sorts them by publish date, starting from the most recent.
Nested Searches
Let’s add a little complexity! Imagine you have a list of reviews for each book. You can search through these reviews to find books with the best ratings. Check this out:
GET /library/_search
{
"query": {
"nested": {
"path": "reviews",
"query": {
"bool": {
"must": [
{ "match": { "reviews.comment": "life-changing" } },
{ "range": { "reviews.rating": { "gte": 4 } } }
]
}
}
}
}
}
This searches for books with a review comment containing "life-changing" and a rating of 4 or more stars. Talk about precision!
How Elasticsearch Works ?
So, you’re already using Elasticsearch, but ever wonder what’s going on behind the scenes? How is it making everything so fast and smooth? Here’s the scoop:
At the core, Elasticsearch is powered by Lucene, a super-efficient search library. Think of Lucene as the engine that does all the heavy lifting, and Elasticsearch is the conductor making sure everything runs in sync.
What’s Happening in Elasticsearch?
Cluster: This is like a team of servers working together. It’s like a library, but bigger—think multiple libraries in different cities working as one.
Node: Each server in the cluster is a node. So, every node is like a librarian with their own little section of the library.
Index: This is where your data (like documents or books) is stored. It’s the bookshelf where Elasticsearch keeps everything organized.
Shards and Replicas: To make sure everything is fast and safe, Elasticsearch breaks up each index into shards, tiny chunks of data. These chunks are spread across different servers (nodes). And just like having backup copies of your favorite book, replicas are extra copies of these chunks to keep your data safe and available.
Indexing: How Elasticsearch Organizes Your Stuff
When you add data to Elasticsearch, it doesn’t just throw it into a random pile. It indexes it, which means it organizes everything neatly to make searching super quick. This is done with something called an inverted index.
Here’s an easy way to think about it: imagine a book index. You look up a word, and it tells you all the pages where that word appears. Elasticsearch does this for every single word in your data. So when you search for something, it doesn’t have to read everything, it just checks the inverted index to find exactly what you need, super fast.
Searching: How Elasticsearch Finds What You Want
When you search, Elasticsearch doesn’t go through every document one by one. Here’s how it works:
Your Search: You type something like, “Find books by Paulo Coelho,” and Elasticsearch figures out where that data is stored.
Distributed Search: Since Elasticsearch is spread out over multiple servers (nodes), it sends the search request to all the right places.
Lucene Does the Work: Each server uses Lucene to find the documents that match your search.
Results: Elasticsearch collects the results from all the servers and sends them back to you.
Wrapping Up!
I hope you’ve got a good grasp of how Elasticsearch works so far! In the next part, we'll dive even deeper into its inner workings, exploring the cluster architecture, and much more.
Loved this deep dive? Hit a like ❤️
For more simple explanations, useful insights on coding, system design, and tech trends, Subscribe To My Newsletter! 🚀
If you have any questions or suggestions, leave a comment.
See you next week with more exciting content!
Exciting Job Opportunities 🚀
Software Engineer, Core Infra, Stripe : Link
Software Engineer, DE Shaw : Link
Software Engineer, Backend, Coinbase : Link
Software Engineer, Google : Link
Software Engineer, Uber : Link
Software Engineer, Meta : Link