Semantic Search through the Complete Wikipedia with Weaviate’s GraphQL API

Weaviate uses GraphQL to provide user-friendly data interaction. Weaviate is an open-source vector search engine, and all searches (e.g. semantic, contextual) are done via its GraphQL API. We’ve put a lot of thought into the design of the GraphQL API, which results in good user and developer experience. In this talk, I will take you along in the journey of how our GraphQL implementation was shaped according to user needs and software requirements, and show a demo of the current design for Weaviate. The demo will show how Weaviate’s GraphQL design enables semantic (vector) search in combination with scalar search through unstructured data. Machine learning models are used in the background, but with the current GraphQL design, users without a technical background can query the vector database easily.

Weaviate has a modular architecture, so users can connect various machine learning models on top of the vector database. Examples are the newly released Question Answering module and the Named Entity Recognition module. Modules can extend the GraphQL schema dynamically, to query the new features intuitively.

This presentation contains a demo where we will query the complete Wikipedia, conduct semantic search queries and more. All through Weaviate’s GraphQL API. No prior knowledge is required.


Weaviate is a vector search engine or database that utilizes a GraphQL API to manage and query data. It's designed to handle unstructured data by converting it into vector representations through machine learning models, allowing for more contextual and nuanced search capabilities beyond simple keyword matching.

Weaviate uses machine learning models to convert data objects into vector representations. Each object is processed through the model to obtain a vector that represents its information in a multi-dimensional space. These vectors allow Weaviate to perform searches based on semantic meanings rather than just matching keywords.

The GraphQL API in Weaviate is designed to facilitate complex queries on the data stored within the database. It allows users to perform operations like 'Get', 'Explore', and 'Aggregate' to retrieve, search through vector space, and summarize data, respectively, using a flexible and powerful query language.

Yes, while the demo focuses on text data, Weaviate is capable of handling various types of data including images and videos. This is facilitated by different machine learning models that can vectorize diverse data types, making it a versatile tool for many use cases.

The core functions of Weaviate's GraphQL API include 'Get', 'Explore', and 'Aggregate'. 'Get' is used to retrieve specific items from the dataset, 'Explore' helps in searching through the complete vector space, and 'Aggregate' is used for summarizing data, such as counting objects.

To start using Weaviate and its GraphQL API, you can visit the SAMI.Technology website, access the Developer section, and follow the installation guide. There are also customization options available to tailor the setup according to specific needs.

Weaviate offers benefits such as handling unstructured data, providing semantic search capabilities through vector representations, and supporting complex queries with its GraphQL API. This makes it suitable for searching and managing large datasets with nuanced and contextual search requirements.

Yes, the Wikipedia dataset used in Weaviate's demo is open source and available on GitHub. It contains over 11 million articles, 27 million paragraphs, and 125 million cross-references, and can be used for testing and exploring Weaviate's capabilities.

Bob van Luijt
Bob van Luijt
17 min
10 Dec, 2021


Weaviate is a database and search engine that uses a GraphQL API. It supports various machine learning models for data vectorization and search. The core functions of Weaviate are get, explore, and aggregate, which allow users to query and search through the data set. Weaviate provides fast and accurate results, allowing users to find anything in the dataset. The GraphQL API in Weaviate can be used for querying specific data and establishing graph relations.

We will talk about our database, search engine Weaviate, and its GraphQL API. We will use a demo data set, the complete Wikipedia, to demonstrate how to query it. We will provide context on vector search engines, discuss the design of the GraphQL API, and give a demo of the API on the data set. Lastly, we will show you how to start Weaviate with its GraphQL API.

So hello everybody. Thank you for taking the time to listen to this talk. We are going to talk about a few things. So first of all, we're going to talk about our database, our search engine, Weaviate, and we're going to use a demo data set, which is the complete Wikipedia to show how you can query it, and most importantly of course, we're going to talk about the GraphQL API that it has.

So weave is a vector search engine or database, it has a GraphQL API, and we're going to use it to demo to show you the demo data set of the complete Wikipedia. So first I will give a little bit of context about like what the vector search engine is, so that you understand what we're talking about, if it's new to you. Then we will look at the design of the GraphQL API. Then we'll go into a demo of the API on the data set. And last but not least, I'll show you how you can start it with Weaviate and its GraphQL API yourself.

So again, thanks for listening. So first of all, what is Weaviate and what is a vector search engine? So at the core, we're dealing with the problem of unstructured data. If you ever use a database or if you ever use a search engine, then you know that the data that you're storing, for example, if it is text, that you can only find it if you use keywords. So for example, in a traditional search engine, you have to, if you search for this data object for wine, for seafood, you will probably not find it because except for the key here, there's nowhere where you find the word wine in the data. The word for is not in there either and seafood is not in there either. So using a vector search engine and you would search wine for seafood, it would actually find the data object. And the reason why it's able to do that is because every data object that you add to the search engine is run through a machine learning model. The machine learning model creates factor representations and that's what you use to search to the database.

Now if this is new to you, then let me give you a little bit of context so that you know what's happening there. So, most machine learning models output vectors. And the easiest way to think about vectors are coordinates. So, for example, our first model had 300 dimensions and you had all these kinds of words in there. So the bulbs here represent words like meat, chicken, fish, etc. What you can do if you add a new data object, for example, the Chardonnay that's good with is that all these individual words that you see here highlighted in green are found in the vector space and they're placed in that same vector space. And what you can do is that you can give a unique centroid position to that data object. So, now you can say in the vector space the data object, in this case the Chardonnay, sits exactly here in the middle of where all these words sit. So now, if you search for wine related to seafood or those kind of things, you will actually be able to find that data object. It is not 100% match, but it's an approximation of what you're searching for. But in a bit, you will see what actually the value is of this. So, as you see here, we have the class Wine with property Covey run 2005 Chardonnay. It might be related to a beacon, and it might have certain vector weights.

We will discuss the data object structure in Weaviate and the database's role in storing objects for vector search and filtering. Weaviate supports various machine learning models for data vectorization and search. The architecture includes modules like text-to-vec and Q&A, running on your infrastructure. Weaviate's core contains these modules, along with a persistence layer for storing vectors and an API for data search. We will focus on the GraphQL API and its design, which we chose over other options. The design involves classes, properties, and graph-like data models with additional properties for searching.

So this is what the data object looks like when you store it in a Weaviate instance. Well, to help you work with this, we have the database which you see in the middle to store your objects to do vector search and to do filtering. But of course, there are many, many machine learning models that you can use to actually or vectorize the data or search through the data.

The demo that I'm going to give today is purely focusing on text. However, you could also do this for images or videos or any other data type. If you go a little bit deeper under the hood, you see how that works from an architectural point of view. So for example, we have text-to-vec modules or we have Q&A modules. They often run on a GPU. That's all running on your infrastructure.

These modules sit in the Weaviate core, then there's a persistence layer that's taking care of storing the vectors, being able to search through the vectors and to store the data object. But most important, there is an API on top of it. Of course, what we're going to focus on today is the GraphQL API and how you can leverage to search through your data.

First, before we do that, I want to talk a little bit about the design of the GraphQL API, because you have to know when we created the database, we didn't have an interface yet. We had to choose what language will we choose to query data. Will we just have a pure RESTful API? Will we adopt some kind of query language? Will we invent something of our own? Then we decided that the best for us was actually to use GraphQL. This is, in a tiny nutshell, our design. At the top, you have a core function within UEFI 8. We'll look at that in a bit. You have a class that you can add and add your data to. A class can be anything. Whatever data you can have, for example, if you have documents, you can just have a class document. If you have products, you can have a class product. Then you have the properties. A property can be also anything. So, for example, if we stay with the class product, then you might have the property name or the property price. You can, of course, make a cross-reference. Hence, it's a graph-like data model. Then we have these underscore additional properties. Those are properties that you get as part of searching for classes. But those are baked in in the modules or into Weave8 itself.

