In this workshop, we will see how to adopt Orama, a powerful full-text search engine written entirely in JavaScript, to make search available wherever JavaScript runs. We will learn when, how, and why deploying it on a serverless function could be a great idea, and when it would be better to keep it directly on the browser. Forget APIs, complex configurations, etc: Orama will make it easy to integrate search on projects of any scale.
JavaScript-based full-text search with Orama everywhere
AI Generated Video Summary
Orama Search Workshop is a learn-as-you-type workshop that introduces Orama, an open-source full-text search engine written in TypeScript. The workshop covers topics such as ES Modules, project structure, using pnpm for monorepos, creating a new Orama schema, importing and inserting data, performing searches, and filtering search results. Orama is recommended for those needing a full-text search engine that can easily scale and run on CDNs. The workshop also discusses future plans, deploying strategies, and community support through Slack.
1. Introduction to Orama Search Workshop
I'm Michele, cofounder of Orama Search and author of Orama, an open source library. This is a learn-as-you-type workshop, so open your IDE and keep your webcam open. Feel free to ask questions or write them in the chat. The only requirement is node 16 or higher. Node 20 was released recently. Orama is a full-text search engine written in TypeScript.
So I guess we can start. Before we start, I need to introduce myself, I guess. I'm Michele, and I am the cofounder of Orama Search, and I am the author and lead maintainer of Orama, which is an open source library that we'll see in just a second.
Before we start, I want you to know that this is a learn-as-you-type workshop, so now it's the time for you to open your Visual Studio code, IntelliJ, whatever IDE you use, this is the time for opening it. And I'd also like to ask if you can keep your webcam open. Yeah, thank you. Thank you for the people who are opening it. I think it's better and I can see like if you're in trouble. I'm sure that is not going to be the case.
So, feel free to unmute yourself and ask questions if you prefer or write them in the chat again. So, let me admit more people. So, there is just one requirement for this workshop and it's node greater than or equal to 16. So, this is the actual only requirement. I would prompt you to update to at least this version if you're running on node 14. If you're running on node 12, please abandon it as soon as possible because it's not maintained anymore and go at least to node 16. Yesterday, node 20 was released, just for you to know. I spotted a couple of little bugs, but it's so promising that I would also prompt you as soon as the workshop ends to test it out. There was a lot of work involved in my former team at NearForm on this release and I'm very proud they made an amazing job.
What is Orama? Orama is a full-text search engine written entirely in TypeScript. At the beginning, it was called Lyra. Then we had a couple of problems with a Google codec which shared the name. There were many companies using Lyra as name. When we founded a company to support the source development... No problem. Don't worry. You were at the Node Congress, right? Sure. I remember you. I remember you. That's good to meet you again. Yeah, same thing.
2. Introduction to Orama Workshop
We founded a company to support the open source development of the Orama library. Orama means to see in Greek. JavaScript can be very fast. The workshop aims to install and test Orama, create a database, and build a Fastify server to query the database. It's a learn-as-you-type workshop.
Yeah. Okay. I was telling, we eventually decided to found a company to support the open source development of this library. We opted for the name Orama, which in Greek means to see. Having a full text search engine means searching through things and seeing stuff. It's pretty important, I guess, for this task.
So the important thing to notice, if you haven't followed the talk at Node Congress, is that JavaScript can be very, very fast. And I will let you test this statement on your machine in just one second. So the purpose for the workshop is to install Orama, it's to test it, to create a new database, search through it, and by the end of the talk, create a Fastify server to query the database. And I will show you how, no problem.
So as I was saying, this is a learn-as-you-type workshop. So you will be able to follow simple steps to come to a solution for the problem. Before we start, please, if you want to go on GitHub and start Orama, this is helping us a lot. We were about to make a new release and maybe we will do while doing this workshop because I got other colleagues working on it right now. But it's pretty stable as for now, it's not a real problem. Before we start, you can either tell in the chat or unmute yourself for replying. How many of you are confident with ECMAScript modules?
3. Introduction to ES Modules and Project Structure
In the old days of Node.js, there was no module system. You had to use require to import functions from one file to another. TC39 standardized the module system as ECMAScript Modules (ESM), which uses the import keyword instead of require. Starting from Node 16 or 18, you can run native ESM and no longer need to use require. This brings benefits like top-level await for using asynchronous functions without wrapping them in an async function. Let's structure our project by creating a new folder called Orama and initializing a new project. I recommend using pnpm, which has a local cache to avoid downloading the same package multiple times.
Feel free to write in the chat if you're not. Just for me to know because we will be using them and that's maybe a good opportunity for everyone to start switching from CommonJS to ECMAScript modules. We will see how in just a second.
Maybe Michelle, there are people that doesn't know what you're saying, what are ES modules and CommonJS modules. Yeah, we'll see in just a second. So, basically, you know that the old days of Node.js there was no module system. So, if you had two different files and you had to import one function from a file into another, you use require. So, you have a function, you require the function. Let's say, const createEqualRequireOrama. So, you import the create function, right? Then, TC39 has standardized the module system under the name of ESM, so ECMAScriptModules. So, instead of requiring stuff, you have to use the import keyword. So, at the beginning, you used to use like Babel, for example, to transpile it. So, to transform it from import into require. Nowadays, you are actually able, starting from node 16 or even 18, I can't really remember, but we will see in just one second, you are able to run native ECMAScriptModules, so you are not required anymore to use require, you can just use import. And this comes with a lot of benefits, like other benefits you get, it's like, yeah, require is not required anymore. That's correct, you have other benefits, such as top level await. So, in order to use asynchronous functions, you don't have to wrap them in, if you want to use await, you don't have to wrap it inside an async function, you can just use it in your normal codebase as a top level await. Then, of course, inside synchronous function, you won't be able still to run awaitable code, but we will see that in just one second.
So, first thing I'd like to do, this is how we structure our project. So, basically, I will ask you to create a new folder called Orama. Enter the folder, initialize a new project. I'm currently using pnpm. If you haven't used that yet, I highly recommend you using it, because it's really well made, and you can install it and I will type the command in the chat. Pnpm. I got the autocorrect that it's against me right now. If you don't have pnpm installed on your machine, I highly recommend this. And the main advantage is for pnpm against YARN or npm is that it basically keeps a cache, a local cache, so you don't download the same package twice. So, if you had multiple projects using, let's say, Fastify or using Orama, you're not downloading it twice. You're always using the same version that you have saved on your machine. So, I highly recommend using this because it saves a lot of time and makes your development process faster.
4. Using pnpm for Monorepos
I highly recommend using pnpm for monorepos. It has a global cache for the whole machine and allows multiple projects to share the same installation. Follow these steps: pnpm init, add Orama, and enable the ECMAScript module system by adding type module in the package.json file.
So, I highly recommend using this because it saves a lot of time and makes your development process faster. So, after you do that... Sorry, stupid question. Is that not the same as yarn-menus-prefer-offline? I haven't used that, but I guess it is, yes. It's just enabled by default, which makes sense. Yeah, it's enabled by default. Exactly, exactly. But if you're using like a monorepo, for example, pnpm works way better in my opinion. I'm very comfortable with it when it comes for monorepos. If you see the Orama code base, it's actually, it uses pnpm for that reason.
Sorry, pnpm has a global cache for the whole machine, or for each repo? Actually both, but if you have different projects and one project uses Orama, and another project uses Orama, you install it only once, and you basically link it for both projects. So there is one unique installation for both, if the version is the same, of course.
So, I will give you five minutes to follow these steps. pnpm init will initialize a new project with a package.json file, You will add Orama, so, etorama, slash Orama, and then you have to go into the package.json file and add one line. You have to add type module. This is going to enable the module, the ECMAScript module system for Orama, and it's gonna blow your mind because it's going to be very, very fun. If you haven't used it already, that's my favorite way for writing JavaScript nowadays. All right. I guess we can proceed.
5. Preparing the Dataset
I've prepared a dataset. Download the big json file from the provided URL, name it flipkart.json, and place it in the root of your project. You have five minutes for this task.
Great. So, I've prepared a dataset. So, this is a big json file that basically is going to be the data source for our database. I would ask you to download it from the URL you see on the slide, and position it in the root of your project. It's gonna be called flipkart.json, that's the name of the file. And please download it and put it in your root for your projects. I will give you five minutes for that because there might be some network issues, depending on where you live etc. Like, I live far from Milan in Italy. We have very bad internet conditions. So I might be the one taking a bit more time for downloading stuff.
6. Creating a New Orama Schema
To create a new Orama schema, create an index.js file and define the properties: product name, description, and price with their respective types. Use import instead of require. Check the file with node.js. Use the create method to define the schema and its properties. You can use nested objects within the schema. Nested objects are properly nested in Orama.
Now you have to create a new Orama schema. So now you have to create an index.js file, and you have to create a new Orama schema to support the following properties. Product name, description, price. And of course every property has a different type. So name and description will be of type string. And the price would be of type number. So these are JavaScript numbers. You can run the file with node.js, node index.js to check that it's working properly. Remember, you're not using ECMAScript modules, so you can use import instead of require.
I will give you, I will give you, yeah, 10 minutes. It's maybe a bit too much, so I will check in less than five minutes. If you got any doubts on how it works, you should see my my pointer on the screen, right? You can go at docs.aramasrch.com, slash usage, slash create. And this will show you how to properly create a schema. A schema.
Maybe another question? Do we have to create separate schema for different items? For example, here I can see directly schema and word, but for the product, should I create a product and under product all the attributes of product or is it flat schema? The requirement is to create a schema. So you have to use the create method. Inside the create method it accepts an object and there is a property called schema as you can see in the slide. Inside the schema property, you should create three more properties, name, description, and price and assign them the correct types.
So we okay, so I don't know on the two orama, so I'm just discovering it. Yeah, no problem. If we have to organize a little bit of the schema, will it be just one flat object for the whole schema or you can split it in? Yeah, you can also use nested objects, but it must be at the end of the day just one object, so you cannot have multiple schemas. You can, yeah, you need to have one schema, but you can have nested properties. If you open this URL that you see here in magenta, you will see the default documentation for this method.
Okay, thank you. No problem. I have a question, are nested objects properly nested, like for example in Elastic, if you write a normal object, it doesn't link all the fields together, it treats it like separate... Like flat. Yeah, like flat. Okay, now, this is treated as nested. There will be cases where you want to assess nested properties, so you just use, let's say, I don't know, property.anotherproperty as a property name.
7. Creating Schema and Importing Data
The solution for this problem is to have nested properties in the schema. To feed the database, you can use insert or insert multiple methods. Import the data from flipkart.json using ES modules and assert type json. Console.log the data to see the schema and the data mismatch.
But it's actually nested, so you can have multiple nested properties and it's not gonna be a problem. Okay. So, this is the solution I got in mind for this problem. Let me know if you did something similar, and if you did something different, I would love to learn what. And I would also ask you to make it similar to what you see on the screen right now, because it's gonna be useful for what we're gonna do in the next step.
Good, can we proceed? Well, I just put some product in prefix to keep the schema flat. But I was wondering if I have something else that is called name, I will have conflict. So, I have just put product name, product price, product description. That's essentially the same. Yeah, yeah. No problem. It maybe can be beneficial for you, for your next step. We will see why.
So, next step, 10 minutes. Now, to feed your database, you can use two different methods. You can use insert or insert multiple. So, the difference, of course, is that insert is an async function that takes the database as a first argument and a single document as a second argument. Insert multiple will basically take the database as a first argument and an array of documents as a second argument. So, at this point, you will have to import the data from flipkart.json. And if you never used ES modules, this is how you do it. So, look at line two for this image. Import data from flipkart.json and you have to put assert type json. So, that way node.js will know that it has to treat this data as json. So, it's basically, this is not going to be a string anymore. This is going to be a JavaScript object. So, it's basically converting the json into a JavaScript object. You will notice a problem though. One thing I'd recommend you to do is to console.log the data. Because you will notice that you created a schema with name, price, description. But the data has product name and product price instead of name and price only.
8. Inserting Data and Querying Collection Size
If you try to insert data as is, it's going to work, but you won't be able to search through it. The flipcart.json file has the data in a different format than the schema, so you need to convert the documents before inserting them. Orama is a semi-schema-less database, allowing you to insert any kind of data, but you can only search through the data specified in the schema. To query the number of items in the collection, you can console log the DB constant and check the doc size property or use the count method.
So, if you try to insert data as is, it's going to work, but you won't be able to search through it. I will let you know why that is in the next step. So, try to fit the database and try to choose if you want to insert or insert multiple to insert the data into the database. I will let you know what I choose and why in the next step.
You can also try to run the code, of course, with node index.js to make sure it works correctly. So, I tried that. I don't have any feedback in the console regarding the insertion of the data, whether it has been inserted or not. Is there a way to have a variables model or something like that? No, that's correct. You're not supposed to have any problem with the data insertion. The only problem we will notice is that the next step, if you try to search for data, you're not going to find anything. And that's because basically the flipcart.json file has the data in the format of, for example, product underscore name and product underscore price. But we created a schema as name and price. So you have to convert all the documents into that format before inserting them into the database. So that's that's part of the task. Now that you brought this on, maybe it's worth that I explain why. So basically, it's a semi-schema-less database, meaning that it allows you, as you notice, it allows you to insert the data as you prefer. So you can insert any kind of data you want. But it will only allow you to search through the data that is part of the schema. So the documents can have, let's say, rating, for example, but you won't be able to filter by rating if you don't specify that inside the schema. So that's why I wanted you to test this.
Michaela, is there a method to query the number of items in the collection? Or an approximation? So we have... give me one second, I'm going to tell you how to do this. If you try to console log the DB, so the DB constant, you should have a doc size property that tells you how many documents you have. Should be 20,000. There is also the count, I guess. Or count, yeah, yeah. Sorry. Oh, and count returns are promised. Crazy. Okay.
9. Using Count and Insert Functions
You can import count and use it to obtain 40,000 entries from the JSON. However, the search function will not work. The count function retrieves the size property of the divi. You can use the count function or check the chat for more information. My personal solution involves using data.map to rename properties before inserting them into the database. When dealing with a large number of documents, use the insert multiple function to prevent the event loop from freezing. For a smaller number of documents, the import function can be used.
Yeah. You can import count I'm putting in the chat, and use it like, oh wait, oh, I hate this how to correct. Oh, okay, Michelle one question, I just inserted the JSON, and also the modified JSON with the parsing. Not with them but passing with the mapping to the correct property names. In both cases, I obtained 40,000 entries. That's correct. Right. It's just the search, which is not going to work. Yes, exactly. Okay. Correct.
You said there's, I mean the count function, sure, but you said there's a property of the divi that is dot size? Should be dot size. I can't remember by memory, but there should be one. I do recommend you to use the count function anyway. I put it in the chat. Yeah, okay, thanks. No problem. So, we can proceed.
This is my personal solution. So, I basically just use data dot map and I rename the properties that I want before inserting them into the database. That's functional code. Yeah, pretty functional. I get your joke because we discussed it at the Node Congress. The reason why I recommend you to use insert multiple when there are many, you know, if there are like, I don't know, 10 documents or 100 documents, then you can just use the insert method. But the insert multiple will basically create some batches. So we'll basically insert by default 500 documents at a time. We'll unblock the event loop and then we will insert another 500 documents. This is pretty important because this will avoid the event loop, will prevent the event loop by freezing. So it's pretty important that you use the insert multiple function if you have to insert multiple documents. If you have one document through, I don't know, two documents or three, you can just use the import function.
10. Performing a Search and Limiting Properties
Now it's time to perform a search. Import the search function from Orama and experiment with different search terms. The default configuration searches all properties, but you can limit the search to specific properties using the properties property. Currently, only text search is supported, but you can convert numbers to strings for searching. Feel free to refer to the documentation for more information.
This is pretty important especially if you run on the front end, so if you have a browser running this, this is pretty important.
So now it's time to perform search. You now may want to import the search function from Orama slash Orama and you can try experimenting with different search terms like Chevrolet Camaro, white shoes, and you can also try pagination by using limit and offset search properties.
So by default this is the configuration you're gonna find by default in Orama. The only mandatory property is term so this is the search query. If you search like Chevrolet Camaro for example, it's gonna go through all the records and find the one containing both Chevrolet and Camaro and by default it limits the results by 10 and the offset is 0.
So you got 10 minutes starting from now to experiment with different search methods. Sorry I said search methods but I meant search terms actually and if you got any any doubts on how it works this is the documentation link. So docs.oramasearch.com feel free to go there and experiment.
Okay it will search the term in all of name description and price? Yeah exactly it will search in all the properties that you stated inside the schema. You can limit which property you want to search in by using the properties property. It assets either an array of properties like I don't know just search into title so you're gonna pass basically properties I'm gonna write to you in chat. So if you only put properties an array containing only title is gonna search into title property only. Of course you can add more properties or use by default it's searching through everything. Good question.
This is crazy fast. That's good to hear. Can you search for numbers or is it only like text is only text search? No. Right now it's only text. You can either convert your number into strings then it's gonna work. Otherwise, you will see later you can filter by number. Okay, yeah. That works. Okay. Awesome. I will give you five more minutes so you can experiment more. Also, feel free to go to the documentation and see what other search properties we have. So by default the search is not case sensitive. No, it's never case sensitive actually. Oh, I tried to restrict the search to the price property.
11. Understanding Orama's Search Algorithm
The score in Orama's search algorithm, BM25, is based on the frequency of search terms in documents. It prevents spammy results and provides accurate results. When searching for numbers, the search term is split into separate terms. The dot is a special character that splits terms, but you can customize this behavior by implementing your own tokenizing function.
It's not possible. Yeah, because it's a number. And how do you filter? What is the function? I will show you in the next slide.
Yeah, yeah, yeah. It's the next exercise. Maybe we can already go to the next exercise. Just one question, please. What does the score mean? Is it the occurrence of the term within the document? The score? Okay. So we use an algorithm called, and I'm going to write it in the chat, BM25. This is the same algorithm used by Elastic, for example. And it's basically takes in consideration the search term. So let's say you're searching for white shoes, for example. And it basically knows every single document that contains the term white and the term shoes. And based on the frequency of these terms inside all the documents, it assigns a score to the document that you're searching through. So the higher the score, the better the results. Okay. And it also prevents, like, if you are in e-commerce, for example, and you've write, I don't know, you got, you want to sell your shoes very fast, you might create a description writing shoes 100 times, right? It actually, if you use bm25, this is going to penalize your result. So it keeps the balance between spammy, non-spammy stuff, and it tends to give you the most accurate result possible.
Yeah, I have one question. I have searched a query like this, and I don't know how to interpret the results. What do you get? So I get this, and I mean, there is no 19.95 there, so I'm wondering how does the search work? Yeah, so it basically splits the search term in your cases 19.95 as a string, it splits it in 19 and 95 as two separate search terms. So you see that in your description you have Adreno 1905, so 19 is the prefix of 1905. So you're going to find this result as well, even though it's not what you're looking for, this is one trade-off you have to make when searching for numbers for example. Is there any, so the dot is a special character? Yes, you get split on the dot, you can customize this behavior. Okay. If you want and I can give you the documentation for that in just one second. Let me find it. You can write your own tokenizer and I'm putting this into the chat. Okay. So maybe, you know, maybe this is not the correct behavior for you, so you can implement your own tokenizing function and make it work as you prefer.
12. Filtering Search Results and Data Persistence
You can filter the search results by prices within a specific range or by products where the price is greater than or less than a specific number. The documentation provides guidance on how to use filters for searching. Orama does not support arrays in the schema, so you can join them into a single string. The database is currently in-memory, but there is a plugin called Data Persistence that allows you to export and load data from disk. Complex queries with OR or AND statements are not supported at the moment.
Alright, I would proceed to the next slide. This is, for example, my whole code. So, term, sofa cover, for example. And I get the results for all the sofa covers. Now, given that Lucia was wondering how do we filter by data, this is how you do it. So I would like you to experiment with the search function and make sure it returns, first of all, prices within a specific range or all the results, all the products where the price is greater than a specific number, or less than a specific number. And I'm going to leave just the documentation. So you can... I will put this in the chat. You should be able to do that by following the documentation only. So you can search, you know, where the price is greater than, greater than, or equal, less than, less than, equal, equal to, or between to dates. So following the documentation for filters... Does it support arrays in the schema? No, it does not support arrays. So if you have arrays, you can either join them into a single string. Yeah, that's typically what you want to do. Maybe one question. Actually, this is in-memory database. Is there any other option than in-memory? What do you mean? I'm not sure. For example, if I stop inserting and I rerun the script, I don't have any data in the database. So it's just in-memory so far. Yes, it's all in-memory. So if you rerun the script, you have to re-index the data. That's correct. There is a plugin that we have and I'm going to send you it in the chat that it's called Data Persistence. So you can basically export the data, save it to a file, and re-index the data so you don't have, sorry, basically load the data from disk instead of re-indexing everything from scratch every time. Okay. Okay, thank you. Can you do like complex queries? Like I don't know, like an OR statement in the WHERE or is it only like, yeah. So right now we don't support either OR or AND. It's all in AND, basically.
13. Orama vs Postgres or MySQL and Workflow
We will support more advanced queries in the future. Orama is a better alternative when you need a full-text search engine and want to scale easily. You can run Orama on CDNs for cheap and easy maintenance. You can load your data to the CDN and let Orama run there. It's recommended to run Orama on a CDN directly for better performance. We will create a global deployment platform to make it easy to run Orama on CDNs. Updating the stock once a day, during the night, is sufficient.
We will support in the future, though. We are waiting to see if there's interest in doing that. Nice, thanks. Because the biggest problem we notice is that, you know, this is a search engine and we don't want people to use this as a leader database. So we think that having like, you know, Postgres or MySQL, it's a better alternative. And that's why we are not pushing a lot into the solutions where you can, let's say, join stuff or, you know, run Boolean query, hardcore Boolean queries. But yeah, having one simple end or one simple or query that would be supported in the future.
Nice, Thanks. Michael, now that you mention it, when would I prefer Orama over Postgres or MySQL? Oh, they are very different. If you need like, you know, a strong database with frequent writes and frequent reads, like you definitely want Postgres. Like I mean, at Orama we are using Postgres as a leader database ourselves because it's, you know, if you want to keep like user data, for example, it's perfect and scales well. I think you should use Orama only in the case where, first of all, you want to run a full-text search for your product. First of all, Postgres works great but it's not a full-text search engine. It has extension to work as such but it's not optimized to do that. And when you go scaling, for example, Postgres is gonna be a bit more costly to maintain and more difficult while Orama is basically either running on your browser so you don't have maintenance to run or you can run it on CDNs edge networks and make it very, very cheap and easy to maintain. That's what we push for.
So let me ask, how can I envision my workflow? So I have a marketplace, right? I have my database with all my data and then what I'm going to do I'm going to load that data to the CDN and let Orama run there on the data? Yes, yes. It run directly on the CDN, correct. And in the browser, it will only work in the browser if I'm sending all the data to the user, right? Yeah, the problem is that the JSON you downloaded it's 15 megabytes, so might be a bit too much for the browser. So I recommend you to run it on a CDN directly which is faster. This is part of what we're gonna create as a company. Of course we will be creating global deployment platform where you can run search on CDN. So deploying Orama on a CDN is not super easy. So we will make this automatic and easy for you. Otherwise if you have the skills for example you could do it already with the open source version and you can basically create I don't know an AWS Lambda to run at the edge or Cloudflare workers, etc. You can choose what you prefer and use them to create low latency and very fast search experiences on CDNs directly.
Okay. And now last question. So imagine I am working for a marketplace for some company that is selling stuff and I know they want to update the stock it's enough to update it only once every day right? Once during the night. Yep.
14. Using Lambda for JSON Search
I have a solution that involves using a Lambda function to load a JSON file and perform the search. If you're interested, join our Slack channel for more information and support.
What I would do is I have my Lambda, right? So I would the most stupid approach ever. This Lambda also has... I will upload the Lambda together with a JSON and then I will let the Lambda load the JSON and search. No, that would be too slow, right? Because I have to load it first. Yeah, exactly. That's one of the secret parts that allows us to run on the cloud networks. If you're interested in that I got nothing to sell right now meaning I'm not really interested in selling but I'm very interested in having feedback. So if you join our Slack channel I'm putting the link here in the chat. You will find a slide at the end of the workshop anyway. So if you join the Slack channel here make sure to write me and I will put you in contact with my colleagues so we can help you doing this.
15. Creating Fastify Server for Orama Search
There is a solution for the filters. It supports numeric and boolean properties. It is expected to have a more straightforward and exact match, similar to the keywords in Elastic. The task is to create a Fastify server that accepts get requests at the path /search and returns the result of an Orama search for the term query parameter. You can follow the solution to integrate this with a Fastify server. Install Fastify, create and insert data into the database, initialize Fastify, create a route, and perform the search.
Okay, but there is a solution. Yes, there is one, yeah. Okay. And is it in the future for the filters? Now it says it supports numeric and boolean properties? Is it expected to have there something like the keywords in Elastic where you don't need a Faceit search? It should be more straightforward and exact match. Yes. Yeah. Yes, definitely. I got one colleague that it's about to release this today. Okay, nice. You are on point. Absolutely, yes.
All right, I guess we can proceed then. This is my solution, like greater than, less than, and in between. I guess you all had similar solutions for your queries. So we're getting close to the last one. And let me know if you want to do it or if you want to skip directly to the Q&A. And I can only show you the solution, that's your choice.
So the task would be, and it's taking like 30 minutes. So I guess we don't have that much time to install Fastify. Create a Fastify server that accepts get requests at the path slash search. Make sure that the Orama database is loaded before accepting any Fastify requests. For every get request at search, take the term query parameter and return the result of an Orama search for that term. If you agree with that, I will show you the solution. So you can follow the solution and basically see how to integrate this with a Fastify server, how does it sound? Agreed? All right, so I'll be shunning. We are saving some time so we can have a little Q&A and I guess that's better for everyone. So this is how I typically do that. First of all, of course you install Fastify and you import it. You create the database, normalize the data, insert the data, then you initialize Fastify and create a route, get the term, if the term is missing you just reply with an error and otherwise you perform search and send the results. If you want to copy this code I will give you 10 minutes starting from now to copy the code and try to run it.
16. Deploying and Future Plans
If you want to copy this code I will give you 10 minutes starting from now. The Lambda part is tricky. I would recommend you to join the Slack channel so we can discuss it and I can show you how we do it. When are you going to make it public? It's gonna be soon because eventually we need some money to run the company. And then it's going to be a paid CDN cloud service, right? Yeah, or the idea would be also you can use your own CDN. Is there any plan to support some algorithm? Yeah, we actually have it. It's a property called tolerance. By default it's zero. But if you say like search term I don't know so far, tolerance to one if you commit one typo it's going to find the result anyway.
If you want to copy this code I will give you 10 minutes starting from now to copy the code and try to run it.
I mean Michel, this is the solution that I don't know how to deploy Yeah, this is the solution. I decided to share the solution because we don't have much time.
Okay, no, no I'm not criticizing you, it's just like this one I understand because I could have multiple servers, for example one in every AWS region or something, I don't know, and then I would restart the server once a night with the new data. That's how I understand.
The Lambda part is tricky. It is. It is. I would recommend you to join the Slack channel so we can discuss it and I can show you how we do it. I can't do it this publicly yet.
It's fine. When are you going to make it public? Yeah, I don't have the roadmap right now, but it's gonna be soon because eventually we need some money to run the company. Yes, makes sense.
And then it's going to be a paid CDN cloud service, right? Yeah, or the idea would be also you can use your own CDN and you will see how, but it's still in the works. We will need some beta users though. So if you join the Slack channel, I'll be glad to put you as one of the beta users. And because I already told this to Lucia, I will put this in the Slack. This is the pull request my colleague made right now for string filters. Just because you asked.
Nice, thanks. No problem. He opened this like six minutes ago. Is there any plan to support some algorithm? Like I used to use SolarCloud, and they have some kind of algorithm like Snowball. I don't remember exactly how it is. But it allows you to search for terms but with a certain precision and to be able to find some documents even if you make a typo, for example.
Yeah, we actually have it. It's a property called tolerance. Okay. By default it's zero. But if you say like search term I don't know so far, tolerance to one if you commit one typo it's going to find the result anyway.
17. Competition and Funding
If you set tolerance to two if you commit two typos it's going to find the result anyway. We have competition in the market, such as Algolia and Maily Search Elastic Search, but we provide a different way of searching to people with different needs. We started this company to solve our own problems and maybe help others with the same problems. We are bootstrapped and currently not running on venture capital.
If you set tolerance to two if you commit two typos it's going to find the result anyway. Of course, it's going to be slower but it's working and it's already supported.
Okay, great. Yeah. Michele, while we're waiting what is your competition? You mean in the market? Yes. I would say mainly I don't know Algolia, Maily Search Elastic Search, it's not really a competitor to me because they do a lot of great job in the log management for example. They're very good at this but still also Algolia and Maily Search are very good at what they do. We just provide a different way of searching to people that has different needs. So I don't like calling them competitors they just have different products and so do we. We have different products that's really it. But they are all good. I mean if you follow my talk at Node Congress I was saying that I actually started this company because I wanted to learn more and of course I had some personal issues with other systems. But it is me. You know, it's my problem not theirs. So I wanted to solve my problems mainly and maybe other people have the same problems. That's really it.
And what is the name of the other one? Is that Alcolia? And the other one you mentioned is Mamesearch? Malesearch. I'm gonna write it. This is also very good. It's reading ERRATHS very fast. I would have never ever guessed this feeling.
Have you been running on venture capital until now? No, not yet. We are bootstrapped. What's the meaning of bootstrapped in this context? We just use our money. So right now, we don't have any NBC for many reasons, but that's it right now. Maybe in the future, but I'm not the person. Our CEO is the person making these kind of decisions, of course, so let's see what he decides. I'm the CTO, so I'm the person that breaks the code rather than the company, which is good. Because I'm... Alright, have you been able to run this code? Have you been able to? Awesome.
QnA
Q&A Session and Community Support
I wanted to have a little Q&A session with you. I would really appreciate if you could leave us a star on GitHub. And if you scan this QR code and go to bit.ly-am-orama you will find all the links to Slack channel, Instagram, Twitter, of course follow us everywhere and join the Slack channel. If you join the Slack channel, we will be able to provide you support. There is an entire community with hundreds of members that can provide support for whatever you need. So I highly recommend you join this. And yeah, before thanking you all I would also, you know, have a kind of Q and A. So if you've got any questions, please go ahead.
Awesome. So I guess that's really it. I wanted to have a little Q&A session with you. I would really appreciate if you could leave us a star on GitHub. That can really help us. And if you scan this QR code and go to bit.ly-am-orama you will find all the links to Slack channel, Instagram, Twitter, of course follow us everywhere and join the Slack channel. This is pretty important. If you join the Slack channel, we will be able to provide you support. There is an entire community with hundreds of members that can provide support for whatever you need. So I highly recommend you join this. And yeah, before thanking you all I would also, you know, have a kind of Q and A. So if you've got any questions, please go ahead.
Open Source, Indexing, and Database Management
Orama is an open-source project available on GitHub. The indexing process involves tokenizing and stemming the input. The data is stored in data structures such as Radix tree, AVL tree, and hashmap. For multiple languages, it is recommended to create separate databases. Updating the database involves re-indexing everything, which is the best approach for JavaScript-based databases.
Is it open source Orama? Yeah, yeah, it is open source. So if you go on github.com slash orama search slash Orama, you will find the code base. I'm putting the link into the description. Putting the link into the chat. You're crazy.
So is a challenge to monetize an open source project? Yeah, hopefully.
How does the indexing work again? What part of the indexing? Yeah, good question. Anything in general, I'm guessing when it comes to like nested objects and then I don't know if because you said you don't flatten them. No, so I don't really know. Yeah, so internally we actually flatten them but we allow you to express your documents by nesting the properties, which is very convenient in many different situations. And also, this is not giving any disadvantage to the end user. You basically can reason of it as if it wasn't flattered, which is good. So the process for indexing, we basically tokenize the input, of course. If you want, we can stem the input. So we apply stemming, we support 26 languages out of the box. So, for example, run, running, runner. They all get stemmed to run. And so we can keep the meaning of a word rather than the word itself, if you want. This is something you can enable. And then we put everything inside a very big Radix tree for strings, avl tree for numbers, and hashmap for booleans. So these are the data structures that are currently involved in the process.
And for languages, I just saw that the DB has a language as a property. So you have usage support 26 languages, but then in a single database you would only expect to have one data with one language, right? Yeah, if you need like more languages, I would recommend you to create more databases for different languages. So one database, let's say for English, one for Spanish, one for Italian, one for French, so you can have multiple databases. The biggest difference between us and Elastic, for example, in that is that Elastic allows you to put everything inside one big instance, but these would grow in size, right? So we have to maintain large servers, etc. We do recommend you to create different databases so that they are cheaper, smaller, faster, and overall better.
How do you manage when you have to update your database? Is there any tips for that? Yeah, so there is the update function and the remove function. You can update or remove documents, but I would recommend to re-index everything every time, which sounds expensive in terms of time, but actually, that's the best way to deal with JavaScript-based databases basically. So I recommend you, if you have a lot of changes, if your data changes every second, use Elastic or Algolia or MailiSearch. They are amazing at this.
Optimization and Conclusion
Orama is optimized for frequent reads but infrequent writes. Join the Slack channel for further questions and contact. Thank you all for joining.
Orama is optimized for frequent reads but unfrequent writes. Yes. Okay. Makes sense. Thank you. No problem.
So if you don't have any other questions, I guess that's all. Please join the Slack channel so that we can keep in contact. And if you have any other questions, I'll be there for answering them. Thank you all for joining.
Comments