JavaScript-based full-text search with Orama everywhere

Bookmark
Project website

In this workshop, we will see how to adopt Orama, a powerful full-text search engine written entirely in JavaScript, to make search available wherever JavaScript runs. We will learn when, how, and why deploying it on a serverless function could be a great idea, and when it would be better to keep it directly on the browser. Forget APIs, complex configurations, etc: Orama will make it easy to integrate search on projects of any scale.



Transcription


So I guess we can start. Before we start, but I need to introduce myself. I guess I'm Michele and I am the co founder of Orama search and I am the author and lead maintainer of Orama which is an open source library that we will see in just a second. Before we start, I want you to know that this is a learn as you type workshop. So now it's the time for you to open your Visual Studio code IntelliJ. Whatever ID you use, this is the time for for opening it. And I'd also like to ask if you can keep your webcam open. Yeah, thank you. Thank you for the people who is opening it. I think it's better and I can see like if you're in trouble. Look at the faces eventually, but I'm sure that is not going to be the case. So feel free to unmute yourself and ask questions if you prefer or write them in the chat again. So let me admit more people. So there is just one requirement for this workshop and it's no greater than or equal to 16. So this is the actually the actual only requirement I would prompt you to update to at least this version if you're running on node 14. And if you're running on node 12, please abandon it as soon as possible because it's not maintained anymore and go at least to node 16. Yesterday node 20 was released just for you to know. I spotted a couple of little bugs, but it's so promising that I would also prompt you as soon as the workshop ends to test it out. There was a lot of work involved in my former team at NearForm on this release and I'm very proud of them. They made an amazing job. So. What is Orama? So. Orama is a full text search engine written entirely in typescript. At the beginning it was called Lira. Then we had a couple of problems with a Google codec which shared the name there were many companies using Lira as name so when we founded a company to support the open source development. Um, Alberto no problem if you leave in advance, don't worry, it's no worries. You were at the Node Congress, right? Sure, I went to talk to you. Yeah, yeah, yeah, I remember you. Yeah, that's good to meet you again. Yeah, and OK. Yeah, I was telling we eventually decided to found a company to support the open source development of this library and we opted for the name Orama, which in Greek means to see and you know, having a full text search engine means searching through things and seeing stuff. It's pretty important, I guess for for this task. So the important thing to notice if you haven't followed the talk at Node Congress is that. javascript can be very, very fast and I will let you test this development on your machine in just one second. So the purpose for the workshop is to install Orama. It's to test it to create a new database search through it and by the end of the talk create a fastify server to query the database and I will show you how no problem. So as I was saying, this is an learn as a type workshop, so you will be able to follow simple steps to come to a solution for the problem. Before we start, please if you want to go on GitHub and start Orama, this is helping us a lot. We were about to make a new release and maybe we will do while. While doing this workshop, because I got other colleagues working on it right now, but it's pretty stable as for now, so it's not a real problem. Before we start, you can either tell him in the chat or unmute yourself or or replying. How many of you are confident with ECMAScript modules? OK, feel free to write in the chat if you're not. Just for me to know because we will be using them and that's maybe a good opportunity for everyone to start switching from common JS to ECMAScript modules. We will see how in just a second. Maybe there are people that doesn't know what you're saying. What are ES modules and common JS modules? Yeah, yeah, we'll see in just a second. So basically you know that the old days of Node JS there was no module systems. So if you had two different files and you had to import one function from a file into another you use to use require. So you have a function you require the function let's say const create equal require Orama so you import the create function right then tc39 has standard dyes the module system under the name of ESM so ECMAScript modules. So instead of requiring stuff you have to use the import keyword. So at the beginning you used to use like Babel for example to transpile it. So to transform it from import into require. Nowadays you are actually able starting from node 16 or or even 18th. I can't really remember, but we will see in just one second. You are able to run native ECMAScript modules so you're not required anymore to use require. You can just use imports and this comes with a lot of benefits like. Other benefits you get. It's like require is not required anymore. That's correct. Yeah, you have other benefits such as top level await. So in order to use asynchronous functions, you don't have to wrap them instead of. You know if you want to use await, you don't have to wrap it inside an async function. You can just use it in your normal code base as a top level await. Then of course inside synchronous function you won't be able still to run. Sync awaitable code, but we will see that in just one second so. First thing I'd like to do. This is how we structure. Our our project. So basically I will ask you to create a new folder called Orama. Enter the folder initializer. Initialize a new project. I'm currently using PNPM. If you haven't used that yet, I highly recommend using it because it's it's really well made. And you can install it by and I will type the command in the chat and the PM. And PM. I got the autocorrect that it's against me right now. If you don't have PNPM installed on your machine, I highly recommend this and the main advantages for PNPM against the yarn or npm is that it basically keeps a cache local cache so you don't download the same package twice. So if you had multiple projects using let's say fastify or using Orama you're not downloading it twice. You're always using the same version that you have saved on your machine, so I I highly recommend using this because it's it saves a lot of time and makes your development process. Faster, so after you do that, yeah, sorry stupid question. Is that not the same as yarn minus minus prefer offline? Um, I haven't used that, but I guess it is yes. This is enabled by default, which makes it yeah, it's enabled by default. Exactly exactly. But if you're using like a monorepo, for example, PNPM works way better. In my opinion, I'm very comfortable with it when it comes for monorepos. And if you see the Orama codebase, it's actually it uses PNPM for that reason. Sorry, PMP PM PM create has a global cache for the whole machine or for each repo. Actually, both, but OK, you have different projects and one project uses Orama and another project uses Orama you install it only once and you basically link it from for both projects so there is one unique installation for both if the version is the same of course. So. Um, I will give you 5 minutes to follow these steps. PNPM init will initialize a new project with a package dot JSON file. You will add Orama so at Orama slash Orama and then you have to go into the package JSON file and add one line. You have to add type module. This is going to enable the module, the ECMAScript module system for Orama and it's going to blow your mind because it's going to be very, very fun if you haven't used it already. That's my favorite way for. Writing javascript nowadays, right? I guess I guess we can proceed. Great, so I've prepared a data set. So this is a big JSON file that basically is going to be the data source for our database. I would ask you to download it from the URL you see on the slide and position it in the root of your project. It's going to be called flipkart.json. That's the name of the file and please download it and put it in your route for your projects. I will give you 5 minutes for that because there might be some network issues depending on where you live, etc. Like I live far from Milan in Italy. We have very bad Internet conditions, so I might be like the one taking a bit more time for downloading stuff. Now you have to create a new Orama schema. So now you have to create an index.js file. And you have to create a new Orama schema to support the following properties, product name, description, price, and of course every every property has a different type. So name and description will be of type string and the price would be of type number. So these are javascript numbers. You can run the file with node.js node index.js to check that it's working properly and remember you're now using ECMAScript modules so you can use import instead of require. I will give you. I will give you 10 minutes. It's maybe a bit too much, so I will check in in less than 5 minutes. If you got any doubts on how it works, you should see my my pointer on the screen, right? You can go at docs.oramasearch.com slash usage slash create and this will show you how to properly create a schema. Maybe another question. Do we have to create separate schema for different? Items, for example, here I can see directly schema and word, but for the product should I create a product and under product all the attributes of product or is it flat? Flat schema kind of? Yeah, so the requirement is to create a schema, so you have to use the create method. Instead, the create method it accepted an object and there is a property called schema. As you can see in the slide. Inside the schema property you should create three more properties, name, description and price and assign them the correct types. So we OK so. So I don't know under the the the the tool orama so I just discovering it. Yeah, if we have to organize a little bit on the schema, will it be just a one flat object for the word schema or you can split it in? Yeah, you can also use nested objects, but it must be at the end of the day. Just one object so you cannot have multiple schemas. You can. Yeah, you need to have one schema, but you can have nested properties. If you open this URL that you see here in magenta you will see the default documentation for this method. OK, thank you, no problem. I have a question, are nested objects properly nested, like for example in elastic? It usually if you write a normal object. It doesn't link all the fields together. It treats it like. Separate flat, yeah, like flat. OK, no, this is treated as nested. There will be cases where you want to assess nested properties, so you just use let's say I don't know property dot another property as a property name. But it's actually nested so you you can have multiple nested properties and it's not going to be a problem. So this is the solution I got in mind for this problem. Let me know if you did something similar. And if you did something different, I would love to learn what I would also ask you to to make it similar to what you see on the screen right now, because it's going to be useful for what we're going to do in the next step. All good, can we proceed? Or just I just put some product in prefix to keep the schema flat, but I was wondering if I have something else that is called name. I will have conflict, so I have just put product name, product price, product description. But it's essentially the same. Yeah, yeah, no problem. It maybe can be beneficial for you for your next step. We will see why. So next step, 10 minutes. Now to feed your database, you can use two different methods. You can use insert or insert multiple. So the difference of course is that insert is an async function that takes the database as a first argument. And a single document as a second argument. Insert multiple will basically take the database as a first argument and an array of documents as a second argument. So at this point you will have to import the data from flipkart.json. And if you never used ES modules, this is how you do it. So look at line 2 for. For this image. Import data from flipkart.json and you have to put assert type JSON so that way node.js will know that it has to treat this data as JSON. So it's basically. This is not going to be a string anymore. This is going to be a javascript object, so it's basically converting the JSON into javascript object. You will notice a problem though. One thing I'd recommend you to do. Is to console log the data. Because you will notice that you created. You created a schema with name, price, description. But the data has product name and product price instead of name and price only. So if you try to insert data as is, it is going to work, but you won't be able to search through it. And I will let you know why is that in the in the next step. So try to feed the database. And and try to choose if you want to use insert or insert multiple to insert the data into the database. I will let you know what I choose and why in the next step. You can also try to run the code. Of course with node node index dot JS to make sure it works correctly. So I tried that I don't have any feedback in the console regarding the insertion of the data. Whether it has been inserted or not. Yeah, is there a way to have a variables model or something like that? No, that's correct. You're not supposed to have any any problem with the data insertion. The only problem we will notice is that the next step, if you try to search for data, you're not going to find anything. And that's because basically the flip card.json file. Has the data in the format of. The for example product underscore name and product underscore price, but we created a schema as name and price, so you have to convert all the documents into that format before inserting them into the database. OK. So that's that's part of the task. Now, now that you brought this on, maybe it's worth that I explain why. So basically Orama, it's a semi schema less database, meaning that it allows you as you as you notice it allows you to insert the data as you prefer so you can insert any kind of data you want, but it will only allow you to search through the data that is part of the schema. So the documents can have, let's say rating for example, but you won't be able to filter by rating if you don't specify that inside the schema. So that's why I wanted you to test this. Michele, is there a method to query the number of items in the collection? Um, give me one second. So we have give me one second. I'm going to tell you how to do this. If you try to console log the DB. So the DB constant. You should have a doc size property that tells you how many documents you have should be 20,000. There is also the count, I guess, or count, yeah yeah yeah sorry count. Oh, and count returns are promised crazy. Yeah. I can't, you can, you can import count. I'm putting in the chat. And use it like a weight. Oh, I hate this how to correct. Oh OK, we shall go question. I just inserted the Jason and also the modified Jason with the parsing. Not with the parsing with the mapping to the correct property names. In both cases I obtained 40,000 entries. That's correct, right? It's just a search which is not going to work. Yes, exactly OK, correct. You said there's I mean the count function sure, but you said there's a property of the of the DB that is dot size should be dot doc size or something. I can't remember by memory, but there should be one. I do recommend you to use the count function anyway. I put it in the chat. Yeah, OK, thanks no problem. So we can proceed. This is my personal solution. So I basically just use data dot map. And I rename the properties that I want before inserting them into the database. And I just said. Yeah, yeah, pretty functional. I get your joke because we discussed this at the Nobel Congress. The reason why I recommend you to use insert multiple. When there are many, you know, if if there are like, I don't know. 10 documents or 100 documents, then you can just use the insert method, but the insert multiple will basically create some batches, so will basically insert by default 5500 documents at a time will unblock the event loop and then we will insert another 500 documents. This is pretty important because this will will avoid the event loop, will prevent the event loop by freezing so it's pretty important that you use the insert multiple function if you have to insert multiple documents. If you have one document through that, I don't know, two documents or three, you can just use the import function. This is this is pretty important, especially if you run on the on the front end. So if you have a browser running this, this is pretty important. So. Now it's time to perform search. You now may want to import the search function from Orama slash Orama. And you can try experimenting with different search terms like Chevrolet Camaro, white shoes, and you can also try pagination by using limit and offset search properties. So by default this is the configuration you're gonna you're gonna find by default in Orama. The only mandatory property is term, so this is the search query right? You search like Chevrolet Camaro. For example, it's gonna go through all the records and find the one containing both Chevrolet and Camaro. And by default it limits the results by 10. In the offset is zero. So you got 10 minutes starting from now to experiment with different search methods. Sorry I said search methods, but I meant search terms actually. And if you got any any doubts on how it works, this is the documentation link. So docs.oramasearch.com. Feel free to go there and experiment. OK, it will search the term in all of name, description, and price. Yeah, yeah, exactly. It will search in all the properties that you stated inside the schema. You can limit which property you want to search in by using the properties property. It does set either an array of properties, like I don't know, just search into title so you're going to pass basically properties. I'm going to write to you in chat, so if you only put properties an array containing only title, is going to search into title property only. Of course you can add more properties or use by default is searching through everything. Good question. This is crazy fast. That's good to hear. Can you search for numbers or is it only like text is only text search? No, right now it's only text. You can either convert your number into strings and it's going to work. Otherwise you will see later you can filter by number. OK, yeah. That works OK. Awesome. I will give you five more minutes so you can experiment more. Also, feel free to go to the documentation and and see what other search properties we have. So by default, the search is not case sensitive. No, it's never case sensitive. Oh, I tried to restrict the search to the price property. It's not possible. Yeah, because it's a number. Interesting and how do you filter? What is the function? I will show you in the next slide. Yeah, yeah. It's the next exercise. Maybe we can already go to the next exercise. Just one one question, please. What does the score mean? Is it the occurrence of the term within? I mean, what is the score? OK, so we use an algorithm called and I'm going to write it in the chat. BM25. This is the same algorithm used by Elastic for example, and it's basically it basically takes in consideration the search term. So let's say you're searching for white shoes for example. Anna it basically knows every single document that contains the term white and the term shoes and based on the frequency of these terms inside all the documents it assigns a score to the document that you're searching through so the higher the score the better the results. OK, and it also prevents like if you if you are in e-commerce for example, and and you write I don't know you got. You want to sell your shoes very fast. You might create a description writing issues 100 times, right? It actually if you use BM25, this is going to penalize your result, so it's it keeps the balance between spammy non spammy stuff and it tends to give you the most accurate result possible. Yeah, I have one question. I have search a query like this. And I don't know how to interpret the. The results. What do you get? So I get this and I mean there is no 19.95 there, so I'm wondering how does the search work? Yeah, so basically splits the search term in your case is 19.95 as a string. It's bleeped in 19 and 95 as two separate search terms. So you see that in your description you have a day no 1905. So 19 is the prefix of 1905. So you're going to find this result as well, even though it's not what you're looking for. This is one trade off you have to make when searching for numbers, for example. Is there any so the dot is a special character? Yes, you get split on the dot you can customize this behavior. OK, if you want and I can give you the documentation for that in just one second. Let me find it. You can write your own tokenizer and I'm putting this in into the chat. Be crazy, so maybe you know maybe this is not the correct behavior for you, so you can implement your own tokenizing function and make it work as you as you like. Alright, I would proceed to the next slide. This is, for example, my full code. So term sofa cover for example, and I get the results for all the sofa covers. Now. Given that Lucia was was wondering how do we filter by data? This is how you do it. So I would like you to experiment with the search function. And make sure it returns. First of all, all the prices within a specific range. Or all the results, all the products that where the price is greater than greater than a specific number or less than a specific number, and I'm going to leave just the documentation. Uhm? So you can. I will put this in the chat. You should be able to do that by following the documentation only so you can search you know where the price is greater than greater than or equal less than less than equal equal to or between two dates. So following the documentation for filters. Does it support arrays in the in the schema and schema? No, it does not support arrays. So if you if you have arrays, you can either join them into a single string. And yeah, that's that's typically what you want to do. Maybe one question. Actually, this is a in memory database. Is there any any option? Other option? Then in memory. What do you mean? So I'm not sure it's on. For example, if I stop inserting and I will rerun the script, I don't have any any data in the database, so it's just in memory so far. Yes, it's only memory. So if you rerun the script, you have to re index the data. That's correct. There is a plugin that we have and I'm going to send you it in the chat that it's called data persistence so you can basically export the data, save it to a file and re index the data. So you don't have sorry, basically loaded the data from disk instead of re indexing everything from scratch every time. OK, OK, thank you. Can you do like a. Complex queries like I don't know, like an or statement in the in the where or is it only like? Yeah, so right now we don't support either or or and it's all in. In and basically we will support in the future though we are waiting to see if there's interest in doing that. Nice thanks because the biggest problem we we notice is that you know this is a search engine and we don't want people to use this as a leader database so we think that having like you know Osgres or MySQL it's a better alternative. And that's why we are not pushing a lot into into the solutions where you can let's say join stuff or you know run Boolean query hardcore Boolean queries. But yeah, having one simple and or one simple or query that that would be supported in the future. Nice thanks. Michelle, now that you mention it. When would I prefer or am I over Postgres or MySQL? Oh, they are very different. If you need like you know a strong database with frequent writes and reads like you definitely want Postgres. Like I mean at Orama we are using Postgres as a leader database ourselves because it's you know it's it's it's. If you want to keep like user data, for example, it's it's perfect and scales well. I think you should use Orama only in the case where first of all you want to run a full text search for your product first of all Postgres works but it's not a full text search engine. It has extension to work as such, but it's not optimized to do that. And when you go scaling, for example, Postgres is going to be a bit more costly to maintain and more difficult. While Orama is basically either running on your browser so you don't have maintenance to run or you can run it on CDNs, edge networks. And make it very, very cheap and easy to maintain. And that's what we push for. So let me ask how can I envision my my workflow so I'm a I have a marketplace, right? I have my database with all my data. And then what I'm going to do. Is it going to load that data to the CDN and let Orama run there on the data? Yes, yes, it run directly on the CDN, correct? And in the browser, I mean, it will only work in the browser if I'm sending all the all the data to the user, right? Yeah, the problem you know is that like the JSON you downloaded, it's 15 megabyte, so might be a bit too much for the browser. So I recommend you to run it on on a CDN directly, which is faster. This is part of the what we're going to create as a company. Of course we will be creating a global deployment platform where where you can run search on CDN. So deploying Orama on a CDN. It's not super easy. So we will automate make this automatic and easy for you. Otherwise, if you have the skills, for example, you could do it already with the open source version and you can basically create a I don't know, an aws lambda to run run on the edge or cloudflare workers etc. You can choose what you prefer and use them to create low latency and very fast search experiences on CDNs. Directly. OK, OK and I know last question. So imagine I am working for a marketplace right for some company that is selling stuff and I know they want to update the stock. It's enough to update it only once every day, right? Once every night. Yeah, what I would do is. I have my lambda, right? So I would most stupid approach ever. This lambda also has. I will upload the lambda together with a Jason and then I will let the lambda load the Jason and search. No, that would be too slow, right? Because I have to load it first. Yeah, exactly. That's that's one of the. Secret parts that allows us to run on the cloud networks. If you're interested in that, I got nothing to sell right now, meaning I'm not really interested in selling, but I'm very interested in having feedback. So if you join our Slack channel. I'm putting the link here in the chat. You will find a slide at the end of the workshop anyway, so if you join the Slack channel here, make sure to write me and I will put you in contact with my colleagues so we we can help you doing this. OK, but there is a solution. Yes, there is one, yeah. OK. And is it is it in the future for the filters? Now it says it supports numerical Boolean properties. Is it expected to have there something like the keywords in Elastic where you don't need a facet search? It should be more straightforward and exact match. Yes, yeah, yes, definitely. I got one colleague that it's about to release this today. OK, nice. You were on point. Absolutely yes. Alright, I guess we can proceed then. This is my solution. Like greater than less than in between. I guess you all had similar solutions for for your queries. So we're getting close to the last one. This is the biggest one and let me know if you want to do it or if you want to skip directly to the Q&A. And I can also answer your questions in the Q&A and I can only show you the solution that's your choice. So the task would be and it's taking like 30 minutes, so I guess we don't have that much time to install fastify, create a fastify server that accept get request at the path slash search. Make sure that the Orama database is loaded before accepting any fastify requests for every get request at search. Take the term query parameter and return the result of an Orama query for that term. If you agree with that, I will show you the solution so you can follow the solution and and basically see how to integrate this with a fastify server. How does it sound? Agreed alright, so I'll be shining. We're saving some time so we can have a little Q&A and I guess that's better for for everyone. So this is how I typically do that. First of all. Of course you install fastify and you import it. You create the database, normalize the data, insert the data, then you initialize fastify and create a route. Get the term if the term is missing, you just reply with a with an error. And otherwise you perform search and send the results. If you want to copy this code, I will give you 10 minutes starting from now to copy the code and try to run it. I mean, this is the solution that I will know how to deploy. Yeah, this is the solution. I decided to share this solution because we don't have much time. OK, no, no, I'm not criticizing you. It's like this one. I understand because I could have multiple servers. For example, one if in every aws region or something I don't know, and then I would restart the server once a night with the new data and that would be that's how that they understand the lambda part is tricky. But it is. It is. I would recommend you to join the Slack channel so we can discuss it and I can show you how we do it. I can't do it this publicly yet. It's fine. What are you going to make it public? Yeah, I don't have the road map right now, but it's going to be soon because eventually we need you know some money to run the company. Yes, next. Yeah, and then it's going to be a paid CDN cloud service, right? Yeah, or the idea would be also you can use your own CDN and and you will see how but it's still in the works. We will need some beta users though, so if you join the Slack channel, I'll be glad to put you as one of the better users. And because I already told this to Lucia, I will put this in the Slack. This is the pull request my colleague made like right now for string filters. Just because you ask. Nice, thanks, no problem. He opened this like 6 minutes ago. Is there any plan to support some algorithm like I used to to to use a solar or I don't remember. Solar cloud and they have some kind of algorithm like snowball. Snowball, I don't remember exactly how it is. But it it allows you to to search for terms but with a certain precision and to be able to find some documents. Even if you make a typo, for example. Yeah, we actually have it. It's a property called tolerance. OK, by default is zero. But if you say like search term, I don't know so far and tolerance to one. If you commit one typo, it's going to find the result anyway. If you set tolerance to two, if you commit two typos, it's going to find the result anyway. Of course it's going to be slower, but it's it's it's working and it's already supported. OK, great. Michele, while we're waiting, what is your competition? You mean in the market? Yes. Uhm, I would say mainly I don't know. Algolia, Malysearch. Elasticsearch it's not really a competitor to me because they they do a lot of great job in the log management. For example, they're very good at this. But still also Algolia and Malysearch are very good at what they do. We we just provide a different way of searching to people that has different needs so I don't like calling them competitors. They just have different products and so do we. We have different products. That's that's really it. But they are all good, I mean. If you follow my talk at Node Congress, I was saying that I actually started this company because I wanted to learn more and of course I had some of the issues with the. Other systems, but it is me. You know it's my problem, not there, so I wanted to solve my problems mainly and maybe other people has the same problems that's really it. And what is the name of the other one? Is Algolia and the other one you mentioned is Malysearch? Malysearch I'm going to write it. This is also very good. It's reading rust very fast. Maybe. I would have never ever guessed this feeling. Have you been running on venture capital until now? No, not yet. We are bootstrapped. What's the meaning of bootstrap in this context? We just use our money. OK, so right now we don't have any any VC for many reasons, but that's that's it right now. Maybe maybe. In the future, but I'm not the person our CEO is the person making this kind of decisions, of course, so. Let's see what he decides. I'm I'm the CTO, so I'm you know, you know the person that breaks the the code rather than the company, which is good because I. Yeah. Alright, have you been able to run this code? Have you been able to talk? Awesome. Awesome, so I guess that's really it. I wanted to have a little Q&A session with you. I will really appreciate if you could leave us a star on GitHub. That can really help us an. If you scan this QR code and go to bit.ly.amorama you will you will find all the links to I don't know Slack channel, Instagram, Twitter. Of course, follow us everywhere and join the Slack channel. This is pretty important to join the Slack channel. We will be able to provide you support. There is an entire community with hundreds of members that can provide support for whatever you need. I'm sorry, highly recommend you join this. And yeah, before thanking you all, I would also, you know, have a kind of Q&A. So if you got any questions, please go ahead. Is it open source? Orama? Yeah, yeah it is open source. So if you go on github.com. Slash Orama search slash Orama. You will find the code base. I'm putting the link into this. Not the description, but putting the link into the chat. Be crazy. So there's a challenge to monetize an open source project. Yeah, hopefully. How does the indexing work again? What part of the indexing? Yeah, good question. Anything in general? I'm guessing when it comes to like nested objects and then. I don't know because you said you don't flatten them. No, I don't really know. Yeah, so internally we actually flatten them, but we allow you to. To express your documents by nesting the properties. Which is very convenient in many different situations and also this is not giving any disadvantage to the end user so you basically can reason of it as it wasn't flattered which is which is good. So the process for indexing we basically tokenize the input. Of course we if you want we can stem the input so we apply stemming. We support 26 languages out of the box. So for example, run running runner. They all get stemmed to run. So we keep we can keep the meaning of a war rather than the word itself. If you want, and this is something you can enable. And. And then we put everything inside a very big radix tree for strings, AVL tree for numbers and booleans for Boolean sorry and HashMap for Booleans. So these are the data structures that are currently involved in the in the process. And for languages I just saw that the DB has a language as a property, so you have you said you support 26 languages, but then in a single database you would only expect to have one data with one language, right? So yeah, if you need like more languages, I would recommend you to create more databases for different languages. So one database, let's say for English, one for Spanish, one for Italian, one for French so you can have multiple databases that the biggest difference between us and elastic. For example, in that is that elastic allows you to put everything inside one big instance, but these will grow in size right so you have to maintain large servers etc. We do recommend you to create different databases so that they are cheaper, smaller, faster and overall better. That makes sense then, yeah. How do you manage when you have to update your database? Do you have any tips for that? Yeah, so there is the update function and you remove function. You can update or remove documents, but I would recommend to re index everything every time which sounds expensive in terms of time, but actually that's the best way to deal with with javascript based databases. Basically, so I recommend you if you have a lot of changes frequent, like if your data changes every second, use elastic or Algolia or Meili search. They are amazing at this or I might optimize for. Frequent reads, but unfrequent writes. Yes, OK, make sense. Thank you no problem. So if you don't have any other question, I guess that's all. Please join the Slack channel so that we can keep in contact. And if you have any other question, please do so then thank you all for your time and it's been a real
49 min
20 Apr, 2023

Watch more workshops on topic

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career