X-Team has a growing number of AI experts among its midst, and Senior Fullstack Engineer Wojciech Jaszczak is one of them. He builds AI projects and recently launched childbook.ai, where the power of AI is used to create unique children's books.
In this interview, we discuss how he came up with the idea, what technologies he used to build everything out, and what plans he has for the product's future.
First things first. What's childbook.ai? Give me the elevator pitch.
With the help of childbook.ai, anyone can create personalized children's stories. Give a short description of the story and its characters, and our AI will create a personalized and unique story with beautiful illustrations and voice-overs. It takes less than five minutes.
Sounds amazing! How did you come up with the idea?
We came up with the idea when my wife, a self-taught frontend developer, and I took part in an AI hackathon. We knew that AI could generate both text and images, and wanted to create something useful and fun. Personalized children's stories sounded like a great idea.
We had previously built a product using Stable Diffusion and knew it could generate high-quality images if instructed properly. We also knew that GPT-3 could generate text. So we combined both to see how far we could get.
The MVP was built in just a weekend and was fully functional. We've been working on it ever since, and the quality of its stories and illustrations have greatly improved already.
So what technologies does it currently run on?
We were able to deliver the MVP in two days because of the technologies we chose:
- DALL-E for the illustrations.
- Stable Diffusion to post-process the illustrations.
- GPT-3 to generate the text.
- tRPC to quickly iterate on both backend and front. For those unfamiliar, tRPC lets you write typed backend that's used like a TanStack
- Redis with bullMQ to queue jobs.
- PostgreSQL to store all the stories.
- AWS S3 to store the images.
- Stripe to handle payments.
- Tailwind CSS and React Headless UI for components like modals and tooltips. Today I'd probably opt for shadcn/ui or Mantine instead.
What was the hardest part about building it?
The hardest part was making sure the illustrations looked good. We had to experiment a lot with the parameters of Stable Diffusion and DALL-E. We also had to experiment with the text generation to make the stories coherent and interesting.
AI is very unpredictable and prompts that give very good results for some stories can give very bad results for others. We had to experiment a lot to find the right prompts that would work for most stories.
We still have a lot of work ahead of us to improve the quality of the stories and illustrations. We are working on it every day. When we started there was no API for ChatGPT. We had to work with GPT-3 which had a lot of limitations. Now, we have access to GPT-4 which is huge step forward in terms of consistency. It actually returns what you'd expect now.
Tech-wise, because we use TypeScript across the whole stack, we didn't have many bugs and could iterate really quickly. It was a very pleasant experience that I'd recommend to anyone starting a new project.
How are you currently marketing it?
Nothing, at the moment. We have around a hundred and fifty daily users from organic traffic. We're currently building out a few big features. Once those are done, we'll market the product and aim for more users.
But, in terms of usage, we've already generated over 2,500 books. We have a good number of subscribers and grow at around 10% week-on-week.
Tell me about those big features.
Firstly, we're working on a DreamBooth implementation. DreamBooth allows you to train AI from a set of photos, which it then uses in generated illustrations. So our subscribers could upload photos of their friends or family and create a story with them illustrated. A perfect gift for any occasion.
We also want to give our subscribers the ability to generate physical copies of their books. We already get the illustrations in high-quality 4K resolution, a prerequisite for print. So once we get the formatting done, we want to make this work too.
And we want our books to have more pages. Currently it's limited to twelve pages to ensure the highest quality. We want to increase that limit, but without sacrificing quality. So we're working on that, too.
Keeping quality consistent across many pages must be a difficult task. How do you currently do it?
We generate the base images with DALL-E (SDXL). Then we post-process it twice to align the styling and the visuals. We do this because we found that entering too much into the prompt causes the image to lose details, so we just generate the image a few times instead.
What's your end goal for childbook.ai? The dream scenario?
There are many ways we can see this product grow. I can image a future where childbook.ai is synonymous for personalized children's stories. I can also see a future where we expand into fantasy stories (as just one example).
We could also create more curated stories, where AI is only the copilot instead of the main driver. It'd be a way to release the creativity of people who may not have the artistic skills, but who have the imagination to create something beautiful.
If you enjoyed this article and are interested in how your company could benefit from AI, don't hesitate to contact X-Team. We have several software engineers like Wojciech who are deeply experienced with AI, and who would love to build awesome software to help you reach your business goals.