Automated vocabulary mining
Upload content.
Get a deck.
Export to Anki.
Drop any PDF, video, or audio file. Saikutsu extracts vocabulary, builds cards from real sentences, and exports to Anki. Minutes, not hours.
We're building this in the open with early testers. Beta access is free — we're looking for people who mine regularly and want to help shape the tool.
青豆は、その風景を黙って見つめていた。
scenery; landscape; view
The problem
Mining shouldn't take longer than reading.
Look up a word, copy the sentence, format the card, add it to Anki. Repeat 50 times. By then, you've forgotten what you were reading.
30–45 min
spent per session creating cards manually. Most people give up before finishing.
5 tools
PDF reader, dictionary, spreadsheet, Anki, tokenizer. For podcasts, add even more.
0 flow
Context switching between reading and card creation destroys immersion entirely.
Features
Everything you need to mine smarter.
Any input, one pipeline
PDF, video, or audio — novels, podcasts, anime, lectures. We extract or transcribe, tokenize, and build your deck.
Cards in context
Every card shows the target word bolded inside a real sentence from your content. Flip for the definition and reading. Context-first learning.
All words + i+1 decks
Get two decks: one with every unique word, and one with only i+1 sentences — where you know every word except the target. The fastest path to comprehension.
Smart deduplication
Already know a word? We skip it. Every new deck checks your entire collection so you never create the same card twice.
Frequency ranking
Words ranked by document frequency and language-wide frequency. Learn high-value words first.
Deep Anki integration
Import existing Anki decks — we skip words you already know. Export new decks as .apkg with sentences, readings, and definitions. No duplicates, ever.
Subtitle mining
Search for TV shows and anime, then extract vocabulary directly from subtitle files. Jimaku for Japanese, OpenSubtitles for other languages — build decks from content you're actually watching.
How it works
Three steps. That's it.
Upload your content
Drag and drop any PDF, video, or audio file. Novels, podcast episodes, anime, lectures. Japanese, Spanish, French, German, Italian, and Portuguese.
We do the mining
Transcription, tokenization, dictionary lookups, frequency analysis, i+1 sentence detection, deduplication against your existing cards. Usually under 5 minutes.
Export to Anki
Your deck is ready. Browse cards, edit anything you want, then export as .apkg and import straight into Anki. Mining in minutes instead of hours.
Early access
We're looking for early testers.
Saikutsu is in early beta. We're building this for the immersion learning community, and we want your input on what to build next and how pricing should work.
What beta testers get
- ✓ Free access to everything for the entire beta period. No limits.
- ✓ Direct input on features, priorities, and how pricing gets structured.
- ✓ Founding member pricing — when we launch, beta testers get grandfathered into the best tier.
Already have access? Sign in
Compatibility
Built for Anki users.
Saikutsu handles the mining. Anki handles the reviewing. Your other tools still fit right in.
Made for Anki
Import your existing Anki decks so we know what you've already learned — we'll skip those words when mining new content. When you're done, export your new cards as .apkg and import them straight into Anki.
Cards include the word in context (with the target word bolded), definitions, and readings. Ready to study the moment you import.
Works alongside Yomitan
Yomitan is great for what it does — hover over a word while you're reading, get an instant definition, and mine it on the spot. That's realtime, one-at-a-time mining, and it's perfect for immersion reading.
Saikutsu is for everything else. Drop an entire PDF, a podcast episode, or a movie file and extract hundreds of vocabulary words at once. Batch mining and realtime mining solve different problems — use both.
Works alongside Jimaku
Saikutsu searches Jimaku's subtitle library to find Japanese subtitles for the shows you're watching. Select an episode, and we tokenize the subtitle text, look up definitions, and generate vocabulary cards automatically.
Jimaku handles hosting and indexing Japanese subtitles — we handle turning them into Anki-ready decks. For non-Japanese languages, we pull from OpenSubtitles instead.
FAQ
Common questions.
Which languages are supported? +
We currently support Japanese, Spanish, French, German, Italian, and Portuguese. All languages are available on every plan, including Free. We're actively adding more.
How does tokenization work for different languages? +
For Japanese, we use MeCab-compatible tokenization with full dictionary lookup. For European languages (Spanish, French, German, Italian, Portuguese), we use lemmatization to reduce words to their base dictionary form — so conjugated verbs, plural nouns, and declined adjectives all map back to the root word you actually need to learn. No more creating separate cards for 'hablar,' 'hablando,' and 'hablo.'
I already use Anki. Why would I use this? +
Saikutsu isn't replacing Anki — it's the fastest way to get cards into it. Upload a PDF or video, and we'll extract vocabulary, find example sentences, look up definitions, and export a ready-to-import .apkg. You keep studying in Anki. We just handle the mining.
What formats are supported? +
Video: MP4, MKV, WebM, MOV. Audio: MP3, M4A, WAV, OGG, FLAC. Documents: PDF with selectable text. Files over 25MB are automatically chunked for transcription.
What's an i+1 sentence? +
A sentence where you know every word except one. It's the ideal context for learning — enough comprehension to guess meaning from context, with exactly one new word to acquire. We find these automatically across your content.
What happens with words I already know? +
We check every word against your entire card collection across all decks. Import your existing Anki decks first, and we'll skip those words automatically when mining new content. No duplicates, no wasted cards.
Can I edit cards before exporting? +
Yes. Browse and edit any card in your deck — change definitions, add notes, delete words you don't want. Export when you're satisfied.
Can I use scanned PDFs? +
Not yet. We need digital text to extract vocabulary. If your PDF has selectable text, you're good. We're working hard to get OCR working to support every type of PDF, but the quality is not up to snuff quite yet.
Found a bug? +
Email kamdyn@saikutsu.com or visit our contact page. We're a small team building this because we use it ourselves.
Built with
Saikutsu is built on top of several open-source projects and community-maintained data sources. We're grateful for their work.
Help us build the mining tool you actually want.
We're a small team from the immersion learning community. We built Saikutsu because we needed it ourselves — and now we're looking for early users to help us get it right. Beta access is free.