Automated vocabulary mining

Upload content.
Get a deck.
Start studying.

Drop any PDF, video, or audio file. Saikutsu extracts vocabulary, builds cloze cards from real sentences, and schedules reviews with FSRS. Minutes, not hours.

採掘 — 日本語の森 ep.42
Card 47 of 312 | Deck: 日本語の森

青豆は、その________を黙って見つめていた。

Aomame silently gazed at that ________.

FSRS v5 — Good: 4.2d ███████▁▁▁ 70% retention

The problem

Mining shouldn't take longer than reading.

Look up a word, copy the sentence, format the card, add it to Anki. Repeat 50 times. By then, you've forgotten what you were reading.

30–45 min

spent per session creating cards manually. Most people give up before finishing.

5 tools

PDF reader, dictionary, spreadsheet, Anki, tokenizer. For podcasts, add even more.

0 flow

Context switching between reading and card creation destroys immersion entirely.

Features

Everything you need to mine smarter.

Any input, one pipeline

PDF, video, or audio — novels, podcasts, anime, lectures. We extract or transcribe, tokenize, and build your deck.

FSRS v5 scheduling

State-of-the-art spaced repetition built in. 20–40% fewer reviews for the same retention. No Anki setup required.

Word/definition + i+1 cards

Choose your card type. Word/definition for straight vocabulary. i+1 for sentences with exactly one unknown word — the fastest path to comprehension.

Smart deduplication

Already know a word? We skip it. Every new deck checks your entire collection so you never study the same word twice.

Frequency ranking

Words ranked by document frequency and language-wide frequency. Learn high-value words first.

Deep Anki integration

Import existing decks — we skip words you know. Export new decks as .apkg. No duplicates, ever.

How it works

Three steps. That's it.

01

Upload your content

Drag and drop any PDF, video, or audio file. Novels, podcast episodes, anime, lectures. Japanese, Spanish, French, German, Italian, and Portuguese.

podcast_ep42.mp3 18.7 MB
02

We do the mining

Transcription, tokenization, dictionary lookups, frequency analysis, i+1 sentence detection, deduplication against your existing cards. Usually under 5 minutes.

Audio transcribed
Tokenized — 312 words (47 already known, skipped)
Found 83 i+1 sentences
Building cards...
03

Start studying

Your deck is ready. Review in-app with FSRS scheduling, or export to Anki. Learning in minutes instead of hours.

312 cards ready
Start Review →

Pricing

Free to start. Upgrade when you need to.

Free

$0/mo

  • 30 reviews / day
  • 3 uploads / month
  • 5 decks
  • 1 Anki export / month
  • Anki deck import
Get Started

Pro

$10/mo

or $60/yr

  • Unlimited reviews
  • Unlimited uploads
  • Unlimited decks
  • Unlimited exports
Start 7-Day Trial

FAQ

Common questions.

Which languages are supported? +

Japanese is our primary focus, with full tokenization and dictionary support. Spanish, French, German, Italian, and Portuguese are supported with stemming-based tokenization. More coming.

I already use Anki. Why would I switch? +

You don't have to. Import your existing decks and we'll skip words you already know. Export new decks as .apkg. No duplicates. Use both tools together.

What's FSRS? +

Free Spaced Repetition Scheduler — a modern, open-source algorithm more efficient than SM-2. It learns your memory patterns and optimizes review timing. 20–40% fewer reviews for the same retention.

What formats are supported? +

Video: MP4, MKV, WebM, MOV. Audio: MP3, M4A, WAV, OGG, FLAC. Documents: PDF with selectable text. Files over 25MB are automatically chunked for transcription.

What's an i+1 sentence? +

A sentence where you know every word except one. It's the ideal context for learning — enough comprehension to guess meaning from context, with exactly one new word to acquire. We find these automatically across your content.

What happens with words I already know? +

We check every word against your entire card collection across all decks. Known words are skipped automatically. No duplicates, no wasted reviews.

How accurate is the tokenization? +

Japanese uses MeCab-compatible tokenization with dictionary form extraction. You get clean vocabulary words, not conjugated forms. European languages use stemming.

Can I use scanned PDFs? +

Not yet. We need digital text to extract vocabulary. If your PDF has selectable text, you're good. Alternatively, record the audio and upload that.

Found a bug? +

Email kamdyn@saikutsu.com or visit our contact page. We're a small team building this because we use it ourselves.

Ready to automate your mining?

Upload your first file and see the difference. Free to start, no credit card required.