June 28, 2026

My AI Workflow for Saving Book Notes

My Kindle highlights were collecting digital dust — until I used Python, Claude, and some AI prompts to turn them into study guides.

There’s something deeply satisfying about sinking into an intriguing nonfiction book. I love learning ideas and highlighting passages for entertainment and to become a better person. I love finishing these books feeling like I've absorbed something real.

And then, almost without fail, I love forgetting 80% of it within six months.

I know I'm not alone here. The gap between taking in knowledge and retaining it is one of the most frustrating experiences I’ve had in the self-directed learning world. I read the book, highlighted the good parts, and even thought of some ways I can use this knowledge in my life. Then when someone asks me six months later what I took away from it, all that knowledge has conveniently left my head. I mumble something about "there was a good chapter on estimation" and hope they don't ask more.

So I did what most programmers do. I ruminated for a while, took an angry walk to blow off some steam, and then made something to help solve it. In this case, I built a pipeline that takes my Kindle highlights from raw markup to polished, study-guide-quality notes. The most important part: AI is the bridge between "collecting notes" and "making readable and focused notes" with minimal work and friction on my side.

Here's exactly how it works, including the code you can steal.

The Workflow at a Glance #

The pipeline has five stages to go from Kindle highlights to reliable study notes:

  1. Highlight with intent — tag passages in my Kindle app with a color key based on the notes’ importance.
  2. Export from Kindle — grab the raw HTML file
  3. Preprocess — run the HTML through a Python script that converts it into structured Markdown with semantic prefixes
  4. Generate the study notes — run an AI prompt that transforms the Markdown into a polished, 1000+ word study guide
  5. Reinforce (optional) — run a second AI prompt that turns key concepts into spaced-repetition flashcards

The output looks like this: a 14-section, organized summary of The Clean Coder by Robert C. Martin that I can revisit any time without re-reading 200 pages.

Let's walk through each stage.

Stage 1: Highlighting with Intent #

Most people highlight books in one color. That's fine for "this seems important," but it gives an AI (or your future self) zero information about why you highlighted it. It could be a core argument, a supporting example, or something I want to quote verbatim.

I use all four of the Kindle app's highlight colors, each mapped to a specific semantic role:

This system doesn't interrupt the flow of reading — I’m just reaching for a different color instead of the same one. It encodes enough structure that the AI can later make intelligent decisions about what matters most.

Kindle lets you attach text notes to any highlight, and I use this to capture stray thoughts, connections, or potential ideas. Those notes get merged into the relevant highlight during preprocessing so nothing gets lost.

Stage 2: Exporting from Kindle #

Kindle stores your highlights and notes in a notebook file accessible from the Kindle app. The raw, exported HTML output looks something like this.

<div class="sectionHeading">1. Professionalism</div>
<div class="noteHeading">
Highlight(<span class="highlight_orange">orange</span>)
</div>
<div class="noteText">
Professionalism is a loaded term. Certainly it is a badge of
honor and pride, but it is also a marker of responsibility
and accountability.
</div>

This is parseable but also noisy, and is exactly the kind of tedious data-munging task I love writing scripts for.

One unexpected benefit of these exports: Kindle limits how much of a book you can export as highlights (usually a percentage cap). At first this annoyed me, but now I consider it as curation enforcement. I have to highlight only what genuinely matters, not every third paragraph.

Stage 3: The Preprocessing Script #

I originally wrote this step as a Ruby script (240 lines across four small classes) using Nokogiri to parse the HTML, extract elements by CSS class, and map color classes to semantic prefixes. I've since ported it to Python since it increasingly dominates the AI/LLM ecosystem, and this tool sits squarely in that world.

The script does exactly four things:

  1. Parse the Kindle HTML and extract the book title, author(s), chapter headings, and every highlight/note pair.
  2. Map highlight colors to semantic prefixes: IMPORTANT: for orange, CONTEXT: for pink, DIRECT QUOTE: for blue, and none for yellow.
  3. Attach user-written notes to the highlight that precedes them. A note I jotted down about a passage gets merged into that passage's bullet point.
  4. Output clean Markdown with ## CHAPTER: headers and bullet-pointed highlights

Here's what that HTML excerpt looks like after it’s run through this script:

## 2. Saying No

* IMPORTANT: The most important time to say no is when the stakes
are highest. The higher the stakes, the more valuable no becomes.

The full Python script is available on GitHub — four files, no dependencies beyond BeautifulSoup. Drop your Kindle HTML exports into a notes/ directory and run make run. This will process them all and put them in the markdown/ directory.

Stage 4: The AI Prompt #

This is where the magic happens.

The processed Markdown gets fed to an AI model with a structured prompt. I use Claude as I've found it emphasizes analytical detail and is less prone to dropping information. The full prompt lives here as a Notion page that I keep updated, but the high-level principles are:

The cost for all this? Pennies per book, if any at all. Even a dense 300-page nonfiction book with hundreds of highlights costs a fraction of a dollar to process through Claude's API. Even if that’s too much, you can run it through a free model like Gemini.

Stage 5: The Output #

The notes for “The Clean Coder” are organized into thematic sections with prose that flows naturally. Each section synthesizes multiple highlights into coherent arguments, with bolded key concepts, callout blocks, and block quotes.

What I love about this format is that it's genuinely usable. Six months after reading the book, I can open this page and in 90 seconds refresh myself on the major arguments. I can search for "TDD" or "estimation" and land exactly where I need. It's not a replacement for the book, and if I need a serious refresh then I’ll reread the section. But it's a strong exoskeleton for my long-term memory.

All processed notes live in my Notion book database — searchable, filterable, and linked to related projects and goals. The AI output gets saved directly to the corresponding book page using a Notion automation. So the entire pipeline from "finished reading" to "study guide ready" is one script run and one AI call.

Fun fact: when this script was completed, I exported the highlights for over a dozen books I read over the last few years. The highlights were all yellow, but the resulting notes were still informative and concise.

Bonus: Flashcards for Spaced Reinforcement #

Notes are great for reference, but for ideas I want truly internalized, I learned I need to go one step further. That requires converting the notes into a format that’s easier for small, routine reviews: flashcards.

To do this, I run the notes through a second AI prompt that converts them into Q&A flashcards. The specific output is a CSV file importable into AlgoApp. It’s a spaced-repetition app that review cards at algorithmically optimized intervals. This means all my book notes can be imported and reviewed at any time, and the app keeps track of my strong and weak points.

The best part? Each deck can be reviewed on its own, or with a collection of other decks. So I can make a collection for programming, one for personal development, one for communication, and more. I can reinforce my knowledge on any topic at any point.

Between the study guide notes and the spaced repetition cards, I've built a system where a book I read two years ago will actively contribute to what I know today with minimal friction and effort.

The Script’s Honest Limitations #

This has worked well for me so far, but of course it’s not perfect. Here's what I've learned about the edge cases so far:

In my opinion, the trade-off — pennies and 10 minutes of review for a permanent, searchable knowledge base of every book I read — is absolutely worth it.

Steal This Workflow #

Everything I've described is available for you to adapt as you see fit.

The whole thing costs pennies per book (at most), takes maybe 15 minutes of human attention to review the AI output, and produces a permanent, searchable knowledge base of everything I've read. This is one of the best uses for AI I think there currently is - as a scaffold and accelerant to existing knowledge and strengths.

If you're a big reader like me, these tools are for you.

Footnotes

  1. Or if I'm 'lucky,' 60% in eight months.
  2. Loud techno music optional but highly recommended.
  3. For the record, I only used a heart emoji since there is no pink circle emoji for some reason. And yes, I have lodged a formal complaint.
  4. And by 'magic,' I mean 'prompt engineering that took several iterations to get right.'