Extract Text from PDF Online
Extract all text from a PDF file and save it as a plain text file.
Extracting text from a PDF is useful when you need to copy the content into a word processor, run it through a spell checker, search it with command-line tools, feed it into a script or language model, or simply store the words without the PDF structure. Not all PDFs are created equal — PDFs created from text-based documents such as Word files or InDesign layouts contain embedded text that can be extracted cleanly. Scanned documents saved as PDFs, however, are essentially images and do not contain extractable text; an OCR tool would be needed for those. Dockitt extracts text from searchable PDFs directly in your browser using PDF.js.
How to use
- Click 'Choose PDF' and select the PDF file you want to extract text from.
- Click 'Extract Text' and wait while each page is processed.
- Download the TXT file. All text from all pages is combined into a single file with page markers.
FAQ
Why does the extracted text look garbled or out of order?
PDF does not store text in reading order — it stores text fragments positioned on a page, and the reading order depends on how the PDF was created. PDFs with complex multi-column layouts, tables, or non-standard text flow may produce text that is out of order when extracted. This is a fundamental limitation of the PDF format, not a bug in the tool. For structured data like tables, a dedicated PDF table extractor would be more appropriate.
The output file is empty. What went wrong?
If the output is empty, the PDF is likely a scanned document — one that was created by scanning paper pages and saving them as a PDF image. Scanned PDFs contain images of text, not actual text characters. To extract text from a scanned PDF, you need an OCR (Optical Character Recognition) tool that can read the image and convert it to text. This tool only works with PDFs that contain embedded text.
How are the pages separated in the output file?
Each page is preceded by a separator line in the format '--- Page N ---', where N is the page number. This makes it easy to find content from a specific page in the output file. You can search for '--- Page' in any text editor to jump between page boundaries.
Does the tool preserve formatting like bold, italics, or headings?
No. Plain text does not support rich formatting. All text is extracted as flat, unformatted characters. Bold, italics, font size differences, and any other visual formatting are not preserved in the TXT output. If you need to preserve formatting, consider using a Word or PDF editor tool instead.
Can I extract text from a password-protected PDF?
No. The PDF must be unlocked before text can be extracted. If your PDF is password-protected, use the Dockitt Unlock PDF tool to remove the password first, then run the unlocked file through the text extraction tool.
Is my document content kept private?
Yes. Text extraction is done entirely in your browser using PDF.js. The content of your PDF is never sent to any server. The file is processed locally on your device, and the extracted text file is created and downloaded directly in your browser without any data leaving your machine.