Luca Puzzoni

aka k0d14k.

Paper bin

by k0d14k

Paper bin

CTF: Angstrom 2019

Category: misc

Difficulty: Easy-Medium

Description


defund accidentally deleted all of his math papers! Help recover them from his computer’s raw data.

address : https://2019.angstromctf.com/challenges

hint : File carving

Solution


In this challenge We have to restore defund’s pdfs from a dump.

As the hint says We can perform a File carving over the dump to get these pdfs.

I’ve tried with two different tools, binwalk and foremost. Binwalk dumps too much files so I prefered use foremost to have a cleaner output.

┌──(kali㉿kali)-[~/…/AngstromCTF/2019/misc/paperBin]
└─$ foremost paper_bin.dat
Processing: paper_bin.dat
|*|
                                                                                                                                                                                                                                              
┌──(kali㉿kali)-[~/…/AngstromCTF/2019/misc/paperBin]
└─$ cd output  
                                                                                                                                                                                                                                              
┌──(kali㉿kali)-[~/…/2019/misc/paperBin/output]
└─$ ll
total 8
-rwxrwx--- 1 root vboxsf 1607 Nov 24 10:17 audit.txt
drwxrwx--- 1 root vboxsf 4096 Nov 24 10:17 pdf
                                                                                                                                                                                                                                              
┌──(kali㉿kali)-[~/…/2019/misc/paperBin/output]
└─$

As We can see foremost has extract an audit.txt file and a pdf directory

audit.txt contains many util information.

Foremost version 1.5.7 by Jesse Kornblum, Kris Kendall, and Nick Mikus
Audit File

Foremost started at Thu Nov 24 10:17:43 2022
Invocation: foremost paper_bin.dat 
Output directory: /home/kali/Documents/vm_shared/ctf/AngstromCTF/2019/misc/paperBin/output
Configuration file: /etc/foremost.conf
------------------------------------------------------------------
File: paper_bin.dat
Start: Thu Nov 24 10:17:43 2022
Length: 6 MB (7123824 bytes)
 
Num      Name (bs=512)         Size      File Offset     Comment 

0:      00000000.pdf         341 KB             222      
1:      00000688.pdf         346 KB          352478      
2:      00001384.pdf         363 KB          708830      
3:      00002112.pdf         353 KB         1081566      
4:      00002824.pdf         333 KB         1446110      
5:      00003496.pdf         370 KB         1790174      
6:      00004240.pdf         349 KB         2171102      
7:      00004944.pdf         388 KB         2531550      
8:      00005728.pdf         356 KB         2932958      
9:      00006448.pdf         318 KB         3301598      
10:     00007088.pdf         411 KB         3629278      
11:     00007912.pdf         351 KB         4051166      
12:     00008616.pdf         317 KB         4411614      
13:     00009256.pdf         319 KB         4739294      
14:     00009896.pdf         339 KB         5066974      
15:     00010576.pdf         324 KB         5415134      
16:     00011232.pdf         321 KB         5751006      
17:     00011880.pdf         293 KB         6082782       (PDF is Linearized)
18:     00012472.pdf         374 KB         6385886      
19:     00013224.pdf         344 KB         6770910      
Finish: Thu Nov 24 10:17:43 2022

20 FILES EXTRACTED

pdf:= 20
------------------------------------------------------------------

Foremost finished at Thu Nov 24 10:17:43 2022

mh what does it means that a PDF is linearized?

From a google research I’ve discovered as follow:

Sometimes called “fast web view,” linearization is a special way of saving a PDF file that 
organizes its internal components to make them easier to read when the file is streamed over a 
network connection. While a standard, non-linearized PDF stores information associated with each 
page across the entire file, linearized PDFs use an object tree format to consolidate page elements
in an ordered, page by page basis.

So in this 00011880.pdf file we have a linear structure that maybe can be readed by a human.

Let’s try to open it with bless.

The file’s structure doesn’t appear human readable but, at the end of the file, I found this:

Screenshot from 2022-11-24 10-24-04.png

This pdf file is produced by pdfTeX that’s a TeX plugin and TeX is a technology used to format texts with a simple sintax. Mh very cool, Are We able to convert this pdf into a TeX file?

Of course, just we need to install pdftotext tool (read Links and tools).

┌──(kali㉿kali)-[~/…/2019/misc/paperBin/output]
└─$ pdftotext 
pdftotext version 22.08.0
Copyright 2005-2022 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011, 2022 Glyph & Cog, LLC
Usage: pdftotext [options] <PDF-file> [<text-file>]
  -f <int>             : first page to convert
  -l <int>             : last page to convert
  -r <fp>              : resolution, in DPI (default is 72)
  -x <int>             : x-coordinate of the crop area top left corner
  -y <int>             : y-coordinate of the crop area top left corner
  -W <int>             : width of crop area in pixels (default is 0)
  -H <int>             : height of crop area in pixels (default is 0)
  -layout              : maintain original physical layout
  -fixed <fp>          : assume fixed-pitch (or tabular) text
  -raw                 : keep strings in content stream order
  -nodiag              : discard diagonal text
  -htmlmeta            : generate a simple HTML file, including the meta information
  -tsv                 : generate a simple TSV file, including the meta information for bounding boxes
  -enc <string>        : output text encoding name
  -listenc             : list available encodings
  -eol <string>        : output end-of-line convention (unix, dos, or mac)
  -nopgbrk             : don't insert page breaks between pages
  -bbox                : output bounding box for each word and page size to html. Sets -htmlmeta
  -bbox-layout         : like -bbox but with extra layout bounding box data.  Sets -htmlmeta
  -cropbox             : use the crop box rather than media box
  -colspacing <fp>     : how much spacing we allow after a word before considering adjacent text to be a new column, as a fraction of the font size (default is 0.7, old releases had a 0.3 default)
  -opw <string>        : owner password (for encrypted files)
  -upw <string>        : user password (for encrypted files)
  -q                   : don't print any messages or errors
  -v                   : print copyright and version info
  -h                   : print usage information
  -help                : print usage information
  --help               : print usage information
  -?                   : print usage information
                                                                                                                                                                                                                                              
┌──(kali㉿kali)-[~/…/2019/misc/paperBin/output]
└─$ pdftotext pdf/00011880.pdf 
                                                                                                                                                                                                                                              
┌──(kali㉿kali)-[~/…/2019/misc/paperBin/output]
└─$ cat pdf/*.txt | grep 'actf'
actf{proof by triviality}

Remember to replace spaces with _.

Flag : actf{proof_by_triviality}

Links and tools


pdftotext : https://www.makeuseof.com/how-to-convert-linux-pdf-file-to-text-document/