Public
- Public
- Network
- Groups
- Popular
- People

Conversation

Notices

jonny (jonny@social.coop)'s status on Wednesday, 26-Jan-2022 16:44:59 UTC jonny

More fun publisher surveillance:
Elsevier embeds a hash in the PDF metadata that is *unique for each time a PDF is downloaded*, this is a diff between metadata from two of the same paper. Combined with access timestamps, they can uniquely identify the source of any shared PDFs.
In conversation Wednesday, 26-Jan-2022 16:44:59 UTC from social.coop permalink
Attachments
1. [A list of metadata for a PDF, the important field being two "Unknown:<long random character string>" fields that are color coded to indicate that they have been changed between versions.
  https://social-coop-media.ams3.cdn.digitaloceanspaces.com/media_attachments/files/107/685/726/594/579/059/original/053de9506bb007c6.jpg
- Santa Claes 🇸🇪🇭🇰🎅 likes this.
- Seachaint :verified: (seachaint@hackers.town)'s status on Wednesday, 26-Jan-2022 16:45:02 UTC Seachaint :verified:
  in reply to
  - Beckett
  @beckett @jonny "PDFparanoia" was a project for exactly this - to strip identifying watermarks and metadata from shared academic PDFs. But it fell victim to the Python 2 to 3 transition and the mess of the PDF libraries in particular, and then fell to bitrot. Would be nice to see it brought back to health.
  
  In conversation Wednesday, 26-Jan-2022 16:45:02 UTC permalink
  
  Santa Claes 🇸🇪🇭🇰🎅 likes this.
- Beckett (beckett@social.coop)'s status on Wednesday, 26-Jan-2022 16:45:07 UTC Beckett
  in reply to
  
  @jonny I do not have any IT skills, but if I did I’d love to write a script to remove metadata from PDFs. Adobe has them wrapped up pretty well.
  
  In conversation Wednesday, 26-Jan-2022 16:45:07 UTC permalink

Public

Conversation

Notices

Feeds