More fun publisher surveillance:
Elsevier embeds a hash in the PDF metadata that is *unique for each time a PDF is downloaded*, this is a diff between metadata from two of the same paper. Combined with access timestamps, they can uniquely identify the source of any shared PDFs.
Conversation
Notices
-
jonny (jonny@social.coop)'s status on Wednesday, 26-Jan-2022 16:44:59 UTC jonny - Santa Claes πΈπͺππ°π likes this.
-
Seachaint :verified: (seachaint@hackers.town)'s status on Wednesday, 26-Jan-2022 16:45:02 UTC Seachaint :verified: @beckett @jonny "PDFparanoia" was a project for exactly this - to strip identifying watermarks and metadata from shared academic PDFs. But it fell victim to the Python 2 to 3 transition and the mess of the PDF libraries in particular, and then fell to bitrot. Would be nice to see it brought back to health.
Santa Claes πΈπͺππ°π likes this. -
Beckett (beckett@social.coop)'s status on Wednesday, 26-Jan-2022 16:45:07 UTC Beckett @jonny I do not have any IT skills, but if I did Iβd love to write a script to remove metadata from PDFs. Adobe has them wrapped up pretty well.