I just developed and published a script to clear your pict-rs object storage from potential CSAM.

db0 · edit-2 10 months ago

I just developed and published a script to clear your pict-rs object storage from potential CSAM.

@dan@upvote.au · edit-2 10 months ago

match the hashes

It’s more than just basic hash matching because it has to catch content even if it’s been resized, cropped, reduced in quality (lower JPEG quality with more artifacts), colour balance change, etc.

@crunchpaste@lemmy.dbzer0.com · 10 months ago

Well, we have hashing algorithms that do exactly that, like phash for example.

@dan@upvote.au · 10 months ago

Definitely. A lot of the good algorithms used by big services are proprietary though, unfortunately.

@crunchpaste@lemmy.dbzer0.com · 10 months ago

Can you point me to some of them? I’m quite interested in visual hashing.

@dan@upvote.au · edit-2 10 months ago

Microsoft’s PhotoDNA is probably the most well-known. Every major service that has user-generated content uses it. Last I checked, it wasn’t open-source. It was built for detecting CSAM, but it’s really just a general-purpose similarity hashing algorithm.

Meta has some algorithms that are open-source: https://about.fb.com/news/2019/08/open-source-photo-video-matching/

Google has CSAI Match for hash-matching of videos and Google Content Safety API for classification of new content, but both are proprietary.

db0 · edit-2 10 months ago

There’s better approaches than hashing. For comparing images I am calculating “distance” in tensors between them. This can match even when compression artifacts are involved or the images are slightly altered.

@FriendlyBeagleDog@lemmy.blahaj.zone · 10 months ago

Ah, of course - that’s unfortunate, but thanks for the pointer.

I just developed and published a script to clear your pict-rs object storage from potential CSAM.

I just developed and published a script to clear your pict-rs object storage from potential CSAM.

GitHub - Haidra-Org/lemmy-safety: A script that goes through a lemmy pict-rs object storage and tries to prevent illegal or unethical content