Matty-kun on Nostr: I think this would be a little more accurate. Querying several million objects ...
I think this would be a little more accurate. Querying several million objects doesn't actually take as long as I expected.
pleroma=# SELECT count(*) FROM objects WHERE split_part(data->>'actor', '/', 3) IN ('nicecrew.digital') AND LOWER(data->>'content') LIKE ANY (ARRAY['%nigga%', '%niggas%', '%nigger%', '%niggers%']);
count
--------
129890
(1 row)
This works out to approximately 3.8 percent of posts on NCD containing any variation of the nigger word. This obviously does not include a total count (for example 'NIGGER NIGGER NIGGER NIGGER' would be one, not four ticks in this query) but the ratio of objects which do contain one of those words.
pleroma=# SELECT count(*) FROM objects WHERE split_part(data->>'actor', '/', 3) IN ('nicecrew.digital') AND LOWER(data->>'content') LIKE ANY (ARRAY['%nigga%', '%niggas%', '%nigger%', '%niggers%']);
count
--------
129890
(1 row)
This works out to approximately 3.8 percent of posts on NCD containing any variation of the nigger word. This obviously does not include a total count (for example 'NIGGER NIGGER NIGGER NIGGER' would be one, not four ticks in this query) but the ratio of objects which do contain one of those words.