Anyone can set up a Lemmy instance, write a small script/bot to find and follow all the communities on all the instances in the Fediverse and store all that data. It's not even hard, maybe a day of work for a proof of concept if you start from zero. (Then you have to figure out how to scale it properly, how to detect you're getting defederated and how to change domains to restart without the defederations. Maybe a week's worth of effort.)
Threads would be way overkill to achieve this goal. You don't need any users. You don't want any users. Just your one account that follows everything.
Edit: or you can just set up a web crawler like Google Search uses to find and store all the data you're looking for, you don't necessarily to be federated / use ActivityPub
While Matrix is federated, it uses its own protocol,.not ActivityPub like the others you mentioned.