Hello,
this is a very good question. Our previous version of pair database (before 2015) was actually sector-based (600k pairs). The problem was there were some very good pairs out there which were not in the database, because each equity was from a different sector. There was still a strong fundamental relationship between both firms, but not covered with sectors.
So in order to cover pairs like this we decided to brute force the whole equity space in the current version of the database.
We are aware this is not ideal from other perspectives (a lot of spurious correlation / cointegration is one example). We are open to suggestions here.
Kind regards,
Karel