I have predictions of orthologs between multiple species (A-B, B-C, A-C) and now I want to join all the data together to see which genes are shared among all the species, which are shared among subsets and which are unique to species. I used the merge.data.frames command to merge the dataframes together and for the most part things look as expected.


81    sr10108       BN887_04349    SPSC_02689
82    sr10109       BN887_04348    NA
83    sr10112.2    BN887_04345    SPSC_02690

In some cases though I have a problem where some rows merge as expected and I get a result like this:

941    BN887_01039    SPSC_05463    sr10904
942    BN887_01040    SPSC_05465    NA
943    BN887_01040    NA                       sr10908

BN887_01040 has been predicted to be orthologous to both SPSC_05465 and sr10908 but they weren't predicted to be orthologous to each other. I think it should be fine to assume they are. If A = B and A = C then it must follow that B = C. However, I do not know how to merge rows like that so that there is only one entry per gene name.

How can I join rows like that so that they display as "BN887_01040    SPSC_05465   sr10908"? I have tried variations of the merge command itself but none seem to have solved this problem.
In the end I just used a different programme which identifies orthologs in multiple species at the same time.

