I have predictions of orthologs between multiple species (A-B, B-C, A-C) and now I want to join all the data together to see which genes are shared among all the species, which are shared among subsets and which are unique to species. I used the merge.data.frames command to merge the dataframes together and for the most part things look as expected.
81 sr10108 BN887_04349 SPSC_02689
82 sr10109 BN887_04348 NA
83 sr10112.2 BN887_04345 SPSC_02690
In some cases though I have a problem where some rows merge as expected and I get a result like this:
941 BN887_01039 SPSC_05463 sr10904
942 BN887_01040 SPSC_05465 NA
943 BN887_01040 NA sr10908
BN887_01040 has been predicted to be orthologous to both SPSC_05465 and sr10908 but they weren't predicted to be orthologous to each other. I think it should be fine to assume they are. If A = B and A = C then it must follow that B = C. However, I do not know how to merge rows like that so that there is only one entry per gene name.
How can I join rows like that so that they display as "BN887_01040 SPSC_05465 sr10908"? I have tried variations of the merge command itself but none seem to have solved this problem.