Coffea arabica cv. Red Bourbon Genome v0.9

Coffea arabica is a polyploid species, carrying four copies of the eleven chromosomes typical of the genus Coffea, totaling 44 (2n = 4x = 44). Technically, it is described as an allotetraploid genome, the result of a hybridization between two diploid species, Coffea canephora and Coffea eugenioides, which doubled arabica's chromosome number to 44.


This genome sequence was derived from a Coffea arabica plant of the Red Bourbon variety.

Sequencing and assembly

The genome was sequenced with Illumina technology at the Istituto di Genomica Applicata in Udine, Italy. Given the inherent complexity of a tetraploid genome, it was sequenced using a hierarchical approach instead of a more common whole genome shotgun approach.

Key numbers and facts

  • 36,864 genomic fragments were cloned into bacterial artificial chromosomes (BACs) and sequenced in 96 pools of 384 clones
  • 488 billion base pairs were produced, corresponding to 132 genome equivalents
  • The genome size was estimated to be 1.3 Gb, based on a k-mers analysis
  • 96 independent assemblies were generated, using the software programs ABySS and SSPACE, and then merged to generate a single multifasta file
  • The sequence contains 1.51 billion base pairs, divided into 164,254 scaffold sequences

User Acknowledgement

The coffee (C. arabica) genome, realized by an Italian partnership led by illycaffè and Lavazza, is made available for advancing research on a non-profit basis.

To respect the rights of the data producers and contributors, you acknowledge that by downloading the genome in scaffolds and annotation files below you are agreeing to the following principles:

  • That this data as accessed is pre-competitive and is not patentable.
  • To use the data in compliance with all applicable statutes and regulations, guidelines for scientific research and publication.
  • Cite this article, as follows:

Scalabrin, S., Toniutti, L., Di Gaspero, G. et al. A single polyploidization event at the origin of the tetraploid genome of Coffea arabica is responsible for the extremely low genetic variation in wild and cultivated germplasm. Sci Rep 10, 4642 (2020).

You also acknowledge that the data providers (Italian partnership led by illycaffè and Lavazza):

  • Make no representations, assume no responsibility and extend no warranties of any kind, either expressed or implied, that the use of the data will not infringe any patent, copyright, trademark, or other proprietary rights
  • Assume no responsibility for the correctness, completeness, quality and reliability of the information and the results that can be obtained using the data, and assume no responsibility for any damages resulting from the use of data
  • Are not responsible for the handling of the data by third parties, in particular following unauthorized access to networks and systems of World Coffee Research