A pipeline design for downloading and analysing promoter sequences in Solanum lycopersicum

  • Alejandro Pistilli Facultad de Ciencias Agrarias, Universidad Nacional de Rosario (UNR)
  • Guillermo R. Pratta Instituto de Investigaciones en Ciencias Agrarias de Rosario, CONICET, UNR. Cátedra de Genética, Facultad de Ciencias Agrarias. UNR.
  • Laura Angelone Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas, Rosario. Facultad de Ciencias Exactas, Ingeniería y Agrimensura, UNR.
  • Débora P. Arce Instituto de Investigaciones en Ciencias Agrarias de Rosario, CONICET, UNR. 6Grupo de Análisis, Desarrollos e Investigaciones Biomédicas, Facultad Regional San Nicolás, Universidad Tecnológica Nacional.

Abstract

A pipeline architecture is implemented to automatize gene promoter sequence download from tomato genome Solanum lycopersicum annotated in Sol Genomics Network. Output gene promoters can be analyzed with MEME and TOMTOM programs. The code is available at www.github.com/lalebot/pip-prom-tom and Git is used as control versions software. Combined Python threads, regular expressions, and SQLite databases are used to reduce time for downloading sequences and optimize informatic resources. The methodology presented in this work is potentially applicable to other biological fields.
Published
2018-08-30
Section
Scientific articles