Convert Organism Names to Acronyms
Example of sequence formats in the input file:
[Source of the sequence: GenBank]
>gnl|REF_E.coli|c3122:1-379 3-deoxy-7-phosphoheptulonate synthase [Escherichia coli CFT073]
MCYRYVILAEDQLSQTSINRIAIMQKDALNNVHITDEQVLMTPEQLKAAFPLSLQQEAQIADSRKTISDI
IAGRDPRLLVVCGPCSIHDPETALEYARRFKALAAEVSDSLYLVMRVYFEKPRTTVGWKGLINDPHMDGS
FDVEAGLQIARKLLLELVNMGLPLATEALDPNSPQYLGDLFSWSAIGARTTESQTHREMASGLSMPVGFK
NGTDGSLATAINAMRAAAQPHRFVGINQAGQVALLQTQGNPDGHVILRGGKAPNYSPADVAQCEKEMEQA
GLRPSLMVDCSHGNSNKDYRRQPAVAESVVAQIKDGNRSIIGLMIESNIHEGNQSSEQPRSEMKYGVSVT
DACISWEMTDALLREIHQDLNGQLTARVA
[Source of the sequence: the SEED]
>fig|83333.1.peg.2568 [Escherichia coli K12] [2-keto-3-deoxy-D-arabino-heptulosonate-7-phosphate synthase]
MQKDALNNVHITDEQVLMTPEQLKAAFPLSLQQEAQIADSRKSISDIIAGRDPRLLVVCG
PCSIHDPETALEYARRFKALAAEVSDSLYLVMRVYFEKPRTTVGWKGLINDPHMDGSFDV
EAGLQIARKLLLELVNMGLPLATEALDPNSPQYLGDLFSWSAIGARTTESQTHREMASGL
SMPVGFKNGTDGSLATAINAMRAAAQPHRFVGINQAGQVALLQTQGNPDGHVILRGGKAP
NYSPADVAQCEKEMEQAGLRPSLMVDCSHGNSNKDYRRQPAVAESVVAQIKDGNRSIIGL
MIESNIHEGNQSSEQPRSEMKYGVSVTDACISWEMTDALLREIHQDLNGQLTARVA
OUTPUT AFTER CONVERSION:
>Ecol_Fa gnl|REF_E.coli|c3122:1-379 3-deoxy-7-phosphoheptulonate synthase [Escherichia coli CFT073]
MCYRYVILAEDQLSQTSINRIAIMQKDALNNVHITDEQVLMTPEQLKAAFPLSLQQEAQIADSRKTISDI
IAGRDPRLLVVCGPCSIHDPETALEYARRFKALAAEVSDSLYLVMRVYFEKPRTTVGWKGLINDPHMDGS
FDVEAGLQIARKLLLELVNMGLPLATEALDPNSPQYLGDLFSWSAIGARTTESQTHREMASGLSMPVGFK
NGTDGSLATAINAMRAAAQPHRFVGINQAGQVALLQTQGNPDGHVILRGGKAPNYSPADVAQCEKEMEQA
GLRPSLMVDCSHGNSNKDYRRQPAVAESVVAQIKDGNRSIIGLMIESNIHEGNQSSEQPRSEMKYGVSVT
DACISWEMTDALLREIHQDLNGQLTARVA
>Ecol_Ab_2568 fig|83333.1.peg.2568 [Escherichia coli K12] [2-keto-3-deoxy-D-arabino-heptulosonate-7-phosph
MQKDALNNVHITDEQVLMTPEQLKAAFPLSLQQEAQIADSRKSISDIIAGRDPRLLVVCG
PCSIHDPETALEYARRFKALAAEVSDSLYLVMRVYFEKPRTTVGWKGLINDPHMDGSFDV
EAGLQIARKLLLELVNMGLPLATEALDPNSPQYLGDLFSWSAIGARTTESQTHREMASGL
SMPVGFKNGTDGSLATAINAMRAAAQPHRFVGINQAGQVALLQTQGNPDGHVILRGGKAP
NYSPADVAQCEKEMEQAGLRPSLMVDCSHGNSNKDYRRQPAVAESVVAQIKDGNRSIIGL
MIESNIHEGNQSSEQPRSEMKYGVSVTDACISWEMTDALLREIHQDLNGQLTARVA
EXPLANATION:
Xxxx or Xxxxnum -- Acronym unique to a species and will not change.
A,B,C,......... -- the UPPERCASE letters designate a strain and when combined with the species acronym, it
is unique at the strain level (e.g., 'Ecol_A' always stands for
E. coli K12 and 'Ecol_F' for E. coli CFT073).
a,b,c,......... -- the lowercase letters designate different copies of the proteins (paralogs) within
the same strain.