[Bioperl-l] fasta header replace

Discussion:

odclerck

2010-08-27 07:44:14 UTC

Hi,
Was wondering if someone had an easy script available that converts the
headers of a fasta sequences to a value stored in a separate text file.

100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum.

I can filter out the position on the plate e.g. "A1" easily but would like
to replace this with the name of the strain stored in a different text file,
e.g. "A1_D1222".

Realize this sounds pretty basic to most of you, but I'm pretty new at
scripting.
Olivier

--
View this message in context: http://old.nabble.com/fasta-header-replace-tp29550202p29550202.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

Frank Schwach

2010-08-30 15:11:06 UTC

Permalink

Hi Olivier,

Do you know how to read a file and build a hash from the contents? This
is what you will need to do,
e.g. if your file is
A1 Strain_A
A2 Strain_A
A3 Strain_B

then you can do something like:

open (INFILE, '>', $infile_path) or die;
my %well2strain;
While (<INFILE>){
my ($well, $strain) = ($_=~/^([A-Z]\d+)\s+(\w+)/);
$well2strain{$well}=$strain;
}

You can then use the values of the hash to set the sequence ID as you
parse the FASTA file. The BioPerl SeqIO howto gives details about how to
read and write the FASTA file
(http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples).
You can change the id of a sequence object with
$some_seq_object->id( 'my new ID');

See http://doc.bioperl.org/releases/bioperl-1.0/Bio/Seq.html for details.

Hope that helps to get you started.

Frank

Post by odclerck
Hi,
Was wondering if someone had an easy script available that converts the
headers of a fasta sequences to a value stored in a separate text file.

100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum.

I can filter out the position on the plate e.g. "A1" easily but would like
to replace this with the name of the strain stored in a different text file,
e.g. "A1_D1222".
Realize this sounds pretty basic to most of you, but I'm pretty new at
scripting.
Olivier

--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

odclerck

2010-08-27 07:44:14 UTC

Permalink

Hi,
Was wondering if someone had an easy script available that converts the
headers of a fasta sequences to a value stored in a separate text file.

100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum.

--
View this message in context: http://old.nabble.com/fasta-header-replace-tp29550202p29550202.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

Frank Schwach

2010-08-30 15:11:06 UTC

Permalink

Post by odclerck
Hi,
Was wondering if someone had an easy script available that converts the
headers of a fasta sequences to a value stored in a separate text file.

100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum.

I can filter out the position on the plate e.g. "A1" easily but would like
to replace this with the name of the strain stored in a different text file,
e.g. "A1_D1222".
Realize this sounds pretty basic to most of you, but I'm pretty new at
scripting.
Olivier

odclerck

2010-08-27 07:44:14 UTC

Permalink

Hi,
Was wondering if someone had an easy script available that converts the
headers of a fasta sequences to a value stored in a separate text file.

100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum.

--
View this message in context: http://old.nabble.com/fasta-header-replace-tp29550202p29550202.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

Frank Schwach

2010-08-30 15:11:06 UTC

Permalink

Post by odclerck
Hi,
Was wondering if someone had an easy script available that converts the
headers of a fasta sequences to a value stored in a separate text file.

100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum.

I can filter out the position on the plate e.g. "A1" easily but would like
to replace this with the name of the strain stored in a different text file,
e.g. "A1_D1222".
Realize this sounds pretty basic to most of you, but I'm pretty new at
scripting.
Olivier

odclerck

2010-08-27 07:44:14 UTC

Permalink

Hi,
Was wondering if someone had an easy script available that converts the
headers of a fasta sequences to a value stored in a separate text file.

100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum.

--
View this message in context: http://old.nabble.com/fasta-header-replace-tp29550202p29550202.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

Frank Schwach

2010-08-30 15:11:06 UTC

Permalink

Post by odclerck
Hi,
Was wondering if someone had an easy script available that converts the
headers of a fasta sequences to a value stored in a separate text file.

100825-30_I01_CF_CentralAmerica1_A1_psbAF.ab1 1012, 1000 bases, 0 checksum.

I can filter out the position on the plate e.g. "A1" easily but would like
to replace this with the name of the strain stored in a different text file,
e.g. "A1_D1222".
Realize this sounds pretty basic to most of you, but I'm pretty new at
scripting.
Olivier