I'm new to Python, so please stand with me.
I can not find this bit to make the script work properly:
genome = open ('refT.txt', 'r')
Detafail - a reference genome with a bunch (2 million contigs):
Contig_01 TGCAGGTAAAAAACTGTCACCTGCTGGT Contig_02 TGCAGGTCTTCCCACTTTATGATCCCTTA Contig_03 TGCAGTGTGTCACTGGCCAAGCCCAGCGC Contig_04 TGCAGTGAGCAGACCCCAAAGGGAACCAT Contig_05 TGCAGTAAGGGTAAGATTTGCTTGACCTA
The file is opened:
above A list of contigs to remove from the listed datasets : Contig_0 1 Contig_02 Contig_03 Contig_05
My disappointing script:
in line for cont_list: if not line in genome.readline (): proceed: a = The script successfully writes the first three contigs in the output file, but (= 'output', 'a') data_
. "Out% s") input ("% s"% s) data_out.close () input does not seem to be able to exclude "contig_04" for some reasons, which is not in the list, and moves on "Contig_05" .
I may feel like a lazy bast to post it, I code -_-
On this small bit I will first try to generate a Python which gives you a tube: (contig, gnome)
:
Def pair (file_obj): for line in file_obj: yield line, next (file_obj)
Now, I will use it to get the desired elements:
< Pre> wanted = {'Contig_01', 'Contig_02', 'Contig_03', 'Contig_05'} Fine fine Open ('filename') in the form: pair = pair (wings) when you wanted: p = next (pair) if p [0] wanted: # write in the output file, store in the list, or dict, .. . Wanted. Forgotten (P [0])
Comments
Post a Comment