I need to create an input for the program, which is actually out of my skill in programming, so I Hope to try anything you can help me.
I have many text files, for each sample, "" "a sample name, a line break, and then starts with data in 0s and 1s.
the data looks like this (huge really):
& gt; SampleName_ZN189A 01000001000000000000100011100000000111000000001000 00110000100000000000010000000000001100000010000000 00110000000000001110000010010011111000000100010000 00000110000001000000010100000000010000001000001110 & gt; SampleName_ZN189B 00110000001101000001011100000000000000000000010001 00010000000000000010010000000000100100000001000000 00000000000000000000000010000000000010111010000000 01000110000000110000001010010000001111110101000000
Note: every 50 characters Adad is a line break.
What do I have to do:
Remove the first 2000 characters of each of the data sample in my file and after a window number, For example, if the name of this file is given: Testfile_1.txt. It should look like this (I have removed the first 50 characters of the data):
& gt; Sample Name_Jan 18 9 A.01000001000000000000100011100000000111000000001000 & gt; Sample name _ZN189 B 0011000000110100000101110000000001000000000000010001
And this file should be named as: Testfile_1_window1.txt
Now, the second window starts from 1500 to 3500 characters It should be testfile_1_window2.txt, the third character ranges from 3000 to 5000 and the file is Testfile_1_window3.txt and so on ... but if the last window is less than 2000 characters, then these letters should be added in the last window.
That is, 2000 windows overlap characters of 500 characters
Thanks in advance.
Note 2:
If you think that this problem may be solved, please post your answer using either Pearl or Python.
In Pearl, you can do it, it seems that this is not effective, but this work Because the op system will cache the file.
Use strict; Use precautions; Local $ / = '& gt;'; Open (my $ FH, '<', 'filename') or $ !; While (my $ share = ) {chomp ($ Chuck); $ Chuck = ~ s! ^ (. +?) \ N + !! is; My $ sample name = $ 1; ### How many fly should be continuous or calculate, I'm currently setting it for 50 (my $ i = 0; $ i <50; $ i ++) {my $ data = Substr ($ chuck, $ i $ i * 1500 + 2000) * 1500; If the next! $ Data; ## Quit if my $ filename = "testfile_" is not there. $ Samplename "_ Window". $ I. ".txt"; Open (my $ ofh, '& gt;', $ filename) or $ $ filename, $ !; $ Ofh Print "& lt; $ samplename \ n $ data \ n"; Close ($ ofh); }} Closed ($ FH);
Comments
Post a Comment