Log and temporary files used by MIRA 3

Bastien Chevreux

MIRA Version 3.4.1.1

Document revision $Id$

Table of Contents

1. Introduction
2. The files
2.1. mira_error_reads_invalid
2.2. mira_info_reads_tooshort
2.3. mira_int_alignextends_preassembly1.0.txt
2.4. mira_int_clippings.0.txt
2.5. mira_int_posmatch_megahubs_pass.X.lst
2.6. mira_int_posmatch_multicopystat_preassembly.0.txt
2.7. mira_int_posmatch_rawhashhits_pass.X.lst
2.8. mira_int_skimmarknastyrepeats_hist_pass.X.lst
2.9. mira_int_skimmarknastyrepeats_nastyseq_pass.X.lst
2.10. mira_int_vectorclip_pass.X.txt
2.11. miratmp.ads_pass.X.forward and miratmp.ads_pass.X.complement
2.12. miratmp.ads_pass.X.reject
2.13. miratmp.noqualities
2.14. miratmp.usedids
2.15. mira_readpoolinfo.lst
 

The amount of entropy in the universe is constant - except when it increases.

 
 --Solomon Short

1.  Introduction

The tmp directory used by mira (usually <projectname>_d_tmp) may contain a number of files with information which could be interesting for other uses than the pure assembly. This guide gives a short overview.

[Note]Note
This guide is probably the least complete and most out-of-date as it is updated only very infrequently. If in doubt, ask on the MIRA talk mailing list.
[Warning]Warning
Please note that the format of these files may change over time, although I try very hard to keep changes reduced to a minimum.
[Note]Note
Remember that mira has two options that control whether log and temporary files get deleted: while [-OUT:rtd] removes the complete tmp directory after an assembly, [-OUT:rrot] removes only those log and temporary files which are not needed anymore for the continuation of the assembly. Setting both options to no will keep all log and temporary files.

2.  The files

2.1.  mira_error_reads_invalid

A simple list of those reads that were invalid (no sequence or similar problems).

2.2.  mira_info_reads_tooshort

A simple list of those reads that were sorted out because the unclipped sequence was too short as defined by [-AS:mrl].

2.3.  mira_int_alignextends_preassembly1.0.txt

If read extension is used ([-DP:ure]), this file contains the read name and the number of bases by which the right clipping was extended.

2.4.  mira_int_clippings.0.txt

If any of the [-CL:] options leads to the clipping of a read, this file will tell when, which clipping, which read and by how much (or to where) the clippings were set.

2.5.  mira_int_posmatch_megahubs_pass.X.lst

Note: replace the X by the pass of mira. Should any read be categorised as megahub during the all-against-all search (SKIM3), this file will tell you which.

2.6.  mira_int_posmatch_multicopystat_preassembly.0.txt

After the initial all-against-all search (SKIM3), this file tells you to how many other reads each read has overlaps. Furthermore, reads that have more overlaps than expected are tagged with ``mc'' (multycopy).

2.7.  mira_int_posmatch_rawhashhits_pass.X.lst

Note: replace the X by the pass of mira. Similar to mira_int_posmatch_multicopystat_preassembly.0.txt, this counts the hash hits of each read to other reads. This time however per pass.

2.8.  mira_int_skimmarknastyrepeats_hist_pass.X.lst

Note: replace the X by the pass of mira. Only written if [-SK:mnr] is set to yes. This file contains a histogram of hash occurrences encountered by SKIM3.

2.9.  mira_int_skimmarknastyrepeats_nastyseq_pass.X.lst

Note: replace the X by the pass of mira. Only written if [-SK:mnr] is set to yes. One of the more interesting files if you want to know the repetitive sequences cause the assembly to be really difficult: for each masked part of a read, the masked sequences is shown here.

E.g.

	U13a04h11.t1    TATATATATATATATATATATATA
	U13a05b01.t1    TATATATATATATATATATATATA
	U13a05c07.t1    AAAAAAAAAAAAAAA
	U13a05e12.t1    CTCTCTCTCTCTCTCTCTCTCTCTCTCTC
      

Simple repeats like the ones shown above will certainly pop-up there, but a few other sequences (like e.g. SINEs, LINEs in eukaryotes) will also appear.

Nifty thing to try out if you want to have a more compressed overview: sort and unify by the second column.

	sort -k 2 -u mira_int_skimmarknastyrepeats_nastyseq_pass.X.lst
      

2.10.  mira_int_vectorclip_pass.X.txt

Note: replace the X by the pass of mira. Only written if [-CL:pvlc] is set to yes. Tells you where possible sequencing vector (or adaptor) leftovers were found and clipped (or not clipped).

2.11.  miratmp.ads_pass.X.forward and miratmp.ads_pass.X.complement

Note: replace the X by the pass of mira. Which read aligns with Smith-Waterman against which other read, 'forward-forward' and 'forward-complement'.

2.12.  miratmp.ads_pass.X.reject

Note: replace the X by the pass of mira. Which possible read overlaps failed the Smith-Waterman alignment check.

2.13.  miratmp.noqualities

Which reads went completely without qualities into the assembly.

2.14.  miratmp.usedids

Which reads effectively went into the assembly (after clipping etc.).

2.15.  mira_readpoolinfo.lst