Table of Contents
“The amount of entropy in the universe is constant - except when it increases. ” | ||
--Solomon Short |
The tmp directory used by mira (usually
<projectname>_d_tmp
) may contain a number of
files with information which could be interesting for other uses than
the pure assembly. This guide gives a short overview.
![]() | Note |
---|---|
This guide is probably the least complete and most out-of-date as it is updated only very infrequently. If in doubt, ask on the MIRA talk mailing list. |
![]() | Warning |
---|---|
Please note that the format of these files may change over time, although I try very hard to keep changes reduced to a minimum. |
![]() | Note |
---|---|
Remember that mira has two options that control whether log and temporary files get deleted: while [-OUT:rtd] removes the complete tmp directory after an assembly, [-OUT:rrot] removes only those log and temporary files which are not needed anymore for the continuation of the assembly. Setting both options to no will keep all log and temporary files. |
A simple list of those reads that were invalid (no sequence or similar problems).
A simple list of those reads that were sorted out because the unclipped sequence was too short as defined by [-AS:mrl].
If read extension is used ([-DP:ure]), this file contains the read name and the number of bases by which the right clipping was extended.
If any of the [-CL:] options leads to the clipping of a read, this file will tell when, which clipping, which read and by how much (or to where) the clippings were set.
Note: replace the X by the pass of mira. Should any read be categorised as megahub during the all-against-all search (SKIM3), this file will tell you which.
After the initial all-against-all search (SKIM3), this file tells you to how many other reads each read has overlaps. Furthermore, reads that have more overlaps than expected are tagged with ``mc'' (multycopy).
Note: replace the X by the pass of mira. Similar to
mira_int_posmatch_multicopystat_preassembly.0.txt
, this counts the
hash hits of each read to other reads. This time however per pass.
Note: replace the X by the pass of mira. Only written if [-SK:mnr] is set to yes. This file contains a histogram of hash occurrences encountered by SKIM3.
Note: replace the X by the pass of mira. Only written if [-SK:mnr] is set to yes. One of the more interesting files if you want to know the repetitive sequences cause the assembly to be really difficult: for each masked part of a read, the masked sequences is shown here.
E.g.
U13a04h11.t1 TATATATATATATATATATATATA U13a05b01.t1 TATATATATATATATATATATATA U13a05c07.t1 AAAAAAAAAAAAAAA U13a05e12.t1 CTCTCTCTCTCTCTCTCTCTCTCTCTCTC
Simple repeats like the ones shown above will certainly pop-up there, but a few other sequences (like e.g. SINEs, LINEs in eukaryotes) will also appear.
Nifty thing to try out if you want to have a more compressed overview: sort and unify by the second column.
sort -k 2 -u mira_int_skimmarknastyrepeats_nastyseq_pass.X.lst
Note: replace the X by the pass of mira. Only written if [-CL:pvlc] is set to yes. Tells you where possible sequencing vector (or adaptor) leftovers were found and clipped (or not clipped).
Note: replace the X by the pass of mira. Which read aligns with Smith-Waterman against which other read, 'forward-forward' and 'forward-complement'.
Note: replace the X by the pass of mira. Which possible read overlaps failed the Smith-Waterman alignment check.