Frequently asked questions

Bastien Chevreux

MIRA Version 3.4.1.1

Document revision $Id$

Table of Contents

1. Assembly quality
1.1. What is the effect of uniform read distribution (-AS:urd)?
1.2. There are too many contig debris when using uniform read distribution, how do I filter for "good" contigs?
1.3. When finishing, which places should I have a look at?
2. 454 data
2.1. What do I need SFFs for?
2.2. What's sff_extract and where do I get it?
2.3. Do I need the sfftools from the Roche software package?
2.4. Combining SFFs
2.5. Adaptors and paired-end linker sequences
2.6. What do I get in paired-end sequencing?
2.7. Sequencing protocol
2.8. Filtering sequences by length and re-assembly
3. Solexa / Illumina data
3.1. Can I see deletions?
3.2. Can I see insertions?
3.3. De-novo assembly with Solexa data
4. Hybrid assemblies
4.1. What are hybrid assemblies?
4.2. What differences are there in hybrid assembly strategies?
5. Masking
5.1. Should I mask?
5.2. How can I apply custom masking?
6. Miscellaneous
6.1. What are megahubs?
6.2. Passes and loops
6.3. Debris
6.4. Log and temporary files: more info on what happened during the assembly
6.4.1. Sequence clipping after load
7. Platforms and Compiling
7.1. Windows
 

Every question defines its own answer. Except perhaps 'Why a duck?'

 
 --Solomon Short

This list is a collection of frequently asked questions and answers regarding different aspects of the MIRA assembler.

[Note]Note
This document needs to be overhauled.

1.  Assembly quality

1.1. Test question 1
1.2. Test question 2

1.1.

Test question 1

Test answer 1

1.2.

Test question 2

Test answer 2

1.1.  What is the effect of uniform read distribution (-AS:urd)?

	I have a project which I once started quite normally via 
	"--job=denovo,genome,accurate,454"
	and once with explicitly switching off the uniform read distribution
	"--job=denovo,genome,accurate,454 -AS:urd=no"
	I get less contigs in the second case and I wonder if that is not better.
	Can you please explain?
      

Since 2.9.24x1, MIRA has a feature called "uniform read distribution" which is normally switched on. This feature reduces overcompression of repeats during the contig building phase and makes sure that, e.g., a rRNA stretch which is present 10 times in a bacterium will also be present approximately 10 times in your result files.

It works a bit like this: under the assumption that reads in a project are uniformly distributed across the genome, MIRA will enforce an average coverage and temporarily reject reads from a contig when this average coverage multiplied by a safety factor is reached at a given site.

It's generally a very useful tool disentangle repeats, but has some slight secondary effects: rejection of otherwise perfectly good reads. The assumption of read distribution uniformity is the big problem we have here: of course it's not really valid. You sometimes have less, and sometimes more than "the average" coverage. Furthermore, the new sequencing technologies - 454 perhaps but especially the microreads from Solexa & probably also SOLiD - show that you also have a skew towards the site of replication origin.

One example: let's assume the average coverage of your project is 8 and by chance at one place you have 17 (non-repetitive) reads, then the following happens:

$p$= parameter of -AS:urdsip

Pass 1 to $p-1$: MIRA happily assembles everything together and calculates a number of different things, amongst them an average coverage of ~8. At the end of pass '$p-1$', it will announce this average coverage as first estimate to the assembly process.

Pass $p$: MIRA has still assembled everything together, but at the end of each pass the contig self-checking algorithms now include an "average coverage check". They'll invariably find the 17 reads stacked and decide (looking at the -AS:urdct parameter which I now assume to be 2) that 17 is larger than 2*8 and that this very well may be a repeat. The reads get flagged as possible repeats.

Pass $p+1$ to end: the "possibly repetitive" reads get a much tougher treatment in MIRA. Amongst other things, when building the contig, the contig now looks that "possibly repetitive" reads do not overstack by an average coverage multiplied by a safety value (-AS:urdcm) which I'll assume in this example to be 1.5. So, at a certain point, say when read 14 or 15 of that possible repeat want to be aligned to the contig at this given place, the contig will just flatly refuse and tell the assembler to please find another place for them, be it in this contig that is built or any other that will follow. Of course, if the assembler cannot comply, the reads 14 to 17 will end up as contiglet (contig debris, if you want) or if it was only one read that got rejected like this, it will end up as singlet or in the debris file.

Tough luck. I do have ideas on how to reintegrate those reads at the and of an assembly, but I had deferred doing this as in every case I had looked up, adding those reads to the contigs wouldn't have changed anything ... there's already enough coverage. What I do in those cases is simply filter away the contiglets (defined as being of small size and having an average coverage below the average coverage of the project / 3 (or 2.5)) from a project.

1.2.  There are too many contig debris when using uniform read distribution, how do I filter for "good" contigs?

	When using uniform read distribution there are too many contig with low
	coverage which I don't want to integrate by hand in the finishing process. How
	do I filter for "good" contigs?
      

OK, let's get rid of the cruft. It's easy, really: you just need to look up one number, take two decisions and then launch a command.

The first decision you need to take is on the minimum average coverage the contigs you want to keep should have. Have a look at the file *_info_assembly.txt which is in the info directory after assembly. In the "Large contigs" section, there's a "Coverage assessment" subsection. It looks a bit like this:

	...
	Coverage assessment:
	--------------------
	Max coverage (total): 43
	Max coverage
	Sanger: 0
	454:    43
	Solexa: 0
	Solid:  0
	Avg. total coverage (size ≥ 5000): 22.30
	Avg. coverage (contig size ≥ 5000)
	Sanger: 0.00
	454:    22.05
	Solexa: 0.00
	Solid:  0.00
	...
      

This project was obviously a 454 only project, and the average coverage for it is ~22. This number was estimated by MIRA by taking only contigs of at least 5Kb into account, which for sure left out everything which could be categorised as debris. It's a pretty solid number.

Now, depending on how much time you want to invest performing some manual polishing, you should extract contigs which have at least the following fraction of the average coverage:

  • 2/3 if a quick and "good enough" is what you want and you don't want to do some manual polishing. In this example, that would be around 14 or 15.

  • 1/2 if you want to have a "quick look" and eventually perform some contig joins. In this example the number would be 11.

  • 1/3 if you want quite accurate and for sure not loose any possible repeat. That would be 7 or 8 in this example.

The second decision you need to take is on the minimum length your contigs should have. This decision is a bit dependent on the sequencing technology you used (the read length). The following are some rules of thumb:

  • Sanger: 1000 to 2000

  • 454 GS20: 500

  • 454 FLX: 1000

  • 454 Titanium: 1500

Let's assume we decide for an average coverage of 11 and a minimum length of 1000 bases. Now you can filter your project with convert_project

	convert_project -f caf -t caf -x 1000 -y 14 sourcefile.caf filtered.caf
      

1.3.  When finishing, which places should I have a look at?

	I would like to find those places where MIRA wasn't sure and give it a quick
	shot. Where do I need to search?
      

Search for the following tags in gap4 or any other finishing program for finding places of importance (in this order).

  • IUPc

  • UNSc

  • SRMc

  • WRMc

  • STMU (only hybrid assemblies)

  • STMS (only hybrid assemblies)

2.  454 data

2.1. What are little boys made of?
2.2. What are little girls made of?

2.1.

What are little boys made of?

Snips and snails and puppy dog tails.

2.2.

What are little girls made of?

Sugar and spice and everything nice.

2.1.  What do I need SFFs for?

	I need the .sff files for MIRA to load ...
      

Nope, you don't, but it's a common misconception. MIRA does not load SFF files, it loads FASTA, FASTA qualities, FASTQ, XML, CAF, EXP and PHD. The reason why one should start from the SFF is: those files can be used to create a XML file in TRACEINFO format. This XML contains the absolutely vital information regarding clipping information of the 454 adaptors (the sequencing vector of 454, if you want).

For 454 projects, MIRA will then load the FASTA, FASTA quality and the corresponding XML. Or from CAF, if you have your data in CAF format.

2.2.  What's sff_extract and where do I get it?

	How do I extract the sequence, quality and other values from SFFs?
      

Use the sff_extract script from Jose Blanca at the University of Valencia to extract everything you need from the SFF files (sequence, qualities and ancillary information). The home of sff_extract is: http://bioinf.comav.upv.es/sff_extract/index.html but I am thankful to Jose for giving permission to distribute the script in the MIRA 3rd party package (separate download).

2.3.  Do I need the sfftools from the Roche software package?

No, not anymore. Use the sff_extract script to extract your reads. Though the Roche sfftools package containes a few additional utilities which could be useful.

2.4.  Combining SFFs

	I am trying to use MIRA to assemble reads obtained with the 454 technology
	but I can't combine my sff files since I have two files obtained with GS20
	system and 2 others obtained with the GS-FLX system. Since they use
	different cycles (42 and 100) I can't use the sfffile to combine both.
      

You do not need to combine SFFs before translating them into something MIRA (or other software tools) understands. Use sff_extract which extracts data from the SFF files and combines this into input files.

2.5.  Adaptors and paired-end linker sequences

	I have no idea about the adaptor and the linker sequences, could you send me
	the sequences please?
      

Here are the sequences as filed by 454 in their patent application:

	>AdaptorA
	CTGAGACAGGGAGGGAACAGATGGGACACGCAGGGATGAGATGG
	>AdaptorB
	CTGAGACACGCAACAGGGGATAGGCAAGGCACACAGGGGATAGG
      

However, looking through some earlier project data I had, I also retrieved the following (by simply making a consensus of sequences that did not match the target genome anymore):

	>5prime454adaptor???
	GCCTCCCTCGCGCCATCAGATCGTAGGCACCTGAAA
	>3prime454adaptor???
	GCCTTGCCAGCCCGCTCAGATTGATGGTGCCTACAG
      

Go figure, I have absolutely no idea where these come from as they also do not comply to the "tcag" ending the adaptors should have.

I currently know one linker sequence (454/Roche also calls it spacer for GS20 and FLX paired-end sequencing:

	>flxlinker
	GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC
      

For Titanium data using standard Roche protocol, you need to screen for two linker sequences:

	>titlinker1
	TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG
	>titlinker2
	CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA
      

[Warning]Warning
Some sequencing labs modify the adaptor sequences for tagging and similar things. Ask your sequencing provider for the exact adaptor and/or linker sequences.

2.6.  What do I get in paired-end sequencing?

	Another question I have is does the read pair sequences have further
	adaptors/vectors in the forward and reverse strands?
      

Like for normal 454 reads - the normal A and B adaptors can be present in paired-end reads. That theory this could could look like this:

A-Adaptor - DNA1 - Linker - DNA2 - B-Adaptor.

It's possible that one of the two DNA fragments is *very* short or is missing completely, then one has something like this:

A-Adaptor - DNA1 - Linker - B-Adaptor

or

A-Adaptor - Linker - DNA2 - B-Adaptor

And then there are all intermediate possibilities with the read not having one of the two adaptors (or both). Though it appears that the majority of reads will contain the following:

DNA1 - Linker - DNA2

There is one caveat: according to current paired-end protocols, the sequences will NOT have the direction

	---> Linker <--- 
      

as one might expect when being used to Sanger Sequencing, but rather in this direction

	<--- Linker --->
      

2.7.  Sequencing protocol

	Is there a way I can find out which protocol was used?
      

Yes. The best thing to do is obviously to ask your sequencing provider.

If this is - for whatever reason - not possible, this list might help.

Are the sequences ~100-110 bases long? It's GS20.

Are the sequences ~220-250 bases long? It's FLX.

Are the sequences ~350-450 bases long? It's Titanium.

Do the sequences contain a linker (GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC)? It's a paired end protocol.

If the sequences left and right of the linker are ~29bp, it's the old short paired end (SPET, also it's most probably from a GS20). If longer, it's long paired-end (LPET, from a FLX).

2.8.  Filtering sequences by length and re-assembly

I have two datasets of ~500K sequences each and the sequencing company
already did an assembly (using MIRA) on the basecalled and fully processed
reads (using of course the accompanying *qual file). Do you suggest that I
should redo the assembly after filtering out sequences being shorter than a
certain length (eg those that are <200bp)? In other words, am I taking into
account low quality sequences if I do the assembly the way the sequencing
company did it (fully processed reads + quality files)?
      

I don't think that filtering out "shorter" reads will bring much positive improvement. If the sequencing company used the standard Roche/454 pipeline, the cut-offs for quality are already quite good, remaining sequences should be, even when being < 200bp, not of bad quality, simply a bit shorter.

Worse, you might even introduce a bias when filtering out short sequences: chemistry and library construction being what they are (rather imprecise and sometimes problematic), some parts of DNA/RNA yield smaller sequences per se ... and filtering those out might not be the best move.

You might consider doing an assembly if the company used a rather old version of MIRA (<3.0.0 for sure, perhaps also <3.0.5).

3.  Solexa / Illumina data

3.1.  Can I see deletions?

	Suppose you ran the genome of a strain that had one or more large
	deletions. Would it be clear from the data that a deletion had occurred?
      

In the question above, I assume you'd compare your strain X to a strain Ref and that X had deletions compared to Ref. Furthermore, I base my answer on data sets I have seen, which presently were 36 and 76 mers, paired and unpaired.

Yes, this would be clear. And it's a piece of cake with MIRA.

Short deletions (1 to 10 bases): they'll be tagged SROc or WRMc. General rule: deletions of up to 10 to 12% of the length of your read should be found and tagged without problem by MIRA, above that it may or may not, depending a bit on coverage, indel distribution and luck.

Long deletions (longer than read length): they'll be tagged with MCVc tag by MIRA ins the consensus. Additionally, when looking at the FASTA files when running the CAF result through convert_project: long stretches of sequences without coverage (the @ sign in the FASTAs) of X show missing genomic DNA.

3.2.  Can I see insertions?

	Suppose you ran the genome of a strain X that had a plasmid missing from the
	reference sequence. Alternatively, suppose you ran a strain that had picked
	up a prophage or mobile element lacking in the reference. Would that
	situation be clear from the data?
      

Short insertions (1 to 10 bases): they'll be tagged SROc or WRMc. General rule: deletions of up to 10 to 12% of the length of your read should be found and tagged without problem by MIRA, above that it may or may not, depending a bit on coverage, indel distribution and luck.

Long insertions: it's a bit more work than for deletions. But if you ran a de-novo assembly on all reads not mapped against your reference sequence, chances are good you'd get good chunks of the additional DNA put together

Once the Solexa paired-end protocol is completely rolled out and used on a regular base, you would even be able to place the additional element into the genome (approximately).

3.3.  De-novo assembly with Solexa data

	Any chance you could assemble de-novo the sequence of a from just the Solexa
	data?
      

[Warning]Warning
Highly opinionated answer ahead, your mileage may vary.

Allow me to make a clear statement on this: maybe.

But the result would probably be nothing I would call a good assembly. If you used anything below 76mers, I'm highly sceptical towards the idea of de-novo assembly with Solexa (or ABI SOLiD) reads that are in the 30 to 50bp range. They're really too short for that, even paired end won't help you much (especially if you have library sizes of just 200 or 500bp). Yes, there are papers describing different draft assemblers (SHARCGS, EDENA, Velvet, Euler and others), but at the moment the results are less than thrilling to me.

If a sequencing provider came to me with N50 numbers for an assembled genome in the 5-8 Kb range, I'd laugh him in the face. Or weep. I wouldn't dare to call this even 'draft'. I'd just call it junk.

On the other hand, this could be enough for some purposes like, e.g., getting a quick overview on the genetic baggage of a bug. Just don't expect a finished genome.

4.  Hybrid assemblies

4.1.  What are hybrid assemblies?

Hybrid assemblies are assemblies where one used more than one sequencing technology. E.g.: Sanger and 454, or 454 and Solexa, or Sanger and Solexa etc.pp

4.2.  What differences are there in hybrid assembly strategies?

Basically, one can choose two routes: multi-step or all-in-one-go.

Multi-steps means: to assemble reads from one sequencing technology (ideally the one from the shorter tech like, e.g., Solexa), fragment the resulting contigs into pseudo-reads of the longer tech and assemble these with the real reads from the longer tech (like, e.g., 454). The advantage of this approach is that it will be probably quite faster than the all-in-one-go approach. The disadvantage is that you loose a lot of information when using only consensus sequence of the shorter read technology for the final assembly.

All-in-one-go means: use all reads in one single assembly. The advantage of this is that the resulting alignment will be made of true reads with a maximum of information contained to allow a really good finishing. The disadvantage is that the assembly will take longer and will need more RAM.

5.  Masking

5.1.  Should I mask?

	In EST projects, do you think that the highly repetitive option will get rid
	of the repetitive sequences without going to the step of repeat masking?
      

For eukaryotes, yes. Please also consult the -SK:mnr option.

Remember: you still MUST have sequencing vectors and adaptors clipped! In EST sequences the poly-A tails should be also clipped (or let mira do it.

For prokaryotes, I´m a big fan of having a first look at unmasked data. Just try to start MIRA without masking the data. After something like 30 minutes, the all-vs-all comparison algorithm should be through with a first comparison round. grep the log for the term "megahub" ... if it doesn't appear, you probably don't need to mask repeats

5.2.  How can I apply custom masking?

	I want to mask away some sequences in my input. How do I do that?
      

First, if you want to have Sanger sequencing vectors (or 454 adaptor sequences) "masked", please note that you should rather use ancillary data files (CAF, XML or EXP) and use the sequencing or quality clip options there.

Second, please make sure you have read and understood the documentation for all -CL parameters in the main manual, but especially -CL:mbc:mbcgs:mbcmfg:mbcmeg as you might want to switch it on or off or set different values depending on your pipeline and on your sequencing technology.

You can without problem mix your normal repeat masking pipeline with the FASTA or EXP input for MIRA, as long as you mask and not clip the sequence.

An example:

	>E09238ARF0
	tcag GTGTCAGTGTTGACTGTAAAAAAAAAGTACGTATGGACTGCATGTGCATGTCATGGTACGTGTCA
	GTCAGTACAAAAAAAAAAAAAAAAAAAAGTACGT tgctgacgcacatgatcgtagc
      

(spaces inserted just as visual helper in the example sequence, they would not occur in the real stuff)

The XML will contain the following clippings: left clip = 4 (clipping away the "tcag" which are the last four bases of the adaptor used by Roche) right clip= ~90 (clipping away the "tgctgac..." lower case sequence on the right side of the sequence above.

Now, on the FASTA file that was generated with reads_sff.py or with the Roche sff* tools, you can let run, e.g., a repeat masker. The result could look like this:

	>E09238ARF0
	tcag XXXXXXXXX TTGACTGTAAAAAAAAAGTACGTATGGACTGCATGTGCATGTCATGGTACGTGTCA
	GTCAGTACAAAAAAAAAAAAAAAAAAAAGTACGT tgctgacgcacatgatcgtagc
      

The part with the Xs was masked away by your repeat masker. Now, when MIRA loads the FASTA, it will first apply the clippings from the XML file (they're still the same). Then, if the option to clip away masked areas of a read (-CL:mbc, which is normally on for EST projects), it will search for the stretches of X and internally also put clips to the sequence. In the example above, only the following sequence would remain as "working sequence" (the clipped parts would still be present, but not used for any computation.

	>E09238ARF0
	...............TTGACTGTAAAAAAAAAGTACGTATGGACTGCATGTGCATGTCATGGTACGTGTCA
	GTCAGTACAAAAAAAAAAAAAAAAAAAAGTACGT........................
      

Here you can also see the reason why your filters should mask and not clip the sequence. If you change the length of the sequence, the clips in the XML would not be correct anymore, wrong clippings would be made, wrong sequence reconstructed, chaos ensues and the world would ultimately end. Or something.

IMPORTANT! It might be that you do not want MIRA to merge the masked part of your sequence with a left or right clip, but that you want to keep it something like DNA - masked part - DNA. In this case, consult the manual for the -CL:mbc switch, either switch it off or set adequate options for the boundaries and gap sizes.

Now, if you look at the sequence above, you will see two possible poly-A tails ... at least the real poly-A tail should be masked else you will get megahubs with all the other reads having the poly-A tail.

You have two possibilities: you mask yourself with an own program or you let MIRA do the job (-CL:cpat, which should normally be on for EST projects but I forgot to set the correct switch in the versions prior to 2.9.26x3, so you need to set it manually for 454 EST projects there).

IMPORTANT! Never ever at all use two poly-A tail masker (an own and the one from MIRA): you would risk to mask too much. Example: assume the above read you masked with a poly-A masker. The result would very probably look like this:

	>E09238ARF0
	tcag XXXXXXXXX TTGACTGTAAAAAAAAAGTACGTATGGACTGCATGTGCATGTCATGGTACGTGTCA
	GTCAGTAC XXXXXXXXXXXXXXXXXXXX GTACGT tgctgacgcacatgatcgtagc
      

And MIRA would internally make the following out of it after loading:

	>E09238ARF0
	...............TTGACTGTAAAAAAAAAGTACGTATGGACTGCATGTGCATGTCATGGTACGTGTCA
	GTCAGTAC..................................................
      

and then apply the internal poly-A tail masker:

	>E09238ARF0
	...............TTGACTGT................................................
	..........................................................
      

You'd be left with ... well, a fragment of your sequence.

6.  Miscellaneous

6.1.  What are megahubs?

	I looked in the log file and that term "megahub" you told me about appears
	pretty much everywhere. First of all, what does it mean?
      

Megahub is the internal term for MIRA that the read is massively repetitive with respect to the other reads of the projects, i.e., a read that is a megahub connects to an insane number of other reads.

This is a clear sign that something is wrong. Or that you have a quite repetitive eukaryote. But most of the time it's sequencing vectors (Sanger), A and B adaptors or paired-end linkers (454), unmasked poly-A signals (EST) or non-normalised EST libraries which contain high amounts of housekeeping genes (always the same or nearly the same).

Countermeasures to take are:

  • set clips for the sequencing vectors (Sanger) or Adaptors (454) either in the XML or EXP files

  • for ESTs, mask poly-A in your input data (or let MIRA do it with the -CL:cpat parameter)

  • only after the above steps have been made, use the -SK:mnr switch to let mira automatically mask nasty repeats, adjust the threshold with -SK:rt

  • if everything else fails, filter out or mask sequences yourself in the input data that come from housekeeping genes or nasty repeats.

6.2.  Passes and loops

	While processing some contigs with repeats i get
	"Accepting probably misassembled contig because of too many iterations."
	What is this?
      

That's quite normal in the first few passes of an assembly. During each pass (-AS:nop), contigs get built one by one. After a contig has been finished, it checks itself whether it can find misassemblies due to repeats (and marks these internally). If no misassembly, perfect, build next contig. But if yes, the contig requests immediate re-assembly of itself.

But this can happen only a limited number of times (governed by -AS:rbl). If there are still misassemblies, the contig is stored away anyway ... chances are good that in the next full pass of the assembler, enough knowledge has been gained top correctly place the reads.

So, you need to worry only if these messages still appear during the last pass. The positions that cause this are marked with "SRMc" tags in the assemblies (CAF, ACE in the result dir; and some files in the info dir).

6.3.  Debris

	What are the debris composed of?
      

  • sequences too short (after trimming)

  • megahubs

  • sequences almost completely masked by the nasty repeat masker ([-SK:mnr])

  • singlets, i.e., reads that after an assembly pass did not align into any contig (or where rejected from every contig).

  • sequences that form a contig with less reads than defined by [-AS:mrpc]

6.4.  Log and temporary files: more info on what happened during the assembly

	I do not understand why ... happened. Is there a way to find out?
      

Yes. The tmp directory contains, beside temporary data, a number of log files with more or less readable information. While development versions of MIRA keep this directory after finishing, production versions normally delete this directory after an assembly. To keep the logs and temporary file also in production versions, use "-OUT:rtd=no".

As MIRA also tries to save as much disk space as possible, some logs and temporary files are rotated (which means that old logs and tmps get deleted). To switch off this behaviour, use "-OUT:rrot=no". Beware, the size of the tmp directory will increase, sometimes dramatically so.

6.4.1.  Sequence clipping after load

How MIRA clipped the reads after loading them can be found in the file mira_int_clippings.0.txt. The entries look like this:

	  load:  minleft. U13a01d05.t1    Left: 11         -> 30
	

Interpret this as: after loading, the read "U13a01d05.t1" had a left clipping of eleven. The "minleft" clipping option of MIRA did not like it and set it to 30.

	  load:  bad seq. gnl|ti|1133527649       Shortened by 89 New right: 484
	

Interpret this as: after loading, the read "gnl|ti|1133527649" was checked with the "bad sequence search" clipping algorithm which determined that there apparently is something dubious, so it shortened the read by 89 bases, setting the new right clip to position 484.

7.  Platforms and Compiling

7.1.  Windows

	Also, is MIRA be available on a windows platform?
      

As a matter of fact: it was and may be again. While I haven't done it myself, according to reports I got compiling MIRA 2.9.3* in a Cygwin environment was actually painless. But since then BOOST and multi-threading has been included and I am not sure whether it is still as easy.

I'd be thankful for reports :-)