Marker Assisted Selection in Wheat - HOME USDA National Institute of Food and Agriculture The Borlaug Global Rust Initiative Marker Assisted Selection in Wheat

Data processing protocols

Submit a genetic map to GrainGenes

Once you finish a genetic map and submit it for publication it is a good idea, sometimes it is even mandatory, to deposit it in a public repository. Having a map in a web-accesible database has several advantages because the map is not just an static object as it would be in a publication. Instead, it can be updated and gets linked to other pieces of information allowing users to extract more information from the map. Among these advantages we can mention,

  • The map will be accesible to a wider audience. A user can reach a map directly or it can be the result of a general, less specific search.
  • Information on the map will be linked to other maps or data resources. Every feature on a map will have a hyperlink to an individual feature report, which in turn, has links to the molecular markers involved, other maps or probe information. As an example, visit this map in GrainGenes and follow some of the hyperlinks associated to it.
  • Any map can be inspected side by side with one or more maps, facilitating the comparison of loci organization.

The choice of the WheatCAP project for depositing its maps is GrainGenes ( This site hosts a large collection of maps for wheat and other Triticeae and Avena. To submit a map to GrainGenes you need to format your data following a format defined in this template.

The template in detail

The MS-Excel template has five sheets:

  • Introduction. General information and GrainGenes contacts. You do not need to edit anything here
  • Map_data. This section contains your map metadata. That is, authors, parents of the cross, bibliographical references and comments.
  • Locus. This is a table which shows the alleles for every locus you scored across the mapping population..
  • Probe. Add here probe information for probes not recorded yet in GrainGenes.
  • Library. Normally not used. It holds data about clones libraries.

Let's take a closer look at the two worksheets you will be working with most of the time.

The Map_data worksheet

Data are organized side by side. A cell on the "B" column holds the title of a item and the contiguous cell on the "C" column contains its description. Let's use as an example the Kofa x UC113 map (you can get the Excel file for this map here). This is a cross between two durum wheats. The first five fields are:

Map_Data Durum wheat, Kofa x UC1113
Species Triticum turgidum ssp. Durum
Female_parent Kofa
Male_parent UC1113
Type Genetic
Map_units cM (Kosambi)

The first row, Map_Data, is a title for the map. The next three fields, Species, Female_parent and Male_parent, are self-explanatory. The Type field is for indicating whether the map is genetic, physical, cytogenetic, etc. Finally, under Map_units remember to record what mapping function you used, Kosambi, Haldane, etc.

You can add one more Reference fields. In our example, there are two, the first is a "classical" bibliographical record and the second is a link to the document object identifier (DOI, assigned by the journal which published the paper. If you have more than one paper, technical report or web link that are relevant for your map, you can add more Reference records.

Reference QTL analysis of pasta quality using a composite microsatellite - SNP map of durum wheat. TAG Theoretical and Applied Genetics, 2008, 117(8):1361-1377
Reference doi:10.1007/s00122-008-0869-1

The next piece of information is one or more Contact rows, one for each person that can be contacted for questions about the map. The last rows on this worksheet are reserved for one or more Remarks fields. You should use at least one to indicate the scoring of the alleles, and it is advisable to add one more Remark with a short description of the mapping population.

The locus worksheet

This is a table that has as many rows as loci analyzed and five columns. The contents of the cells on the first column, Map, is the same for all loci, a description of the cross that created the mapping population. The cells on the second colum, Probe, contain the probe names that were used to score the Locus shown on the third column. Note that the locus designation includes the chromosome. Then comes the Position column showing the map position in cM.

Each cell on the Scoring column holds a text string whose length correspond to the number of lines in the mapping population plus the two parents. The observed allele is indicated using the coding schema described in the Remarks field of the Map_Data worksheet. For example, in the following extract of the locus table, the first locus shown, Xbarc148-1A, is missing in the first line (3), the second line has the allele from Kofa (1), the third has the allele from UC1113 (0) and so on. The same three lines have the Kofa allele (1) for the second locus, Xwg983-1A.

Probe Locus Position Scoring
BARC148 Xbarc148-1A 47.5 310101000001
WG983 Xwg983-1A 73.2 111113000310
BARC83 Xbarc83-1A 75.2 111111000010
WMS135 Xgwm135-1A 76.4 111101000010
WMC312 Xwmc312-1A 89.3 111101010010
CFA2129 Xcfa2129-1A 91.6 111101010010
DuPw38 Xdupw38-1A 113.3 001010010110
CFA2147 Xcfa2147-1A 118.0 000010000110

One problem with this way of formatting allele data is that people tend to order their spreadsheets for routine work with one locus per row and one column per recombinant line. Furthermore, letters are used to indicate alleles instead of numbers. One way to convert from one format to the other is to copy the cells to a word processor (something basic like Windows' Notepad works better). Then you have to remove extra blank spaces, replace letters with numbers and add an apostrophe at the beginning of each line before copying back to MS-Excel. If you forget the apostrophe ( ' ), the leading zeros of the scorings will be trimmed:

00000101100 becomes 101100 because it is read as a number.
'00000101100 keeps the leading zeros because it is interpreted as text.

Another option is to work with the built-in function CONCATENATE() or add a new function to MS-Excel to handle this situation.

Using MultiCat

The CONCATENATE() function that comes with Excel can only combine text from up to 30 cells. This means that if you have a hundred and more lines, you have to use this function several times to get a combined result for all of them. Nick Crowley, from Stephen Baenziger's lab, brought to our attention a user defined function (UDF) for MS-Excel, called MultiCat(), that concantenates as many cell as needed. It is more useful for formatting your mapping data according to the GrainGenes format than the CONCATENATE() function. The code for this UDF was originally posted in, and may be freely distributed under the Gnu GPL License.

Use of the MultiCat() is as simple as any other built-in Excel function, and adding its code to your worksheets should take a few minutes. The hardest part, at least in my installation of MS-Excel was understanding the security policy for macros, which includes UDFs. If you have MS-Excel configured with its default settings, the macro security is set to high or very high. This means that only those macros that come with a digital signature ca be executed on your computer. Even if you are the author of the code, you are not allowed to execute it.

At least in MS-Office 2003 you can add the code for MultiCat() into a new module. When you activate the function selector in a worksheet (Shift-F3), you will be able to see and select the function, but then an error will show up, that refers to a naming problem and never to a security issue. So, the steps to have MultiCat() running are this: first change your macro security settings, add a new module, and finally insert MultiCat() in any cell of your worksheet.

Changing security settings in MS-Excel:
  1. Open MS-Excel with an empty worksheet.
  2. Go to Tools > Options > Security
  3. Click on the button labeles "Macro security..."
  4. Select "Medium"
  5. Close MS-Excel

Note: Now, every time you open a worksheet that contains macros, you will be asked whether you want to continue or not. Please, be careful, sometimes computer viruses travel in the form of MS-Office macros, so do not open a file with macros if you do not know its origins.

Note: We strongly recommend to consult your network or computer security staff about changing your security settings in MS-Excel and be sure that your antivirus is running and updated.

Inserting a new module
  1. Open your xls worksheet containing you mapping data
  2. Go to Tools > Macros > Visual Basic Editor or just press Alt-F11
  3. The Visual Basic Editor pops up, go to insert > module
  4. A new window shows up. Copy and paste this code into that window
	'Purpose: Concatenate all cells in a range    
	'Inputs:  rRng - range to be concatenated    
	'         sDelimiter - optional delimiter    
	'            to insert between cell Texts    
	'Returns: concatenated string    
	Public Function MultiCat( _          
	       ByRef rRng As Excel.Range, _
		   Optional ByVal sDelim As String = "") As String
       Dim rCell As Range  
	   For Each rCell In rRng     
	      MultiCat = MultiCat & sDelim & rCell.Text       
	   Next rCell       
	   MultiCat = Mid(MultiCat, Len(sDelim) + 1)    
	 End Function  

   5. Close the module and the editor

Using MultiCat()
  1. Let's say that the data for a marker fill cells from B2 to EK2
  2. Enter in cell EL2, or any other blank cell you like: MultiCat(B2:EK2)
  3. Now you have to copy the new long string into the worksheet you will use for submission. Remember #using Paste Special with values only. Otherwise, you would copy the formula with a wrong reference.

Submit your map

Once your map file is ready you can submit it to the e-mail address on the first worksheet of the GrainGenes template file. If you are a WheatCAP collaborator, submit your map or questions about map preparation to Marcelo Soria.