Home > RAVEN > getMetsFromKEGG.m

getMetsFromKEGG

PURPOSE ^

getMetsFromKEGG

SYNOPSIS ^

function model=getMetsFromKEGG(keggPath)

DESCRIPTION ^

 getMetsFromKEGG
   Retrieves information on all metabolites stored in KEGG database

   keggPath    this function reads data from a local FTP dump of the KEGG
               database. keggPath is the pathway to the root of the database

   model       a model structure generated from the database. The following
               fields are filled
               id:             'KEGG'
               description:    'Automatically generated from KEGG database'
               mets:           KEGG compound ids
               metNames:       Compound name. Only the first name will be
                               saved if there are several synonyms
               metMiriams:     If there is a CHEBI id available, then that
                               will be saved here
               inchis:         InChI string for the metabolite
               metFormulas:    The chemical composition of the metabolite.
                               This will only be loaded if there is no InChI 
                               string
   If the file keggMets.mat is in the RAVEN directory it will be loaded 
   instead of parsing of the KEGG files. If it does not exist it will be 
   saved after parsing of the KEGG files. In general, you should remove the
   keggMets.mat file if you want to rebuild the model structure from a
   newer version of KEGG.
               
   Usage: model=getMetsFromKEGG(keggPath)

   Rasmus Agren, 2013-08-01

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function model=getMetsFromKEGG(keggPath)
0002 % getMetsFromKEGG
0003 %   Retrieves information on all metabolites stored in KEGG database
0004 %
0005 %   keggPath    this function reads data from a local FTP dump of the KEGG
0006 %               database. keggPath is the pathway to the root of the database
0007 %
0008 %   model       a model structure generated from the database. The following
0009 %               fields are filled
0010 %               id:             'KEGG'
0011 %               description:    'Automatically generated from KEGG database'
0012 %               mets:           KEGG compound ids
0013 %               metNames:       Compound name. Only the first name will be
0014 %                               saved if there are several synonyms
0015 %               metMiriams:     If there is a CHEBI id available, then that
0016 %                               will be saved here
0017 %               inchis:         InChI string for the metabolite
0018 %               metFormulas:    The chemical composition of the metabolite.
0019 %                               This will only be loaded if there is no InChI
0020 %                               string
0021 %   If the file keggMets.mat is in the RAVEN directory it will be loaded
0022 %   instead of parsing of the KEGG files. If it does not exist it will be
0023 %   saved after parsing of the KEGG files. In general, you should remove the
0024 %   keggMets.mat file if you want to rebuild the model structure from a
0025 %   newer version of KEGG.
0026 %
0027 %   Usage: model=getMetsFromKEGG(keggPath)
0028 %
0029 %   Rasmus Agren, 2013-08-01
0030 %
0031 
0032 %NOTE: This is how one entry looks in the file
0033 
0034 % ENTRY       C00001                      Compound
0035 % NAME        H2O;
0036 %             Water
0037 % FORMULA     H2O
0038 % MASS        18.0106
0039 % REMARK      Same as: D00001
0040 % REACTION    R00001 R00002 R00004 R00005 R00009 R00010 R00011 R00017
0041 %             R00022 R00024 R00026 R00028 R00036 R00041 R00044 R00045
0042 % ENZYME      1.1.1.160
0043 % DBLINKS     PubChem: 7435
0044 %             ChEBI: 29110
0045 
0046 %Then a lot of info about the positions of the atoms and so on. It is not
0047 %certain that each metabolite follows this structure exactly
0048 
0049 %The file is not tab-delimited. Instead each label is 12 characters
0050 %(except for '///')
0051 
0052 %Check if the reactions have been parsed before and saved. If so, load the
0053 %model.
0054 [ST I]=dbstack('-completenames');
0055 ravenPath=fileparts(ST(I).file);
0056 metsFile=fullfile(ravenPath,'kegg','keggMets.mat');
0057 if exist(metsFile, 'file')
0058     fprintf(['NOTE: Importing KEGG metabolites from ' strrep(metsFile,'\','/') '.\n']);
0059     load(metsFile);
0060 else
0061     %Download required files from KEGG if it doesn't exist in the directory
0062     downloadKEGG(keggPath);
0063     
0064     %Add new functionality in the order specified in models
0065     model.id='KEGG';
0066     model.description='Automatically generated from KEGG database';
0067 
0068     %Preallocate memory for 20000 metabolites
0069     model.mets=cell(20000,1);
0070     model.metNames=cell(20000,1);
0071     model.metFormulas=cell(20000,1);
0072     model.metMiriams=cell(20000,1);
0073 
0074     %First load information on metabolite ID, metabolite name, composition, and
0075     %CHEBI
0076     fid = fopen(fullfile(keggPath,'compound'), 'r');
0077 
0078     %Keeps track of how many metabolites that have been added
0079     metCounter=0;
0080 
0081     %Loop through the file
0082     while 1
0083       %Get the next line
0084       tline = fgetl(fid);
0085 
0086       %Abort at end of file
0087       if ~ischar(tline)
0088           break;
0089       end
0090 
0091       %Skip '///'
0092       if numel(tline)<12
0093           continue;
0094       end
0095 
0096       %Check if it's a new reaction
0097       if strcmp(tline(1:12),'ENTRY       ')
0098           metCounter=metCounter+1;
0099 
0100           %Add empty strings where there should be such
0101           model.metNames{metCounter}='';
0102           model.metFormulas{metCounter}='';
0103 
0104           %Add compound ID (always 6 characters)
0105           model.mets{metCounter}=tline(13:18);
0106       end
0107 
0108       %Add name
0109       if strcmp(tline(1:12),'NAME        ')
0110           %If there are synonyms, then the last character is ';'
0111           if strcmp(tline(end),';')
0112                 model.metNames{metCounter}=tline(13:end-1);
0113           else
0114                 model.metNames{metCounter}=tline(13:end);
0115           end
0116       end
0117 
0118       %Add composition
0119       if strcmp(tline(1:12),'FORMULA     ')
0120           model.metFormulas{metCounter}=tline(13:end);
0121       end
0122 
0123       %Add CHEBI id
0124       if numel(tline)>19
0125           if strcmp(tline(1:19),'            ChEBI: ')
0126               chebiID=tline(20:end); %This is because there is sometimes more then one CHEBI index
0127 
0128               %Only load one id for now
0129               s=strfind(chebiID,' ');
0130               if any(s)
0131                  chebiID=chebiID(1:s-1);
0132               end
0133               miriamStruct.name{1}='obo.chebi:CHEBI';
0134               miriamStruct.value{1}=chebiID;
0135               model.metMiriams{metCounter}=miriamStruct;
0136           end
0137       end
0138     end
0139 
0140     %Close the file
0141     fclose(fid);
0142 
0143     %If too much space was allocated, shrink the model
0144     model.mets=model.mets(1:metCounter);
0145     model.metNames=model.metNames(1:metCounter);
0146     model.metFormulas=model.metFormulas(1:metCounter);
0147     model.metMiriams=model.metMiriams(1:metCounter);
0148 
0149     %If there was no CHEBI found, add the KEGG id as a metMiriams
0150     for i=1:numel(model.mets)
0151         if ~isstruct(model.metMiriams{i})
0152             miriamStruct.name{1}='kegg.compound';
0153             miriamStruct.value{1}=model.mets{i};
0154             model.metMiriams{i}=miriamStruct;
0155         end
0156     end
0157 
0158     %Then load the InChI strings from another file. Not all metabolites will be
0159     %present in the list
0160 
0161     inchIDs=cell(numel(model.mets),1);
0162     inchis=cell(numel(model.mets),1);
0163 
0164     %The format is metID*tab*string
0165 
0166     fid = fopen(fullfile(keggPath,'compound.inchi'), 'r');
0167 
0168     %Loop through the file
0169     counter=1;
0170     while 1
0171       %Get the next line
0172       tline = fgetl(fid);
0173 
0174       %Abort at end of file
0175       if ~ischar(tline)
0176           break;
0177       end
0178 
0179       %Get the ID and the InChI
0180       inchIDs{counter}=tline(1:6);
0181       inchis{counter}=tline(14:end);
0182       counter=counter+1;
0183     end
0184 
0185     %Close the file
0186     fclose(fid);
0187 
0188     inchIDs=inchIDs(1:counter-1);
0189     inchis=inchis(1:counter-1);
0190 
0191     %Find the metabolites that had InChI strings and add them to the model
0192     [a b]=ismember(inchIDs,model.mets);
0193 
0194     %If there were mets with InChIs but that were not in the list
0195     if ~all(a)
0196        dispEM('Not all metabolites with InChI strings were found in the original list');  
0197     end
0198 
0199     model.inchis=cell(numel(model.mets),1);
0200     model.inchis(:)={''};
0201     model.inchis(b)=inchis;
0202 
0203     %Remove composition if InChI was found
0204     model.metFormulas(b)={''};
0205     
0206     %Saves the model
0207     save(metsFile,'model');
0208 end
0209 end

Generated on Mon 06-Jan-2014 14:58:12 by m2html © 2005