getRxnsFromKEGG Retrieves information on all reactions stored in KEGG database keggPath this function reads data from a local FTP dump of the KEGG database. keggPath is the pathway to the root of the database keepUndefinedStoich include reactions in the form n A <=> n+1 A. These will be dealt with as two separate metabolites (opt, default true) keepIncomplete include reactions which have been labelled as "incomplete", "erroneous" or "unclear" (opt, default true) keepGeneral include reactions which have been labelled as "general reaction". These are reactions on the form "an aldehyde <=> an alcohol", and are therefore unsuited for modelling purposes. Note that not all reactions have this type of annotation, and the script will therefore not be able to remove all such reactions (opt, default false) model a model structure generated from the database. The following fields are filled id: 'KEGG' description: 'Automatically generated from KEGG database' rxns: KEGG reaction ids rxnNames: Name for each reaction entry mets: KEGG compound ids. If the equations use stoichiometry such as ID(n+1) then the whole expression is saved as the id eccodes: Corresponding ec-number if available rxnMiriams: Contains reaction specific information such as KO id and pathways that the reaction is associated to S: Stoichiometric matrix lb: -1000 for all reactions ub: 1000 for all reactions rev: 1 for reversible and 0 for irreversible. For reactions present in pathway maps the reversibility is taken from there b: 0 for all metabolites Reactions on the form A <=> A + B will not be loaded. If the file keggRxns.mat is in the RAVEN directory it will be loaded instead of parsing of the KEGG files. If it does not exist it will be saved after parsing of the KEGG files. In general, you should remove the keggRxns.mat file if you want to rebuild the model structure from a newer version of KEGG. Usage: model=getRxnsFromKEGG(keggPath,keepUndefinedStoich,keepIncomplete,keepGeneral) Rasmus Agren, 2012-12-16
0001 function model=getRxnsFromKEGG(keggPath,keepUndefinedStoich,keepIncomplete, keepGeneral) 0002 % getRxnsFromKEGG 0003 % Retrieves information on all reactions stored in KEGG database 0004 % 0005 % keggPath this function reads data from a local FTP dump of 0006 % the KEGG database. keggPath is the pathway to the 0007 % root of the database 0008 % keepUndefinedStoich include reactions in the form n A <=> n+1 A. These 0009 % will be dealt with as two separate metabolites 0010 % (opt, default true) 0011 % keepIncomplete include reactions which have been labelled as 0012 % "incomplete", "erroneous" or "unclear" (opt, 0013 % default true) 0014 % keepGeneral include reactions which have been labelled as 0015 % "general reaction". These are reactions on the form 0016 % "an aldehyde <=> an alcohol", and are therefore 0017 % unsuited for modelling purposes. Note that not all 0018 % reactions have this type of annotation, and the 0019 % script will therefore not be able to remove all 0020 % such reactions (opt, default false) 0021 % 0022 % model a model structure generated from the database. The following 0023 % fields are filled 0024 % id: 'KEGG' 0025 % description: 'Automatically generated from KEGG database' 0026 % rxns: KEGG reaction ids 0027 % rxnNames: Name for each reaction entry 0028 % mets: KEGG compound ids. If the equations use 0029 % stoichiometry such as ID(n+1) then the whole 0030 % expression is saved as the id 0031 % eccodes: Corresponding ec-number if available 0032 % rxnMiriams: Contains reaction specific information such as 0033 % KO id and pathways that the reaction is 0034 % associated to 0035 % S: Stoichiometric matrix 0036 % lb: -1000 for all reactions 0037 % ub: 1000 for all reactions 0038 % rev: 1 for reversible and 0 for irreversible. For 0039 % reactions present in pathway maps the reversibility 0040 % is taken from there 0041 % b: 0 for all metabolites 0042 % 0043 % Reactions on the form A <=> A + B will not be loaded. If the file 0044 % keggRxns.mat is in the RAVEN directory it will be loaded instead of 0045 % parsing of the KEGG files. If it does not exist it will be saved after 0046 % parsing of the KEGG files. In general, you should remove the 0047 % keggRxns.mat file if you want to rebuild the model structure from a 0048 % newer version of KEGG. 0049 % 0050 % Usage: model=getRxnsFromKEGG(keggPath,keepUndefinedStoich,keepIncomplete,keepGeneral) 0051 % 0052 % Rasmus Agren, 2012-12-16 0053 % 0054 0055 %NOTE: This is how one entry looks in the file 0056 0057 % ENTRY R00001 Reaction 0058 % NAME Polyphosphate polyphosphohydrolase 0059 % DEFINITION Polyphosphate + n H2O <=> (n+1) Oligophosphate 0060 % EQUATION C00890 + n C00001 <=> (n+1) C02174 0061 % ENZYME 3.6.1.10 0062 % /// 0063 0064 %The file is not tab-delimited. Instead each label is 12 characters 0065 %(except for '///') 0066 0067 if nargin<2 0068 keepUndefinedStoich=true; 0069 end 0070 if nargin<3 0071 keepIncomplete=true; 0072 end 0073 if nargin<4 0074 keepGeneral=true; 0075 end 0076 0077 %Check if the reactions have been parsed before and saved. If so, load the 0078 %model. 0079 [ST I]=dbstack('-completenames'); 0080 ravenPath=fileparts(ST(I).file); 0081 rxnsFile=fullfile(ravenPath,'kegg','keggRxns.mat'); 0082 if exist(rxnsFile, 'file') 0083 fprintf(['NOTE: Importing KEGG reactions from ' strrep(rxnsFile,'\','/') '.\n']); 0084 load(rxnsFile); 0085 else 0086 %Download required files from KEGG if it doesn't exist in the directory 0087 downloadKEGG(keggPath); 0088 0089 %Add new functionality in the order specified in models 0090 model.id='KEGG'; 0091 model.description='Automatically generated from KEGG database'; 0092 0093 %Preallocate memory for 10000 reactions 0094 model.rxns=cell(10000,1); 0095 model.rxnNames=cell(10000,1); 0096 model.eccodes=cell(10000,1); 0097 model.subSystems=cell(10000,1); 0098 model.rxnMiriams=cell(10000,1); 0099 equations=cell(10000,1); %Temporarily stores the equations 0100 isIncomplete=false(10000,1); 0101 isGeneral=false(10000,1); 0102 0103 %First load information on reaction ID, reaction name, KO, pathway, and ec-number 0104 fid = fopen(fullfile(keggPath,'reaction'), 'r'); 0105 0106 %Keeps track of how many reactions that have been added 0107 rxnCounter=0; 0108 0109 %Loop through the file 0110 orthology=false; 0111 pathway=false; 0112 while 1 0113 %Get the next line 0114 tline = fgetl(fid); 0115 0116 %Abort at end of file 0117 if ~ischar(tline) 0118 break; 0119 end 0120 0121 %Skip '///' 0122 if numel(tline)<12 0123 continue; 0124 end 0125 0126 %Check if it's a new reaction 0127 if strcmp(tline(1:12),'ENTRY ') 0128 rxnCounter=rxnCounter+1; 0129 0130 %Add empty strings where there should be such 0131 model.rxnNames{rxnCounter}=''; 0132 model.eccodes{rxnCounter}=''; 0133 model.subSystems{rxnCounter}=''; 0134 equations{rxnCounter}=''; 0135 0136 %Add reaction ID (always 6 characters) 0137 model.rxns{rxnCounter}=tline(13:18); 0138 orthology=false; 0139 pathway=false; 0140 end 0141 0142 %Add name 0143 if strcmp(tline(1:12),'NAME ') 0144 model.rxnNames{rxnCounter}=tline(13:end); 0145 end 0146 0147 %Add whether the comment includes "incomplete", "erroneous" or "unclear" 0148 if strcmp(tline(1:12),'COMMENT ') 0149 %Read all text until '///' or 'RPAIR' 0150 commentText=tline(13:end); 0151 while 1 0152 tline = fgetl(fid); 0153 if ~strcmp(tline(1:3),'///') && ~strcmp(tline(1:3),'RPA') && ~strcmp(tline(1:3),'ENZ') 0154 commentText=strcat(commentText,' ',tline); 0155 else 0156 break; 0157 end 0158 end 0159 upperLine=upper(commentText); 0160 if any(strfind(upperLine,'INCOMPLETE')) || any(strfind(upperLine,'ERRONEOUS')) || any(strfind(upperLine,'UNCLEAR')) 0161 isIncomplete(rxnCounter)=true; 0162 end 0163 if any(strfind(upperLine,'GENERAL REACTION')==1) %It should start this way 0164 isGeneral(rxnCounter)=true; 0165 end 0166 0167 %Go to next iteration if it is '///' 0168 if numel(tline)<12 0169 continue; 0170 end 0171 end 0172 0173 %Add ec-number 0174 if strcmp(tline(1:12),'ENZYME ') 0175 model.eccodes{rxnCounter}=tline(13:end); 0176 end 0177 if numel(tline)>8 0178 if strcmp(tline(1:9),'REFERENCE') 0179 pathway=false; 0180 orthology=false; 0181 end 0182 end 0183 0184 %Add KO-ids 0185 if numel(tline)>16 0186 if strcmp(tline(1:16),'ORTHOLOGY KO: ') || strcmp(tline(1:16),' KO: ') || strcmp(tline(1:12),'ORTHOLOGY ') || orthology==true 0187 pathway=false; 0188 %Check if KO has been added already (each reaction may belong to 0189 %several) 0190 if isstruct(model.rxnMiriams{rxnCounter}) 0191 addToIndex=numel(model.rxnMiriams{rxnCounter}.name)+1; 0192 else 0193 addToIndex=1; 0194 end 0195 0196 tempStruct=model.rxnMiriams{rxnCounter}; 0197 tempStruct.name{addToIndex,1}='urn:miriam:kegg.ko'; %WARNING: This is not a real MIRIAM identifier 0198 if strcmp(tline(13:16),'KO:') %This is in the old version 0199 tempStruct.value{addToIndex,1}=tline(17:22); 0200 else 0201 if strcmp(tline(1:12),'ORTHOLOGY ') 0202 %This means that it found one KO in the new format and that 0203 %subsequent lines might be other KOs 0204 orthology=true; 0205 end 0206 tempStruct.value{addToIndex,1}=tline(13:18); 0207 end 0208 model.rxnMiriams{rxnCounter}=tempStruct; 0209 end 0210 end 0211 0212 %Add pathways 0213 if numel(tline)>18 0214 if strcmp(tline(1:18),'PATHWAY PATH: ') || strcmp(tline(1:18),' PATH: ') || strcmp(tline(1:12),'PATHWAY ') || pathway==true 0215 %Check if annotation has been added already 0216 if isstruct(model.rxnMiriams{rxnCounter}) 0217 addToIndex=numel(model.rxnMiriams{rxnCounter}.name)+1; 0218 else 0219 addToIndex=1; 0220 end 0221 0222 tempStruct=model.rxnMiriams{rxnCounter}; 0223 tempStruct.name{addToIndex,1}='urn:miriam:kegg.pathway'; 0224 %If it's the old version 0225 if strcmp(tline(14:17),'PATH:') 0226 tempStruct.value{addToIndex,1}=tline(19:25); 0227 else 0228 %If it's the new version 0229 tempStruct.value{addToIndex,1}=tline(13:19); 0230 pathway=true; 0231 end 0232 0233 %Don't do this if the pathway is rn01100 (Metabolic pathways) 0234 if ~strcmp('rn01100',tempStruct.value{addToIndex,1}) 0235 model.rxnMiriams{rxnCounter}=tempStruct; 0236 0237 %Also save the subSystems entry as being the first path found 0238 if ~any(model.subSystems{rxnCounter}) 0239 if strcmp(tline(14:17),'PATH:') 0240 model.subSystems{rxnCounter}=tline(28:end); 0241 else 0242 model.subSystems{rxnCounter}=tline(22:end); 0243 end 0244 end 0245 end 0246 end 0247 end 0248 end 0249 0250 %Close the file 0251 fclose(fid); 0252 0253 %This is done here since the the indexes won't match since some reactions 0254 %are removed along the way 0255 isIncomplete=model.rxns(isIncomplete); 0256 isGeneral=model.rxns(isGeneral); 0257 0258 %If too much space was allocated, shrink the model 0259 model.rxns=model.rxns(1:rxnCounter); 0260 model.rxnNames=model.rxnNames(1:rxnCounter); 0261 model.eccodes=model.eccodes(1:rxnCounter); 0262 equations=equations(1:rxnCounter); 0263 model.rxnMiriams=model.rxnMiriams(1:rxnCounter); 0264 model.subSystems=model.subSystems(1:rxnCounter); 0265 0266 %Then load the equations from another file. This is because the equations 0267 %are easier to retrieve from there 0268 0269 %The format is rxnID: equation 0270 %The reactions should have been loaded in the exact same order 0271 fid = fopen(fullfile(keggPath,'reaction.lst'), 'r'); 0272 0273 %Loop through the file 0274 for i=1:rxnCounter 0275 %Get the next line 0276 tline = fgetl(fid); 0277 0278 equations{i}=tline(9:end); 0279 end 0280 0281 %Close the file 0282 fclose(fid); 0283 0284 %Construct the S matrix and list of metabolites 0285 [S mets badRxns]=constructS(equations); 0286 model.S=S; 0287 model.mets=mets; 0288 0289 %There is some limited evidence for directionality in 0290 %reaction_mapformula.lst. The information there concerns how the reactions 0291 %are drawn in the KEGG maps. If a reaction is irreversible in the same 0292 %direction for all maps, then I consider is irreversible, otherwise 0293 %reversible. Also, not all reactions are present in the maps, so not all 0294 %will have directionality. They will be considered to be reversible. 0295 0296 %The format is R00005: 00330: C01010 => C00011 0297 %Generate a reversibility structure with the fields: 0298 %rxns: reaction ids 0299 %product: one met id that is a product. This is because the reactions 0300 %might be written in another direction compared to in the reactions.lst 0301 %file 0302 %rev: 1 if reversible, otherwise 0 0303 reversibility.rxns={}; 0304 reversibility.product={}; 0305 reversibility.rev=[]; 0306 0307 fid = fopen(fullfile(keggPath,'reaction_mapformula.lst'), 'r'); 0308 while 1 0309 %Get the next line 0310 tline = fgetl(fid); 0311 0312 %Abort at end of file 0313 if ~ischar(tline) 0314 break; 0315 end 0316 0317 rxn=tline(1:6); 0318 prod=tline(end-5:end); 0319 rev=any(strfind(tline,'<=>')); 0320 if isempty(reversibility.rxns) 0321 reversibility.rxns{1}=rxn; 0322 reversibility.product{1}=prod; 0323 reversibility.rev(1)=rev; 0324 else 0325 %Check if the reaction was added before. It's an ordered list, so only 0326 %check the last element 0327 if strcmp(reversibility.rxns(end),rxn) 0328 %If it's reversible in the new reaction or reversible in the old reaction 0329 %then set (keep) to be reversible 0330 if rev==1 || reversibility.rev(end)==1 0331 reversibility.rev(end)=1; 0332 else 0333 %This means that the reaction was already loaded, that it was 0334 %irreversible before and irreversible in the new reaction. 0335 %However, it could be that they are written in diferent 0336 %directions. If the product differ, then set to be reversible. 0337 %This assumes that the reactions are written with the same 0338 %metabolite as the last one if they are in the same direction. 0339 if ~strcmp(prod,reversibility.product(end)) 0340 reversibility.rev(end)=1; 0341 end 0342 end 0343 else 0344 reversibility.rxns=[reversibility.rxns;rxn]; 0345 reversibility.product=[reversibility.product;prod]; 0346 reversibility.rev=[reversibility.rev;rev]; 0347 end 0348 end 0349 end 0350 fclose(fid); 0351 0352 %Update the reversibility 0353 model.rev=ones(rxnCounter,1); 0354 %Match the reaction ids 0355 irrevIDs=find(reversibility.rev==0); 0356 [crap I]=ismember(reversibility.rxns(irrevIDs),model.rxns); 0357 [crap prodMetIDs]=ismember(reversibility.product(irrevIDs),model.mets); 0358 model.rev(I)=0; 0359 0360 %See if the reactions are written in the same order in model.S 0361 linearInd=sub2ind(size(model.S), prodMetIDs, I); 0362 changeOrder=I(model.S(linearInd)<0); 0363 model.S(:,changeOrder)=model.S(:,changeOrder).*-1; %Change the order of these reactions 0364 0365 %Add some stuff to get a correct model structure 0366 model.ub=ones(rxnCounter,1)*1000; 0367 model.lb=model.rev*-1000; 0368 model.c=zeros(rxnCounter,1); 0369 model.b=zeros(numel(model.mets),1); 0370 model=removeRxns(model,badRxns,true,true); 0371 0372 %Save the model structure 0373 save(rxnsFile,'model','isGeneral','isIncomplete'); 0374 end 0375 0376 %Delete reaction which are labeled as "incomplete", "erroneous", "unclear" 0377 %or "general reaction" (depending on settings. 0378 if keepGeneral==false 0379 model=removeRxns(model,intersect(isGeneral,model.rxns),true,true); 0380 end 0381 if keepIncomplete==false 0382 model=removeRxns(model,intersect(isIncomplete,model.rxns),true,true); 0383 end 0384 0385 %Delete reactions involving undefined stoichiometry. These metabolites have 0386 %an ID containing the letter "n" or "m" 0387 if keepUndefinedStoich==false 0388 I=cellfun(@any,strfind(model.mets,'n')) | cellfun(@any,strfind(model.mets,'m')); 0389 [crap J]=find(model.S(I,:)); 0390 model=removeRxns(model,J,true,true); 0391 end 0392 end