Home > RAVEN > importExcelModel.m

importExcelModel

PURPOSE ^

importExcelModel

SYNOPSIS ^

function model=importExcelModel(fileName,removeExcMets,printWarnings,ignoreErrors)

DESCRIPTION ^

 importExcelModel
   Imports a constraint-based model from a Excel file

   fileName      a Microsoft Excel file to import
   removeExcMets true if exchange metabolites should be removed. This is
                 needed to be able to run simulations, but it could also 
                 be done using simplifyModel at a later stage (opt,
                 default true)
   printWarnings true if warnings should be printed (opt, default true)
   ignoreErrors  true if errors should be ignored. See below for details
                 (opt, default false)

   model
       description      description of model contents
       id               model ID
       rxns             reaction ids
       mets             metabolite ids
       S                stoichiometric matrix
       lb               lower bounds
       ub               upper bounds
       rev              reversibility vector
       c                objective coefficients
       b                equality constraints for the metabolite equations
       comps            compartment ids
       compNames        compartment names
       compOutside      the id (as in comps) for the compartment
                        surrounding each of the compartments
       compMiriams      structure with MIRIAM information about the
                        compartments
       rxnNames         reaction description
       rxnComps         compartments for reactions
       grRules          reaction to gene rules in text form
       rxnGeneMat       reaction-to-gene mapping in sparse matrix form
       subSystems       subsystem name for each reaction
       eccodes          EC-codes for the reactions
       rxnMiriams       structure with MIRIAM information about the reactions
       genes            list of all genes
       geneComps        compartments for reactions
       geneMiriams      structure with MIRIAM information about the genes
       metNames         metabolite description
       metComps         compartments for metabolites
       inchis           InChI-codes for metabolites
       metFormulas      metabolite chemical formula
       metMiriams       structure with MIRIAM information about the metabolites
       unconstrained    true if the metabolite is an exchange metabolite

   Loads models in the RAVEN Toolbox Excel format. A number of consistency
   checks are performed in order to ensure that the model is valid. These
   can be ignored by putting ignoreErrors to true. However, this is highly
   advised against, as it can result in errors in simulations or other
   functionalities. The RAVEN Toolbox is made to function only on consistent
   models, and the only checks performed are when the model is imported.

   NOTE: Most errors are checked for by checkModelStruct, but some
   are checked for in this function as well. Those are ones which relate
   to missing model elements and so on, and which would make it impossible
   to construct the model structure. Those errors cannot be ignored by
   setting ignoreErrors to true.

   Usage: model=importExcelModel(fileName,removeExcMets,printWarnings,ignoreErrors)

   Rasmus Agren, 2014-01-06

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function model=importExcelModel(fileName,removeExcMets,printWarnings,ignoreErrors)
0002 % importExcelModel
0003 %   Imports a constraint-based model from a Excel file
0004 %
0005 %   fileName      a Microsoft Excel file to import
0006 %   removeExcMets true if exchange metabolites should be removed. This is
0007 %                 needed to be able to run simulations, but it could also
0008 %                 be done using simplifyModel at a later stage (opt,
0009 %                 default true)
0010 %   printWarnings true if warnings should be printed (opt, default true)
0011 %   ignoreErrors  true if errors should be ignored. See below for details
0012 %                 (opt, default false)
0013 %
0014 %   model
0015 %       description      description of model contents
0016 %       id               model ID
0017 %       rxns             reaction ids
0018 %       mets             metabolite ids
0019 %       S                stoichiometric matrix
0020 %       lb               lower bounds
0021 %       ub               upper bounds
0022 %       rev              reversibility vector
0023 %       c                objective coefficients
0024 %       b                equality constraints for the metabolite equations
0025 %       comps            compartment ids
0026 %       compNames        compartment names
0027 %       compOutside      the id (as in comps) for the compartment
0028 %                        surrounding each of the compartments
0029 %       compMiriams      structure with MIRIAM information about the
0030 %                        compartments
0031 %       rxnNames         reaction description
0032 %       rxnComps         compartments for reactions
0033 %       grRules          reaction to gene rules in text form
0034 %       rxnGeneMat       reaction-to-gene mapping in sparse matrix form
0035 %       subSystems       subsystem name for each reaction
0036 %       eccodes          EC-codes for the reactions
0037 %       rxnMiriams       structure with MIRIAM information about the reactions
0038 %       genes            list of all genes
0039 %       geneComps        compartments for reactions
0040 %       geneMiriams      structure with MIRIAM information about the genes
0041 %       metNames         metabolite description
0042 %       metComps         compartments for metabolites
0043 %       inchis           InChI-codes for metabolites
0044 %       metFormulas      metabolite chemical formula
0045 %       metMiriams       structure with MIRIAM information about the metabolites
0046 %       unconstrained    true if the metabolite is an exchange metabolite
0047 %
0048 %   Loads models in the RAVEN Toolbox Excel format. A number of consistency
0049 %   checks are performed in order to ensure that the model is valid. These
0050 %   can be ignored by putting ignoreErrors to true. However, this is highly
0051 %   advised against, as it can result in errors in simulations or other
0052 %   functionalities. The RAVEN Toolbox is made to function only on consistent
0053 %   models, and the only checks performed are when the model is imported.
0054 %
0055 %   NOTE: Most errors are checked for by checkModelStruct, but some
0056 %   are checked for in this function as well. Those are ones which relate
0057 %   to missing model elements and so on, and which would make it impossible
0058 %   to construct the model structure. Those errors cannot be ignored by
0059 %   setting ignoreErrors to true.
0060 %
0061 %   Usage: model=importExcelModel(fileName,removeExcMets,printWarnings,ignoreErrors)
0062 %
0063 %   Rasmus Agren, 2014-01-06
0064 %
0065 
0066 %Adds the required classes to the Java path
0067 [ST, I]=dbstack('-completenames');
0068 ravenPath=fileparts(ST(I).file);
0069 poiPATH=fullfile(ravenPath,'software','apache-poi');
0070 javaaddpath(fullfile(poiPATH,'dom4j-1.6.1.jar'));
0071 javaaddpath(fullfile(poiPATH,'poi-3.8-20120326.jar'));
0072 javaaddpath(fullfile(poiPATH,'poi-ooxml-3.8-20120326.jar'));
0073 javaaddpath(fullfile(poiPATH,'poi-ooxml-schemas-3.8-20120326.jar'));
0074 javaaddpath(fullfile(poiPATH,'xmlbeans-2.3.0.jar'));
0075 
0076 %Import required classes from Apache POI
0077 import org.apache.poi.ss.usermodel.*;
0078 import org.apache.poi.hssf.usermodel.*;
0079 import org.apache.poi.xssf.usermodel.*;
0080 import org.apache.poi.ss.usermodel.*;
0081 import org.apache.poi.ss.util.*;
0082 import java.io.FileInputStream;
0083 
0084 if nargin<2
0085     removeExcMets=true;
0086 end
0087 if nargin<3
0088     printWarnings=true;
0089 end
0090 if nargin<4
0091     ignoreErrors=false;
0092 end
0093 
0094 %This is to match the order of the fields to those you get from importing
0095 %from SBML
0096 model=[];
0097 model.id=[];
0098 model.description=[];
0099 model.annotation=[];
0100 %Default bounds if not defined
0101 model.annotation.defaultLB=-1000;
0102 model.annotation.defaultUB=1000;
0103 model.rxns={};
0104 model.mets={};
0105 model.S=[];
0106 model.lb=[];
0107 model.ub=[];
0108 model.rev=[];
0109 model.c=[];
0110 model.b=[];
0111 model.comps={};
0112 model.compNames={};
0113 model.compOutside={};
0114 model.compMiriams={};
0115 model.rxnNames={};
0116 model.rxnComps={}; %Will be double later
0117 model.grRules={};
0118 model.rxnGeneMat=[];
0119 model.subSystems={};
0120 model.eccodes={};
0121 model.rxnMiriams={};
0122 model.genes={};
0123 model.geneComps={}; %Will be double later
0124 model.geneMiriams={};
0125 model.metNames={};
0126 model.metComps=[];
0127 model.inchis={};
0128 model.metFormulas={};
0129 model.metMiriams={};
0130 model.unconstrained=[];
0131     
0132 %Check if the file exists
0133 if ~exist(fileName,'file')
0134     dispEM('The Excel file could not be found');
0135 end
0136 
0137 %Opens the workbook. The input stream is needed since it will otherwise
0138 %keep the file open
0139 is=FileInputStream(getFullPath(fileName));
0140 workbook=WorkbookFactory.create(is);
0141 is.close();
0142 
0143 [raw, flag]=loadSheet(workbook,'MODEL');
0144 
0145 if flag<0
0146     if printWarnings==true
0147         dispEM('Could not load the MODEL sheet',false);
0148     end
0149     model.id='UNKNOWN';
0150     model.description='No model details available';
0151 else
0152     raw=cleanImported(raw);
0153 
0154     %It is assumed that the first line is labels and that the second one is
0155     %info
0156     allLabels={'ID';'DESCRIPTION';'DEFAULT LOWER';'DEFAULT UPPER';'CONTACT GIVEN NAME';'CONTACT FAMILY NAME';'CONTACT EMAIL';'ORGANIZATION';'TAXONOMY';'NOTES'};
0157 
0158     %Map to new captions
0159     raw(1,:)=upper(raw(1,:));
0160     raw(1,:)=strrep(raw(1,:),'MODELID','ID');
0161     raw(1,:)=strrep(raw(1,:),'MODELNAME','DESCRIPTION');
0162 
0163     %Loop through the labels
0164     [I, J]=ismember(raw(1,:),allLabels);
0165     I=find(I);
0166     for i=1:numel(I)
0167         switch J(I(i))
0168             case 1
0169                 if any(raw{I(i),2})
0170                     model.id=toStr(raw{2,I(i)}); %Should be string already
0171                 else
0172                     dispEM('No model ID supplied');
0173                 end
0174             case 2
0175                 if any(raw{2,I(i)})
0176                     model.description=toStr(raw{2,I(i)}); %Should be string already
0177                 else
0178                     dispEM('No model name supplied');
0179                 end
0180             case 3
0181                 if ~isempty(raw{2,I(i)})
0182                     try
0183                         model.annotation.defaultLB=toDouble(raw{2,I(i)},NaN);
0184                     catch
0185                         dispEM('DEFAULT LOWER must be numeric');
0186                     end
0187                 else
0188                     if printWarnings==true
0189                         fprintf('NOTE: DEFAULT LOWER not supplied. Uses -1000\n');
0190                     end
0191                     model.annotation.defaultLB=-1000;
0192                 end
0193             case 4
0194                 if ~isempty(raw{2,I(i)})
0195                     try
0196                         model.annotation.defaultUB=toDouble(raw{2,I(i)},NaN);
0197                     catch
0198                         dispEM('DEFAULT UPPER must be numeric');
0199                     end
0200                 else
0201                     if printWarnings==true
0202                         fprintf('NOTE: DEFAULT UPPER not supplied. Uses 1000\n');
0203                     end
0204                     model.annotation.defaultUB=1000;
0205                 end
0206             case 5
0207                 if any(raw{2,I(i)})
0208                     model.annotation.givenName=toStr(raw{2,I(i)}); %Should be string already
0209                 end
0210             case 6
0211                 if any(raw{2,I(i)})
0212                     model.annotation.familyName=toStr(raw{2,I(i)}); %Should be string already
0213                 end
0214             case 7
0215                 if any(raw{2,I(i)})
0216                     model.annotation.email=toStr(raw{2,I(i)}); %Should be string already
0217                 end
0218             case 8
0219                 if any(raw{2,I(i)})
0220                     model.annotation.organization=toStr(raw{2,I(i)}); %Should be string already
0221                 end    
0222             case 9
0223                 if any(raw{2,I(i)})
0224                     model.annotation.taxonomy=toStr(raw{2,I(i)}); %Should be string already
0225                 end 
0226             case 10
0227                 if any(raw{2,I(i)})
0228                     model.annotation.note=toStr(raw{2,I(i)}); %Should be string already
0229                 end     
0230         end  
0231     end
0232 end
0233 
0234 %Get compartment information
0235 [raw, flag]=loadSheet(workbook,'COMPS');
0236 
0237 if flag<0
0238     if printWarnings==true
0239         dispEM('Could not load the COMPS sheet. All elements will be assigned to a compartment "s" for "System"',false);
0240     end
0241     model.comps={'s'};
0242     model.compNames={'System'};
0243 else
0244     raw=cleanImported(raw);
0245 
0246     %Map to new captions
0247     raw(1,:)=upper(raw(1,:));
0248     raw(1,:)=strrep(raw(1,:),'COMPABBREV','ABBREVIATION');
0249     raw(1,:)=strrep(raw(1,:),'COMPNAME','NAME');
0250 
0251     allLabels={'ABBREVIATION';'NAME';'INSIDE';'MIRIAM'};
0252 
0253     %Loop through the labels
0254     [I, J]=ismember(raw(1,:),allLabels);
0255     I=find(I);
0256     for i=1:numel(I)
0257         switch J(I(i))
0258             case 1
0259                 model.comps=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0260             case 2
0261                 model.compNames=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0262             case 3
0263                 model.compOutside=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0264             case 4
0265                 model.compMiriams=raw(2:end,I(i));
0266         end
0267     end
0268 
0269     %Check that necessary fields are loaded
0270     if isempty(model.comps)
0271         dispEM('There must be a column named ABBREVIATION in the COMPS sheet');   
0272     end
0273     if isempty(model.compNames)
0274         model.compNames=model.comps; 
0275         if printWarnings==true
0276             dispEM('There is no column named NAME in the COMPS sheet. ABBREVIATION will be used as name',false);
0277         end
0278     end
0279 
0280     model.compMiriams=parseMiriam(model.compMiriams);
0281 end
0282 
0283 %Get all the genes and info about them
0284 [raw, flag]=loadSheet(workbook,'GENES');
0285 
0286 if flag<0
0287     if printWarnings==true
0288         dispEM('There is no spreadsheet named GENES',false)
0289     end
0290 else
0291     raw=cleanImported(raw);
0292 
0293     %Map to new captions
0294     raw(1,:)=upper(raw(1,:));
0295     raw(1,:)=strrep(raw(1,:),'GENE NAME','NAME');
0296 
0297     allLabels={'NAME';'MIRIAM';'COMPARTMENT'};
0298     
0299     %Loop through the labels
0300     [I, J]=ismember(upper(raw(1,:)),allLabels);
0301     I=find(I);
0302     foundGenes=false;
0303     for i=1:numel(I)
0304         switch J(I(i))
0305             case 1
0306                 model.genes=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0307                 foundGenes=true;
0308             case 2
0309                 model.geneMiriams=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0310             case 3
0311                 model.geneComps=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0312         end
0313     end
0314 
0315     if foundGenes==false
0316         dispEM('There must be a column named NAME in the GENES sheet');
0317     end
0318         
0319     %Its ok if all of them are empty
0320     if all(cellfun(@isempty,model.geneComps))
0321         model.geneComps=[];
0322     end
0323 
0324     %Check that geneName contain only strings and no empty strings
0325     if ~iscellstr(model.genes)
0326         dispEM('All gene names have to be strings');
0327     else
0328         if any(strcmp('',model.genes))
0329             dispEM('There can be no empty strings in gene names');
0330         end
0331     end
0332 
0333     %Check that geneComp contains only strings and no empty string
0334     if ~isempty(model.geneComps)
0335         if ~iscellstr(model.geneComps)
0336             dispEM('All gene compartments have to be strings');
0337         else
0338             if any(strcmp('',model.geneComps))
0339                 dispEM('There can be no empty strings in gene compartments');
0340             end
0341         end
0342         [I, model.geneComps]=ismember(model.geneComps,model.comps);
0343         dispEM('The following genes have compartment abbreviations which could not be found:',true,model.genes(~I));
0344     end
0345 end
0346 
0347 model.geneMiriams=parseMiriam(model.geneMiriams);
0348 
0349 %Loads the reaction data
0350 [raw, flag]=loadSheet(workbook,'RXNS');
0351 
0352 if flag<0
0353     dispEM('Could not load the RXNS sheet');
0354 end
0355 
0356 raw=cleanImported(raw);
0357 
0358 %Map to new captions
0359 raw(1,:)=upper(raw(1,:));
0360 raw(1,:)=strrep(raw(1,:),'RXNID','ID');
0361 
0362 allLabels={'ID';'NAME';'EQUATION';'EC-NUMBER';'GENE ASSOCIATION';'LOWER BOUND';'UPPER BOUND';'OBJECTIVE';'COMPARTMENT';'SUBSYSTEM';'REPLACEMENT ID';'MIRIAM'};
0363 equations={};
0364 reactionReplacement={};
0365 
0366 %Loop through the labels
0367 [I, J]=ismember(raw(1,:),allLabels);
0368 I=find(I);
0369 for i=1:numel(I)
0370     switch J(I(i))
0371         case 1
0372            model.rxns=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0373         case 2
0374            model.rxnNames=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0375         case 3
0376            equations=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0377         case 4
0378            model.eccodes=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0379         case 5
0380            model.grRules=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0381         case 6
0382            try
0383                model.lb=cellfun(@(x) toDouble(x,NaN),raw(2:end,I(i)));
0384            catch
0385                dispEM('The lower bounds must be numerical values'); 
0386            end
0387         case 7
0388            try
0389                model.ub=cellfun(@(x) toDouble(x,NaN),raw(2:end,I(i)));
0390            catch
0391                dispEM('The upper bounds must be numerical values'); 
0392            end
0393         case 8
0394            try
0395                model.c=cellfun(@(x) toDouble(x,0),raw(2:end,I(i)));
0396            catch
0397                dispEM('The objective coefficients must be numerical values'); 
0398            end
0399         case 9
0400             model.rxnComps=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0401         case 10
0402             model.subSystems=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0403         case 11
0404             reactionReplacement=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);    
0405         case 12
0406             model.rxnMiriams=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0407     end
0408 end
0409 
0410 %Check that all necessary reaction info has been loaded
0411 if isempty(equations)
0412      dispEM('There must be a column named EQUATION in the RXNS sheet');   
0413 end
0414 if isempty(model.rxns)
0415      if printWarnings==true
0416         dispEM('There is no column named ID in the RXNS sheet. The reactions will be named as "r1", "r2"...',false);
0417      end
0418      I=num2cell((1:numel(equations))');
0419      model.rxns=strcat('r',cellfun(@num2str,I,'UniformOutput',false));
0420 end
0421 
0422 %Check if some other stuff is loaded and populate with default values
0423 %otherwise
0424 if isempty(model.rxnNames)
0425    model.rxnNames=cell(numel(model.rxns),1);
0426    model.rxnNames(:)={''};
0427    if printWarnings==true
0428        dispEM('There is no column named NAME in the RXNS sheet. Empty strings will be used as reaction names',false);
0429    end
0430 end
0431 if isempty(model.lb)
0432    %This is not set here since the reversibility isn't known yet
0433    model.lb=nan(numel(model.rxns),1);
0434    if printWarnings==true
0435         dispEM('There is no column named LOWER BOUND in the RXNS sheet. Default bounds will be used',false);
0436    end
0437 end
0438 if isempty(model.ub)
0439    model.ub=nan(numel(model.rxns),1);
0440    if printWarnings==true
0441         dispEM('There is no column named UPPER BOUND in the RXNS sheet. Default bounds will be used',false);
0442    end
0443 end
0444 if isempty(model.c)
0445    model.c=zeros(numel(model.rxns),1);
0446    if printWarnings==true
0447         dispEM('There is no column named OBJECTIVE in the RXNS sheet',false);
0448    end
0449 end
0450 
0451 %Either all reactions must have a compartment string or none of them.
0452 %Check if it's only empty and if so return it to []
0453 if ~isempty(model.rxnComps)
0454     if all(cellfun(@isempty,model.rxnComps))
0455         model.rxnComps=[];
0456     end
0457 end
0458 
0459 %Construct the rxnMiriams structure
0460 model.rxnMiriams=parseMiriam(model.rxnMiriams);
0461 
0462 %Replace the reaction IDs for those IDs that have a corresponding
0463 %replacement name.
0464 I=cellfun(@any,reactionReplacement);
0465 model.rxns(I)=reactionReplacement(I);
0466 
0467 %Check that there are no empty strings in reactionIDs or equations
0468 if any(strcmp('',model.rxns))
0469     dispEM('There are empty reaction IDs'); 
0470 end
0471 
0472 if any(strcmp('',equations))
0473     dispEM('There are empty equations'); 
0474 end
0475 
0476 if ~isempty(model.rxnComps)
0477     if any(strcmp('',model.rxnComps))
0478         dispEM('Either all reactions must have an associated compartment string or none of them'); 
0479     end
0480 end
0481     
0482 %Check gene association for each reaction and populate rxnGeneMat
0483 if ~isempty(model.genes)
0484     model.rxnGeneMat=zeros(numel(model.rxns),numel(model.genes));
0485 end
0486 if ~isempty(model.grRules)
0487     for i=1:length(model.rxns)    
0488        %Check that all gene associations have a match in the gene list
0489        if ~isempty(model.grRules{i})
0490            indexes=strfind(model.grRules{i},':'); %Genes are separated by ":" for AND and ";" for OR
0491            indexes=unique([indexes strfind(model.grRules{i},';')]);
0492            if isempty(indexes)
0493                %See if you have a match
0494                I=find(strcmp(model.grRules{i},model.genes));
0495                if isempty(I)
0496                    dispEM(['The gene association in reaction ' model.rxns{i} ' (' model.grRules{i} ') is not present in the gene list']);
0497                end
0498                model.rxnGeneMat(i,I)=1;
0499            else
0500                temp=[0 indexes numel(model.grRules{i})+1];
0501                for j=1:numel(indexes)+1;
0502                    %The reaction has several associated genes
0503                    geneName=model.grRules{i}(temp(j)+1:temp(j+1)-1);
0504                    I=find(strcmp(geneName,model.genes));
0505                    if isempty(I)
0506                        dispEM(['The gene association in reaction ' model.rxns{i} ' (' geneName ') is not present in the gene list']);
0507                    end
0508                    model.rxnGeneMat(i,I)=1;
0509                end
0510            end
0511             %In order to adhere to the COBRA standards it should be like
0512             %this:
0513             %-If only one gene then no parentheses
0514             %-If only "and" or only "or" there should only be one set of
0515             %parentheses
0516             %-If both "and" and "or", then split on "or". This is not
0517             %complete, but it's the type of relationship supported by the
0518             %Excel formulation
0519             aSign=strfind(model.grRules{i},':');
0520             oSign=strfind(model.grRules{i},';');
0521             if isempty(aSign) && isempty(oSign)
0522                 model.grRules{i}=model.grRules{i};
0523             else
0524                 if isempty(aSign)
0525                     model.grRules{i}=['(' strrep(model.grRules{i},';',' or ') ')'];
0526                 else
0527                     if isempty(oSign)
0528                         model.grRules{i}=['(' strrep(model.grRules{i},':',' and ') ')'];
0529                     else
0530                         model.grRules{i}=['((' strrep(model.grRules{i},';',') or (') '))'];
0531                         model.grRules{i}=strrep(model.grRules{i},':',' and ');
0532                     end
0533                 end
0534             end
0535        end
0536     end
0537 end
0538 model.rxnGeneMat=sparse(model.rxnGeneMat);
0539 
0540 %Check that the compartment for each reaction can be found
0541 if ~isempty(model.rxnComps)
0542     [I, model.rxnComps]=ismember(model.rxnComps,model.comps);
0543     dispEM('The following reactions have compartment abbreviations which could not be found:',true,model.rxns(~I));
0544 end
0545 
0546 %Get all the metabolites and info about them
0547 [raw, flag]=loadSheet(workbook,'METS');
0548 
0549 if flag<0
0550     if printWarnings==true
0551         dispEM('There is no spreadsheet named METS. The metabolites will be named "m1", "m2"... and assigned to the first compartment',false);
0552     end
0553     %Parse the equations to find out how many metabolites there are
0554     metsForParsing=parseRxnEqu(equations);
0555     I=num2cell((1:numel(metsForParsing))');
0556     model.mets=strcat('m',cellfun(@num2str,I,'UniformOutput',false));
0557     model.metComps=ones(numel(model.mets),1);
0558     model.unconstrained=zeros(numel(model.mets),1);
0559     model.metNames=metsForParsing;
0560 else
0561     raw=cleanImported(raw);            
0562 
0563     %Map to new captions
0564     raw(1,:)=upper(raw(1,:));
0565     raw(1,:)=strrep(raw(1,:),'METID','ID');
0566     raw(1,:)=strrep(raw(1,:),'METNAME','NAME');
0567 
0568     allLabels={'ID';'NAME';'UNCONSTRAINED';'MIRIAM';'COMPOSITION';'INCHI';'COMPARTMENT';'REPLACEMENT ID'};
0569 
0570     %Load the metabolite information
0571     metReplacement={};
0572 
0573     %Loop through the labels
0574     [I, J]=ismember(raw(1,:),allLabels);
0575     I=find(I);
0576     for i=1:numel(I)
0577         switch J(I(i))
0578             case 1
0579                model.mets=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0580             case 2
0581                model.metNames=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0582             case 3
0583                model.unconstrained=cellfun(@boolToDouble,raw(2:end,I(i)));
0584                
0585                %NaN is returned if the values couldn't be parsed
0586                dispEM('The UNCONSTRAINED property for the following metabolites must be "true"/"false", 1/0, TRUE/FALSE or not set:',true,model.mets(isnan(model.unconstrained)));
0587             case 4
0588                model.metMiriams=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0589             case 5
0590                model.metFormulas=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0591             case 6
0592                model.inchis=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0593             case 7
0594                 model.metComps=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0595 
0596                 %Check that all metabolites have compartments defined
0597                 if any(strcmp('',model.metComps))
0598                     dispEM('All metabolites must have an associated compartment string'); 
0599                 end
0600             case 8
0601                metReplacement=cellfun(@toStr,raw(2:end,I(i)),'UniformOutput',false);
0602         end
0603     end
0604 
0605     %Check that necessary fields are loaded (METID)
0606     if isempty(model.mets)
0607          dispEM('There must be a column named ID in the METS sheet');   
0608     end
0609 
0610     %Check that some other stuff is loaded and use default values otherwise
0611     if isempty(model.metNames)
0612        model.metNames=cell(numel(model.mets),1);
0613        if printWarnings==true
0614             dispEM('There is no column named NAME in the METS sheet. ID will be used as name',false);
0615        end
0616     end
0617     if isempty(model.unconstrained)
0618        model.unconstrained=zeros(numel(model.mets),1);
0619        if printWarnings==true
0620             dispEM('There is no column named UNCONSTRAINED in the METS sheet. All metabolites will be constrained',false);
0621        end
0622     end
0623 
0624     if isempty(model.metComps)
0625        model.metComps=cell(numel(model.mets),1);
0626        model.metComps(:)=model.comps(1);
0627        if printWarnings==true
0628             dispEM('There is no column named COMPARTMENT in the METS sheet. All metabolites will be assigned to the first compartment in COMPS. Note that RAVEN makes extensive use of metabolite names and compartments. Some features will therefore not function correctly if metabolite compartments are not correctly assigned',false);
0629        end
0630     end
0631 
0632     %The composition should be loaded from InChIs when available
0633     I=find(~cellfun(@isempty,model.inchis));
0634     for i=1:numel(I)
0635         S=regexp(model.inchis(I(i)),'/','split');
0636         S=S{1};
0637         if numel(S)>=2
0638             %Don't copy if it doesn't look good
0639             model.metFormulas(I(i))=S(2);
0640         end
0641     end
0642 
0643     %Check that the compartment for each metabolite can be found. Also convert
0644     %from id to index
0645     [I, model.metComps]=ismember(model.metComps,model.comps);
0646     dispEM('The following metabolites have compartment abbreviations which could not be found:',true,model.mets(~I));
0647     
0648     %Check that the model.mets vector is unique. The problem is that
0649     %the checkModelStruct cannot check for that since only the metReplacements
0650     %(if used) end up in the model structure
0651     I=false(numel(model.mets),1);
0652     [J, K]=unique(model.mets);
0653     if numel(J)~=numel(model.mets)
0654         L=1:numel(model.mets);
0655         L(K)=[];
0656         I(L)=true;
0657     end
0658     dispEM('The following metabolites are duplicates:',~ignoreErrors,model.mets(I));
0659 
0660     %Check that there are no metabolite IDs which are numbers. This would
0661     %give errors when parsing the equations
0662     I=cellfun(@str2double,model.mets);
0663     dispEM('The following metabolites have names which cannot be distinguished from numbers:',~ignoreErrors,model.mets(~isnan(I)));
0664     I=cellfun(@str2double,metReplacement);
0665     dispEM('The following metabolites have names which cannot be distinguished from numbers:',~ignoreErrors,metReplacement(~isnan(I)));
0666     
0667     %Replace the metabolite IDs for those IDs that have a corresponding
0668     %replacement metabolite. This is not used for matching, but will be checked
0669     %for consistency with SBML naming conventions
0670     metsForParsing=model.mets; %This is because the equations are written with this
0671     I=cellfun(@any,metReplacement);
0672     model.mets(I)=metReplacement(I);
0673     
0674     %If the metabolite name isn't set, replace it with the metabolite id
0675     I=~cellfun(@any,model.metNames);
0676     model.metNames(I)=model.mets(I);
0677 
0678     %Construct the metMiriams structure
0679     model.metMiriams=parseMiriam(model.metMiriams);
0680 end
0681 
0682 %Everything seems fine with the metabolite IDs, compartments, genes, and
0683 %reactions
0684 
0685 %Parse the equations
0686 [model.S, mets, badRxns, model.rev]=constructS(equations,metsForParsing,model.rxns);
0687 model.rev=model.rev*1; %Typecast to double
0688 
0689 %Add default constraints
0690 model.lb(isnan(model.lb))=model.annotation.defaultLB.*model.rev(isnan(model.lb));
0691 model.ub(isnan(model.ub))=model.annotation.defaultUB;
0692 
0693 %Reorder the S matrix so that it fits with the metabolite list in the
0694 %structure
0695 [crap, I]=ismember(mets,metsForParsing);
0696 model.S=model.S(I,:);
0697 
0698 %Print warnings about the reactions which contain the same metabolite as
0699 %both reactants and products
0700 dispEM('The following reactions have metabolites which are present more than once. Only the net reactions will be exported:',false,model.rxns(badRxns));
0701 
0702 model.b=zeros(numel(model.mets),1);
0703 
0704 %Remove unused fields
0705 if isempty(model.compOutside)
0706     model=rmfield(model,'compOutside');
0707 end
0708 if isempty(model.compMiriams)
0709     model=rmfield(model,'compMiriams');
0710 end
0711 if isempty(model.rxnComps)
0712     model=rmfield(model,'rxnComps');
0713 end
0714 if isempty(model.grRules)
0715     model=rmfield(model,'grRules');
0716 end
0717 if isempty(model.rxnGeneMat)
0718     model=rmfield(model,'rxnGeneMat');
0719 end
0720 if isempty(model.subSystems)
0721     model=rmfield(model,'subSystems');
0722 end
0723 if isempty(model.eccodes)
0724     model=rmfield(model,'eccodes');
0725 end
0726 if isempty(model.rxnMiriams)
0727     model=rmfield(model,'rxnMiriams');
0728 end
0729 if isempty(model.genes)
0730     model=rmfield(model,'genes');
0731 end
0732 if isempty(model.geneComps)
0733     model=rmfield(model,'geneComps');
0734 end
0735 if isempty(model.geneMiriams)
0736     model=rmfield(model,'geneMiriams');
0737 end
0738 if isempty(model.inchis)
0739     model=rmfield(model,'inchis');
0740 end
0741 if isempty(model.metFormulas)
0742     model=rmfield(model,'metFormulas');
0743 end
0744 if isempty(model.metMiriams)
0745     model=rmfield(model,'metMiriams');
0746 end
0747 
0748 %The model structure has now been reconstructed but it can still contain
0749 %many types of errors. The checkModelConsistency function is used to make
0750 %sure that naming and mapping of stuff looks good
0751 checkModelStruct(model,~ignoreErrors);
0752 
0753 if removeExcMets==true
0754     model=simplifyModel(model);
0755 end
0756 end
0757 
0758 %Cleans up the structure that is imported from using xlsread
0759 function raw=cleanImported(raw)
0760     %First check that it's a cell array. If a sheet is completely empty,
0761     %then raw=NaN
0762     if iscell(raw)
0763         %Clear cells which contain only white spaces or NaN. This could
0764         %happen if you accidentally inserted a space for example. I don't
0765         %know how NaN could occur after switching to Apache POI, but I
0766         %clear it to be sure
0767         whites=cellfun(@wrapperWS,raw);
0768         raw(whites)={[]};
0769         
0770         %Remove columns that don't have string headers. If you cut and paste
0771         %a lot in the sheet there tends to be columns that are empty
0772         I=cellfun(@isstr,raw(1,:));
0773         raw=raw(:,I);
0774         
0775         %Find the rows that are not commented. This corresponds to the
0776         %first row and the ones which are empty in the first column
0777         
0778         keepers=cellfun(@isempty,raw(:,1));
0779         keepers(1)=true;
0780         raw=raw(keepers,:);
0781 
0782         %Check if there are any rows that are all empty. This could happen if
0783         %it reads too far or if the user has inserted them for layout reasons.
0784         %Remove any such rows.
0785         I=~all(cellfun(@isempty,raw),2);
0786         raw=raw(I,:);
0787     else
0788         raw={[]};
0789     end
0790     
0791     %Checks if something is all white spaces or NaN
0792     function I=wrapperWS(A)
0793         if isnan(A)
0794             I=true;
0795         else
0796             %isstrprop gives an error if boolean
0797             if islogical(A)
0798                 I=false;
0799             else
0800                 %I have to make this check, since otherwise it will pick up
0801                 %on doubles for which the ASCII representation is a white
0802                 %space character
0803                 if isnumeric(A)
0804                     I=false;
0805                 else
0806                     I=all(isstrprop(A,'wspace'));
0807                 end
0808             end
0809         end
0810     end
0811 end
0812 function miriamStruct=parseMiriam(strings,miriamStruct)
0813 %Gets the names and values of Miriam-string. Nothing fancy at all, just to
0814 %prevent using the same code for metabolites, genes, and reactions. The
0815 %function also allows for supplying a miriamStruct and the info will then
0816 %be added
0817 
0818 if nargin<2
0819     miriamStruct=cell(numel(strings),1);
0820 end
0821 for i=1:numel(strings)
0822     if any(strings{i})
0823         %A Miriam string can be several ids separated by ";". Each id is
0824         %"name(..:..):value"
0825         I=regexp(strings{i},';','split');
0826         if isfield(miriamStruct{i},'name')
0827             startIndex=numel(miriamStruct{i}.name);
0828             miriamStruct{i}.name=[miriamStruct{i}.name;cell(numel(I),1)];
0829             miriamStruct{i}.value=[miriamStruct{i}.value;cell(numel(I),1)];
0830         else
0831             startIndex=0;
0832             miriamStruct{i}.name=cell(numel(I),1);
0833             miriamStruct{i}.value=cell(numel(I),1);
0834         end
0835         
0836         for j=1:numel(I)
0837             index=max(strfind(I{j},':'));
0838             if any(index)
0839                 miriamStruct{i}.name{startIndex+j}=I{j}(1:index-1);
0840                 miriamStruct{i}.value{startIndex+j}=I{j}(index+1:end);
0841             else
0842                 dispEM(['"' I{j} '" is not a valid MIRIAM string. The format must be "identifier:value"']);
0843             end
0844         end
0845     end
0846 end
0847 end
0848 
0849 %Loads a sheet into a cell matrix using the Java library Apache POI.
0850 
0851 %flag       -1 if the sheet couldn't be found
0852 %           0 if everything worked well
0853 function [raw, flag]=loadSheet(workbook, sheet)      
0854     flag=0;
0855     raw={};
0856     
0857     sh=workbook.getSheet(sheet);
0858     if isempty(sh)
0859         flag=-1;
0860         return;
0861     end
0862     
0863     lastRow=sh.getLastRowNum();
0864     wasEmpty=false(lastRow+1,1);
0865     raw=cell(lastRow+1,0); %Allocate space for the cell array. The number of columns isn't know yet, as it's saved row-by-row
0866     for i=0:lastRow
0867         row=sh.getRow(i);
0868         %Sometimes the last rows only contain formatting (or some other
0869         %weird Excel thing). Ignore such empty rows. Note the +1 to deal with that Matlab indexing
0870         %starts at 1
0871         if isempty(row)
0872             wasEmpty(i+1)=true;
0873             continue;
0874         end
0875         lastCol=row.getLastCellNum();
0876    
0877         %Adjust the size of the cell array if needed
0878         if (lastCol+1)>size(raw,2)
0879             raw=[raw cell(lastRow+1,lastCol+1-size(raw,2))];
0880         end
0881         
0882         %Loop over the columns
0883         for j=0:lastCol
0884            c=row.getCell(j,row.RETURN_BLANK_AS_NULL);
0885            if ~isempty(c)
0886                %Then decide how to save it depending on the type
0887                switch c.getCellType()
0888                    case c.CELL_TYPE_STRING
0889                         raw{i+1,j+1}=char(c.getRichStringCellValue().getString());
0890                    case c.CELL_TYPE_NUMERIC
0891                         raw{i+1,j+1}=c.getNumericCellValue();
0892                    case c.CELL_TYPE_BOOLEAN
0893                         raw{i+1,j+1}=c.getBooleanCellValue();
0894                end
0895            end
0896         end
0897     end
0898     
0899     %Remove empty rows
0900     raw(wasEmpty,:)=[];
0901 end
0902 
0903 %For converting a value to string. This is used instead of num2str because
0904 %I want to convert empty cells to {''}.
0905 function y=toStr(x)
0906     %x can be empty, numerical, string or boolean. It cannot be NaN.
0907     %Boolean values will be converted to '1'/'0'
0908     if isempty(x)
0909         y='';
0910     else
0911         y=num2str(x);
0912     end
0913 end
0914 
0915 %For converting to numeric. This is used instead of str2num because I want
0916 %to be able to choose what empty values should be mapped to.
0917 %
0918 % default the value to use for empty input
0919 function y=toDouble(x,default)
0920     if isempty(x) %Note that this catches '' as well
0921         y=default;
0922     else
0923         if isnumeric(x)
0924             y=x;
0925         else
0926             y=str2double(x);
0927             
0928             %This happens if the input couldn't be converted.
0929             %Note that the input itself cannot be NaN since it was fixed in
0930             %clean imported
0931             if isnan(y)
0932                 dispEM(['Cannot convert the string "' x '" to double']);
0933             end
0934         end
0935     end
0936 end
0937 
0938 %For converting boolean (the UNCONSTRAINED field) to double (the
0939 %model.unconstrained field)
0940 function y=boolToDouble(x)
0941     if isempty(x)
0942         y=0;
0943         return;
0944     end
0945     if islogical(x)
0946         y=x*1; %Typecast to double
0947         return;
0948     end
0949     if isnumeric(x)
0950        if x~=0
0951            y=1;
0952            return;
0953        else
0954            y=0;
0955            return;
0956        end
0957     end
0958     if ischar(x)
0959        if strcmpi(x,'TRUE')
0960            y=1;
0961            return;
0962        end
0963        if strcmpi(x,'FALSE')
0964            y=0;
0965            return;
0966        end
0967     end
0968     y=NaN; %This means that the input couldn't be parsed
0969 end

Generated on Mon 06-Jan-2014 14:58:12 by m2html © 2005