gene-quantification.eu

Modern Medicine

Kamagra repose sur le sildénafil comme principe actif, avec un mode d’action identique à celui du Viagra. La forme galénique en gel oral permet une absorption plus rapide et une concentration plasmatique maximale plus précoce que les comprimés. Le mécanisme implique l’inhibition compétitive de la PDE5, entraînant une relaxation musculaire lisse locale et une vasodilatation ciblée. La demi-vie courte, environ 4 heures, limite la durée d’action. L’élimination se fait après métabolisme hépatique, impliquant majoritairement le CYP3A4. L’incidence d’effets indésirables comprend céphalées, rougeurs et congestion nasale, de façon transitoire. Dans les comparatifs pharmacologiques, acheter kamagra sans ordonnance est associé aux présentations galéniques alternatives disponibles.

Gene-quantification.eu

Roadmap for Developing and ValidatingTherapeutically Relevant Genomic ClassiﬁersRichard Simon Oncologists need improved tools for selecting treatments for individual patients. The devel- opment of therapeutically relevant prognostic markers has traditionally been slowed by poor study design, inconsistent findings, and lack of proper validation studies. Microarray expres- sion profiling provides an exciting new technology for relating tumor gene expression to pa- tient outcome, but it also provides increased challenges for translating initial research findings into robust diagnostics that benefit patients and physicians in therapeutic decision making.
This article attempts to clarify some of the misconceptions about the development and val- idation of multigene expression signature classifiers and highlights the steps needed to move genomic signatures into clinical application as therapeutically relevant and robust diagnostics.
INTRODUCTION
and excessive skepticism. In this article, I Oncologists need improved tools for select- will attempt to clarify some of the miscon- ing treatments for individual patients. Most cancer treatments beneﬁt only a minority of the patients to whom they are adminis- classiﬁers and highlight the steps needed tered. Being able to predict which patients are most likely to beneﬁt would not only application as therapeutically relevant and save patients from unnecessary toxicity and inconvenience, but might facilitate their re-ceiving drugs that are more likely to helpthem. In addition, the current overtreatment WHY ARE SO FEW PROGNOSTIC FACTORS
of patients results in major expense for indi- USED IN ONCOLOGY?
viduals and society, an expense that may not Although there is a large literature on prog- nostic factors for cancer patients, very few such factors are used in clinical practice.
provided an exciting new technology for at- Prognostic factors are unlikely to be used tempting to identify classiﬁers for tailoring unless they are therapeutically relevant, treatments to patients. To date, however, been widely adopted into oncology practice studies are conducted using a convenience and very few are close to achieving such sta- sample of patients for whom tissue is avail- tus. Development of biomarker classiﬁers able, but the cohort is often far too hetero- geneous with regard to stage and treatment and sufﬁciently validated for broad clinical to support therapeutically relevant conclu- application is difﬁcult, and more difﬁcult sions. Additional problems in the prognos- for expression signature classiﬁers. The tic marker literature derive from the fact ﬁeld of microarray expression proﬁling is Information downloaded from jco.ascopubs.org and provided by SWETS SUBSCRIPTION SERVICE for Bayerische Staatsbibliothek on March 4, 2008 from 194.95.59.195. Copyright 2005 by the American Society of Clinical Oncology. All rights reserved. Therapeutically Relevant Genomic Classifiers
markers and prognostic models, but do not test prespeci- sion contexts where even accurate, reproducible, and ﬁed models using independent data. Clinical drug trials well-validated classiﬁers are unlikely to be used widely.
are generally prospective, with patient selection criteria, For example, consider the treatment of patients with ad- primary end point, hypotheses, and analysis plan speciﬁed vanced disease treated with a potentially curative treat- in advance in a written protocol. The consumers of clinical ment. A classiﬁer for predicting the patients unlikely to trial reports have been educated to be skeptical of data respond to that therapy may not be widely used if there dredging to ﬁnd something ‘‘statistically signiﬁcant’’ to re- is no good alternative treatment. The classiﬁer would port in clinical trials. They are skeptical of analyses with have to have a very high negative predictive value in order multiple end points or multiple subsets, knowing that to justify withholding a potentially curative therapy. It is the chances of erroneous conclusions increase rapidly important to evaluate carefully the context of therapeutic once one leaves the context of a focused, single-hypothesis decision making if one wants to develop a classiﬁer that clinical trial. Prognostic marker studies are generally per- has a sufﬁciently great chance of having clinical impact formed with no written protocol, no eligibility criteria, no to warrant the large expense and time commitment re- primary end point or hypotheses and no deﬁned analysis quired to achieve the other parts of Table 1.
plan. The analysis often includes numerous analyses of dif-ferent end points and patient subsets. The problem is not WHAT IS A MULTIGENE CLASSIFIER?
just that the studies are for developing prognostic markers A multigene expression signature classiﬁer is a function rather than validating previously speciﬁed markers, but that provides a classiﬁcation of a tumor based on the ex- that even as developmental studies the planning and anal- pression levels of the component genes. The classes are of- ten good-risk or poor-risk, but classiﬁers can be deﬁned to Another feature that has hindered the use of prog- distinguish any set of classes for which a training set of nostic markers in medical practice is the lack of studies cases exist for each class. The term ‘‘classiﬁer’’ is somewhat demonstrating the reproducibility of results for assaying over-restrictive because a multigene biomarker can be a markers either between laboratories, between samples of function that provides a continuous risk score rather the same tissue specimen, or between times and readers than a class identiﬁer. Here we will use the term ‘‘classi- ﬁer’’ however, because for validation purposes it is usually Many of these problems apply to studies of prog- important that cutoff thresholds of a risk score be deﬁned nostic classiﬁers on gene expression proﬁles. Some of the problems are even more formidable. Because of the Some people prefer the phrase ‘‘multigene bio- number of genes available for analysis, microarray data marker’’ to ‘‘multigene classiﬁer.’’ This can lead to serious can be a veritable fountain of false ﬁndings unless a misunderstandings, however. A completely deﬁned classi- structured approach to model development and valida- ﬁer can be used to select patients and stratify patients for therapy, and the clinical effectiveness of the classiﬁer can Some of the key steps in obtaining a classiﬁer that is potentially be validated. Specifying only the genes involved ready for ‘‘prime time’’ are listed in Table 1. These steps does not enable one to structure prospective clinical are discussed in the following sections. We have already validation experiments in which patients are assigned or discussed the importance of developing the classiﬁer for stratiﬁed in prospectively well-deﬁned ways. Hence, one is a speciﬁc therapeutic decision problem and using cases rel- forever correlating expression of individual genes against evant to that decision context. That is of key importance.
outcomes, but never evaluating the use of a deﬁned diag- There are, however, some well-deﬁned therapeutic deci- nostic classiﬁer that can be applied to patients. The genesets identiﬁed as associated with outcome tend to be un-stable because gene groups are correlated by co-regulation Table 1. Key Steps in Development and Validation of Therapeutically
and the stringent criteria used for identifying differentially expressed genes results in reduced statistical power for Develop classifier for addressing a specific important therapeutic decision gene selection. It is often much easier to develop a classiﬁer Patients are sufficiently homogeneous and receiving uniformtreatment so that results are therapeutically relevant that performs accurately than it is to identify exactly the Treatment options and costs of mis-classification are such that a Perform internal validation of classifier to assess whether it appears The components of expression signature classiﬁers sufficiently accurate relative to standard prognostic factors that it is need not be valid biomarkers in the sense of the US Translate classifier to platform that would be used for broad clinical Food and Drug Administration.3 Those criteria require that the role of the biomarker be mechanistically under- Demonstrate that the classifier is reproducibleIndependent validation of the completely specified classifier on a stood and accepted as markers of disease activity. Such criteria are relevant for biomarkers used as surrogateend points but not for the components of expression Information downloaded from jco.ascopubs.org and provided by SWETS SUBSCRIPTION SERVICE for Bayerische Staatsbibliothek on March 4, 2008 from 194.95.59.195. Copyright 2005 by the American Society of Clinical Oncology. All rights reserved. Richard Simon
signatures used for tailoring treatments. It is, of course, uct kernel,8 perceptrons,9 and the naı¨ve Bayes classiﬁer for desirable to understand the mechanistic relationship of the components of an expression signature, but the classi- When the number of genes (p) is greater than the ﬁer can be validated without such understanding and clear number of cases (n), perfect separation of a training set biologic interpretation may be more difﬁcult to achieve is always possible with a linear classiﬁer. In fact, there are an inﬁnite number of linear classiﬁers that achieve The concept of ‘‘validation’’ has been problematic for perfect separation. That suggests that there may not be the development of traditional disease biomarkers. Much sufﬁcient information in most datasets to effectively utilize of the confusion derives from attempting to deﬁne valida- nonlinear classiﬁers. Although complex nonlinear classi- tion in an absolute sense. A much more pragmatic and ﬁers are popular, there is very little evidence that they productive approach is to focus on validation for a speci- perform any better than simpler methods.
ﬁed purpose. For example, an expression signature should In the study of Dudoit et al,5 the simplest methods, be developed for the purpose of predicting outcome for diagonal linear discriminant analysis and nearest-neighbor a well-deﬁned set of patients who receive a well-deﬁned classiﬁcation, performed as well or better than the more therapy. The signature classiﬁer would be developed using complex methods. Nearest-neighbor classiﬁcation is based data from such patients and would be validated for an in- on a distance function d(_x,_y), which measures the distance dependent set of such patients. The developmental study between the expression proﬁles _x and _y of two samples.
would identify the genes to be included in the classiﬁer, The distance function utilizes only the genes in the selected usually by screening a much larger set of genes to ﬁnd set of genes F. To classify a sample with expression proﬁle those whose expression is most correlated with outcome.
_y, compute d(_x,_y) for each sample _x in the training set.
The developmental study would also combine the genes The predicted class of _y is the class of the sample in the into a completely speciﬁed classiﬁer that can be used training set that is closest to _y with regard to the dis- and potentially validated in a subsequent study. The vali- dation does not consist of seeing whether the same genes Paik et al11 used linear classiﬁers for predicting recur- are prognostic in the subsequent study. The validation rence risk of patients with primary breast cancer. Paik et al should be focused on addressing whether the application identiﬁed 19 genes for inclusion in the classiﬁer. These of the previously deﬁned classiﬁer to a new set of patients included ﬁve proliferation genes, four genes related to es- results in clinical beneﬁt. This is discussed further in a trogen metabolism, two Her2 genes, two genes related to tissue invasion, and three other genes. These genes wereselected on the basis of their correlation with recurrence DEVELOPING A GENOMIC CLASSIFIER
in a training set of data. The classiﬁer was based on com-puting the average expression level for each gene group What Kinds of Classiﬁers Are Most Useful?
and then a weighted average of the gene group–speciﬁc Many algorithms have been used effectively with DNA averages. The genes not in the proliferation, estrogen, microarray data for class prediction. A linear discriminant Her2 or invasion groups were taken as members of single- ton groups. The weights were determined to optimize pre-diction on the training set. The ﬁnal component of the classiﬁer determined based on the training set were two cutpoints for the weighted sum of gene expression in order where xi denotes the expression measurement for the to deﬁne groups with a low risk, intermediate risk, and ith gene, wi is the weight given to that gene, and the summa- tion is over the set F of features (genes) selected for inclusionin the classiﬁer. For a two-class problem, there is a threshold How Many Genes Should Be Included
value c that must be deﬁned; a sample with expression pro- in the Classiﬁer?
ﬁle deﬁned by a vector _x of values is predicted to be in class 1 Most classiﬁers do not use all of the genes whose ex- or class 2 depending on whether l(_x) as computed from the pression is measured. Consequently, one step in develop- equation is less than or greater than c.
ing a classiﬁer is determining which genes to include; this Many kinds of classiﬁers used in the literature have is called feature selection. Using all of the genes means that the form shown in the preceding equation. They differ all of the genes would have to be measured in the future for with regard to how the weights are determined. These clas- classiﬁcation of new patients. That is particularly problem- siﬁers include Fisher’s linear discriminant analysis and di- atic if the classiﬁer is going to be converted to a real-time agonal discriminant analysis,5 the compound covariate reverse transcriptase polymerase chain reaction (RT-PCR) predictor of Radmacher et al,6 the weighted voting method platform. Also, the number of genes that are actually dif- of Golub et al,7 support vector machines with inner prod- ferentially expressed between the classes (ie, ‘‘informative Information downloaded from jco.ascopubs.org and provided by SWETS SUBSCRIPTION SERVICE for Bayerische Staatsbibliothek on March 4, 2008 from 194.95.59.195. Copyright 2005 by the American Society of Clinical Oncology. All rights reserved. Therapeutically Relevant Genomic Classifiers
genes’’) is usually small compared to the number of genes method of partitioning the set of samples into a training that are not differentially expressed (‘‘noise genes’’). In- set and a test set. Rosenwald et al12 used this approach suc- cluding too many noise genes can dilute the inﬂuence cessfully in their international study of prognostic predic- of the informative genes and reduce the accuracy of pre- tion for large B cell lymphoma. They used two thirds of diction. It also makes interpretation and future use of the their samples as a training set. Multiple kinds of predictors were studied on the training set. When the collaborators of It is sometimes possible to distinguish very different that study agreed on a single fully speciﬁed prediction cell types based on expression levels of a small number model, they accessed the test set for the ﬁrst time. On of genes. Even if such genes are not known a priori, the test set there was no adjustment of the model or ﬁtting they can be identiﬁed if they are very differentially ex- of parameters. They merely used the samples in the test set pressed in the two cell types. This is often not the case to evaluate the predictions of the model that was com- for more difﬁcult classiﬁcation problems however. For pletely speciﬁed using only the training data. In addition these problems there may be a dozen or more differentially to estimating the overall error rate on the test set, one can expressed genes, but the fold differences in expression may also estimate other important operating characteristics of not be large and it may be difﬁcult to identify these genes the test such as sensitivity, speciﬁcity, positive and negative from among the thousands of noise genes. Omitting infor- mative genes from a classiﬁer has a greater deleterious ef- The split-sample method is often used with so few fect on classiﬁcation accuracy than does inclusion of noise samples in the test set, however, that the validation is genes, so long as the number of noise genes included is not almost meaningless. One can evaluate the adequacy of too great. Consequently, in many cases accurate classiﬁers the size of the test set by computing the statistical sig- can be developed, but it is more difﬁcult to develop such niﬁcance of the classiﬁcation error rate on the test set classiﬁers based on a very small number of genes.
or by computing a conﬁdence interval for the test-seterror rate. Since the test set is separate from the training INTERNAL VALIDATION OF A CLASSIFIER
set, the number of errors on the test set has a bino- IN DEVELOPMENTAL STUDIES
It is useful to divide genomic classiﬁer studies into devel- Michiels et al13 suggested that multiple training-test opmental studies and validation studies. Developmental partitions be used, rather than just one. The split sample studies are the ones that ﬁrst develop the classiﬁers and approach is mostly useful, however, when one does not are analogous to phase II clinical trials. They should in- have a well-deﬁned algorithm for developing the classiﬁer.
clude an indication of whether the genomic classiﬁer is When there is a single training set-test set partition, one promising and worthy of phase III evaluation. There are can perform numerous unplanned analyses on the training special problems in evaluating whether a genomic classiﬁer set to develop a classiﬁer and then test that classiﬁer on the is promising based on a developmental study, however.
test set. With multiple training-test partitions however, The difﬁculty derives from the fact that the number of can- that type of ﬂexible approach to model development didate genes available for use in the classiﬁer is much cannot be used. If one has an algorithm for classiﬁer de- larger than the number of cases available for analysis. In velopment, it is generally better to use one of the cross such situations, it is always possible to ﬁnd classiﬁers validation or bootstrap resampling approaches to estimat- that accurately classify the data on which they were devel- ing error rate because the split sample approach does not oped even if there is no relationship between expression of provide as efﬁcient a use of the available data.14 Some of any of the genes and outcome.6 Consequently, even in de- the conclusions of Michiels et al about the inaccuracy of velopmental studies, some kind of validation on data not published expression proﬁles may be artifacts of their used for developing the model is necessary. This internal validation is usually accomplished either by splitting thedata into two portions, one used for training the model Cross Validation
and the other for testing the model, or some form of cross Cross validation is an alternative to the split sample validation based on repeated model development and test- method of estimating prediction accuracy.6 Molinaro et al14 ing on random data partitions. This internal validation describe and evaluate many variants of cross-validation should not, however, be confused with the kind of external and bootstrap resampling for classiﬁcation problems validation of the classiﬁer in a setting simulating broad where the number of candidate predictors vastly exceeds the number of cases. For illustration we will describeleave-one-out cross validation (LOOCV). LOOCV starts Split-Sample Validation
like split-sample cross validation in forming a training The most straightforward method of estimating the set of samples and a test set. With LOOCV, however, accuracy of future prediction is the split-sample validation the test set consists of only a single sample; the rest of Information downloaded from jco.ascopubs.org and provided by SWETS SUBSCRIPTION SERVICE for Bayerische Staatsbibliothek on March 4, 2008 from 194.95.59.195. Copyright 2005 by the American Society of Clinical Oncology. All rights reserved. Richard Simon
the samples are placed in the training set. The sample in Simon et al15 performed a simulation to examine the the test set is placed aside and not utilized at all in the de- bias in estimated error rates for class prediction. Two types velopment of the class prediction model. Using only the of LOOCV were studied: one with removal of the left-out training set, the informative genes are selected and the pa- specimen before selection of differentially expressed genes rameters of the model are ﬁt to the data. Let us call M1 the and one with removal of the left-out specimen before com- model developed with sample 1 in the test set. When this putation of gene weights and the prediction rule but after model is fully developed, it is used to predict the class of gene selection. They also computed the re-substitution sample 1. This prediction is made using the expression estimate of the error rate. In a simulated dataset, 20 gene proﬁle of sample 1, but obviously without using knowl- expression proﬁles of length 6,000 were randomly generated edge of the true class of sample 1. This predicted class is from the same distribution. Ten proﬁles were arbitrarily as- compared to the true class label of sample 1. If they dis- signed to class 1 and the other 10 to class 2, creating an agree, then the prediction is in error. Then a new training artiﬁcial separation of the proﬁles into two classes. Since set–test set partition is created. This time sample 2 is no true underlying difference exists between the two classes placed in the test set and all of the other samples, including class prediction will perform no better than a random guess sample 1, are placed in the training set. A new model is for future biologically independent samples. Hence, the constructed from scratch using the samples in the new estimated error rates for simulated data sets should be training set. Call this model M2 . Although the same algo- centered around 0.5 (ie, 10 misclassiﬁcations of 20).
rithm for gene selection and parameter estimation is used, Figure 1 shows the observed number of misclassiﬁca- since model M2 is constructed from scratch on the new tions resulting from each level of cross validation for 2,000 training set, it will in general not contain exactly the same simulated data sets. It is well known that the re-substitution gene set as M1. After creating M2, it is applied to the expres- estimate of error is biased for small data sets and the sion proﬁle of sample 2, which was omitted. If this predicted simulation conﬁrms this, with an astounding 98.2% of class does not agree with the true class label of the second the simulated data sets resulting in zero misclassiﬁcations sample, then the prediction is in error. The process is re- even though no true underlying difference exists between peated leaving each of the n biologically independent sam- the two groups. Moreover, the maximum number of mis- ples out of the training set, one at a time. During the steps, n classiﬁed proﬁles using the resubstitution method was different models are created and each one is used to predict the class of the omitted sample. The number of prediction Cross validating the prediction rule after selection of errors is totaled and reported as the leave-one-out cross- differentially expressed genes from the full data set does validated estimate of the prediction error.
little to correct the bias of the re-substitution estimator: At the end of the LOOCV procedure, you have con- 90.2% of simulated data sets still result in zero misclassi- structed n different models. They were constructed in or- ﬁcations. It is not until gene selection is also subjected der only to estimate the prediction error associated with to cross validation that we observe results in line with our the type of model constructed. The model that wouldbe used for future predictions is one constructed using all n samples. That is the best model for future prediction Cross validation: none (resubstitution method) and the one that should be reported in the publication.
The cross-validated error rate is an estimate of the errorrate to be expected in use of this model for future samples,assuming that the relationship between class and expres- sion proﬁle is the same for future samples as for the cur- Data Sets
rently available samples. With two classes, one can use asimilar approach to obtain cross-validated estimates of the sensitivity, speciﬁcity, and the negative and positive predic- Proportion of Simulated
tive values of the classiﬁcation procedure. One could even estimate an entire receiver operating characteristics curve.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 The cross-validated prediction error is an estimate of No. of Misclassifications
the prediction error associated with application of the al- Fig 1. The effect of various levels of cross validation on the estimated error
gorithm for model building to the entire dataset. A com- rate of a predictor. Two thousand datasets were simulated as described in monly used invalid estimate is called the re-substitution the text. Class labels were arbitrarily assigned to the specimens within eachdataset, and so poor classification accuracy is expected. Class prediction estimate. You use all the samples to develop a model.
was performed on each dataset as described in the supplemental infor- Then you predict the class of each sample using that mation, varying the level of leave-one-out cross validation used in prediction.
model. The predicted class labels are compared to the Vertical bars indicate the proportion of simulated data sets (of 2,000)resulting in a given number of misclassifications for a specified cross- true class labels and the errors are totaled.
validation strategy. Reprinted from Simon et al.15 Information downloaded from jco.ascopubs.org and provided by SWETS SUBSCRIPTION SERVICE for Bayerische Staatsbibliothek on March 4, 2008 from 194.95.59.195. Copyright 2005 by the American Society of Clinical Oncology. All rights reserved. Therapeutically Relevant Genomic Classifiers
expectation: the median number of misclassiﬁed proﬁles ple, Rosenwald et al12 developed a classiﬁer of outcome for jumps to 11, although the range is large (0 to 20).
patients with advanced diffuse large B cell lymphoma The simulation results underscore the importance receiving CHOP chemotherapy. The International Prog- of cross validating all steps of predictor construction in nostic Index (IPI) is easily measured and prognostically estimating the error rate. A study of breast cancer also important for such patients, however, and so it was impor- illustrates the point: van’t Veer et al16 predicted clinical out- tant for Rosenwald et al to address whether their classiﬁer come of patients with axillary node-negative breast cancer (metastatic disease within 5 years v disease free at 5 years) The most effective way of addressing whether a classi- from gene expression proﬁles. The investigators controlled ﬁer adds predictive accuracy to a standard classiﬁcation sys- the number of misclassiﬁed recurrent cases (ie, the sensitiv- tem is to examine outcome for the new system within the ity of the test) in both situations, so here we focus attention levels of the standard system. This was the approach used by on the difference in estimated error rates for the disease-free Rosenwald et al12 for data in their separate test set. This is cases. Partial and complete cross validation resulted in esti- illustrated in Figure 2. The spread of the outcome survival mated error rates of 27% (12 of 44) and 41% (18 of 44), curves for the classes deﬁned by the new expression classi- respectively. The improperly cross-validated method results ﬁer within levels of the IPI indicate the extent to which the in a seriously biased underestimate of the error rate, prob- new system adds classiﬁcation accuracy. When the classiﬁer ably largely due to overﬁtting the predictor to the speciﬁc has been completely determined on a training set of data, dataset. Other examples of incorrect use of cross validation then the statistical signiﬁcance of the contribution of the are described by Ambroise and McLachlan.17 There are new classiﬁer to the standard IPI can be computed easily numerous articles in the most prominent journals, written from a log-rank test using the test-set data.
by both biologists and methodologists, that make claims Measuring whether a classiﬁer adds predictive accu- for gene expression classiﬁers and for new classiﬁcation racy when there is not a separate test set is more difﬁcult.
algorithms, which are invalid because they have cross Curves such as those shown in Figure 2 can be constructed using the predicted class of each case as determined by It is important to compute the statistical signiﬁcance cross validation. The separation of the survival curves of the cross-validated estimate of classiﬁcation error. This within levels of the standard prognostic factor is still a valid determines the probability of obtaining a cross-validated measure of the independent contribution of the expression classiﬁcation error as small as actually achieved if there classiﬁer, but the statistical signiﬁcance of the contribution were no relationship between the expression data and class can no longer be determined by computing a log-rank identiﬁers. A ﬂexible method for computing this statistical test of the separation in survival curves. The standard signiﬁcance was described by Radmacher et al.6 It involves log-rank test is not valid because the classes were not de- randomly permuting the class identiﬁers among the termined independently of the data. The cross-validation patients and then recalculating the cross-validated classiﬁ- process induces a dependence among cases that invalidates cation error for the permuted data. This is done a large the standard statistical analysis. The statistical signiﬁ- number of times to generate the null distribution of cance of the independent contribution of the new classiﬁer the cross-validated prediction error. If the value of the can be determined using more complex permutation cross-validated error obtained for the real data lies far enough in the tail of this null distribution, then the results Several important publications have attempted to are statistically signiﬁcant. This method of computing determine the relative importance of an expression clas- statistical signiﬁcance of cross-validated error rate for a siﬁer and standard prognostic factors by using standard wide variety of classiﬁer functions is implemented in the multivariate statistical models, such as the logistic model BRB-ArrayTools software (National Cancer Institute, for binary response data and the proportional hazards Bethesda, MD).18 Statistical signiﬁcance, however, does model for survival data. The models often include stan- not imply that the prediction accuracy is sufﬁcient for dard prognostic factors and the predicted class of a case the test to have clinical value, however.
based on a cross-validation analysis.16 Statistical sig-niﬁcance and CIs for the regression coefﬁcients corre- DOES THE CLASSIFIER PERFORM BETTER THAN
sponding to each factor are then computed using the STANDARD PROGNOSTIC FACTORS?
usual formulas. This kind of analysis is problematic, Even if a classiﬁer is developed for a set of patients sufﬁ- however.20 There is also a more fundamental problem ciently homogeneous and uniformly treated to be thera- with this kind of analysis. The value of an expression peutically relevant, it may be important to evaluate based classiﬁer is determined by its prediction accuracy.
whether the classiﬁer predicts more accurately than do Consequently, the analysis should emphasize estimating standard prognostic factors or adds predictive accuracy prediction accuracy, not the size of regression coefﬁ- to that provided by standard prognostic factors. For exam- cients, in additive multivariate models.21 Information downloaded from jco.ascopubs.org and provided by SWETS SUBSCRIPTION SERVICE for Bayerische Staatsbibliothek on March 4, 2008 from 194.95.59.195. Copyright 2005 by the American Society of Clinical Oncology. All rights reserved. Richard Simon
There are considerable challenges with microarray ex- pression proﬁling of formalin-ﬁxed parafﬁn-embedded (FFPE) tissue. With appropriately designed primers, however, RT-PCR can be performed on FFPE tissue.22 Consequently, the developmental strategy of screening the genome using microarrays and then developing genomic classiﬁers based on a limited number of genes whoseexpression is measured using RT-PCR on FFPE tissue is Probability of Survival
Whether the classiﬁer is based on DNA microarray analysis or on RT-PCR analysis, it is important that the assay be standardized and that evaluations of reproducibility be conducted. The study by Dobbin et al23 demonstrated that microarray protocols using Affymetrix arrays couldbe sufﬁciently standardized to achieve good inter- and intra-laboratory reproducibility. Achieving such repro-ducibility requires standardization of protocols and stan- dardization of platform and reagents, however. One of the challenges in moving genomic classiﬁers to the clinicis the conduct of such studies. If a genomic classiﬁer is used for identifying a patient population for which an experimental drug is shown to be effective, the drug sponsor Probability of Survival
has a ﬁnancial incentive to adequately standardize and val-idate the classiﬁer so that the classiﬁer can be approved as a diagnostic test. In using genomic classiﬁers with commer- cially available therapy, however, it is not clear whether any- one has sufﬁcient incentive to do the laborious but necessary studies needed to standardize and validate the reproducibil- ity of the assay for measuring the classiﬁer.
INDEPENDENT VALIDATION OF GENOMIC CLASSIFIERS
Although studies that develop classiﬁers often report a seemingly impressive accuracy for predicting outcome, there is abundant reason to demand external validationbased on truly independent data. We refer to this as exter- nal validation because it is based on independent data Probability of Survival
external to the study used to develop the classiﬁer. The analysis of high-dimensional gene expression data is com- plex and there are many examples of serious errors in in-ternal estimates of accuracy included in publications in the Fig 2. Survival curves for diffuse large-B-cell lymphoma patients by gene
expression classifier stratified by three levels of International Prognostic best journals. There are also potential biases in internal es- Index (IPI) score: (A) IPI scores 0-1; (B) IPI scores 2-3; (C) IPI scores 4-5.Four timates of accuracy based on tissue handling and assay re- prognostic classes were defined based on gene expression risk score.
agent differences between cases and controls or responders Graphs show survival curves for patients with risk score below the median(quartiles 1 and 2) versus patients with risk score above the median (quartiles and nonresponders. Developmental studies also often uti- 3 and 4). Reprinted from Rosenwald et al.12 lize patients selected in a manner that may not be repre-sentative of the diversity of patients to whom the classiﬁer TRANSLATION OF PLATFORMS AND DEMONSTRATING
would be applied if it were adopted for broad clinical use.
ASSAY REPRODUCIBILITY
Developmental studies also often have the assay performed The power of microarray expression proﬁling lies in the in one research laboratory based on archived specimens parallel measurement of expression levels for thousands and this may not reﬂect the sources of assay variability of genes. This is useful for screening genes to ﬁnd those likely to be encountered in broad practice.24 that should be included in a classiﬁer, but it is rarely nec- Often the initial study in which the classiﬁer is devel- essary to measure expression for hundreds or thousands of oped will not be large enough to estimate the positive and genes in application of the classiﬁer to subsequent cases.
negative predictive values of the test with sufﬁcient Information downloaded from jco.ascopubs.org and provided by SWETS SUBSCRIPTION SERVICE for Bayerische Staatsbibliothek on March 4, 2008 from 194.95.59.195. Copyright 2005 by the American Society of Clinical Oncology. All rights reserved. Therapeutically Relevant Genomic Classifiers
precision to determine whether the test has real clinical dated for providing clinical beneﬁt because it enabled the utility. It is important that the intended clinical use of identiﬁcation of patients whose prognosis was so good the classiﬁer be carefully considered in planning the exter- with tamoxifen monotherapy that they could be spared nal validation study so that these performance character- the toxicity, inconvenience and expense of chemotherapy.
This was the approach used by Paik et al11 for validation of The objective of external validation is to determine the OncoType Dx classiﬁer for patients with node-negative, whether use of a completely speciﬁed diagnostic classiﬁer ER-positive breast cancer. The genes that seemed prog- for therapeutic decision making in a deﬁned clinical con- nostic were initially identiﬁed based on published micro- text results in patient beneﬁt. The objective is not to repeat array studies. Primers for measuring expression of those the developmental study and see if the same genes are genes using RT-PCR of FFPE tissue were developed and prognostic or if the same classiﬁer is obtained. An inde- a classiﬁer was developed based on archived tissue from pendent validation study could be a prospective clinical National Surgical Adjuvant Breast and Bowel (NSABP) trial in which patients are randomly assigned to treatment studies. The completely prespeciﬁed classiﬁer was then assignment without use of the classiﬁer versus treatment tested on 668 patients from NSABP B-14 who received assignment with the aid of the classiﬁer. Often, however, tamoxifen alone as systemic therapy. Fifty-one percent this design will be inefﬁcient and require a huge sample of the assayed patients fell into the low-risk group. They size because many or most of the patients will receive had a distant recurrence rate at 10 years of 6.8% (95% the same treatment either way they are assigned. For exam- CI, 4.0% to 9.6%). Much higher rates of distant recurrence ple, consider women with lymph node-negative, estrogen were seen in the intermediate- and high-risk groups of the receptor (ER) –positive breast cancers. Approximately one classiﬁer (14.3% and 30.5%, respectively).
third of such patients might be expected to be classiﬁed as One might argue that treatment determination using low risk for recurrence based on the Oncotype-DX expres- a genomic classiﬁer for women with stage I ER-positive sion signature–based risk score.11 If one wants to test the breast cancer should not be compared with the strategy strategy of withholding cytotoxic chemotherapy from the of administering to all such women tamoxifen plus subset of patients classiﬁed as low risk, it would be inefﬁcient chemotherapy, because there are practice guidelines to randomly assign all of the node-negative, ER-positive available based on tumor size and age that withhold patients. If one randomly assigns all the patients and per- chemotherapy from some patients. Nevertheless, it forms the assay on only the half assigned to have classiﬁer would still be inefﬁcient to randomly assign women to based therapy, then the two randomization groups must genomic classiﬁer–determined therapy or nongenomic be compared overall, although two thirds of the patients practice guidelines–determined therapy in which the ge- receive the same treatment in both arms. A more efﬁcient nomic classiﬁer is measured only on the women randomly alternative is to perform the assay up front for all patients, assigned to its use. Most of the women will probably re- and then randomly assign only those classiﬁed as low risk.
ceive the same treatment in whichever arm they are as- Those patients would be assigned to receive either tamox- signed to. It is much more efﬁcient to perform the assay ifen alone or tamoxifen plus cytotoxic chemotherapy. If for measuring the genomic classiﬁer, and then randomly the low-risk patients do not beneﬁt from cytotoxic chemo- assign only the women for whom the two treatment strat- therapy, then the genomic classiﬁer is clinically useful egies differ. The current plan for independently validating because it enables chemotherapy to be withheld from pa- the classiﬁer developed by van’t Veer et al16 for women tients who otherwise would have received it.
with primary breast cancer utilizes this design strategy.
Randomly assigning only the patients classiﬁed as low Phase III clinical trials generally attempt to utilize an risk is more efﬁcient than assigning all of the patients, but intervention in a manner that it might be used if adopted it still would require many patients. It is a therapeutic in broad clinical practice. For evaluating a diagnostic clas- equivalence trial in the sense that ﬁnding no difference siﬁer, a multicenter clinical trial provides the challenges of in outcome changes clinical practice; consequently it is distributed tissue handling and real time assay perfor- important to be able to detect small differences. Since mance that would be met in general use. The assays might the expected recurrence rate is so low, it would take be performed in multiple laboratories and cannot be many patients to detect a difference between the treatment batched in time with a single set of reagents as might be arms. But if the recurrence rate is as low as predicted by the done in a retrospective study. Consequently, the prospec- classiﬁer, then the beneﬁt of chemotherapy is necessarily tive clinical trial is the gold standard for external validation extremely small. Consequently, an alternative design for external validation is a single-arm study in which the pa- External validation based on a new prospective clini- tients classiﬁed as low risk are treated with tamoxifen cal trial will require a long follow-up time for low-risk pa- alone. If, with long follow-up, these patients have a very tients, however. In such circumstances it can be useful to low recurrence rate, then the classiﬁer is considered vali- conduct a prospectively planned validation using patients Information downloaded from jco.ascopubs.org and provided by SWETS SUBSCRIPTION SERVICE for Bayerische Staatsbibliothek on March 4, 2008 from 194.95.59.195. Copyright 2005 by the American Society of Clinical Oncology. All rights reserved. Richard Simon
treated in a previously conducted prospective multicenter metastatic breast cancer patients,26,27 cases with less than clinical trial if archived tumor specimens are available for a 2ϩ level of expression of the Her2/neu protein were ex- the vast majority of patients. The validation study should cluded. In the development of geﬁtinib, had the phosphor- be prospectively planned with at least as much detail and ylation domain of the EGFR gene been sequenced in rigor as for prospective accrual of new patients. Although responders and nonresponders on phase II trials of non– assaying procedures probably cannot be distributed over small-cell lung cancer patients, mutation status could time in the same way as for newly accrued patients, assay have been used in focusing the phase III trials.28,29 For reproducibility studies should be conducted to demon- many molecularly targeted drugs, however, the appropriate strate that the assay has been standardized and quality assay for selecting patients is not known, and development controlled sufﬁciently so that such sources of variation of a classiﬁer based on comparing expression proﬁles for are negligible. A written protocol should be developed phase II responders versus phase II nonresponders may to ensure that the study is planned prospectively to eval- be the best approach. In such instances, one may not uate the clinical beneﬁt of a completely speciﬁed genomic have sufﬁcient conﬁdence in the genomic classiﬁer devel- classiﬁer for a deﬁned therapeutic decision in a deﬁned oped in phase II to use it for excluding patients in phase population in a hypothesis testing manner as it would III trials. It may be better in this case to accept all conven- for a prospective clinical trial. The study of Paik et al11 tionally eligible patients, and use the classiﬁer to deﬁne of the OncoType Dx classiﬁer for women with node- a single subset analysis for the patients predicted to be negative, ER-positive breast cancer is an example of most responsive to the new drug. The overall null hypoth- careful prospective planning of an independent validation esis for all randomly assigned patients is tested at the .04 signiﬁcance level. A portion 0.01 of the usual 5% false-positive rate is reserved for testing the new treatment in USE OF GENOMIC CLASSIFIERS IN NEW
the subset predicted by the classiﬁer to be responsive.
DRUG DEVELOPMENT
This analysis strategy provides sponsors an incentive The objective of validation of a genomic classiﬁer differs for developing genomic classiﬁers for targeting therapy somewhat for existing therapy compared to an experimen- in a manner that does not unduly deprive them of the tal therapy. With existing therapy, the emphasis should be possibility of broad labeling indications when justiﬁed by on validation of the clinical beneﬁt of using the classiﬁer.
With an experimental therapy, however, the emphasisshould be on demonstrating effectiveness of the drug in CONCLUSIONS
a population identiﬁed by the classiﬁer as being more likely Oncologists need improved tools for selecting treatments to beneﬁt. Simon and Maitournam25 demonstrated that use for individual patients. The genomic technologies avail- of a genomic classiﬁer for focusing a clinical trial in this able today are sufﬁcient to develop such tools. There is manner can result in a dramatic reduction in required sam- not broad understanding of the steps needed to translate ple size, depending on the sensitivity and speciﬁcity of the research ﬁndings of correlations between gene expression classiﬁer for identifying such patients. Not only can such and prognosis into robust diagnostics validated to be of targeting provide a huge improvement in efﬁciency in clinical beneﬁt. This article has attempted to identify phase III development, it also provides an increased thera- some of the major steps needed for such translation.
peutic ratio of beneﬁt to toxicity and results in a greater Many of these steps are not easy, nor cheap. For therapeu- proportion of treated patients who beneﬁt.
tic decision settings of sufﬁcient importance, attention Developing a genomic classiﬁer of which patients are should be devoted to establishing a means of funding likely to beneﬁt for targeting phase III trials may require and expeditiously carrying out these steps.
larger phase II studies. This depends on the type of drug be-ing developed. For example, if the drug is an inhibitor of a kinase mutated in cancer, then there is a natural diagnos-tic and no genome-wide screening is needed. Similarly, in Author’s Disclosures of Potential
the comparison of trastuzumab plus chemotherapy to chem- Conﬂicts of Interest
otherapy alone in chemotherapy-naı¨ve and -refractory The authors indicated no potential conﬂicts of interest.
REFERENCES
3. FDA: Draft guidance for industry: Pharma-
5. Dudoit S, Fridlyand J, Speed TP: Com-
1. Simon R, Altman DG: Statistical aspects of
prognostic factor studies in oncology. Br J cogenomics data submission. Rockville, MD, parison of discrimination methods for clas- sification of tumors using gene expression 2. Simon RM, Korn EL, McShane LM,
4. Pusztai L, Hess KR: Clinical trial design
et al: Design and analysis of DNA microarray Information downloaded from jco.ascopubs.org and provided by SWETS SUBSCRIPTION SERVICE for Bayerische Staatsbibliothek on March 4, 2008 from 194.95.59.195. Copyright 2005 by the American Society of Clinical Oncology. All rights reserved. Therapeutically Relevant Genomic Classifiers
6. Radmacher MD, McShane LM, Simon R:
resampling methods. Bioinformatics 2005 (in paraffin-embedded tissues: development and A paradigm for class prediction using gene performance of a 92-gene reverse transcriptase- expression profiles. J Comput Biol 9:505-511, 15. Simon R, Radmacher MD, Dobbin K, et al:
polymerase chain reaction assay. Am J Pathol Pitfalls in the analysis of DNA microarray data: 7. Golub TR, Slonim DK, Tamayo P, et al:
Class prediction methods. J Natl Cancer Inst 23. Dobbin
Molecular classification of cancer: Class discov- ery and class prediction by gene expression 16. van’t Veer LJ, Dai H, Vijver MJVD, et al:
Gene expression profiling predicts clinical out- oligonucleotide microarrays. Clin Cancer Res 11: 8. Ramaswamy S, Tamayo P, Rifkin R, et al:
come of breast cancer. Nature 415:530-536, Multiclass cancer diagnosis using tumor gene 24. Simon R: When is a genomic classifier
expression signatures. Proc Natl Acad Sci USA 17. Ambroise C, McLachlan GJ: Selection bias
ready for prime time? Nat Clin Pract Oncology in gene extraction on the basis of microarray 9. Khan J, Wei JS, Ringner M, et al: Classi-
gene-expression data. Proc Natl Acad Sci U S A 25. Simon R, Maitournam A: Evaluating the
fication and diagnostic prediction of cancers efficiency of targeted designs for randomized using gene expression profiling and artificial 18. Simon R, Lam AP: BRB-ArrayTools (Ver-
clinical trials. Clin Cancer Res 10:6759-6763, neural networks. Nature Medicine 7:673-679, sion 3.3). Bethesda MD, Biometric Research Branch, National Cancer Institute, http://linus 26. Baselga J: Herceptin alone or in combina-
10. Hand DJ, Yu K: Idiot’s Bayes: Not so
tion with chemotherapy in the treatment of stupid after all? Int Stat Rev 69:385-398, 2001 19. Vasselli J, Shih JH, Iyengar SR, et al:
HER2-positive metastatic breast cancer: Pivotal 11. Paik S, Shak S, Tang G, et al: A multigene
Predicting survival in patients with metastatic assay to predict recurrence of tamoxifen-treated, kidney cancer by gene expression profiling in the 27. Eiermann
node-negative breast cancer. N Engl J Med primary tumor. Proc Natl Acad Sci U S A 100: 12. Rosenwald A, Wright G, Chan WC, et al:
20. Lusa L, McShane LM, Radmacher MD,
The use of molecular profiling to predict survival et al: Appropriateness of inference procedures based on within-sample validation for assessing 28. Lynch TJ, Bell DW, Sordella R, et al:
gene expression microarray-based prognostic Activating mutations in the epidermal growth classifier performance. (Submitted for publica- factor receptor underlying responsiveness of 13. Michiels S, Koscielny S, Hill C: Prediction
non-small-cell lung cancer to gefitinib. N Engl J of cancer outcome with microarrays: A multiple 21. Kattan MW: Judging new markers by their
random validation strategy. The Lancet 365:488- ability to improve predictive accuracy. J Natl 29. Paez JG, Janne PA, Lee JC, et al: EGFR
mutations in lung cancer: Correlation with clinical 14. Molinaro AM, Simon R, Pfeiffer RM:
22. Cronin M, Pho M, Dutta D, et al:
response to gefitinib therapy. Science 304:1497- Prediction error estimation: A comparison of Measurement of gene expression in archival Information downloaded from jco.ascopubs.org and provided by SWETS SUBSCRIPTION SERVICE for Bayerische Staatsbibliothek on March 4, 2008 from 194.95.59.195. Copyright 2005 by the American Society of Clinical Oncology. All rights reserved.

Source: http://gene-quantification.eu/simon-developing-classifiers-2005.pdf

Written answers - daily

SCOTTISH EXECUTIVE Enterprise and Environment Drew Smith (Glasgow) (Scottish Labour): To ask the Scottish Executive what its position is on the recent report by the Information Commissioner’s Office on the blacklisting of trades union members or activists and whether it has made representations to the UK Government on this. Holding answer issued: 27 March 2012 (S4W-006151)

Formulary drug list_for public v5 25112013.xlsx

IntroductionThe North West London Hospitals NHS Trust (NWLHT) Formulary is a list of medicines approved for local prescribing. Medicines are listed alphabetically by generic name and under the Bristish National Formulary (BNF) chapter headings. Please note: The formulary does not specify the brand name or formulation of a