{"id":485297,"date":"2024-01-13T18:07:33","date_gmt":"2024-01-13T23:07:33","guid":{"rendered":"https:\/\/platohealth.ai\/drowning-in-data-a-data-science-primer-for-a-translational-scientist\/"},"modified":"2024-01-13T18:10:27","modified_gmt":"2024-01-13T23:10:27","slug":"drowning-in-data-a-data-science-primer-for-a-translational-scientist","status":"publish","type":"post","link":"https:\/\/platohealth.ai\/drowning-in-data-a-data-science-primer-for-a-translational-scientist\/","title":{"rendered":"Drowning in Data: A Data Science Primer for a Translational Scientist","gt_translate_keys":[{"key":"rendered","format":"text"}]},"content":{"rendered":"
In 1945 the volume of human knowledge doubled every 25 years. Now, that number is 12 hours [1]. With our collective computational power rapidly increasing, vast amounts of data and our ability to assimilate it, has seeded unprecedented fertile ground for innovation. Healthtech companies are rapidly sprouting from data ridden soil at exponential rates. Cell free DNA companies, once a rarity, are becoming ubiquitous. The genomics landscape, once dominated by the few, are being inundated by a slew of competitors. Grandiose claims of being able to diagnose 50 different cancers from a single blood sample, or use AI to best dermatologists, radiologists, pathologists, etc., are being made at alarming rates. Accordingly, it\u2019s imperative to know how to assess these claims as fact or fiction, particularly when such claimants may employ \u201cstatistical misdirection\u201d. In this addition to \u201cThe Insider\u2019s Guide to Translational Medicine\u201d we disarm perpetrators of statistical warfare of their greatest weapons, statistics themselves. To do so we introduce a novel BASIS acronym for analyzing data underlying AI models and new products. Moreover, we introduce a unique harm \/ inherency \/ plan \/ solvency \/ disadvantage paradigm for developing and assessing business plans, grants, healthtech, genomics companies, etc. We provide a use case for implementation of these thought constructs to assess new entrants in the melanoma detection. Ultimately, I intend to leave you with a rigorous approach to discriminate the good from the bad, and everything in between, in healthtech, multiomics, etc.<\/p>\n
<\/p>\n
You are an investor and CEO of Biopharmatrend.com Investment Group, BPTIG, and are considering four companies in early melanoma detection to invest in: DermTech, Melanoma AI, Skin Analytics, and Cancer Canines.<\/p>\n
DermTech<\/a> <\/strong>(DMTK), a company listed on the NASDAQ stock exchange, talks to you about a disruptive technology they developed called the “Smart Sticker” (figure 1).<\/p>\n Figure 1: DermTech\u2019s melanoma detection Smart Sticker.<\/strong><\/p>\n They claim that, rather than have to biopsy a patient’s skin to determine if a mole is melanoma or not, a physician can simply place the Smart Sticker on the patient’s skin [2]. The sticker entails a proprietary technology allowing for the capture of skin cells and subsequent processing at DermTech\u2019s central facility (figure 2). The mole is identified as malignant or not by their Pigmented Lesion Assay (PLA) that tests corresponding cells for expression of two genes, long intergenic nonprotein coding RNA 518 (LIC00518), and preferentially expressed antigen in melanoma (PRAME). DermTech is very quick to inform you they have a National Comprehensive Cancer Network (NCCN) guidelines category 2B designation [3], and a >99% negative predictive value, 91% sensitivity, 69% specificity, and AUC of 0.9 for melanoma detection [4].<\/p>\n Figure 2: Skin cells being obtained using the Smart Sticker. The lesion of interest is marked with a circle and only the corresponding skin cells are tested.<\/strong><\/p>\n They emphasize their test is non-invasive and will not leave a patient scarred; in contrast to shave, punch, and excisional biopsies, the predominant techniques dermatologists use to sample moles. As of May 27, 2022, DermTech had a stock price of 7.14 with a market cap of 214 million dollars [5]. You are considering making a sizeable stock purchase, particularly as DermTech\u2019s stock price was inexplicably as high as 79.76 on February 19, 2021 (figure 3).<\/p>\n Figure 3: DermTech (DMTK) stock price.<\/strong><\/p>\n Melanoma AI <\/strong>is an MIT outfit that contends they built a deep convoluted neural network (DCNN) that allows patients to detect if they have melanoma through wide-field imaging of their skin at home or their primary care physician\u2019s office (figure 4). The model is predicated largely on the \u201cugly duckling\u201d concept that preferentially identifies skin lesions that look very different than others on the patient\u2019s body [6].<\/p>\n Figure 4: Wide-field DCNN in classification of suspected pigmented lesions (SPL in red) from others. Wide image pictures are taken of a patient\u2019s skin at their primary care physician\u2019s office and analyzed via a DCNN (top panel). Ugly duckling lesion analysis is performed and suspicious lesions are identified (bottom panel).<\/strong><\/p>\n They reference a paper published in Science Translational Medicine February, 27, 2021, where they had a sensitivity of 90.3%, specificity of 89.9%, accuracy of 86.56%, and AUC of 0.97 for differentiating \u201csuspicious pigmented lesions (SPLs)\u201d from others. They site numerous articles in various journals celebrating their DCNN algorithm. They are a private company currently looking for series A funding, and are asking for 10 million dollars at a 100-million-dollar valuation.<\/p>\n Skin Analytics<\/strong><\/a>, like Melanoma AI, believes they have the best AI based neural network that allows people to use their cell phones with dermascopes to determine if moles are melanoma or not (figure 5). They cite a paper they published in Dermatology, October 16, 2019, and state the UK NHS presently uses their AI technology, Deep Ensemble for the Recognition of Malignancy (DERM), for melanoma detection [7]. They report an AUC of 0.901, 99% negative predictive value, 100% sensitivity, and 64.8% specificity for determining if a lesion is melanoma or not using images obtained from an iPhone 6s attached to a dermascope. They reveal a series A funding round of 9 million dollars with 16 investors involved. They are presently asking for 10 million dollars at a 100-million-dollar valuation in Series B funding.<\/p>\n Figure 5: Cell phone with dermascope examination of a nevus using Skin Analytics DERM.<\/strong><\/p>\n Cancer Canines <\/strong>trains dogs to use their extraordinary olfactory sense to determine if moles are malignant or not (figure 6). They contend that melanomas emit volatile organic compounds (VOCs), small odorant molecules that evaporate at normal temperatures and pressures, and that dogs have the keen olfactory senses required to detect them (9-10).<\/p>\n Figure 6: Cancer Canines secret weapon, Bolt, who is actually one of my rescue dogs.<\/strong><\/p>\n During their pitch to you, they report a case of a 43-year-old female who presented with a mole on her central back that was present since childhood and changed over the last few years (figure 7).<\/p>\n Figure 7: A 43 year old female with a suspicious mole.<\/strong><\/p>\n Her 2-year-old rescue dog frequently sniffed the mole and would get agitated and try to scratch and bite it. This prompted the woman to go to the dermatologist, who determined the mole was a stage 3B malignant melanoma (figure 8).<\/p>\n Figure 8: Biopsy revealed malignant melanoma.<\/strong><\/p>\n After surgical removal of the lesion the woman reported her dog no longer became agitated when it sniffed her back. She is still alive 7 years later. Based on this, Cancer Canines conducted studies demonstrating dogs they trained to detect melanoma correctly identified melanomas 45% of the time. They propose placing cancer canines in kiosks in cities throughout the world to help facilitate early detection of melanoma. They are asking for 3 million dollars at a 10-million-dollar valuation.<\/p>\n You, as the CEO of BPTIG, are very impressed. The numbers being thrown around are stupendous and you firmly believe you’ve stumbled on disruptive technology that might replace dermatologists to some extent. Indeed, you are astutely aware numerous studies showed AI performed better than dermatologists at diagnosing skin maladies. Moreover, there is a profound shortage of dermatologists in the U.S.A [12]. You have been told this is due to the American Dermatology Association ensuring the number of available residency spots doesn’t increase significantly. In addition, you noticed many dermatologists appear to be running cosmetic shops that focus on administering botox, restalyn, collagen, laser treatments, etc., rather than caring for melanoma.<\/p>\n After daydreaming about being a dermatologist you realize you digressed a bit. You snap back to reality and reassess your excitement about the four proposals before you. Specifically, you recall reading an editorial written by Basem Goueli MD, PhD, MBA, where he insisted that \u201cwhen you’re confused about accuracy, remember to be precise and specific, while not being too negative<\/em><\/strong>.\u201d You look on biopharmatrend.com to revisit what he wrote and see:<\/p>\n In the world of fast money many disciplines are prone to hyperbole. Pharmaceutical companies boast of phase 1 data that barely provides an incremental improvement over the status quo. They celebrate “me too” drugs that even politicians are beginning to speak out against [12]. Physicians tout “best doctor” awards they’ve “won” that have little to do with ability or talent, and are either glorified popularity contests or outright paid for by the physician [13]; I have had three offers this year to buy a top doctor award (figure 9).<\/p>\n Figure 9: Modern Healthcare exposes Top Doctor Awards. <\/strong><\/p>\n The American Board of Internal Medicine (ABIM) claims they are the authority on ensuring physicians are current in their field, but many questions they ask on their certification examinations are incredibly esoteric and largely clinically irrelevant, and physicians can actually use uptodate.com<\/a> DURING<\/strong> their tests to look for answers.<\/p>\n For every Jeff Bezos or Bill Gates, there is an Elizabeth Theranos or Bernie Madoff trying to grift you out of your money with big promises and flowery rhetoric. Every day we see announcements of companies getting tens of millions of dollars in investor money and we struggle to understand why. Indeed, I recently saw a company that got 29 million dollars in series C funding and their platform DOESN\u2019T PRESENTLY WORK.<\/em><\/strong><\/p>\n Investors from 2020-2022 seem to have been particularly prone to smoke and mirrors emanating from heathtech companies. Unsurprisingly, many individuals flocked to the sector to try and profit off unsuspecting, but well-meaning investors, who truly want to help society while making money at the same time (figure 10).<\/p>\n Figure 10: Well-meaning investors need to be on Healthtech fraud alert at all times. <\/strong><\/p>\n I composed this article to assist all of the well-meaning individuals in this world, whether it be investors, health care companies, pharmaceuticals, etc., as they seek to help patients, and\/or are bombarded by claims of superiority and disruption. The arsenal of choice for the claimants is often statistics, and we will do what we can to disarm the perpetrators of their greatest weapons.<\/p>\n It has been mesmerizing to see the explosion in data science over the last decade. A simple tour on LinkedIn, Upwork, Fivrr, etc. suggests nearly everyone on the planet is a data scientist. Accordingly, as you can’t put the toothpaste back in the tube, the data science revolution is here to stay. Allow me to state for the record, this is a tremendous development and will propel society forward. I\u2019m a firm believer everyone should develop some familiarity with data science (see supplemental information for my favorite way to do this). At the very least everyone needs to be facile with very important statistical concepts.<\/p>\n Suppose for a moment I tell you that my superpower is I can predict if a mole is melanoma or not with 93% accuracy, 100% negative predictive value (NPV, not to be mistaken with net present value), a 7% misclassification rate, and 100% specificity. I also tell you I\u2019ve already convinced STARK industries to give me 2 million dollars at a 10-million-dollar valuation based on these claims. You seem suspect, but are interested in seeing me in action.<\/p>\n Having done your homework you know that only 1 in 10,000 moles become melanoma, and only approximately 7% of \u201csuspicious pigmented lesions (SPLs)\u201d that are biopsied are malignant melanoma. You manage to acquire 100 pictures of SPLs, labeled as melanoma or benign based on associated biopsies.<\/p>\n You invite me to your mansion in Medina, Washington (figure 11), to observe me in action. I arrive by boat in one of my patented three-piece suits with an ayre of unprecedented self-confidence (figure 12).<\/p>\n Figure 11: Your mansion in Medina, Washington [14].<\/strong><\/p>\n I wave to you with mixed indifference, as though I\u2019ve seen houses like your countless times before, and Tony Stark just gave me 2 million dollars in investment. Hiding behind my sunglasses, I am admittedly in bewildered awe of what I\u2019m seeing, and think to myself, \u201cI could fit a lot of homeless people and orphans in this house.\u201d<\/p>\n Figure 12: Model reenactment of the suit I arrive in at your mansion [15]. <\/strong><\/p>\n After walking uphill to get to the entry door I stroll into the foyer. I\u2019m immediately struck by how technologically advanced the facility is and wonder why everything says Microsoft. Ultimately, I arrive in an amphitheater like room and sit in the most comfortable movie chair I\u2019ve ever seen (figure 13).<\/p>\n Figure 13: The amphitheater in your mansion where you try to expose me as a fraud [16]. <\/strong><\/p>\n Nearly falling asleep the minute after I recline in the chair; you remind me why I\u2019m there. You immediately turn off the lights in the room and begin projecting images from the patient data set you acquired.<\/p>\n In line with the observation that approximately 7% of suspected pigmented lesions biopsied are melanoma [17], the data set contains pictures of 93 benign lesions and 7 melanoma. As the first image is projected on the screen I ask if we can eat dinner prior to my predictions. You say, \u201cno\u201d.<\/p>\n With the first image on the screen, I stand up from the chair and begin assessing the movie sized picture. I apply the ABCDE (asymmetry, border, color, diameter, evolution) logic for evaluating moles taught to me in medical school, while staring intensely at the picture (figure 14). After five minutes I tell you the mole is benign.<\/p>\n Figure 14: The ABCDEs of melanoma [18]. <\/strong><\/p>\n Knowing the answers, you note that I am correct in my assessment. However, you\u2019re aware the data set is markedly imbalanced with only 7% of the images being melanoma. Blind luck could have allowed me to guess the first SPL.<\/p>\n You quickly project the second SPL picture on the screen. Again, you watch as I put my hands on my face and stare intensely at the picture. You then see me walk around feigning use of my superpower. After five minutes I tell you SPL #2 is negative. Again, you note I\u2019m correct.<\/p>\n With a list showing the order the SPL pictures will be presented, and whether they are malignant or not, you note the first melanoma picture is #9.<\/p>\n About an hour passes between picture #1 and #8. You can hear my stomach grumbling, but refuse to let me eat. You have watched me do my \u201cmelanoma dance\u201d repeatedly, only to say after five minutes that every picture was not melanoma. So far, I\u2019m 8 for 8.<\/p>\n You start to feel excitement when picture 9 appears. Indeed, you have been readily anticipating this moment. You study me carefully as I walk around supremely confident in my ability. Ultimately, after completing the routine you\u2019ve witnessed 8 times before, I tell you the SPL on the screen is benign.<\/p>\n \u201cEureka!\u201d, now you have me. You knew something was amiss, and you must be right. Yet, being a very intelligent person, you acknowledge I guessed 8 of 9 SPLs correctly, and whatever system I\u2019m using could have merit. You are unprepared to label me a grifter until you\u2019ve seen more. You glance at your sheet with the labeled SPLs and are frustrated the next SPL isn\u2019t until picture 27. You calculate in your head that to get through the next 91 images at 5 minutes each will take over 7.5 hours (455 minutes).<\/p>\n You reassess how it is I got the appointment with you, given how busy you are. You recall it was a friend of yours that knew me that asked you to meet with me, and you obliged them. You are painfully regretting that decision at this moment. Yet, you haven\u2019t proven I\u2019m a fraud yet.<\/p>\n With your stomach screaming at you and the audible bowel sounds emanating from me you decide it\u2019s time for dinner. You watch me savor every bite of the filet mignon in front of me. You note that I\u2019ve taken every liberty offered, including a second helping of the entr\u00e9e, and you know I\u2019m going to ask for seconds on desert. Yet, you\u2019re too far invested in the night to not play out the string.<\/p>\n After watching me clearly stretch out dinner for 2 hours you welcome me back to the amphitheater. I ask to use the restroom, where I spend another 15 minutes, before returning to the amphitheater. You\u2019re curious what I was doing, but dare not ask.<\/p>\n SPL #10 projects on the screen and I\u2019m right back at doing my \u201cmelanoma dance\u201d. You look at your watch and note that it\u2019s already 8 PM, and resign yourself to the fact you will be there until 3:30 AM before being able to definitively call me a fake.<\/p>\n Hours later SPL #27 projects on the screen. By this point you feel you\u2019ve seen more SPLs to last a lifetime. You\u2019re starting to believe you might even be able to tell the difference, that is until you see SPL #27 that looks very similar to many others that were benign. You are very careful not to change any mannerisms while watching my SPL #27 melanoma dance, as you don\u2019t want to give anything away.<\/p>\n \u201cBenign\u201d, you hear me say, and you smirk every so slightly. As we agreed before we started you are not to reveal if my predictions are correct or not. I try to read your eyes, but you, like Lady Gaga, have a very good poker face. You note that I\u2019ve been correct 25 of 27 times, but the 2 times I was wrong the SPLs were melanoma.<\/p>\n You\u2019ve never quit anything in your life, and even though we\u2019re nearing your bed time, you are committed to seeing \u201cthe grift\u201d play out. The next melanoma is image 43.<\/p>\n SPL #43 appears and you hear me say, \u201cbenign\u201d again. At this point you think you\u2019re truly on to me because 43 SPLs have been shown and I guessed they were all benign. Although I\u2019ve been right 40 times, I\u2019ve been wrong the only three times it truly mattered.<\/p>\n The clock now reads 3:30 AM and you are barely conscious (figure 15).<\/p>\n Figure 15: A baby reenacting you at 3:30 AM during our meeting [19].<\/strong><\/p>\n You\u2019re astounded as you look at me because I am wide awake (figure 16). You remember me telling you I work over 100 hours a week and almost never sleep. You thought it was hyperbole, but now you\u2019re starting to believe me.<\/p>\n Figure 16: A child reenacting me at 3:30 AM during our meeting [20].<\/strong><\/p>\n SPL #100 appears on the screen and in a barely audible voice you ask me if I think it\u2019s malignant or benign. I tell you it\u2019s malignant. \u201cWait, what did you say?\u201d, you ask me as it\u2019s the first time I said the lesion was malignant. I tell you I said it was malignant, but I was just trying to jolt you awake. I then confirm I feel the lesion is benign. <\/p>\n You were already annoyed, but now you\u2019re irate. You wasted a day on my charade to appease your friend, and are infuriated thinking I might be having fun at your expense. You are thankful we\u2019re nearly done as you assess the final tally. Of the 100 SPLs shown, like I said I would be, I was 93% accurate in my designation of them as malignant or benign. My negative predictive value and specificity was 100%, and my misclassification rate was only 7%. And then, as you rush me down the stairs and on the boat at the dock where I arrived, it dawns on you the game I was playing all along.<\/p>\n Benjamin Disreali once wrote, “There are three types of lies \u2014 lies, damn lies, and statistics”. Ron DeLagge II penned, \u201c99 percent of all statistics only tell 49 percent of the story.\u201d Finally, Andrew Lang stated, \u201cMost people use statistics like a drunk man uses a lamppost; more for support than illumination\u201d.<\/p>\n The morning after our interaction at your house you reflect on the initial conversation we had. You realize that I very carefully spoke of accuracy, negative predictive value, misclassification rate, and specificity, but you don\u2019t remember me mentioning precision, recall, error rate, F1 score, or area under the curve (AUC). I used the lamppost as support, but not for illumination. And with that, you realize you can hate the player, but you have to respect the game. You break it down as follows.<\/p>\n Initially, when developing prediction models, such as whether a SPL is melanoma or not, or what a stock price will be tomorrow, one can use regression or classification. Regression is chosen when the output variable is continuous, such as stock price, age, salary, height, etc. Classification is used when the output variable is categorical, such as melanoma versus benign, winning versus losing, etc. Accordingly, my melanoma detection superpower is a classification model.<\/p>\n When evaluating the efficacy of classification models, we generate a confusion matrix that relates actual values to predicted values. In our aforementioned melanoma dance (figure 17):<\/p>\n Figure 17: Confusion matrix for the experiment testing my melanoma detection superpower.<\/strong><\/p>\n With the confusion matrix in hand, we can now obtain important metrics of model efficacy, illustrated in figure 18.<\/p>\n Figure 18: Confusion matrix measures of critical importance.<\/strong><\/p>\n Accuracy:<\/strong> Refers to the overall percentage of times our predictions matched actual outcomes. It is calculated as above, with my superpower being 93% accurate (figure 19).<\/p>\n Figure 19: Equation for accuracy.<\/strong><\/p>\n Misclassification Rate:<\/strong> Sometimes referred to as the error rate, the misclassification rate refers to the overall percentage of times our predictions didn\u2019t match actual outcomes. It is calculated as above (figure 20), with my superpower having a 7% misclassification rate.<\/p>\n Figure 20: Equation for Misclassification Rate.<\/strong><\/p>\n Precision:<\/strong> Commonly referred to as the positive predictive value, precision refers to the percentage of times a predicted positive in our model was a true positive. It is calculated as above (figure 21). In our extreme example where no positive (melanoma) predictions were made, the precision is undefined because the true positive and false positive rate are zero.<\/p>\n Figure 21: Equation for Positive Predictive Value.<\/strong><\/p>\n Recall:<\/strong> Frequently called the sensitivity, and more rarely the true positive rate, recall refers to how often we detect the event of interest when it is actually present (e.g. properly identifying patients with a rare disease in a large sample population). In our case, sensitivity refers to how often we identified a mole as melanoma when it was. It is calculated as above (figure 22), and is essentially the melanomas (positives) identified by the model divided by all the melanomas present. In our extreme case the recall, also known as sensitivity, was 0% as I failed to identify any of the seven melanomas in your sample set.<\/p>\n Figure 22: Equation for Recall.<\/strong><\/p>\n Specificity:<\/strong> Sometimes referred to as the true negative rate, specificity refers to how often we identify something as negative that is actually negative. In this case it refers to how often we conclude an SPL wasn\u2019t melanoma when it wasn\u2019t melanoma. My superpower had 100% specificity based on our calculations, largely because I said every mole was negative (figure 23).<\/p>\n Figure 23: Equation for Specificity.<\/strong><\/p>\n Negative Predictive Value:<\/strong> Often used in statistical subterfuge, the negative predictive value refers to how often something was negative when we said it was negative. My superpower, based on our calculations, had a negative predictive value of 93% (figure 24).<\/p>\n Figure 24: Equation for Negative Predictive Value.<\/strong><\/p>\n False Positive Rate:<\/strong> Very useful in determining the ROC and AUC (see below), the false positive rate refers to the rate at which we falsely predicted something would be positive when it was actually negative; how often I said a mole was melanoma when it wasn\u2019t. It is calculated as above, and can also be determined by subtracting specificity from 1. In our extreme example, the melanoma superpower has a false-positive rate of 0%, due to my saying none of the SPLs were melanoma (figure 25).<\/p>\n Figure 25: Equation for False Positive Rate.<\/strong><\/p>\n False Negative Rate:<\/strong> Basically, the opposite of the false positive rate is the false negative rate, otherwise known as the miss rate. It refers to the rate at which we falsely predicted something would be negative relative to when it was actually positive. It is calculated as above. In our extreme example, the melanoma superpower has a miss rate (false negative rate) of 100%, as I was wrong with all 7 melanomas (figure 26).<\/p>\n Figure 26: Equation for False Negative Rate.<\/strong><\/p>\n F1 Score:<\/strong> Otherwise known as the F-score or F-measure, the F1 score is an oft used statistical measure that encompasses precision and recall. Ranging from 0 (worst) to 1 (best), you may hear a model\u2019s efficacy described by its F1 score. In our extreme example, the F1 score is undefined because no positive predictions (melanoma) were made (figure 27).<\/p>\n Figure 27: Equation for F1 Score.<\/strong><\/p>\n Receiver Operating Curve (ROC) curve: <\/strong>The ROC curve is used to summarily describe a model\u2019s recall (sensitivity or true positive rate) and false positive rate (FPR), calculated as 1 minus specificity (figure 28). To generate the ROC curve, recall is plotted on the y-axis, and FPR on the x-axis. In our model the false positive rate was 0 and the recall was 0, showing that my superhuman power was futile.<\/p>\n Figure 28: ROC Curve [21].<\/strong><\/p>\n Area Under The Curve (AUC): <\/strong>Area under the curve, depicted in figure 29, is perhaps the most commonly used measure to summarize specificity and sensitivity (recall or true positive rate). It is a measure of the area bounded by a curve connecting the true positive rate (sensitivity or recall) with the false positive rate (1-specificity). The largest AUC that can be obtained is 1, which corresponds with a perfect model. An AUC of 0.5 describes a classification model no better than chance (figure 30).<\/p>\n Figure 29: AUC Curve.<\/strong><\/p>\n Figure 30: AUC Curves [22].<\/strong><\/p>\n In the wake of our melanoma dance marathon that left you exhausted; the next day you feel satisfied as you revealed my ruse. You acknowledge that I never lied to you and that my accuracy, negative predictive value, and specificity were 93%, 93%, and 100%, respectively, and my misclassification rate was only 7%. Nonetheless, you marvel at how awful my superpower really is. You note that the recall (sensitivity or true positive rate) was 0% and my precision was so bad it couldn\u2019t be defined.<\/p>\n As you reflect further on your interaction with me, you explore why my specificity, misclassification rate, negative predictive value, and accuracy rates were so high, even though it\u2019s clear I have no melanoma detection ability. You immediately recognize this was due to the striking data imbalance in the data set you used to test my abilities.<\/p>\n You recall that of the 100 images you showed me, only 7 were melanoma. Therefore, there were far more true negatives (benign lesions) than true positives (melanomas). Accordingly, measures such as the specificity, misclassification rate, and negative predictive value, where the true negative value is in the numerator, will be high by default (figure 31).<\/p>\n Figure 31: Statistical measures affected by imbalanced data sets with predominant true-negatives.<\/strong><\/p>\n Ultimately, you conclude that when evaluating models where there is profound data imbalance favoring true negatives, it\u2019s essential to inquire about precision and recall that emphasize identification of true positives (melanoma), which are in the numerator of the corresponding equations (figure 32). In fact, you vow to reflexively ask about precision and recall when a company approaches you for investment in the future.<\/p>\n Figure 32: Statistical measures used when there is data imbalance involving large numbers of true negatives.<\/strong><\/p>\n When Confused about Accuracy, Remember to be Precise and Specific, but not too Negative about AUC and F1, <\/strong>is a pneumonic you develop to help you address classification based products in the future.<\/p>\n Something is bothering you. The next morning you call Tony Stark, a brilliant man and head of STARK industries, to inquire why he invested in my superpower. When you ask Tony why he invested 2 million dollars in me at a 10-million-dollar valuation, he tells you he doesn\u2019t know who I am. After talking more, he realizes that Happy, his colleague, must have agreed to fund my project without talking to him because apparently, he was off gallivanting with people named Peter Parker, Dr. Stephen Strange, and Peter Quill. He tells you some outlandish story of someone named Thanos, and you roll your eyes. You exchange pleasantries and hang up the phone. After all, you now have four melanoma detection companies asking for an investment from you.<\/p>\n We live in a world where people often make claims and then prove them later. The truth frequently feels subjective, and it\u2019s very common for politicians to employ the strategy of telling a lie so frequently that people believe it\u2019s true. It\u2019s important to have a rigorous construct to assess claims made by individuals, companies of all types, etc.<\/p>\n The clinical world, particularly in cancer, is undergoing tremendous upheaval with the advent of cellular therapies, multiomics, cell free DNA based tests, AI\/ML platforms, etc. Recently, Paige had their prostate pathology AI FDA approved as an adjunct method pathologists can use to determine if patients have prostate cancer. Numerous AI based platforms contend they can read MRIs, CTs, PET CTs, etc., better than radiologists, and they appear to be correct (23-24). Indeed, as we discuss later, you are now faced with two AI companies claiming they can distinguish melanoma lesions from benign ones better than dermatologists.<\/p>\n The classification confusion matrix and associated measures we discussed will be immeasurably useful to you in evaluating claims made by healthtech, genomics, and pharmaceutical companies in the future. In the very next article of this series, we will apply the knowledge you gain here to unabashedly dissect claims made by cell free DNA companies that they can adequately screen patients for cancer with a simple blood test. Foremost among these is GRAIL, a company purchased by Illumina, that contends their Galleri blood test can identify if patients have one of 50 different cancers long before they would become clinically detectable with conventional imaging, blood tests, etc. [25]. To this end, the little bit we discussed above will go a long way in enabling you to separate fact from fiction in a hyperbolic world.<\/p>\n To assist you in determining which, if any, of the four melanoma detection companies you should invest in I ask you to channel your inner Hermione Granger.<\/p>\n Figure 33: Harry Potter talking to Hermione Granger [26].<\/strong><\/p>\n To this end, in Harry Potter, Deathly Hallows Part 1, when Harry refers to Hermione as brilliant, she replies, \u201cI’m highly logical, which allows me to look past extraneous detail and perceive clearly that which others overlook” (figure 33).<\/p>\n There are multiple rubrics one can use to consider investing in a business or even starting a business. Business schools have entire classes on how to build a business plan, and how to analyze a potential investment. I’ve taken several of them, and they were excellent. In fact, I will draw from some of them in future articles in this series. However, the simplest construct I’ve ever seen for determining if a venture is worth pursuing or not is something I learned when I was 12 years old, and is in line with channeling your inner Hermione.<\/p>\n When I started high school at Apple Valley High School in Minnesota at the age of 12, I immediately joined the policy debate team. In policy debate there are two teams, affirmative and negative, that debate a resolution (figure 34).<\/p>\n Figure 34: Policy debate round with affirmative and negative teams [27].<\/strong><\/p>\n The affirmative team supports the resolution with a plan they devised, and tries to convince the judge of the competition they have the best position and to implement their plan. In contrast, the negative team contends the resolution is false and\/or the plan will not succeed, and compels the judge to reject the affirmative plan and award them the round.<\/p>\n In my four years in high school competing in policy debate the resolutions included:<\/p>\n 1988-1989:<\/em><\/strong> Resolved: That the federal government should implement a comprehensive program to guarantee retirement security for United States citizens over age 65<\/p>\n 1989-1990:<\/em><\/strong> Resolved: That the federal government should adopt a nationwide policy to decrease overcrowding in prisons and jails in the United States.<\/p>\n 1990-1991:<\/em><\/strong> Resolved: That the United States government should significantly increase space exploration beyond the Earth’s mesosphere.<\/p>\n 1991-1992:<\/em><\/strong> Resolved: That the federal government should significantly increase social services to homeless individuals in the United States.<\/p>\n In a typical debate competition year, one participates in hundreds of rounds, alternating between affirming the resolution with a conceived plan, and negating it. Teams are assigned a position at the beginning of the round having prepared to argue either position prior to that.<\/p>\n The affirmative team presents a plan to support the resolution and to address the problem underlying it. The basic construct (figure 34) used to affirm the resolution has stayed with me to this day and is a facile way to assess investments, develop proposals, write grants, etc.<\/p>\n Affirmative Plans are comprised of four primary sections (figure 35):<\/p>\n Figure 35: Simple Harms\/Inherency\/Plan\/Solvency\/Disadvantage Rubric for assessing business plans, grants, etc.<\/strong><\/p>\n Harm<\/strong> is as it sounds and refers to present problems, and their impact on humanity, that must be addressed. For example, when debating the utility of space exploration, we cited climate change and the impending destruction of Earth as a harm that must be addressed.<\/p>\n Inherency<\/strong> refers to the status quo, including all presently, or soon to be available, options to address the harm. Specifically, the affirmative team presents the current state of affairs in the context of the problems, and explains there is no available option to remedy the harm.<\/p>\n Plan<\/strong> is self-explanatory and entails a nuanced description of how the affirmative team intends to resolve the harms they presented in the context of an inadequate status quo, as stated in inherency. The plan section is very detailed, and includes funding, rollout, infrastructure, etc. It is largely beyond the scope of this article, but we will focus on it more in the future.<\/p>\n Solvency<\/strong> entails the proof presented by the affirmative team their plan is solvent and will adequately resolve the harm in a way the status quo doesn\u2019t based on inherency.<\/p>\n After the affirmative presents their proposal, comprised of the aforementioned four sections, the negative team presents their rebuttal. They can choose to attack any, or all, of harm, inherence, plan, or solvency to demonstrate the affirmative plan shouldn\u2019t be adopted. In addition, and this is critical, they can present disadvantages. Disadvantages refer to bad outcomes the negative team feels will arise from the affirmative team’s plan. For example, a negative team may contend that if the affirmative team’s plan of discovering aliens on another planet succeeds, the aliens would be alerted to our presence and destroy Earth. Ultimately, through poking holes in the affirmative team’s harm, inherency, plan, or solvency arguments, and\/or showing the disadvantages of the affirmative plan outweigh the harm it addresses, the affirmative team\u2019s plan is rejected and the negative team wins the round.<\/p>\n When employing the harm, inherency, plan, solvency, disadvantage rubric, you will reject the plan (model) proposed to you if:<\/p>\n 1: Any of the Harm\/Inherency\/Plan\/Solvency conditions aren’t met. <\/strong><\/p>\n 2: If the disadvantages outweigh the harms remedied by the proposed plan you cannot accept it. <\/strong><\/p>\n Simply put, if you are creating worse problems than you\u2019re solving that\u2019s an issue.<\/p>\n If you take a moment to consider harm, inherency, plan, solvency, and disadvantages, you will realize they constitute a simple, yet effective, thought construct to analyze many things in life, including business plans, grants, investments, etc. We will use this construct here, and in many future articles in this series.<\/p>\n Let’s apply the aforementioned rubric to build another one, this time incorporating methods of assessing the solvency of a proposal. We will use this in a regimented and easily extrapolated fashion, when considering our melanoma investment conundrum, as illustrated in figure 36.<\/p>\n Figure 36: Comprehensive rubric for assessing investments, business proposals, etc.<\/strong><\/p>\n In this case all four companies, DermTech, Melanoma AI, AustralianAI, and Cancer Canines, are acting as the affirmative team supporting the resolution with various proposed plans, they want you to affirm as the judge.<\/p>\n The resolution itself is: Resolved: That the medical field should adopt a new method of melanoma detection.<\/strong><\/p>\n From a hypothesis testing perspective, consider the resolution the alternative hypothesis, with the null hypothesis being that the current methods of detecting melanoma are adequate.<\/p>\n You, as the investor, must be cynical and do what you can to poke holes in the affirmative team’s case. You represent the null hypothesis, that current methods of melanoma detection are adequate. You need substantial proof the resolution is wrong and the affirmative plan is viable to agree to fund the affirmative team.<\/p>\n Another way to think of this is that, whereas in the criminal justice system people are presumed innocent until they are found guilty, the affirmative team looking to obtain an investment from you is guilty until proven innocent. The default is that the statements they are making are wrong until they can prove otherwise. They must prove the alternative hypothesis and show you the null hypothesis, not making an investment, needs to be rejected.<\/p>\n As the round begins you watch the affirmative teams, DermTech, Melanoma AI, Skin Analytics, and Cancer Canines present their cases, including harm, inherency, plan and solvency. They all have the same harm arguments. They rightfully contend that 96,480 patients were newly diagnosed with melanoma, and 7,230 died of the disease in 2019 (28-29). Over 90% of patients with melanomas caught early, when the cancer has a thickness of less than 1 mm, are alive 5 years after diagnosis. However, this number sharply declines with increasing depth or spread of the disease, as only 62% of stage 3, and 18% of stage 4, melanoma patients are alive in 5 years, respectively. Moreover, the financial cost to society of not detecting melanoma early is substantial as there is a 20-fold increase in treatment cost from early to late-stage melanoma. Indeed, there are drugs we use in melanoma, such as ipilimumab, that cost up to 150,000 dollars for a four-dose treatment course. Other immunotherapies used in the disease, such as keytruda, opdivo, and tecentriq, can cost as much as 120,000 dollars per year per patient. The companies reference the cost to society of human work hours lost, lives lost, etc. They speak of the emotional toll melanoma takes on the patient and loved ones, and how devastating the disease remains. They smartly point out that, even though we’ve made significant progress in stage 4 melanoma, with the approval of BRAF and MEK inhibitors, TVEC, and immunotherapy, including PD1, PD-L1, CTLA4, and LAG3 inhibitors, we remain abysmal at saving metastatic melanoma patients who ultimately progress on these therapies.<\/p>\n After completing their harm arguments, all four companies present the inherency facet of their case. They reference a status quo that is wholly inadequate at melanoma screening and detecting melanoma early in people throughout the world. They correctly point out the protracted shortage of dermatologists and how one can’t get an appointment with a dermatologist in many places in this country for many months. The teams report that current melanoma screening is largely restricted to skin exams, performed by dermatologists, during which pictures of lesions may or not be taken. They state that the ABCDE (asymmetry, border, color, diameter, evolution) rubric for determining if a mole is cancerous or not is crude and imperfect. They smartly point out that dermatologists, in some studies, have a sensitivity and specificity of 71% and 59%, for identifying if a suspicious pigmented lesion is malignant or benign [30]. Moreover, one can see diagnostic discordance as high as 38% between dermatopathologists evaluating patient samples under the microscope [31].<\/p>\n After hearing the teams harm and inherency arguments, each team presents their plans, including the underlying methodology, marketing and manufacturing plan, technology, funding etc.<\/p>\n DermTech shows you the Smart Sticker they’ve invented. They go over the cost of the sticker, the margin, the dermatologist network they’ve established to use it, the procedure behind it, etc.<\/p>\n Melanoma AI shows you the program they’re using. They tell you it involves a deep convoluted neural network (DCCN), but refuse to show you the exact configuration of it because they contend its proprietary.<\/p>\n Skin Analytics presents their specific detection method, but also refuse to disclose the configuration of their AI neural network. They do however tell you they feel theirs is better than the one created by the MIT group. They are also displeased as they correctly point out the Melanoma AI didn\u2019t cite their work in the Science Translational Medicine paper they published. Having read that paper, you are acutely aware Melanoma AI very much intentionally omitted that Skin Analytics had a melanoma AI that was published two years before theirs and is very visibly being employed by the NHS in the UK.<\/p>\n Cancer Canines brings out Bella, a 11-year-old rescue miniature pinscher\/chihuahua mix (figure 37), who is formally trained in detecting melanoma. She is docile and kind and continuously sniffing you. You worry because she is fixated on a mole on your forearm, but try not to let it distract you.<\/p>\n Figure 37: Bella (left), my other rescue dog, pictured with my twin boys, and Bolt.<\/strong><\/p>\n After hearing the detailed plans of the four groups before you, much of which was not discussed here, each company now shows you the supportive evidence for their claims in the solvency section of their plans<\/p>\n DermTech tells you their test has a negative predictive value (NPV) of 99%, sensitivity of 91%, specificity of 89% and AUC of 0.9, without disfiguring the patient with biopsies [32].<\/p>\n Melanoma AI contends they have a NPV of 99%, specificity of 90%, AUC of 0.97, and predictive value of 99% [33].<\/p>\n AustralianAI states they have an NPV of 99%, sensitivity of 100%, specificity of 65%, AUC of 0.95, and that the UK NHS has used them for the last two years [34].<\/p>\n Cancer Canines reports a 45% accuracy rate in melanoma detection [35].<\/p>\n You are very impressed by the claims made by the four companies and feel they have solid arguments. However, you hired Cancer Clarity to serve as the negative team in support of the null hypothesis, and want to hear what they have to say.<\/p>\n For the sake of this exercise, we will say that I, Basem Goueli, represent the Cancer Clarity team in this regard. I refer to the rubric above.<\/p>\n As I stand up to present my argument, I very quickly tell you that I grant to all four teams their harm arguments. I, as a cancer doctor who has published in melanoma and currently care for several stage 4 melanoma patients, have no disagreement there is a need for improved melanoma screening and that melanoma can be a devastating disease.<\/p>\n Next, in assessing the team\u2019s inherency arguments, I have no counterargument for the notion that the status quo is inadequate.<\/p>\n In terms of each groups plan I initially forego teasing out all the details regarding cost, rollout, margins, etc. Indeed, if I can prove these four companies have products that don’t adequately solve the problem they intend to, it doesn’t matter how great their plan is. Accordingly, I hone in on the solvency aspect of each company\u2019s proposal.<\/p>\n Having prepared for the competition against all four companies I come armed with ample research I did in the weeks prior to the tournament.<\/p>\n We begin our solvency assessment my developing a deep understanding of our use case. We define true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN):<\/p>\n We now determine what we must get right with our model, and what we can afford to miss. In melanoma detection what we absolutely cannot do is tell a patient they don\u2019t have melanoma or an SPL when they do. Their melanoma would go untreated with potentially catastrophic consequences. Therefore, the most important facet of our confusion matrix is the FALSE NEGATIVES (FNs). Of course, false positive results are not ideal as patients will be subjected to biopsies they don\u2019t require. However, the physician will not treat for melanoma solely based on the model\u2019s prediction. The requirement for biopsy confirmation will prevent patients from being harmed by unnecessary surgeries or treatments, beyond the biopsy itself.<\/p>\n Figure 38: BASIS acronym for assessing data.<\/strong><\/p>\n It’s now time to assess the data underlying each company\u2019s solvency claims. To do this we consider another rubric we will use, the BASIS acronym (figure 38). Indeed, we will be using this BASIS paradigm to assess genomics, healthtech, and pharmaceutical companies in the future. Specifically, as companies, especially AI\/ML based ones, are entirely reliant on data to generate their core technology, we will use the BASIS acronym to assess the data they used.<\/p>\n Denzel Washington, when asked about the media, once said, \u201cif you don\u2019t watch the news you\u2019re uninformed, if you do, you\u2019re misinformed<\/strong>\u201d<\/em>. Some would say that if you don\u2019t pay attention to statistics you\u2019re uninformed, but if you do, you\u2019re misinformed. To ensure one is not misinformed, one must assess the data utilized by the company of interest with a regimented framework.<\/p>\n The BASIS acronym (figure 39), depicted below, is a way to evaluate data underlying models and products that are generated.<\/p>\n Figure 39: BASIS acronym for assessing data.<\/strong><\/p>\n Bias:<\/strong> With cell-free DNA, healthtech, etc., we are witnessing individuals make grandiose claims based on studies they conducted on various data sets. It\u2019s imperative investors, healthcare practitioners, etc., proceed with the null hypothesis the tests are inherently flawed. The alternative hypothesis, where the burden of proof is on the company of interest, is that the test can do what they say it can. One immediate question that must be asked is whether the data used is representative of the target population. We will discuss this in the context of cell-free DNA in the next article of this series, but let\u2019s apply this to our melanoma conundrum.<\/p>\n Every company in melanoma detection requires a labeled data set that designates skin lesions as malignant or benign to train and validate their prediction model. Questions related to bias in melanoma training sets include, but are not limited to:<\/p>\n Figure 41: Appearance of different types of melanoma [37].<\/strong><\/p>\n Figure 42: Melanoma can appear in multiple shapes, colors, and sizes [38].<\/strong><\/p>\n Amount:<\/strong> The “A” of BASIS stands for amount. This is relatively self-explanatory, but the amount of data used in generating the melanoma model is critical. If one is truly discriminating between melanoma and benign lesions, it\u2019s imperative the model sees all permutations of melanoma and benign, so as to not make a mistake. Obviously, the larger the amount of data available the more likely this can happen. Importantly, there is no magic number as to how much data is required to generate a model.<\/p>\n Source<\/strong>: The first “S” of BASIS pertains to the source of the data. Again, this is self-explanatory, but a thorough assessment of the sourcing of data used to generate a model is critical. Imagine if the data used by the AI was labeled by a group of children deciding if they felt a mole was malignant or benign, without biopsy confirmation. Any model based on this is likely to be deadly wrong. More realistically, as it pertains to our melanoma discussion, an evaluation of the pathologists designating a mole as malignant or benign in the labeled data set is necessary, particularly as studies have shown up to a 38% discordant rate among dermatopathologists in melanoma diagnoses. Importantly, it\u2019s usually preferred to obtain data from multiple reliable sources, as opposed to one. There may be something about the way a site takes pictures of lesions, identifies patients to be assessed, etc., that can cause the model to fail in real world settings involving numerous sites.<\/p>\n Imbalance:<\/strong> The “I” of BASIS refers to imbalanced data sets. We discussed this in the melanoma dance above, but if there is tremendous imbalance in the data many of the common statistical measures are largely useless. As above, in a situation where there are a lot true negatives, precision, recall, F1 score, and AUC are much more important than accuracy, specificity, and negative predictive value. In contrast, if the data imbalance leans towards a preponderance of true positives, specificity and negative predictive value will be critical, and accuracy, sensitivity, and positive predictive value less so.<\/p>\n Data imbalance is so unbelievably important to recognize because most real-world use cases are affected by it. Consider our melanoma example. Experts estimate that fewer than one in 10,000 moles will become melanoma. Therefore, if we look at 10,000 moles in the real world setting only ONE will be a melanoma. Consider the confusion matrix here for a moment, assuming we get everything correct (figure 43).<\/p>\n Figure 43: Real world melanoma Confusion Matrix<\/strong><\/p>\n Here, there is only 1 true positive out of 10,000 samples. Let\u2019s say we say everything is negative and we miss the one cancerous mole. Our negative predictive value would 99.99%, accuracy 99.99%, specificity 100%. Our recall would be telling and would be 0, just as above. The melanoma example we discussed above was already exaggerated, but the melanoma real world case is even worse.<\/p>\n Standardization:<\/strong> The last “S” in our BASIS acronym is standardization. Data generally isn\u2019t provided in nice packages. It\u2019s often messy and unstandardized. This is problematic as it\u2019s difficult to develop a model on unstandardized data. In this case one can envision a scenario where different institutions may label the pictures differently, use different cameras, take pictures of moles at different angles, or different pathologists may have different diagnostic approaches to melanoma.<\/p>\n Although we will apply the BASIS algorithm to the four companies we\u2019re evaluating, it can be applied to essentially every use case. Of note, assessment of data quality is intrinsic to the BASIS paradigm.<\/p>\n With the BASIS algorithm in hand, as the negative team, I now go on the attack and assess the remainder of each company\u2019s Solvency in succession, per the rubric in figure 36.<\/p>\n DermTech claims their Smart Sticker for melanoma detection has a >99% negative predictive value, sensitivity of 91%, specificity of 69%, and AUC of 0.9. The authors of the paper publishing these results placed tremendous emphasis on negative predictive value and feel it\u2019s the most important measure of the assay. However, they are incorrect in this assertion because of the negative imbalance associated with melanoma we discussed previously. Overall, only 1 in 10,000 moles is melanoma. Only 0.8% of SPLs biopsied are melanoma [39]. Therefore, the true negative rate in the numerator of the negative predictive value equation is high and the NPV is high by default. Accordingly, the measures we are very interested in are the precision (positive predictive value), and the recall (sensitivity). A sensitivity of 91% is excellent; precision isn\u2019t given. However, the AUC of 0.9 suggests the precision is good. Now, it\u2019s time to assess the BASIS of the data DermTech used to assess their model.<\/p>\n Data presented:<\/strong> DermTech validated their Smart Sticker on 398 pigmented lesions, selected by dermatologists, which were tested with their PLA assay using their sticker, and subsequently biopsied. Final diagnosis of the lesion based on biopsy was determined by 3 experienced dermatopathologists who determined if a lesion was melanoma or not. Of the 398 pigmented lesions, 87 were melanomas, and 311 were non-melanomas. Of the 87 melanomas, 57 were superficial spreading, 13 lentigo maligna, and 15 hybrid. There were no nodular melanomas.<\/p>\n Figure 44: Breakdowns of melanoma subtypes in DermTech study.<\/strong><\/p>\n The median thickness of the melanomas was 0.4 mms, and included melanoma in situs that were not invasive and cannot spread, suggesting the company was focused on very early melanomas (figure 44). Samples were obtained from 28 institutions throughout the world. Most lesions were acquired from patients with white skin, and were well distributed between the head and neck, extremities and torso. Patients with lesions on the soles of the feet, hands, and mucous membranes were excluded from the study. Lesions had to be greater than 4 mms in diameter, and patients couldn\u2019t have ongoing skin conditions, or recent skin related treatment. Males and females were well represented in the patient population.<\/p>\n BASIS of the data:<\/strong><\/p>\n Bias<\/strong>: The sample set is relatively diverse as it contained atypical nevi, melanoma in situ, and invasive melanoma. It had lesions from various locations and the final biopsy diagnosis was made by three independent dermatopathologists, who incidentally couldn\u2019t come to a consensus on 11% of samples.<\/p>\n The sample size didn\u2019t include a sufficient number of non-white patients to be generalizable to them. It also didn\u2019t include patients with ongoing skin disorders or skin treatments, nodular melanomas, lesions less than 4 mms in diameter, or lesions on the hands, soles of the feet, or mucous membranes.<\/p>\n Ultimately, we conclude the sample size was reasonably unbiased (diverse), but the aforementioned issues are present.<\/p>\n Amount<\/strong>: The validation sample size was 398 and included 87 melanomas. Undoubtedly, one would like more testing even though the p-value for the study was less than 0.05. However, the study is decent, given how difficult it is to obtain the necessary samples.<\/p>\n Source<\/strong>: Samples were obtained from 28 different institutions throughout the world from reputable dermatologists who identified the lesions to be tested. Interestingly, 14% of samples obtained had to be discarded because they didn\u2019t have sufficient skin cells to run the PLA test, presumably due to user error. Nonetheless, it appears the source of the DermTech data was good.<\/p>\n Imbalance<\/strong>: As aforementioned, we are vehemently focused on data imbalance in melanoma detection studies. 87 (21.9%) of the 398 samples tested were melanoma. Given that 1 in 10,000 moles are melanomas and only 0.8% of suspicious pigmented lesions are melanoma, this data set is more balanced than others in melanoma.<\/p>\n Standardization<\/strong>: Investigators had a regimented protocol around sample acquisition and the procedures were highly standardized, allowing for institutional cross comparison.<\/p>\n Ultimately, having completed the BASIS assessment, although I have some arguments we can pose, I feel DermTech\u2019s underlying data set was adequate.<\/p>\n Moving on to the final facets of the DermTech solvency analysis, we define the confusion matrix measures.<\/p>\n Recall that DermTech\u2019s technology relies on testing skin cells removed from a patient\u2019s skin lesion for the expression of two genes, LINC00518 and PRAME, that are preferentially expressed in melanoma (figure 45).<\/p>\n Figure 45: Preferential Expression of LINC00518 and PRAME in melanoma evolution (40<\/a>).<\/strong><\/p>\n Figure 46 demonstrates that 66 of the 87 melanoma in their validation set expressed both genes, 13 expressed 1 of 2, and 5 none. This enabled them to make the predictions depicted in figure 46, resulting in 91% sensitivity and 69% specificity. The negative predictive value was calculated at 99% on 7% prevalence.<\/p>\n Figure 46: Preferential expression of LINC00518 and PRAME in samples tested with Smart Sticker.<\/strong><\/p>\n DermTech reports an AUC of 0.9 in the validation set (figure 47).<\/p>\n Figure 47: AUC for DermTech Smart Sticker.<\/strong><\/p>\n Ultimately, given the inherent negative data imbalance in melanoma detection, and that 79% of the samples tested were not melanoma, we very quickly hone in on precision and recall (sensitivity). Sensitivity is excellent at 91%. Recall, although not specifically given, appears to be decent as well; I didn\u2019t take the time to calculate it. Interestingly, the false positive rate is a bit high with a specificity of 68%. However, we are willing to live with this as our major concern is false negatives, nicely depicted in sensitivity, not false positives, which is illustrated in specificity. We can live with patients being over biopsied, but not them being under diagnosed.<\/strong><\/p>\n Overall, after completing your solvency assessment of DermTech you are fairly impressed.<\/p>\n Melanoma AI is predicated on a deep convoluted neural network (DCNN) that can identify suspicious pigmented lesions (SPLs) from wide images. They contend a primary care physician could take a picture of the patient\u2019s entire back and their AI will identify lesions of concern. Thereafter, the lesions would be formally assessed and biopsied by a specialist.<\/p>\n Melanoma AI utilized 38,283 pictures taken from publicly available images, including those acquired via web scraping, and 133 patients from Hospital Gregorio Maranon in Madrid, Spain<\/a>. The pictures were taken with a variety of consumer grade cameras, largely without dermoscopy. The pictures were classified by 3 board-certified dermatologists, who confirmed if labels assigned to them on the internet or at the Spanish hospital were accurate.<\/p>\n The picture distribution included six categories (figure 48):<\/p>\n The majority of the SPL pictures were of white skin. They were not independently validated as being biopsy proven melanoma or not. The majority of them were from anonymous sources, so the underlying characteristics of the patients whose lesions were captured is largely unknown.<\/p>\n Figure 48: Different picture classes used in Melanoma AI generation.<\/strong><\/p>\n BASIS of the data:<\/strong><\/p>\n Bias<\/strong>: There is concern the majority of the SPL pictures were of white skin. However, there seems to be sufficient diversity in the lesions assessed, including 14 non-malignant pigmented lesions, basal cell carcinoma, squamous cell carcinoma, and all stages of melanoma. The pictures were obtained from innumerable sources, including Hospital Gregorio Maranon in Madrid, Spain. It\u2019s unclear how many SPL pictures were from women or men, but it does appear the picture sample for the NSPL-type A, NSPL-type B and SPLs was sufficiently diverse<\/p>\n Amount<\/strong>: The number of pictures used is sizeable, but it\u2019s important to note that 10.6% of them were SPLs, and 7.6% were melanoma. Nonetheless, the amount of skin lesion pictures used is impressive.<\/p>\n Source<\/strong>: This is likely the biggest concern regarding the data used. Initially, there are studies showing that many skin lesion pictures online are mislabeled as malignant or otherwise. In addition, as there is no biopsy data available for the pictures used, it\u2019s unclear if the SPLs were actually malignant or not, even though they appeared to be by three independent dermatologists. This is a critical issue as one can imagine a scenario where the model was trained on lesions that were misclassified, thereby making it error prone in real-world applications.<\/p>\n Imbalance<\/strong>: As above, only 10.6% of the pictures used are SPLs, and 7.6% were melanoma. However, this is reasonable given the nature of melanoma previously discussed.<\/p>\n Standardization<\/strong>: The nature of the data retrieval, including web scraping of numerous sites, resulted in data that was highly non-standardized. However, this is advantageous when training an AI model, and isn\u2019t a significant issue in this study.<\/p>\n After numerous iterations of their DCNN, Melanoma AI settled on a configuration that demonstrated 99% negative predictive value, 90.3% sensitivity, 89.9% specificity, 86.56% accuracy, and AUC of 0.97 in distinguishing SPLs from other pictures. En face, that\u2019s pretty impressive. However, it cannot be overstated that the pictures used didn\u2019t have corresponding biopsy reports confirming whether they were malignant or not. It\u2019s conceivable SPLs identified as melanoma based on appearance were not actually melanoma, which confounds the model.<\/p>\n Overall, as the use case for Melanoma AI is the acquisition of wide-angle pictures of patients in primary care clinics to identify SPLs, the data presented here is intriguing.<\/p>\n Skin Analytics, not to be outdone by the MIT group leading Melanoma AI, generated their Deep Ensemble for Recognition of Malignancy (DERM) AI algorithm for analyzing dermascopic images taken by smart phone and digital single-lens reflex cameras. They correctly state their AI is currently in use by the UK NHS to help identify SPLs in the clinic.<\/p>\n The primary data Skin Analytics cites comes from a study reported in Dermatology in 2019. 731 images comprised of 51 biopsy proven melanomas, 109 biopsy proven NON-melanoma lesions, and 571 non-biopsied skin lesions were used in the study. Pictures were taken at 7 UK hospitals, with 3 different cameras. The sample size included 279 women and 484 white patients. The patients had fair skin tones (Fitzpatrick 1-3). Most of the melanomas were superficial spreading (54%) or lentigo maligna (31%).<\/p>\n BASIS of the data:<\/strong><\/p>\n Bias<\/strong>: Most of the samples were of fair skinned white patients. They were obtained at 7 different centers and the sex distribution was reasonable.<\/p>\n Amount<\/strong>: In contrast to the Melanoma AI data set, this data set only had 55 melanoma pictures, but very importantly, they were biopsy proven unlike Melanoma AI. The overall data set included 731 images.<\/p>\n Source<\/strong>: The data came from 7 reputable UK institutions and was acquired by trained professionals.<\/p>\n Imbalance<\/strong>: The data set of 731 images included 55 melanomas (7.5%), revealing significant imbalance, but this is expected with this use case.<\/p>\n Standardization<\/strong>: The data acquired was highly standardized.<\/p>\n The model identified melanoma lesions from non-melanoma lesions with an AUC of 0.95 for all lesions (0.906 for biopsied lesions), 100% sensitivity, and 64.8% specificity using iPhone 6s. AUC was 0.87 for biopsied lesions using the DSLR camera, and 0.86 using Galaxy S6. Negative predictive value for all devices was 99%.<\/p>\n Overall, the data presented by Skin Analytics is intriguing as the AUC is reasonable. There are issues with the amount of data available, use of only three Fitzpatrick skin types, etc., but there is merit in the DERM AI. Indeed, this is why the UK NHS adopted it.<\/p>\n Cancer Canines contends they\u2019ve trained dogs to identify the presence of volatile organic compounds (VOCs), such as dimethyl disulfide and dimethyl trisulfide, that can distinguish melanoma lesions from non-melanoma lesions. They cite studies demonstrating these substances are emitted by melanoma, but not benign lesions. As canine olfaction can identify odorant molecules at up to 1 part per trillion (a magnitude more sensitive than the currently available gas chromatography\u2013mass spectrometry technologies) and can detect the production of new VOCs or altered VOC ratios due to pathologic processes, Cancer Canines believes their dogs can be used in early detection of melanoma.<\/p>\n<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
Lies, damn lies, and statistics<\/h3>\n
<\/p>\n
<\/p>\n
Setting The Scene<\/h3>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/strong><\/p>\n
The Lamppost<\/h3>\n
<\/p>\n
\n
<\/p>\n
\n<\/p>\n
\n<\/p>\n
\n<\/p>\n
\n<\/p>\n
\n<\/p>\n
\n<\/p>\n
\n<\/p>\n
\n<\/p>\n
\n<\/p>\n
\n<\/p>\n
\n<\/p>\n
<\/p>\n
Using the Lamppost to Illuminate<\/h3>\n
Remember to be Precise<\/h3>\n
<\/p>\n
<\/p>\n
A nagging feeling<\/h3>\n
Drowning in Data<\/h3>\n
Channeling your inner Hermione Granger<\/h3>\n
<\/p>\n
The Harm \/ Inherency \/ Plan \/ Solvency \/ Disadvantages Rubric<\/h3>\n
<\/p>\n
<\/p>\n
Putting the rubric to work<\/h3>\n
\n
An investment to be made?<\/h3>\n
<\/p>\n
The Affirmative Team<\/h3>\n
<\/p>\n
The negative team<\/h3>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
\n
<\/p>\n
The BASIS of the data<\/h3>\n
<\/p>\n
<\/p>\n
\n
<\/p>\n
<\/p>\n
The true debate begins<\/h3>\n
DermTech<\/h4>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
\n<\/p>\n
Melanoma AI<\/h4>\n
\n
<\/p>\n
<\/p>\n
Skin Analytics<\/h4>\n
<\/p>\n
Cancer Canines<\/h4>\n
<\/p>\n