{"id":485297,"date":"2024-01-13T18:07:33","date_gmt":"2024-01-13T23:07:33","guid":{"rendered":"https:\/\/platohealth.ai\/drowning-in-data-a-data-science-primer-for-a-translational-scientist\/"},"modified":"2024-01-13T18:10:27","modified_gmt":"2024-01-13T23:10:27","slug":"drowning-in-data-a-data-science-primer-for-a-translational-scientist","status":"publish","type":"post","link":"https:\/\/platohealth.ai\/drowning-in-data-a-data-science-primer-for-a-translational-scientist\/","title":{"rendered":"Drowning in Data: A Data Science Primer for a Translational Scientist","gt_translate_keys":[{"key":"rendered","format":"text"}]},"content":{"rendered":"

In 1945 the volume of human knowledge doubled every 25 years. Now, that number is 12 hours [1]. With our collective computational power rapidly increasing, vast amounts of data and our ability to assimilate it, has seeded unprecedented fertile ground for innovation. Healthtech companies are rapidly sprouting from data ridden soil at exponential rates. Cell free DNA companies, once a rarity, are becoming ubiquitous. The genomics landscape, once dominated by the few, are being inundated by a slew of competitors. Grandiose claims of being able to diagnose 50 different cancers from a single blood sample, or use AI to best dermatologists, radiologists, pathologists, etc., are being made at alarming rates. Accordingly, it\u2019s imperative to know how to assess these claims as fact or fiction, particularly when such claimants may employ \u201cstatistical misdirection\u201d. In this addition to \u201cThe Insider\u2019s Guide to Translational Medicine\u201d we disarm perpetrators of statistical warfare of their greatest weapons, statistics themselves. To do so we introduce a novel BASIS acronym for analyzing data underlying AI models and new products. Moreover, we introduce a unique harm \/ inherency \/ plan \/ solvency \/ disadvantage paradigm for developing and assessing business plans, grants, healthtech, genomics companies, etc. We provide a use case for implementation of these thought constructs to assess new entrants in the melanoma detection. Ultimately, I intend to leave you with a rigorous approach to discriminate the good from the bad, and everything in between, in healthtech, multiomics, etc.<\/p>\n

<\/p>\n

Image credit: Adrienn Harto<\/strong><\/h6>\n

Putting You to Work<\/h3>\n

You are an investor and CEO of Biopharmatrend.com Investment Group, BPTIG, and are considering four companies in early melanoma detection to invest in: DermTech, Melanoma AI, Skin Analytics, and Cancer Canines.<\/p>\n

DermTech<\/a> <\/strong>(DMTK), a company listed on the NASDAQ stock exchange, talks to you about a disruptive technology they developed called the “Smart Sticker” (figure 1).<\/p>\n

<\/p>\n

Figure 1: DermTech\u2019s melanoma detection Smart Sticker.<\/strong><\/p>\n

They claim that, rather than have to biopsy a patient’s skin to determine if a mole is melanoma or not, a physician can simply place the Smart Sticker on the patient’s skin [2]. The sticker entails a proprietary technology allowing for the capture of skin cells and subsequent processing at DermTech\u2019s central facility (figure 2). The mole is identified as malignant or not by their Pigmented Lesion Assay (PLA) that tests corresponding cells for expression of two genes, long intergenic nonprotein coding RNA 518 (LIC00518), and preferentially expressed antigen in melanoma (PRAME). DermTech is very quick to inform you they have a National Comprehensive Cancer Network (NCCN) guidelines category 2B designation [3], and a >99% negative predictive value, 91% sensitivity, 69% specificity, and AUC of 0.9 for melanoma detection [4].<\/p>\n

<\/p>\n

Figure 2: Skin cells being obtained using the Smart Sticker. The lesion of interest is marked with a circle and only the corresponding skin cells are tested.<\/strong><\/p>\n

They emphasize their test is non-invasive and will not leave a patient scarred; in contrast to shave, punch, and excisional biopsies, the predominant techniques dermatologists use to sample moles. As of May 27, 2022, DermTech had a stock price of 7.14 with a market cap of 214 million dollars [5]. You are considering making a sizeable stock purchase, particularly as DermTech\u2019s stock price was inexplicably as high as 79.76 on February 19, 2021 (figure 3).<\/p>\n

<\/p>\n

Figure 3: DermTech (DMTK) stock price.<\/strong><\/p>\n

Melanoma AI <\/strong>is an MIT outfit that contends they built a deep convoluted neural network (DCNN) that allows patients to detect if they have melanoma through wide-field imaging of their skin at home or their primary care physician\u2019s office (figure 4). The model is predicated largely on the \u201cugly duckling\u201d concept that preferentially identifies skin lesions that look very different than others on the patient\u2019s body [6].<\/p>\n

<\/p>\n

Figure 4: Wide-field DCNN in classification of suspected pigmented lesions (SPL in red) from others. Wide image pictures are taken of a patient\u2019s skin at their primary care physician\u2019s office and analyzed via a DCNN (top panel). Ugly duckling lesion analysis is performed and suspicious lesions are identified (bottom panel).<\/strong><\/p>\n

They reference a paper published in Science Translational Medicine February, 27, 2021, where they had a sensitivity of 90.3%, specificity of 89.9%, accuracy of 86.56%, and AUC of 0.97 for differentiating \u201csuspicious pigmented lesions (SPLs)\u201d from others. They site numerous articles in various journals celebrating their DCNN algorithm. They are a private company currently looking for series A funding, and are asking for 10 million dollars at a 100-million-dollar valuation.<\/p>\n

Skin Analytics<\/strong><\/a>, like Melanoma AI, believes they have the best AI based neural network that allows people to use their cell phones with dermascopes to determine if moles are melanoma or not (figure 5). They cite a paper they published in Dermatology, October 16, 2019, and state the UK NHS presently uses their AI technology, Deep Ensemble for the Recognition of Malignancy (DERM), for melanoma detection [7]. They report an AUC of 0.901, 99% negative predictive value, 100% sensitivity, and 64.8% specificity for determining if a lesion is melanoma or not using images obtained from an iPhone 6s attached to a dermascope. They reveal a series A funding round of 9 million dollars with 16 investors involved. They are presently asking for 10 million dollars at a 100-million-dollar valuation in Series B funding.<\/p>\n

<\/p>\n

Figure 5: Cell phone with dermascope examination of a nevus using Skin Analytics DERM.<\/strong><\/p>\n

Cancer Canines <\/strong>trains dogs to use their extraordinary olfactory sense to determine if moles are malignant or not (figure 6). They contend that melanomas emit volatile organic compounds (VOCs), small odorant molecules that evaporate at normal temperatures and pressures, and that dogs have the keen olfactory senses required to detect them (9-10).<\/p>\n

<\/p>\n

Figure 6: Cancer Canines secret weapon, Bolt, who is actually one of my rescue dogs.<\/strong><\/p>\n

During their pitch to you, they report a case of a 43-year-old female who presented with a mole on her central back that was present since childhood and changed over the last few years (figure 7).<\/p>\n

<\/p>\n

Figure 7: A 43 year old female with a suspicious mole.<\/strong><\/p>\n

Her 2-year-old rescue dog frequently sniffed the mole and would get agitated and try to scratch and bite it. This prompted the woman to go to the dermatologist, who determined the mole was a stage 3B malignant melanoma (figure 8).<\/p>\n

<\/p>\n

Figure 8: Biopsy revealed malignant melanoma.<\/strong><\/p>\n

After surgical removal of the lesion the woman reported her dog no longer became agitated when it sniffed her back. She is still alive 7 years later. Based on this, Cancer Canines conducted studies demonstrating dogs they trained to detect melanoma correctly identified melanomas 45% of the time. They propose placing cancer canines in kiosks in cities throughout the world to help facilitate early detection of melanoma. They are asking for 3 million dollars at a 10-million-dollar valuation.<\/p>\n

Lies, damn lies, and statistics<\/h3>\n

You, as the CEO of BPTIG, are very impressed. The numbers being thrown around are stupendous and you firmly believe you’ve stumbled on disruptive technology that might replace dermatologists to some extent. Indeed, you are astutely aware numerous studies showed AI performed better than dermatologists at diagnosing skin maladies. Moreover, there is a profound shortage of dermatologists in the U.S.A [12]. You have been told this is due to the American Dermatology Association ensuring the number of available residency spots doesn’t increase significantly. In addition, you noticed many dermatologists appear to be running cosmetic shops that focus on administering botox, restalyn, collagen, laser treatments, etc., rather than caring for melanoma.<\/p>\n

After daydreaming about being a dermatologist you realize you digressed a bit. You snap back to reality and reassess your excitement about the four proposals before you. Specifically, you recall reading an editorial written by Basem Goueli MD, PhD, MBA, where he insisted that \u201cwhen you’re confused about accuracy, remember to be precise and specific, while not being too negative<\/em><\/strong>.\u201d You look on biopharmatrend.com to revisit what he wrote and see:<\/p>\n

In the world of fast money many disciplines are prone to hyperbole. Pharmaceutical companies boast of phase 1 data that barely provides an incremental improvement over the status quo. They celebrate “me too” drugs that even politicians are beginning to speak out against [12]. Physicians tout “best doctor” awards they’ve “won” that have little to do with ability or talent, and are either glorified popularity contests or outright paid for by the physician [13]; I have had three offers this year to buy a top doctor award (figure 9).<\/p>\n

<\/p>\n

Figure 9: Modern Healthcare exposes Top Doctor Awards. <\/strong><\/p>\n

The American Board of Internal Medicine (ABIM) claims they are the authority on ensuring physicians are current in their field, but many questions they ask on their certification examinations are incredibly esoteric and largely clinically irrelevant, and physicians can actually use uptodate.com<\/a> DURING<\/strong> their tests to look for answers.<\/p>\n

For every Jeff Bezos or Bill Gates, there is an Elizabeth Theranos or Bernie Madoff trying to grift you out of your money with big promises and flowery rhetoric. Every day we see announcements of companies getting tens of millions of dollars in investor money and we struggle to understand why. Indeed, I recently saw a company that got 29 million dollars in series C funding and their platform DOESN\u2019T PRESENTLY WORK.<\/em><\/strong><\/p>\n

Investors from 2020-2022 seem to have been particularly prone to smoke and mirrors emanating from heathtech companies. Unsurprisingly, many individuals flocked to the sector to try and profit off unsuspecting, but well-meaning investors, who truly want to help society while making money at the same time (figure 10).<\/p>\n

<\/p>\n

Figure 10: Well-meaning investors need to be on Healthtech fraud alert at all times. <\/strong><\/p>\n

I composed this article to assist all of the well-meaning individuals in this world, whether it be investors, health care companies, pharmaceuticals, etc., as they seek to help patients, and\/or are bombarded by claims of superiority and disruption. The arsenal of choice for the claimants is often statistics, and we will do what we can to disarm the perpetrators of their greatest weapons.<\/p>\n

Setting The Scene<\/h3>\n

It has been mesmerizing to see the explosion in data science over the last decade. A simple tour on LinkedIn, Upwork, Fivrr, etc. suggests nearly everyone on the planet is a data scientist. Accordingly, as you can’t put the toothpaste back in the tube, the data science revolution is here to stay. Allow me to state for the record, this is a tremendous development and will propel society forward. I\u2019m a firm believer everyone should develop some familiarity with data science (see supplemental information for my favorite way to do this). At the very least everyone needs to be facile with very important statistical concepts.<\/p>\n

Suppose for a moment I tell you that my superpower is I can predict if a mole is melanoma or not with 93% accuracy, 100% negative predictive value (NPV, not to be mistaken with net present value), a 7% misclassification rate, and 100% specificity. I also tell you I\u2019ve already convinced STARK industries to give me 2 million dollars at a 10-million-dollar valuation based on these claims. You seem suspect, but are interested in seeing me in action.<\/p>\n

Having done your homework you know that only 1 in 10,000 moles become melanoma, and only approximately 7% of \u201csuspicious pigmented lesions (SPLs)\u201d that are biopsied are malignant melanoma. You manage to acquire 100 pictures of SPLs, labeled as melanoma or benign based on associated biopsies.<\/p>\n

You invite me to your mansion in Medina, Washington (figure 11), to observe me in action. I arrive by boat in one of my patented three-piece suits with an ayre of unprecedented self-confidence (figure 12).<\/p>\n

<\/p>\n

Figure 11: Your mansion in Medina, Washington [14].<\/strong><\/p>\n

I wave to you with mixed indifference, as though I\u2019ve seen houses like your countless times before, and Tony Stark just gave me 2 million dollars in investment. Hiding behind my sunglasses, I am admittedly in bewildered awe of what I\u2019m seeing, and think to myself, \u201cI could fit a lot of homeless people and orphans in this house.\u201d<\/p>\n

<\/p>\n

Figure 12: Model reenactment of the suit I arrive in at your mansion [15]. <\/strong><\/p>\n

After walking uphill to get to the entry door I stroll into the foyer. I\u2019m immediately struck by how technologically advanced the facility is and wonder why everything says Microsoft. Ultimately, I arrive in an amphitheater like room and sit in the most comfortable movie chair I\u2019ve ever seen (figure 13).<\/p>\n

<\/p>\n

Figure 13: The amphitheater in your mansion where you try to expose me as a fraud [16]. <\/strong><\/p>\n

Nearly falling asleep the minute after I recline in the chair; you remind me why I\u2019m there. You immediately turn off the lights in the room and begin projecting images from the patient data set you acquired.<\/p>\n

In line with the observation that approximately 7% of suspected pigmented lesions biopsied are melanoma [17], the data set contains pictures of 93 benign lesions and 7 melanoma. As the first image is projected on the screen I ask if we can eat dinner prior to my predictions. You say, \u201cno\u201d.<\/p>\n

With the first image on the screen, I stand up from the chair and begin assessing the movie sized picture. I apply the ABCDE (asymmetry, border, color, diameter, evolution) logic for evaluating moles taught to me in medical school, while staring intensely at the picture (figure 14). After five minutes I tell you the mole is benign.<\/p>\n

<\/p>\n

Figure 14: The ABCDEs of melanoma [18]. <\/strong><\/p>\n

Knowing the answers, you note that I am correct in my assessment. However, you\u2019re aware the data set is markedly imbalanced with only 7% of the images being melanoma. Blind luck could have allowed me to guess the first SPL.<\/p>\n

You quickly project the second SPL picture on the screen. Again, you watch as I put my hands on my face and stare intensely at the picture. You then see me walk around feigning use of my superpower. After five minutes I tell you SPL #2 is negative. Again, you note I\u2019m correct.<\/p>\n

With a list showing the order the SPL pictures will be presented, and whether they are malignant or not, you note the first melanoma picture is #9.<\/p>\n

About an hour passes between picture #1 and #8. You can hear my stomach grumbling, but refuse to let me eat. You have watched me do my \u201cmelanoma dance\u201d repeatedly, only to say after five minutes that every picture was not melanoma. So far, I\u2019m 8 for 8.<\/p>\n

You start to feel excitement when picture 9 appears. Indeed, you have been readily anticipating this moment. You study me carefully as I walk around supremely confident in my ability. Ultimately, after completing the routine you\u2019ve witnessed 8 times before, I tell you the SPL on the screen is benign.<\/p>\n

\u201cEureka!\u201d, now you have me. You knew something was amiss, and you must be right. Yet, being a very intelligent person, you acknowledge I guessed 8 of 9 SPLs correctly, and whatever system I\u2019m using could have merit. You are unprepared to label me a grifter until you\u2019ve seen more. You glance at your sheet with the labeled SPLs and are frustrated the next SPL isn\u2019t until picture 27. You calculate in your head that to get through the next 91 images at 5 minutes each will take over 7.5 hours (455 minutes).<\/p>\n

You reassess how it is I got the appointment with you, given how busy you are. You recall it was a friend of yours that knew me that asked you to meet with me, and you obliged them. You are painfully regretting that decision at this moment. Yet, you haven\u2019t proven I\u2019m a fraud yet.<\/p>\n

With your stomach screaming at you and the audible bowel sounds emanating from me you decide it\u2019s time for dinner. You watch me savor every bite of the filet mignon in front of me. You note that I\u2019ve taken every liberty offered, including a second helping of the entr\u00e9e, and you know I\u2019m going to ask for seconds on desert. Yet, you\u2019re too far invested in the night to not play out the string.<\/p>\n

After watching me clearly stretch out dinner for 2 hours you welcome me back to the amphitheater. I ask to use the restroom, where I spend another 15 minutes, before returning to the amphitheater. You\u2019re curious what I was doing, but dare not ask.<\/p>\n

SPL #10 projects on the screen and I\u2019m right back at doing my \u201cmelanoma dance\u201d. You look at your watch and note that it\u2019s already 8 PM, and resign yourself to the fact you will be there until 3:30 AM before being able to definitively call me a fake.<\/p>\n

Hours later SPL #27 projects on the screen. By this point you feel you\u2019ve seen more SPLs to last a lifetime. You\u2019re starting to believe you might even be able to tell the difference, that is until you see SPL #27 that looks very similar to many others that were benign. You are very careful not to change any mannerisms while watching my SPL #27 melanoma dance, as you don\u2019t want to give anything away.<\/p>\n

\u201cBenign\u201d, you hear me say, and you smirk every so slightly. As we agreed before we started you are not to reveal if my predictions are correct or not. I try to read your eyes, but you, like Lady Gaga, have a very good poker face. You note that I\u2019ve been correct 25 of 27 times, but the 2 times I was wrong the SPLs were melanoma.<\/p>\n

You\u2019ve never quit anything in your life, and even though we\u2019re nearing your bed time, you are committed to seeing \u201cthe grift\u201d play out. The next melanoma is image 43.<\/p>\n

SPL #43 appears and you hear me say, \u201cbenign\u201d again. At this point you think you\u2019re truly on to me because 43 SPLs have been shown and I guessed they were all benign. Although I\u2019ve been right 40 times, I\u2019ve been wrong the only three times it truly mattered.<\/p>\n

The clock now reads 3:30 AM and you are barely conscious (figure 15).<\/p>\n

<\/p>\n

Figure 15: A baby reenacting you at 3:30 AM during our meeting [19].<\/strong><\/p>\n

You\u2019re astounded as you look at me because I am wide awake (figure 16). You remember me telling you I work over 100 hours a week and almost never sleep. You thought it was hyperbole, but now you\u2019re starting to believe me.<\/p>\n

 <\/strong><\/p>\n

Figure 16: A child reenacting me at 3:30 AM during our meeting [20].<\/strong><\/p>\n

SPL #100 appears on the screen and in a barely audible voice you ask me if I think it\u2019s malignant or benign. I tell you it\u2019s malignant. \u201cWait, what did you say?\u201d, you ask me as it\u2019s the first time I said the lesion was malignant. I tell you I said it was malignant, but I was just trying to jolt you awake. I then confirm I feel the lesion is benign.        <\/p>\n

You were already annoyed, but now you\u2019re irate. You wasted a day on my charade to appease your friend, and are infuriated thinking I might be having fun at your expense. You are thankful we\u2019re nearly done as you assess the final tally. Of the 100 SPLs shown, like I said I would be, I was 93% accurate in my designation of them as malignant or benign. My negative predictive value and specificity was 100%, and my misclassification rate was only 7%. And then, as you rush me down the stairs and on the boat at the dock where I arrived, it dawns on you the game I was playing all along.<\/p>\n

The Lamppost<\/h3>\n

Benjamin Disreali once wrote, “There are three types of lies \u2014 lies, damn lies, and statistics”. Ron DeLagge II penned, \u201c99 percent of all statistics only tell 49 percent of the story.\u201d Finally, Andrew Lang stated, \u201cMost people use statistics like a drunk man uses a lamppost; more for support than illumination\u201d.<\/p>\n

The morning after our interaction at your house you reflect on the initial conversation we had. You realize that I very carefully spoke of accuracy, negative predictive value, misclassification rate, and specificity, but you don\u2019t remember me mentioning precision, recall, error rate, F1 score, or area under the curve (AUC). I used the lamppost as support, but not for illumination. And with that, you realize you can hate the player, but you have to respect the game. You break it down as follows.<\/p>\n

Initially, when developing prediction models, such as whether a SPL is melanoma or not, or what a stock price will be tomorrow, one can use regression or classification. Regression is chosen when the output variable is continuous, such as stock price, age, salary, height, etc. Classification is used when the output variable is categorical, such as melanoma versus benign, winning versus losing, etc. Accordingly, my melanoma detection superpower is a classification model.<\/p>\n

When evaluating the efficacy of classification models, we generate a confusion matrix that relates actual values to predicted values. In our aforementioned melanoma dance (figure 17):<\/p>\n

<\/p>\n

Figure 17: Confusion matrix for the experiment testing my melanoma detection superpower.<\/strong><\/p>\n