The Study of Machine Learning Models in Predicting the Intention of Adolescents to Smoke Cigarettes

Four hundred eighty thousand deaths are recorded every year in the US alone due to smoking. Also, as per research, life expectancy for smokers is atleast ten years shorter than for non-smokers. In recent years, the use of e-cigarettes has been increasing rapidly for adolescents. Sometimes, e-cigarettes act as a gateway to cigarette use, which could be severe. This makes it important to predict the probability of adolescents smoking cigarettes in the future. 

Seung Joon Nam, Han Min Kim, Thomas Kang, and Cheol Young Park have discussed this in their research paper titled “The Study of Machine Learning Models in Predicting the Intention of Adolescents to Smoke Cigarettes”, which forms the basis of the following text.

Importance of this research

Consuming nicotine can cause cancer (lung cancer), cardiovascular and metabolic diseases, respiratory diseases, and perinatal diseases. These conditions are fatal for individuals. If an ML algorithm can predict the probability of individuals smoking cigarettes, efforts can be made to educate these individuals about the ill effects of smoking. This counseling could help adolescents avoid the ill effects of cigarette smoking. Thus, prediction can directly help individuals stay away from the path of smoking cigarettes and help save lives!

Research Objective

The main aim of this research is to

  • Find the best-fitting model to predict smoking intention for individuals
  • Create a website to help adolescents prevent e-cigarette.

About the Research 

The researchers evaluated different models such as Decision Tree, Gaussian NB, Logistic Regression, Random Forest, and Gradient Boosting to predict the accuracy of ML models for accurately predicting the intention of Adolescents to smoke cigarettes. 

Research Result

Based on the experiments done by the researchers, they found the Gradient Boosting to be the most accurate way to predict smoking tendency in the future. The researchers have also published for an anti-smoking campaign for teenagers. 


While healthcare has made a lot of progress in recent years, the number of medical conditions affecting teenagers has only increased. Often, these diseases are of our own making caused by our lifestyle choices. This research paper attempts to identify adolescents who are more likely to smoke and help adolescents make informed and healthier lifestyle choices. In the words of the researchers

E-cigarette use has increased among adolescents. This is a worldwide problem, because it has been stated in many researches mentioned in the introduction that e-cigarette use can cause future use of cigarettes. Since e-cigarette is a recent rising issue, there is little research done on this topic, compared to smoking cigarettes. Even among the researches done, there is a lack of researches implementing prediction models, which are more practical in preventing adolescents from using (e-)cigarettes. Thus, we researched using the 2018 NYTS data and developed multiple prediction models to predict a adolescents intention to smoke cigarette. The most accurate prediction model was Gradient Boosting Classifier with an overall accuracy of 93%. This model was applied in the website we designed to allow the public to input their information in respect to tobacco products, including e-cigarette, cigarette, and cigar. With this information, the algorithm can predict the respondees probability of future of smoking. This will help the public become more aware about certain factors in their lives and be attentive about their drug use or how their environment can affect their intention to smoke cigarettes. Further research could include a wider range of ages, since our research is mainly focused on adolescents rather than adults. In order to improve the accuracy of the prediction model, it is essential to increase the amount of data or choose better, more fitting, variables.

Source: Seung Joon Nam, Han Min Kim, Thomas Kang and Cheol Young Park’s “The Study of Machine Learning Models in Predicting the Intention of Adolescents to Smoke Cigarettes” 

Leave a Reply