ABSTRACT
Purpose
Polycystic ovary syndrome (PCOS) is a serious health condition, affecting 5-10% of women of reproductive age, and is associated with obesity, irregular periods, infertility, and hirsutism. PCOS may significantly impact quality of life and requires accurate information for proper management. Social media has become an important source where many women seek health information. This study evaluated YouTube and Instagram reels content related to PCOS. The search terms “PCOS,” “PCOS and menstrual irregularity,” and “PCOS and hair growth” were used. Identified content was assessed for quality, educational value, and number of views. The aim was to determine whether these platforms provide reliable, high-quality educational information about PCOS.
Methods
Content was categorized into four groups according who created it: physicians; healthcare institutions; non-physician healthcare professionals; and personal accounts. For each YouTube and Instagram video, the following variables were recorded: duration, resolution, likes, comments, upload date, Global Quality score (GQS), modified DISCERN (mDISCERN), and engagement rate. This categorization allowed comparisons between professional and non-professional content producers.
Results
Content created by physicians and healthcare institutions demonstrated significantly higher GQS and mDISCERN scores than that produced by other groups. Physicians reached smaller audiences but generated higher engagement, while personal accounts attracted more views and interactions overall. These findings highlight the dual nature of social media as both a valuable source of information and a potential channel for misinformation.
Conclusion
Content from physicians and healthcare institutions was more accurate and reliable, whereas personal accounts gained greater popularity. This study demonstrated that popularity does not necessarily reflect quality. Medical information on social media should prioritize accuracy and reliability in order to genuinely benefit users and reduce the risk of misinformation.
INTRODUCTION
Polycystic ovary syndrome (PCOS) is a serious health condition, affecting 5-10% of women of reproductive age, and is associated with obesity, impaired glucose tolerance, irregular menstrual cycles, infertility, and hirsutism.1 YouTube is a popular platform where users can easily access video content and share these videos.2 Visitors can upload videos, like or dislike content, and express their opinions through comments. Although YouTube can be used as a source of medical information, videos on the platform are not peer-reviewed. Moreover, videos that do not meet educational content criteria are ranked according to factors such as popularity, view counts, and comments.3 Instagram is a platform where users can post photos and short videos, which can be liked, commented on, and shared. While Instagram can serve as a source of medical information, posts on the platform are also not peer-reviewed.4 In this study, YouTube videos and Instagram reels were searched for using the keywords “polycystic ovary syndrome,” “PCOS,” “PCOS and irregular menstruation,” and “PCOS and hirsutism”. Identified content was evaluated in the order presented to users by the platform algorithms. A total of 50 YouTube videos and 50 Instagram reels were analyzed. Content was categorized into four groups according to the creating source: physicians; healthcare institutions; non-physician healthcare professionals; and personal accounts.
This study evaluated the educational quality, reliability, and popularity of PCOS-related YouTube videos and Instagram reels. The aim was to examine the content quality of PCOS-related videos on YouTube and Instagram and assess the extent to which viewers could access accurate information from these videos. To the best of our knowledge, this is the first study to analyze and evaluate both Instagram and YouTube content on PCOS in the manner described.
METHODS
All data used in this study were publicly available and did not require special access for collection. Therefore, no permission was required from an ethics committee, YouTube or Instagram to conduct this study. In March 2025, the most viewed English-language content was identified using the keywords “polycystic ovary syndrome,” “PCOS,” “PCOS and irregular menstruation,” and “PCOS and hirsutism.” The selected content was independently evaluated by two obstetricians and gynecologists in a double-blind design, and a third obstetrician’s opinion was sought in cases of disagreement. Interobserver agreement was assessed using Cohen’s kappa statistic. Videos that were non-English, produced for advertising purposes, or explicitly intended to manipulate viewers were excluded. Data collection was performed during a single time period (March 2025).
Given the lack of sufficient published data regarding potential seasonal fluctuations in PCOS-related social media content, a fixed time window was used to enhance reproducibility. This approach was adopted to minimize the effects of algorithmic changes and potential seasonal variations. For the purposes of the study, new accounts were created specifically for research on the respective platforms. YouTube and Instagram content was accessed via web browsers on a computer, and all evaluations were conducted using the browsers’ incognito mode. This method was chosen to minimize the risk of selection bias caused by platform algorithms that provide personalized content recommendations.
Content was evaluated starting from the first page of search results on YouTube and Instagram search engines. For each YouTube video, the following variables were recorded: video length, video resolution, number of views, number of likes, number of comments, upload date, Global Quality score (GQS), modified DISCERN (mDISCERN) score, and engagement rate. For a description of the GQS and mDISCERN scoring system and validation, see below. Account size (followers/subscribers) and account age were deliberately excluded from analysis to better reflect the real user experience. In this study, content encountered according to the ranking provided by the platforms’ algorithms was assessed, and analysis was performed based on absolute engagement values and quality metrics. All quantitative data were manually recorded by the researchers for each video or reel, using pre-defined standardized forms to ensure consistency.
Videos with durations between 1 and 10 minutes were included. The number of dislikes for YouTube videos could not be recorded because YouTube hides dislike counts. The study initially planned to calculate a video power index (VPI) to assess video impact, using the formula: [likes/ (likes + dislikes)] × 100.4 However, VPI could not be calculated as dislike counts are not officially shared by YouTube with third parties. For each video, values such as likes/day, comments/day, and views/day were calculated based on the elapsed time since publication, total views, likes, and comments. In addition, the like-to-view ratio was calculated as an indicator of viewer satisfaction, reflecting the proportion of viewers who liked the content, independent of total views.
From Instagram posts, reels were included in the study. Instagram reels are short, easily consumable videos with a maximum length of three minutes. For each reel, fluency, audio and video quality, number of likes, number of comments, number of times the reel was shared via direct message (DM), and the posting date were recorded. Reels were evaluated using the GQS and interaction rate. For each Instagram reel, likes/day, comments/day, and DMs/day were calculated. The DMs/day metric represents the frequency of organic sharing of the content via DM calculated by dividing the total number of DMs by the number of days since posting. Both YouTube videos and Instagram reels were also scored using the mDISCERN tool.
Modified DISCERN Scoring
The DISCERN tool is a standardized scoring system used to evaluate the quality of medical content.5 The mDISCERN is a simplified, five-question version of the original DISCERN instrument, which has been used in multiple previous studies.6-8 For mDISCERN scoring, five “yes” or “no” questions were asked. A “yes” response was scored as 1 and a “no” as 0, resulting in a total score out of 5. The five questions were:
1. Is the aim clear, concise, and understandable?
2. Are the sources of information reliable? (Are cited references or video content derived from valid studies?)
3. Is the information presented balanced and unbiased? (Is there any reference to alternative treatment options?)
4. Are additional sources of information listed?
5. Does the video address areas of uncertainty?
Interpretation of the DISCERN Scores
• 1-2 points: Low quality. Material is insufficient in terms of reliability and information presentation and is not suitable for educational or patient information purposes.
• 3-4 points: Moderate quality. Material contains some important information but has deficiencies and imbalances; it may be partially useful.
• 5 points: High quality. Material presents reliable and balanced information and is highly suitable for education and patient information.
Global Quality Score
Videos were also evaluated using the GQS, a five-point scale assessing the overall quality of video content.9 GQS evaluates the educational value of the content based on five core criteria.8
GQS Scoring Table (Text Version)
Score 1: Very poor quality - very low quality, poor flow, most information missing, not useful for education. Score 2: Poor quality - limited usefulness; only some information is available, most key points not addressed. Score 3: Moderate quality and flow - somewhat useful but important topics are missing; flow is insufficient. Score 4: Good quality and flow - useful, as most key topics are addressed. Score 5: Excellent quality and flow - highly useful, covering all important topics comprehensively.
Interaction Index
The interaction index is a holistic measure of content engagement over time, allowing performance to be evaluated not only by total interactions but also in relation to time.
Formula (Text Version)
Interaction index = (number of likes + number of comments + number of shares) ÷ number of days.
Engagement Rate
Engagement rate is a key performance metric indicating the extent of interaction between content and viewers. This ratio typically includes likes, comments, and sometimes negative feedback (dislikes) relative to total views. However, since YouTube no longer publicly displays dislike counts, standard calculation methods have been adjusted.
In this study, the engagement rate was calculated using only likes and comments relative to total views. This provides a usable and comparable measure of overall viewer interaction.
Formula (Text Version)
Engagement rate = [(number of likes + number of comments) ÷ total views] × 100
Videos were categorized into four main groups according to their source:
1. Videos produced on behalf of healthcare institutions
2. Videos produced by physicians
3. Videos produced by non-physician healthcare professionals
4. Personal videos produced by individuals who are not healthcare professionals
YouTube videos were excluded from the study if they were longer than 10 minutes or shorter than 1 minute, if the language of the content was not English or English was insufficiently understandable, if video quality was below 480p, if the video contained advertisements, or if there was a mismatch between the title and the content. Videos included in the study were English-language content longer than 1 minute and shorter than 10 minutes, with video quality above 480p, matching title and content, and without advertising. Videos below 480p reduce educational value, as medical visuals, diagrams, or written materials may not be clearly visible. This criterion ensures that visual content can be properly evaluated. Videos shorter than 1 minute cannot adequately address complex topics such as PCOS, while videos longer than 10 minutes do not reflect typical social media consumption habits, considering user attention spans.
Instagram reels included in the study were excluded if the language was not English or English was insufficiently understandable, or if they contained advertising. Only English-language reels without advertising were included. No time restriction was applied for Instagram reels.
A total of 114 YouTube videos were initially reviewed by the researchers (Figure 1). Based on inclusion and exclusion criteria, 61 videos were excluded for being longer than 10 minutes, 1 video for being shorter than 1 minute, and 2 videos for containing advertisements. Consequently, 50 videos were included for evaluation.
For comparisons between groups, the Shapiro-Wilk test indicated that the data were not normally distributed. Therefore, the non-parametric Kruskal-Wallis test was used.
Statistical Analysis
Statistical analyses were performed using IBM SPSS Statistics version 25.0 (IBM Inc, Armonk, NY, USA). In group comparisons using the Kruskal-Wallis test, p<0.05 was considered statistically significant, p<0.001 was considered highly significant, and p>0.05 was considered not significant.
A total of 55 reels were viewed for Instagram reels content. As a result of inclusion and exclusion criteria, two pieces of content were excluded due to advertising, two pieces of content were excluded due to awkward language, and one piece of content was excluded due to title-content mismatch. As a result, 50 individual Instagram reels were evaluated (Figure 2).
RESULTS
Interobserver reliability analysis demonstrated high agreement. For GQS scores, κ=0.847 (p<0.001), and for mDISCERN scores, κ=0.724 (p<0.001). These values correspond to “almost perfect” and “good” levels of agreement, respectively.
Of the YouTube videos included in the study, 18 were posted on personal accounts, three by non-physician healthcare workers, 16 by doctors, and 13 by healthcare institutions. The GQS, mDISCERN, engagement rate, views/day, likes/day, likes/views, and average duration values are shown in Table 1. Content created by doctors demonstrated the best performance in terms of quality and engagement, although their like rate was low. The overall quality of doctors’ content was notably high. Since the mDISCERN score is a measure of the reliability and accuracy of content, it indicates that the videos produced by doctors provide reliable and accurate information. Personal accounts were found to be highly popular in terms of views and likes; however, their reliability scores were low. Compared to other categories, personal accounts ranked lower in terms of content quality and accuracy. This suggests that personal accounts provide less reliable information or are more oriented toward entertainment content. Healthcare organizations maintained high content reliability, but their engagement and viewership rates were low. A substantial difference was observed between non-physician healthcare professionals and personal accounts; while personal accounts received far more views and engagement, the content of healthcare professionals received fewer views and interactions. The very low interaction rate of non-physician healthcare professionals compared to other groups may indicate that their content has difficulty establishing a strong connection with the audience or that the level of audience interest is low.
Of the Instagram reels content included in the study, 21 were posted by doctors, eight by non-physician healthcare professionals, 17 by personal accounts, and four by healthcare organisations.
On both platforms, some personal accounts were found to introduce themselves to people with unrealistic, false statements such as “PCOS coach” and mislead them in this way. This statement is not a variable systematically analyzed within the scope of our study, but rather a qualitative finding observed during the content evaluation process. The average GQS score for Instagram reels, the average mDISCERN score, the average number of likes per day, the average number of posts per day, and the average number of comments per day for doctors, non-physician healthcare workers, personal accounts, healthcare institutions are shown in Table 2. Instagram reels produced by physicians had high GQS and mDISCERN scores and were evaluated as high-quality and reliable. These scores indicate that the content was strong, both esthetically and in terms of informational accuracy. Content produced by non-physician healthcare professionals demonstrated lower reliability compared to physicians, although it generally provided an adequate level of information.
Content from personal accounts received lower GQS and mDISCERN scores relative to other categories, reflecting that personal accounts typically offer more entertainment-oriented or subjective content. The engagement rate for personal accounts was moderate; although not high, the content was still viewed and interacted with by audiences.
Content from healthcare institutions exhibited both reliability and quality, indicating that these accounts provide professional and scientifically accurate material. Physicians and healthcare institutions achieved the highest reliability scores. However, while healthcare institutions received lower engagement, physicians achieved higher engagement and more direct interaction with viewers through messages.
Non-physician healthcare professionals exhibited notable engagement, particularly in DM and comment counts, although their content reliability was lower than that of physicians. Personal accounts stood out with high numbers of likes and comment counts but lower reliability scores. Overall, personal accounts appear to generate more engaging content, yet the content quality is generally more subjective.
The results of the Kruskal-Wallis test are presented in Table 3 and Table 4, which contain analyses of YouTube and Instagram content, respectively.
YouTube is a platform where content tends to be longer and more detailed, allowing reliable sources such as physicians and healthcare institutions to demonstrate stronger credibility. In contrast, Instagram emphasizes shorter content, where the reliability of content producers, particularly personal accounts and non-physician healthcare professionals, is generally lower.
Due to Instagram’s focus on rapid and visually-oriented content, engagement rates may be higher. Personal accounts and non-physician healthcare professionals received notable engagement on this platform, whereas more institutional content, such as that produced by healthcare institutions, tends to have lower engagement rates.
On YouTube, engagement rates are generally lower compared to Instagram, as viewers prefer longer-duration content. However, the depth of content on YouTube tends to sustain engagement over a longer period.
DISCUSSION
The internet is widely used as a source of health information.10 YouTube and Instagram are commonly used social media platforms that individuals also consult for informational purposes. However, on these platforms, content may spread rapidly regardless of its accuracy, posing a risk for the dissemination of misinformation. Considering the increasing frequency of online searches regarding PCOS over time, the reliability of content on these platforms has become particularly important.11 The aim of this study was to examine this issue.
In a study by (Mahajan et al.12) titled Educational quality and content of YouTube videos on diabetic macular edema (International Ophthalmology), findings aligned with our results, showing that content produced by physicians and healthcare institutions was of higher quality compared to content from other producers. Similarly, in the present study, content produced by physicians and healthcare institutions had significantly higher GQS and mDISCERN scores than that produced by other groups. Regarding YouTube content, videos produced by healthcare institutions had an average GQS of 3.3, exceeding the overall mean of 2.84, while physicians ranked second with an average of 2.87. For mDISCERN scores, physicians, assessed by physicians, ranked first with an average of 4.0, and healthcare institutions ranked second with an average of 3.7. These findings were supported by significant differences, indicating that physicians and healthcare institutions produce more reliable and higher-quality content. Post hoc analyses confirmed these differences, showing that physicians significantly outperformed personal accounts in content quality metrics on both YouTube and Instagram platforms (p<0.05 for all comparisons), while personal accounts achieved higher engagement rates (Tables 5 and 6).
In terms of views per day and likes per day, personal accounts had the highest average counts. Although these differences approached statistical significance (p=0.06-0.07), they were not definitive; nevertheless, personal accounts appear to be more successful in reaching content consumers. In terms of like-to-view ratios, physicians had higher average values, though this difference was again not significant (p=0.44). This suggests that, although the audience reached by physicians was smaller, engagement within this audience may be higher.
In terms of video duration, healthcare institutions produced shorter videos, whereas physicians produced longer videos. A similar trend was observed in the analysis of Instagram reels. Healthcare institutions had the highest scores in GQS (3.37) and mDISCERN (4.62), and these differences were significant. Physicians ranked second with average scores for GQS and DISCERN of 3,35 and 4.19, respectively. This suggests that physicians and healthcare institutions also produced higher-quality content on Instagram.
Although healthcare institutions had the highest average likes per day, this difference was not significant (p=0.26); therefore, this difference should be interpreted cautiously. When evaluating DM/day and comments/day, personal accounts and non-physician healthcare workers had higher average engagement, with the difference for comments/day being significant (p=0.04). This suggests that content produced by these two groups may have greater potential for sharing and discussion. Although physicians’ DM/day values were above the average, their likes/day and comments/day were lower. A possible reason for this was that their content was informative but did not sufficiently attract users in terms of visual or emotional engagement.
Similarly, in a study on contraceptive implants Sütcüoğlu and Güler13, the quality and reliability of social media videos were evaluated using GQS and mDISCERN scores, showing that content created by healthcare professionals was of higher quality. In the present study, physicians and healthcare institutions also produced higher-quality content, whereas personal accounts achieved higher views and engagement. This finding demonstrated that on social media platforms, information quality should be assessed independently of popularity. In particular, health-related content should be evaluated using systematic quality-based criteria, rather than relying solely on engagement metrics.
The limitations of this study include the restricted number of content items, low variability between groups, and the tendency of social media platform algorithms to prioritize engagement metrics over content quality. These findings highlight that healthcare professionals analyzing social media should consider not only statistical data but also the algorithmic promotion mechanisms of the platforms. Another limitation was that the GQS and mDISCERN scoring systems were originally developed for traditional-format videos. However, their core evaluation criteria (source reliability, information balance, and clarity of purpose) are format-neutral and applicable to short-format content. From the patient perspective, the accuracy and reliability of health information are independent of content duration.
Study Limitations
Additional limitations include the restricted sample size, low intergroup variability, and the platforms’ emphasis on engagement metrics over content quality when ranking content. Future research may address this methodological gap by developing scoring systems optimized for short-format social media content. Furthermore, the applied duration (1-10 minutes) and resolution (>480p) criteria may have excluded potentially valuable educational content. Specifically, longer, detailed educational videos or lower-resolution but content-rich materials were excluded from evaluation. Future studies may consider applying these criteria more flexibly.
CONCLUSION
Overall, content produced by physicians and healthcare institutions scored higher in information accuracy on both platforms, whereas personal accounts achieved higher views and engagement. This underscores the importance of evaluating the source of content, as certain material, presented using terms such as “PCOS coach”, may create the impression of professional medical authority for viewers. For medical content on platforms such as YouTube and Instagram to be beneficial, both users and content creators should pay careful attention to the accuracy and reliability of the information presented.