Practical Considerations for Fine-Tuning BERT-Based Language Models in Health Research: Lessons from Classifying Anti-Vaccine Posts on Social Media
For those transitioning from traditional machine learning to LLMs, it is crucial to understand how choices in data collection and training-including keyword selection, fine-tuning data quantity, stopping conditions, collapsing categories, and labeling conflicts-can significantly impact model performance, especially with smaller datasets.