Online searches have been used as a tool for close to real-time study of different health-related behaviours, including identifying disease outbreaks. However, many models have been criticized for ignoring whether such activity is related to an actual disease. Here we propose a methodology to disentangle online search behaviours that are driven by actual disease from others that can be caused by other motives, including media-driven curiosity or information seeking. In particular, we are taking advantage of the current and past pandemics to identify different search-patterns. It is known that information seeking becomes less common as pandemics progress, so we argue that selecting search terms during the worst possible moment, with highest media hype, can help to understand which searches are more associated with the disease and which are prompted by media exposure. We use Google Trends and apply this methodology to two pandemic respiratory infectious diseases: 2009-H1N1 (in the United States) and COVID-19 (in Spain). We found that search-terms cluster into three groups, one more associated with cases (C1), another highly correlated with media reports (C2) and a noisier third (C3, not shown). We observed the same pattern for both diseases, showing that it is possible to identify differences in search-patterns online, and that these are consistent in time and across countries. We tested whether the differently clustered search-terms could now-cast seasonal influenza and the Spanish COVID19 second wave. By using both Random Forest and Linear Regression, we show that, contrary to common believe, less data can be better and that previous clustering and manual selection can help model performance. Our system is flexible and general enough to be applied to other diseases, or different phases (ex. seasonal events), and human activities that spread on networks.