How we have identified and analysed trends in the Knowledge Base
The trends identification method is based on a mixed-method-design consisting of qualitative and quantitative research approaches. The following paragraph explaines how we applied data mining techniques in the trend identification process. The second paragrah showcasts the way we have descibed the trend tendency in the Knowledge Base, based on specific Web of Science search results.
Trend identification on Twitter and Web of Science through data mining
In order to gain a broader understanding of relevant trends in the public sector we used data mining techniques to identify trend frequencies in the Web of Science data base and on Twitter.
The database Web of Science and the social network Twitter are due to the large amount of data ideal for the conduction of efficient data mining.
With 1.4 billion indexed references and over 20,000 Journals Web of Science is one of the main multidisciplinary academic literature collections. Thus, Web of Science has the big advantage of allowing the exportation of the results for further analysis. Twitter is with 330 million monthly active users one of the most used social network sites in the world. The site is especially important for the present case since amongst the users are many professionals from politics, the public as well as the private sector.
In order to identify relevant trends a keyword research has been performed on both platforms, using the following queries: (“policy OR policies”) AND (“big data” OR “open data” OR “data analytics”) AND (“public”). We obtained over 2700 related tweets on twitter and round about 4600 related records on WOS which has been reduced to their roots in a data mining process using the statistical programming language R. From the data mining process we gained ranked frequencies of two-word-combinations.
Noticeable is the fact that the health domain turns out to be most affected by data analytics innovations since it is the first two-word-combination in the ranking without terms that have been used in the search query.
The Web of Science and the Twitter query revealed that the term health and the term Social Media are very high rated. However, most of the gained terms are multifarious terms with only one or two mentions.
In the Twitter-ranking also the term “change_public” revealed on a high rank. It points on the transformation process in terms of data analytic strategies that is going on in the public sector.
Noteworthy is furthermore, that critical amount of tweets on Twitter is related to the term “dehumanizing”, which indicates a controversial discussion regarding the use of data analytics technologies in the public sector.
Trend Tendencies in the BPC Knowledge Base
We have identified trends also through qualitative expert interviews and desk research activities. To derive trend tendencies we made respective trend queries in the Web of Science database. The search results have been exported to create trend visualisations in the open source data analytics software “R Studio”. The figures in the trend items in the Knowledge Base are presenting relative frequencies in a time span between 2009 and 2018.
To provide a specific view on public sector trends, we refined the results in a next step by applying the public sector related WOS filter categories. All refined trend queries are referred as “public sector relevant”. To bring in the big data perspective we combined the general trend query with the term “Big Data”. Therefore, we applied the following trend query (“Trend” AND “Big Data”) in order to obtain a big data relevance of the respective trends.
As a result, trends are represented in the Knowledge Base with focus on the general trend tendency (solid line), the on the public sector relevance (dotted line) and the big data relevance (broken line) as shown in the following figure.
Distribution on categories
The Web of Science (WOS) database provides a huge catalogue of filter categories. To depict the distribution of the respective trend on WOS categories we clustered the categories and obtained a manageable category catalogue consisting of 36 categories.