Skip to main content

A machine learning framework for social business intelligence incorporating big data


Rapid increasing of consumer expectation and demands in business environment has elicited an urgent need of competitive strategy to develop a successful product or service. The ability to harness the ever increasing amounts of business-related data enables us to identify optimal marketing strategy [1]. In this context, ‘Big Data’ has significant impact on the Business Intelligence domain. In particular, ‘Big Data’ can be generated from various sources including huge metadata (e.g. trust, security, and privacy) for imbuing the business data with additional semantics, the adoption of social media, the digitalization of business artifacts (e.g. documents, reports, and receipts), and sensor networks (e.g. smart sensors in credit cards). Thus, understanding and analyzing the semantics of the big data is a goal of enterprises today. However, distilling information through ‘Big Data’ from World Wide Web / Internet of Things such as forum and social media remains challenging as these data consisting of images, contents, and videos are unstructured [2]. Also verification of credibility of users and their shared contents is challenging as their real identity cannot be justified [3]. These two challenges are essential to be tackled to explore business or marketing related information from ‘Big Data’.

Challenge 1: Unstructured nature of Big Data

With the rapid increase in unstructured data, the traditional data warehouses are not the only reliable data source in business intelligence applications. According to the International Data Corporation, unstructured data accounts for 80% of the total data in organizations and also the amount of unstructured data is expected to increase by 60% per year in the next few years [2]. However, Business intelligence applications are mostly focused on structured data and data mining based support decision makers mainly relied on day-to-day operational databases and structured external data sources [4]. Consequently, it is essential to review the tools used to collect, transfer, store, and analyse massive amount of unstructured data [5]. As capturing external data for contextualising data analysis operations is a time-consuming and complex task, the developed tools will bring large benefits to current Business Intelligence environments [6].

Challenge 2: Trustworthiness of ‘Big data’

Trust in social media refers to the credibility of users and their posted/shared contents in a particular domain. Understanding social trust is essential in order to improve the analysis process and mining credibility from social media data. Although users are trustworthy in a particular domain, their trustworthiness can be different in other domains [7]. As a vast volume of data is being interchanged within the social media ecosystems, data trustworthiness is a vital issue, especially for personal data [8]. The importance of trustworthiness in the social media context comes from affluent resources for market analysis, listening to the ‘Voice of Customer’, and for sentiment analysis to feed business intelligence applications [9]. Therefore, it is necessary to infer trustworthiness from different sources such as social media, news agencies, and web logs.

To tackle the above two challenges, a machine learning framework with the following two components are proposed:

Tackling Challenge 1: Deep neural network

As volumes and dimensions of “big data” are huge, unstructured data is usually scaled down by extracting a few significant features. Information can be extracted from the features using machine learning such as neural networks [10]. For example, when surveying popular products in the market, a few features are captured from the products and customer opinions can be learned from the captured features [11]. This approach has a limitation in that some important information can be lost as only a few features are used for analysis. Recent techniques based on deep neural networks have begun to achieve state-of-the-art machine learning in many problem domains [12]. The whole set of unstructured data are fully fed in to the deep neural networks and no information will be lost for the analysis. The deep neural networks will be developed on extracting information from unstructured social media data. A trade-off between learning capability and model complexity will be studied [13].

Tackling challenge 2: Fuzzy regression

Evaluation of user trustworthiness is subjective and is therefore inherently fuzzy which is imprecise, uncertain and vague [14]. Business intelligent models based on fuzzy regression have been developed to drive knowledge from fuzzy data [15, 16]. However, the state-of-the-art fuzzy regression are developed for small volume or low dimension data, while the big data streams from social media have not been addressed. A novel fuzzy regression will be integrated with Leveraging method [17] and parallel sampling [18], which samples a small subset of data from the full dataset, and performs intended computations for the full dataset using the small subset as a surrogate. A more reliable business intelligence model can be developed, when user trustworthiness in big data is addressed.

The aim of this research is to develop a framework for social business intelligence. Domain based trust, unstructured data mining, and machine learning machine learning notions will be incorporated to obtain a better understanding of social big data thereby providing new insights that will be benefit the business intelligence domain. The proposed framework will be validated by a case study which investigates consumer scenario in purchasing electronic products. The proposed machine learning framework will be developed based on social media data for purchasing electronic products. ASM Pacific Technology Ltd., a manufacturing company, which supports a recent awarded priming grant [19], will provide the core data in order to validate the effectiveness of the machine learning framework.


This research is significant and will contribute the following practical but currently unresolved issues. Firstly, the proposed framework will include machine learning techniques to infer trustworthy data, semantically enriching textual data, and articulate structure from unstructured social data. This will contribute to the current machine learning techniques for social data analytics. Secondly, MIT with Deloitte collaboration conducted a survey to study the adoption of Social Business (SB) in enterprises and how it can boost business processes [20]. Seventy percent of respondents believe that SB will change how organizations work. However, more than half of those surveyed believed that SB adoption was still in its infancy. Thirdly, Deloitte survey [21] emphasizes the importance of Big Data in the business domain; around 75% of all respondents believe that the adoption of Big Data will benefit their business strategies. Ninety-six percent of the respondents consider data analytics as an added value for their businesses in the coming three years. Lastly, the proposed machine learning framework will provide a comprehensive Social Big Data Analytics. Organizations can utilize the framework and further develop a more sophisticated system to improve their internal business processes.

Anticipated Outcomes

This research will construct the following three main artifacts. Firstly, the development of a novel trustworthiness inference module for social big data will contribute to the design theory of trust inference and evaluation methods. Secondly, the development of machine learning in Social Business Intelligence will enrich textual data semantically and articulate structure from unstructured social big data. This approach can be used as a guideline for other enriching textual data approaches to enhance semantic analysis processes. Lastly, building a machine learning mechanism will contribute Social Business Intelligence framework, tools, and techniques.


  1. Hao, T. and X. Zhao, Financial management based decision making in the big data era. Computer Modelling and New Technologies, 2014. 18(11): p. 908-913.
  2. Gantz, J. and D. Reinsel, The digital universe decade-are you ready. External publication of IDC (Analyse the Future) information and data, 2010: p. 1-16.
  3. Lehikoinen, J. and V. Koistinen, In Big Data We Trust?,. Interactions, 2014. September- October: p. 38- 41.
  4. Wongthongtham, P. and B.A. Salih. Ontology and Trust based Data Warehouse in New Generation of Business Intelligence. in IEEE International Conference on Industrial Informatics. 2015. Cambridge, UK: IEEE.
  5. Slavakis, K., G.B. Giannakis, and G. Mateos, Modeling and optimization for big data analytics, in IEEE Signal Processing Magazine. September 2014.
  6. Manuel Pérez-Martínez, J., et al., Contextualizing data warehouses with documents. Decision Support Systems, 2008. 45(1): p. 77-94.
  7. Abu-Salih, B., et al., An Approach for Time-aware Domain-based Analysis of Users Trustworthiness in Big Social Data. International Journal of Big Data, 2016. 2(1).
  8. Passant, A., et al. Enabling trust and privacy on the social web. in W3C workshop on the future of social networking. 2009.
  9. Berlanga, R., et al., Towards a Semantic Data Infrastructure for Social Business Intelligence. New Trends in Databases and Information Systems, 2014: p. 319-327.
  10. Kwong, C.K., T.C. Wong, and K.Y. Chan, A methodology of generating customer satisfaction models for new product development using a neuro-fuzzy approach. Expert Systems with Applications, 2009. 36(8): p. 11262 – 11270.
  11. Chan, K.Y., et al., An intelligent fuzzy regression approach for affective product design that captures nonlinearity and fuzziness. Journal of Engineering Design, 2011. 22(8): p. 523-542.
  12. Stuhlsatz, A., J. Lippel, and T. Zielke, Feature extraction with deep neural networks by a generalized discriminant analysis. IEEE Transactions on Neural Networks and Learning Systems, 2012. 23(4): p. 596-608.
  13. Bianchini, M. and F. Scarselli, On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Transactions on Neural Networks and Learning Systems, 2014. 25(8): p. 1553-1565.
  14. Zadeh, L.A., The concept of a Linguistic variable and its application to approximate reasoning – I. Information Sciences, 1975. 8: p. 199-249.
  15. Chan, K.Y., et al., A stepwise based fuzzy regression procedure for developing customer preference models in new product development. IEEE Transactions on Fuzzy Systems, 2015. 23(5): p. 1728-1745.
  16. Chan, K.Y. and U. Engelke, Varying Spread Fuzzy Regression for Affective Quality Estimation. IEEE Transactions on Fuzzy Systems, 2016.
  17. Ma, P. and X. Sun, Leveraging for big data regression. Computational Statistics, 2015. 7: p. 70-76.
  18. He, Q., et al., parallel sampling from big data with uncertainty distribution. Fuzzy Sets and Systems,2015. 258: p. 117-133.
  19. Chan, K.Y., Electronic packaging technology for electronic products. 2016, Priming Grant from Academy of Technology and Engineering (Partnered International Firm: ASM Pacific Technology Ltd; it has 700R&D staff;
  20. Kiron, D., et al., Social business: Shifting out of first gear. MIT Sloan Management Review ResearchReport, 2013.
  21. Deloitte, The Analytics Advantage We’re just getting started 2013.