April 25, 2018

Big Data Solutions are complex and relatively new to the most of the devs. Despite that, many enterprises dive deep into the big data software development as it is hard to overestimate its value for the business. Here you will find a comprehensive explanation of the big data applications architecture and cost breakdown for the sample BD development project.

Highly valuable information is hidden in the huge volume of data that more and more businesses are aware of their importance. Systems were conceived and created to save information so that we can analyze the data and make decisions based on them. Business Intelligence and Data warehouse have proved the importance of data analysis in decision making based on structured data. The traditional databases were conceived to store structured data, queried using SQL languages. Data warehousing and Business Intelligence were the first pioneers for data exploitation and data mining. Data consulting services have provided tools to improve decision making.

In the meantime, cost of server hosting has decreased significantly, more data can be saved. Data become more and more complex as they grew exponentially rather than linearly. Today, structured data represent only 15% of data produced every day, the remaining 85% is unstructured like music, movie etc. A study has found that more data are produced in two days than since the dawn of humanity till 2003. Since SQL can’t query unstructured data, another way to query these data were elaborated and NoSQL database has emerged.

List of The Contents

FROM PROCESSING STRUCTURED DATA TO PROCESSING UNSTRUCTURED DATA

Even if a very large data warehouse can be conceived, they are limited to structured and semi-structured data. The BI consists of a set of tools and techniques for collecting, cleaning and enriching structured or semi-structured data to store in different forms of SQL-based, multidimensional databases. The data will, therefore, be managed in standardized formats to facilitate access to information and processing speeds. Business Intelligence aims to produce KPI to analyze past and present data in order to predict the future using extrapolation and act based on these forecasts.

Structured and unstructured big data in the big data software solutions illustration image

To handle this important amount of data and especially this huge amount of unstructured data, big data was born. It has become very popular, and in 2018 it is more relevant than ever to tame it. Traditional data consulting services have evolved and have integrated big data with their services. They have become big data consulting services. Let’s dig a little further to better understand what big data is, why has it become the tendency, what is big data solution, and especially what could be the cost of implementing big data solutions.

WHERE DO WE STAND IN TERMS OF BIG DATA IN 2018?

Technology has evolved so fast, now we are talking about petabytes to evaluate stored data when a few decades ago, megabyte seems to be a big revolution. Today thanks to IoT (Internet of Things) there are even more and more interconnected objects, the estimation is over 10 billions of interconnected objects to date.

A study published by McKinsey in 2014 reported that 131% is the percentage of ROI that companies obtain, compared to their competitors, when their managers and executives have access to an adequate analysis of the data from big data. This difference is likely to be more important today and it is increasing every day as we are getting more familiar and well trained on big data and are more experienced in implementing big data solutions.

Data find their sources from diverse platforms like social media producing huge quantities of data: individual information, link sharing, blogging, etc; applications: geo-localization, application logging, etc.

Several applications of big data are developed and implemented and AI has become an ally of big data.

WHERE AND HOW BIG DATA SOLUTIONS ARE USED?

The data recorded relate to several domains, all the businesses can find the information they want. Currently, the field of use of big data solutions is quite large. These fields include:

  • Telecom is known as big data solutions for telecom
  • Media and Entertainment
  • Finance
  • Government
  • Retailing
  • Health
  • Energy

 

And much other business and industry domains.

where do we use big data solutions in real life scheme picture

The big data is used by companies and businesses to:

  • Help in decision making: use of real-time and accurate data for an informed decision, decisions are based on facts not on guts
  • Help with the customer segmentation
  • Offer a fully-customized service to customers
  • Manage stocks and predict needs in products, predict ideal product to be stored
  • Help in understanding customers’ needs, behaviors and habits
  • Help to better manage customer relationship, offering more customer-centric products
  • Help in fraud prediction
  • Help improving business process
  • Help identify and remove performance bottlenecks proactively, optimize resource utilization, and reduce costs.
  • Reduce billing errors, verify eligibility, detect and prevent fraudulent claims and speed-up revenue recovery.

 

Big data can be used in several different ways.

WHAT ABOUT REAL-WORLD BIG DATA SOLUTIONS?

Big data finds its strength in analytics and offers a very powerful tool for decision making. Now that we understand better what is big data solution, we may wonder behind these theories, how big data is really operating in the real world and what are these real-world uses.

Media (videos, audios, etc.) today represent an important part of the data stored to date. It is not surprising that one of the uses in the real world is in the marketing of video content. Netflix is a good example of big data solutions’ use. It has to this day millions of customers.

Customer management: when offering content to its customers it records their behaviors in order to offer them personalized programs and contents:

  • What they watch
  • The time at which they watched the videos
  • How the video was seen (in a continuous or discontinuous way, the complete video has been watched or has it been watched partially, was it looped and how many times was it seen etc.)
  • How they interact with the video
  • What are their appreciations if any
  • How did they react at the end of the video

 

Telecom is one of the fields that mostly exploit big data today, big data solutions for telecom are essential:

  • customer management (how to acquire and retain customers): behavioral analysis using data science, identifying proactively potential “churners” using a predictive model, targeted marketing (tools to perform customer segmentation profiling and offering areas of improvement), personalized plans
  • operation management: call routing optimization (using IVR or direct routing to customer services, agent-specific routing depending on the end user to allow better user experience etc.), capturing all calls and events (CDR) in real time, capacity optimization by forecasting demands, marketing based on geo-localisation (data centers are implemented in regional basis allowing a real-time analysis and approach)
  • network and infrastructure management: proactive maintenance, failure prediction by detecting an anomaly
  • security management: detect and prevent fraud, securing payment, protecting data

 

Adopting data-driven approach increases customer interaction and improves customer satisfaction.

WHICH TECHNOLOGIES ARE NOW USED FOR THE BIG DATA SOLUTIONS?

In big data concept, we are collecting huge and complex data sets collected, processing them requires specific tools and applications. Part of the complexity lies in the fact that the data volume is very large and is mostly unstructured.

The first challenge is to find the best way to address data restructuring. The required big data software solutions should support this data restructuring. The second challenge lies in data storage, choosing the fittest databases to optimize the storage for future analysis.

What is currently the trend in terms of big data software solutions?

  • Apache Hadoop: Apache Hadoop is one reference when it comes to big data frameworks, it brings solution where traditional data exploitation tools are very limited in tackling unstructured data. Hadoop is a free and open source framework used to store data and process data. Thanks to its data replication, Hadoop provides high availability. Hadoop implements MapReduce framework, which allows large data processing by distributing the computation on several nodes. Hadoop is a very mature big data solution, it has been used by several companies. Hadoop allows big data agile development.
  • AWS is service provided by Amazon, used to store large amounts of data accessible online for other hosting applications. AWS has many advantages such as security, reliability, flexibility, and scalability. AWS is now very popular among companies, making the skills in this technology very on-demand.
  • Cloudera offers a centralized data storage making data analysis easier, software needed is delivered in single package reducing considerably installation time.
  • NoSQL database: Since we are dealing with any types of data, the database should adapt to the absence of structure. The specificity of this kind of database is that no specific schema is required to store data in the database. The most popular NoSQL databases used are: Apache Cassandra, Oracle NoSQL, HBase, MongoDB, Amazon SimpleDB
  • Tableau is an analytics tool transforming data into insights.
  • Talend Big data Integration: it is a big data software solution including graphical tools and components that generate code allowing big data software development team to work with other big data software solutions and technologies like Apache Hadoop, and NoSQL databases etc.

STRUCTURE AND ARCHITECTURE OF BIG DATA APPLICATIONS

We have seen there are several big data solutions, big data solutions for telecom is among them. It is hard to generalize the architecture of the Big data applications as well as how the data itself is structured there. A very simple example of the BD solutions is given on the scheme below for your general understanding how different modules are connected and integrated with each other to do what Big data is supposed to do – provide users with reports and the analytics based on the huge massive of the information from various sources received by this kind of the software.

scheme of the big data solution artitecture image example

In this section, let’s take a simpler example of applications of big data in car rental services. When deciding to scale a car rental service, there are many challenges that we may face.

  • Reliability of partners (car owners, other businesses)
  • The success of the scale tentative
  • Control over users
  • Competitivity
  • Reliability of the renter
  • Personalized offers for customers based on their habits, their budget etc.

 

Taking profit of big data solutions, these challenges can be tackled intelligently with a real-time solution:

  • Management of historical data regardless of the provenance
  • Time processing allowing a real-time analysis and offering a real-time solution
  • Solutions that can scale in a straightforward way and very fast
  • A solution providing fast delivery and almost immediate input/output
  • Solution able to handle huge amounts of data, good performance
  • Solution focused on business
  • Cost savings and even a better ROI
  • IT resources leverage
  • Enterprise-ready solution
  • Complete production environment
  • Rate management
  • Preventing loss
  • Evaluate customers satisfaction
  • Accident management

 

The solution includes the following components and layers:

  • Sources of big data: the tasks in this component consist of understanding the data provenance, at this point data scientist will clarify their needs in terms of data in order to identify the data required for their analysis. The structures and formats of data needed may vary considerably from analysis to another and even in one axis of analysis, there may be a big variety of structures and data formats. It is important to understand the speed at which data arrive and evaluate its volume to predict the amount of data analyzed for each analysis. In this layer, we should be able to understand the restrictions regarding the data analysts may need.
  • Data acquisition layer: at this point, the process consists of acquiring data from previously identified data sources. Transformation occurs here, ETL-like transformation to extract data from their sources, transform them to fit the format needed for data analysis and then load in Hadoop Distributed File System store or a traditional RDBMS. After this first processing, some further processing may also occur.
  • Analysis layer: Once the data are identified and processed, analysis can begin. It may happen that access is made directly on the data source depending on the analysis that needs to be done. At this point, support for decision making is developed, we are producing analytics using the best tools and algorithms to do the analytics
  • Consumption layer: In this layer, the analysis provided is consumed, analysis can be displayed in applications or entered into another business processes.

 

HOW MUCH DOES IT COST TO BUILD A BIG DATA SOFTWARE SOLUTION?

Big data applications are slightly different from traditional applications. They are data-driven, taking advantages of the huge amount of data and then working on getting the valuable information underneath. The development process is also quite different but there is at least one common component for a successful project for both traditional and big data project. This common denominator is the methodology advocated for its implementation, Agile methodology is more than ever relevant. Big data agile development is a must in order to have a successful implementation.

A good strategy should be taken into consideration while building big data solutions. Not every strategy will work as we are addressing a very specific area. Big data strategy consulting can be very helpful for companies so that they can avoid wrong investments in solutions and infrastructure that do not really meet their needs.

In the process of building a big data application, the following approaches are used:

  • Development of analytic strategies to identify hidden advantage of a large amount of data for more competitiveness
  • Development and deployment of analytics tools: to identify possible bottlenecks, performance or production issues
  • Development and deployment of applications that support the decision making

 

It can involve much more actors than those listed above due to the inner complexity that lies in big data.

Big data consulting services are not as numerous as traditional software development consulting services. Developers that worked with big data skills are currently really scarce, not every developer has the necessary qualification for big data job. Many of them aren’t yet familiar with the complexities that lie in deploying, managing and tuning of the applications. Fortunately, the infrastructure is getting simpler so that the focus can be on the building of applications that are of interest to companies. big data consulting offers skills and allow companies to focus on their core business in a way that they can benefit from big data without having to engage their human resources in setting up big data solution.

Big data applications can be built on the top of existing platforms and systems. building big data applications follows the following processes:

  • Design and infrastructure: designing the architecture of the big data application to be implemented. The design is based on several assumptions.
  • Hardware and network configuration, and implementation: Once the application to be built is designed, hardware and software configuration are done.
  • System development, integration, training: big data experts who are going to build the application are trained so that they have a better understanding of the company’s needs and goals to be able to implement the best solution. Application development starts following big data agile development methodology. Starting from an assumption, line of codes are written, every data can lead to new assumptions that lead to new lines written. It is really important to work in a very small increment developing one single assumption at a time, the assumption is analyzed, developed into actionable business information. Being Agile is really important because learning from feedback is one important aspect, concept and advantage of big data. Feedbacks help evaluate if direction taken was the right one or if we should adjust our direction during the following iteration.
  • Test of integrated system and launch in the production environment: the output developed from assumption is tested and if the test is positive it will be pushed into the production environment.
  • Improve infrastructure: from iteration made and integration to live system, we can work on improving infrastructure so that we can have a more reliable infrastructure

 

The following estimation doesn’t take into account the infrastructure and hardware needed to implement big data. We should also remember that as data evolve the costs may evolve as well.

We are going to consider in the following estimation the cost of building big data application which takes into consideration the following endeavor:

  • Improvement of partners’ reliability: acting on data about car owners, businesses’ partners
  • The success of the scale tentative: how to take profit of data to increase the scope of car rental service’s business: address a wider customer.
  • Competitivity: analyze data to be more competitive, identify strengths and weaknesses.
  • Renters’ reliability: define and measure reliability for safer operations. Identify the KPIs that define reliability of renters retrieve them from data and analyze them to identify the cause of potential failure etc.
  • Personalized offers for customers based on their habits, their budget etc.: understand how to tailor services that best meet the needs and budget of users.

 

Six-month iterations will be done in order to improve the above actions. Let’s refer to this first iteration as the big data car rental iteration. The following estimation takes into consideration the afore-mentioned goals done in a six months scope.

Since this is the first set-up, the delivery time for each phase and type of actions is as follows:

PhaseTime estimation
Architecture design10%
Hardware and software configuration10%
System development and integration50%
Testing and launch15%
Maintenance15%

From these processes, we can identify the following actors involved in the process of building a big data application:

  • Computer analysts
  • Business analysts
  • Developers engineers
  • Software developers
  • Database analysts
  • Data administrators
  • Mathematicians
  • Statisticians
  • Data analysts
  • System administrators
  • ETL developers

 

The big data software development team can be composed in several ways with the actors mentioned above.

PhaseDeveloper engineers/
software engineersDatabase analysts/
DBAData analysts Big data PM
Sys Admin
Total40%10%15%20%15%
Architecture design3%2%5%
Hardware and software configuration2%2%5%
System development and integration20%2%10%10%
Testing and launch10%1%3%3%
Maintenance10%2%2%3%5%

The development of a first iteration needed for actions previously will cost:

  • 4 System administrators working full time 1 month charging $70/hour
  • 4 developer engineers working full time for 6 months charging $80/hour
  • 4 database analysts and/or DBA working full time for 1 month charging $90/hour
  • 2 data analysts working full time for 6 months charging $65/hour
  • 1 big data project manager working full time for 6 months charging $100/hour
  • 1 QA tester working full time for 6 months charging $70/hour

 

The big data module for the car rental service will cost approximately $698 000.

THE COST BREAKDOWN TABLE WITH THE COMPARISON OF THE PROJECT BUDGET IN THREE DIFFERENT LOCATIONS

In the table given below, we will try to show you the overall cost and hourly rate of hiring each specialist from the third party like software development company with the Big data expertise. Also, we compare how much the Big Data software solution development would cost in the US, Western Eastern, and in the location for offshore outsourcing, Eastern Europe in our case. The average cost of the services is calculated based on the data from the open sources such as Glassdoor and the rate cards provided by the several companies.

SpecialistNumber of Devs.Months of EngagementUS Hourly RateWestern Europe
Hourly Rate
Eastern Europe
Total Cost$697 600$508 800$252 160
System administrator41705025
Software engineer46806030
Database analyst41907033
Data analyst26654527
Project manager161006530
QA engineer16705520

If we consider continuing in pursuing big data exploration with an additional iteration, we repeat from system development to infrastructure improvement, there is no cost or a very low cost for infrastructure design and hardware configuration once it is first set.

There is no limit in big data solution building, it is an ongoing process where every output leads to another axis of improvement.

CONCLUSION

Today we are facing rapid rise and evolution of technologies, companies may sometimes feel lost in front of the vast and unknown possibilities. Many big data consulting solutions are available to them, to help them analyze and choose the right tools and big data tools that best fit their business. These solutions are tailored so that companies can adapt their services to the very changing demands of often very versatile customers.

Today, IoT makes information more available, information that took companies years to acquire a few years ago which unleash more and more opportunities. Working with the right approach and adopting the right solution is a key to business success.

Due to its complexity, we advise starting small while implementing big data for the first time and then do the numbers of iteration needed in order to have the outcome expected. Slow implementation enables actors involved in the process to master technologies and business, to be more comfortable with the tools they are using. It is important to surround yourself with experts for the implementation of the big data applications development that would best fit the business.

What is your Big data strategy for the upcoming years, do you plan to use BD and the advantages it offers you and your business? Share your thought and opinion in the comment section below. Need a quote for the development cost for your big data solution or help with building it? Contact us via the form on the website or email and we will help you with that.