Logical Layers for Big Data Solutions
Logical layers provide a way to organize your components. These layers provide a way to organize components that perform specific functions. These layers are only logical layers; this does not mean that the functions supporting each layer run on separate machines or separate processes. A Big Data solution typically consists of the following logical layers:
1. Big Data Sources
2. Data Alteration (massaging) and Storage Layers
3. Analytics Layers
4. Usage Layers
The Internet is an amazingly large network, and Big Data development and software customization is a model, here to provide the most detailed offer, if you really want to do, you can come here, the beginning of this cell phone number is one eight seven in the middle of the three children zero last is one four two five zero, according to the order of the combination can be found, I would like to say that, unless you want to do or understand this aspect of the content, if it is just a hilarious words, don't come
Big Data Source: Consider data from all sources, all available for analysis. Ask the data scientists in your organization to articulate the data needed to perform the type of analysis you need. Data comes in different formats and origins:
Format- Structured, semi-structured, or unstructured.
Velocity and volume of data- The speed at which data arrives and the rate at which it is transmitted varies depending on the data source.
Collection point- The location where the data is collected, either directly or through a data provisioning program, in real time or in batch mode. Data may come from a primary source, such as weather conditions, or from a secondary source, such as a media-sponsored weather channel.
Location of data sources- Data sources may be located within or outside the organization. Identify data to which you have limited access, as access to data affects the scope of data available for analysis.
Data alteration and storage layer: This layer is responsible for taking the data from the data source and, if necessary, converting it into a format suitable for the way the data is analyzed. For example, a graph may need to be transformed before it can be stored in a Hadoop Distributed File System (HDFS) store or Relational Database Management System (RDBMS) repository for further processing. Compliance systems and governance policies require appropriate storage for different data types.
Analytics Tier: The analytics tier reads data alterations and data collated (digested) by the storage tier. In some cases, the analytics layer accesses data directly from the data source. Designing the analytics layer requires careful forethought and planning. Decisions must be made about how to manage the following tasks:
Generate the desired analytics
Get insights from the data
Find the desired entities
Locate the data sources that can provide data about those entities
Understand what algorithms and tools are needed to perform the analytics.
User Layer: this layer uses the output provided by the analysis layer. Users may be visualization applications, humans, business processes, or services. Visualizing the results of the analytics layer can be challenging. Sometimes it helps to see how competitors in similar markets are doing.
Each layer contains multiple component types, which are described below.
Big Data Sources
This layer contains all the necessary data sources that provide the insights needed to solve business problems. The data is structured, semi-structured and unstructured and comes from many sources:
1. Enterprise legacy systems-? These systems are enterprise applications that perform the analyses and gain the insights needed by the business:
Customer Relationship Management Systems
Billing Operations
Mainframe Applications
Enterprise Resource Planning
Web Application Development
Web applications and other sources of data augment the data that the enterprise has Web applications and other data sources expand the data that organizations have. These applications can expose data using customized protocols and mechanisms.
2. Data Management System (DMS)- A DMS stores logical data, processes, policies, and a variety of other types of documents:
Microsoft Excel spreadsheets
Microsoft Word documents
These documents can be used to store data in a variety of ways. p>
These documents can be converted into structured data that can be used for analysis. Document data can be exposed as a domain entity, or the data alteration and storage layer can convert it to a domain entity.
3. Data stores- Data stores contain enterprise data warehouses, operational databases, and transactional databases. This data is usually structured and can be used directly or easily transformed to meet requirements. This data is not necessarily stored in a distributed file system and is dependent on the context in which it is stored.
4. Smart Devices- Smart devices are capable of capturing, processing, and transmitting information using the widest range of protocols and formats. Examples of this include smart phones, meters, and medical devices. These devices can be used to perform various types of analysis. The vast majority of smart devices perform real-time analytics, but information coming from smart devices can also be analyzed in bulk.
5. Aggregated Data Providers- These providers own or acquire data and make it publicly available through specific filters in complex formats and at desired frequencies. Massive amounts of data are generated every day, in different formats, at different rates, and through a variety of data provisioning programs, sensors, and incumbent enterprises.
Other Data Sources- There is a lot of data coming from automated sources:
Geographic information:
Maps
Area details
Location details
Mine details
Human-generated content:
Social media
Blogs
Online information
Sensor data:
Environmental: Weather, rainfall, humidity, light
Electrical: Electricity, current, energy potential, and more!
Navigation devices
Ionizing radiation, subatomic particles, etc.
Proximity, presence, etc.
Position, angle, displacement, distance, speed, acceleration
Sound, acoustic vibration, etc.
Automotive, transportation, etc.
Heat, heat, temperature
Optics, light, imaging, visibility
Chemical
Pressure
Flow, Fluid, Velocity
Force, Density Levels, etc.
Other Data from Sensor Vendors
Data Alteration and Storage Layer
Because incoming data may have different characteristics, the components in the Data Alteration and Storage component in the layer must be able to read data in a variety of frequencies, formats, sizes, and over a variety of communication channels:
Data Fetching- Fetches data from a variety of data sources and either sends it to the data wrangling component or stores it in a specified location. This component must be smart enough to choose whether and where to store incoming data. It must be able to determine whether the data should be altered before storage or whether the data can be sent directly to the business analytics layer.
Data Collation- Responsible for modifying the data into the desired format for analytical purposes. This component can have simple transformation logic or complex statistical algorithms to transform the source data. The analysis engine will determine the specific data format required. The main challenge is to accommodate unstructured data formats such as images, audio, video and other binary formats.
Distributed Data Store- is responsible for storing data from data sources. Typically, several data storage options are provided in this tier, such as Distributed File Storage (DFS), Cloud, Structured Data Sources, NoSQL, and others.
Analytics Layer
This is the layer that extracts business insights from the data:
Analytics Layer Entity Recognition - Responsible for identifying and populating contextual entities. This is a complex task that requires efficient and high-performance processes. The data wrangling component should complement this entity recognition component by modifying the data into the desired format. The analytics engine will need the contextual entities to perform the analysis.
Analytics Engine- Uses other components (specifically, entity identification, model management, and analytics algorithms) to process and perform analytics. Analysis engines can have a variety of different workflows, algorithms, and tools that support parallel processing.
Model Management - Responsible for maintaining various statistical models, validating and testing these models, and improving accuracy through continuous training of the models. The Model Management component then promotes these models, which can be used by the Entity Recognition or Analytics Engine components.
Usage Layer
This layer uses business insights obtained from analytic applications. The results of the analytics are used by individual users within the organization and by entities external to the organization, such as customers, suppliers, partners, and providers. This insight can be used to target customers with product marketing messages. For example, with insights from analytics, a company can use customer preference data and location awareness to provide personalized marketing messages to customers as they pass through aisles or stores.
The insights can be used to detect fraud, intercepting transactions in real time and correlating them with views built using data already stored in the organization. When fraudulent transactions occur, customers can be informed of the potential for fraud so that corrective actions can be taken in a timely manner.
In addition, business processes can be triggered based on analytics completed at the data alteration layer. Automated steps can be initiated - for example, a new order needs to be created if a customer accepts a marketing message that can be triggered automatically, and a block on credit card usage can be triggered if the customer reports fraud.
The output of the analytics can also be used by the recommendation engine, which matches customers with their favorite products. The recommendation engine analyzes the available information and provides personalized and real-time recommendations.
The usage layer also provides internal users with the ability to understand, find, and navigate chained information both inside and outside the organization. For internal users, the ability to build reports and dashboards for business users enables stakeholders to make savvy decisions and design appropriate strategies. To improve operational effectiveness, real-time business alerts can be generated from the data and operational KPIs can be monitored:
Transaction Interceptor- This component intercepts high volume transactions in real-time, converting them into a real-time format that can be easily understood by the analytics layer in order to execute on incoming data for Real-time analytics. The Transaction Interceptor should be able to integrate and process data from a variety of sources such as sensors, smart meters, microphones, cameras, GPS devices, ATMs and image scanners. Various types of adapters and APIs can be used to connect to data sources. Various gas pedals can also be used to simplify development, such as real-time optimization and streaming analytics, video analytics, gas pedals for banking, insurance, retail, telecom, and public **** transportation, social media analytics, and sentiment analysis.
Business Process Management Processes- Insights from the analytics layer are available to Business Process Execution Language (BPEL) processes, APIs, or other business processes to further capture business value by automating the functionality of upstream and downstream IT applications, people, and processes.
Real-time monitoring- Data derived from analytics can be used to generate real-time alerts. Alerts can be sent to interested users and devices such as smart phones and tablets. Data insights generated from the analytics component can be used to define and monitor key performance indicators to determine operational effectiveness. Real-time data can be exposed to business users in the form of dashboards from a variety of sources to monitor the health of the system or to measure the effectiveness of marketing campaigns.
Reporting Engine- The ability to generate reports similar to traditional BI reports is critical. Users can create ad hoc reports, scheduled reports, or self-service queries and analytics based on insights gained from the analytics layer.
Recommendation Engine- Based on analytics from the analytics layer, the recommendation engine delivers real-time, relevant, and personalized recommendations to shoppers, increasing conversions and average value per order in e-commerce transactions. The engine processes available information in real-time and responds dynamically to each user, based on the user's real-time activity, registered customer information stored in the CRM system, and the social profiles of non-registered customers.
Visualization and Discovery- Data can be navigated across a wide variety of federated data sources both inside and outside the enterprise. Data may have different content and formats, and all data (structured, semi-structured and unstructured) can be combined for visualization and made available to users. This capability enables organizations to combine their traditional enterprise content, contained in enterprise content management systems and data warehouses, with new social content, such as tweets and blog posts, into a single user interface.
Vertical Layer
Aspects of all components affecting the logical layer (the big data source, data alteration and storage, analytics, and usage layers) are contained in the vertical layer:
Information Integration
Big Data Governance
System Management
Quality of Service
Information Integration
Big Data applications fetch data from a variety of data origins, providers, and data sources and store it in data storage systems such as HDFS, NoSQL, and MongoDB. This vertical layer is available to various components (e.g., data fetching, data organizing, model management, and transaction interceptors) responsible for connecting to various data sources. Integrating information from data sources that will have different characteristics (e.g., protocols and connectivity) requires high-quality connectors and adapters. Gas pedals can be used to connect to most known and widely used sources. These gas pedals include social media adapters and weather data adapters. Various components can also use this layer to store information in Big Data stores and retrieve information from Big Data stores in order to process that information. Most big data stores provide services and APIs to store and retrieve this information.
Big Data Governance
Data governance involves defining guidelines to help organizations make the right decisions about data. Big data governance helps in dealing with the complexity, volume and variety of data coming into the organization or from external sources. Strong guidelines and processes are needed to monitor, structure, store and protect data as it comes into the enterprise for processing, storage, analysis and purging or archiving.
Big data governance encompasses other factors in addition to the normal data governance considerations:
1. Managing large amounts of data in a variety of formats.
2. Ongoing training and management of the statistical models necessary to preprocess unstructured data and analytics. Remember, setup is an important step when dealing with unstructured data.
3. Set up policies and compliance systems for external data regarding its retention and use.
4. Define data archiving and purging policies.
5.Create policies on how to replicate data across various systems.
6. Set data encryption policies.
Quality of Service Layer
This layer intricately defines data quality, policies around privacy and security, data frequency, size of data per crawl, and data filters:
Data Quality
1. Completely identifies all necessary data elements
2. provide a timeline of the data with an acceptable level of freshness
3. validate the accuracy of the data according to the data accuracy rules
4. use a common language (data tuples satisfy the need to be expressed in a simple business language)
5. validate the consistency of data from multiple systems according to the data consistency rules
6. p>6. Technical compliance based on meeting data specifications and information architecture guidelines
Strategies around privacy and security
Strategies are needed to protect sensitive data. Data obtained from external organizations and provider programs may contain sensitive data (such as Facebook users' contact information or product pricing information). Data can originate from different regions and countries, but must be processed accordingly. Decisions must be made about data masking and the storage of such data. Consider the following data access strategies:
A, Data Availability
B, Data Criticality
C, Data Authenticity
D, Data *** Enjoyment and Distribution
E, Data Storage and Retention, including issues such as whether external data can be stored. If data can be stored, how long can the data be stored? What types of data can be stored?
F, Data provisioning process constraints (policy, technical, and regional)
G, Social media terms of use
Frequency of data
What is the frequency with which fresh data is provided? Is it on-demand, continuous or offline?
Size of data crawled
This attribute helps define the data that can be crawled and the size of the data that can be used after each crawl.
Filter
The standard filter removes unwanted data and interfering data from the data, leaving only the data needed for analysis.
Systems management
Systems management is critical to big data because it involves many systems across enterprise clusters and boundaries. Monitoring the health of the entire big data ecosystem includes:
A. Managing system logs, virtual machines, applications, and other devices
B. Correlating a variety of logs to help investigate and monitor specific scenarios
C. Monitoring real-time alerts and notifications
D. Using real-time dashboards displaying a wide range of parameters
E. Citing reports and detailed analyses about the reports and detailed analysis of the system
F, Setting and adhering to service level agreements
G, Managing storage and capacity
G, Archiving and managing archive retrieval
I, Performing system recovery, cluster management, and network management
J, Policy management
Concluding remarks< /p>
For developers, layers provide a way to categorize the functions that must be performed by a Big Data solution, suggesting to the organization the code required to necessarily perform those functions. However, for business users who want to gain insights from Big Data, it is often helpful to consider Big Data requirements and scope. Atomic patterns address the mechanisms for accessing, processing, storing, and using Big Data, providing a way for business users to address requirements and scope. The next article describes atomic patterns for this purpose.