Big data training course introduction, big data learning course to learn which

The following introduction to the course is mainly for zero basic big data engineer each stage of easy to understand simple introduction, the aspect of a better understanding of the big data learning program. The framework of the course is Kodo Big Data's Zero-Basic Big Data Engineer program.

I. Stage 1: Static Web Page Fundamentals (HTML+CSS)

1. Difficulty: one star

2. Amount of time (technical knowledge points + stage project tasks + comprehensive ability)

3. The main technologies include: html common tags, CSS common layout, style, positioning, etc., static page design and production methods.

4. Described as follows:

From the technical level, the technical code used in this stage is simple, easy to learn and easy to understand. From the late course layer, because we focus on big data, but the early need to exercise programming skills and thinking. After our years of development and teaching project manager analysis, to meet these two points, the current market is the best understanding and mastery of the technology is J2EE, but J2EE and can not be separated from the page technology. So the first phase of our focus is on page technology. Using the market mainstream HTMl + CSS.

Two, the second phase: JavaSE + JavaWeb

1. Difficulty: two stars

2. amount of time (technical knowledge points + stage project tasks + comprehensive ability)

3. The main technologies include: java basic syntax, java object-oriented (class, Objects, encapsulation, inheritance, polymorphism, abstract classes, interfaces, common classes, inner classes, common modifiers, etc.), exceptions, collections, files, IO, MYSQL (basic SQL statement manipulation, multi-table query, subquery, stored procedures, transactions, distributed transactions) JDBC, threads, reflection, socket programming, enumeration, generalization, design patterns

4. Described as follows:

Called Java Fundamentals, from shallow to deep technical points, real business project module analysis, the design of a variety of storage methods

and implementation. This stage is the first four stages of the most important stage, because all the later stages of all the stages to be based on this stage, but also to learn the highest degree of tightness of big data stage. This phase will be the first exposure to a team developing and producing a real project with front and backend (combined application of phase 1 technologies + phase 2 technologies).

Third, the third stage: the front-end framework

1. Difficulty of the program: two stars

2. The amount of time (technical knowledge + stage project tasks + comprehensive ability): 64 hours

3. The main technologies include: Java, Jquery, annotations reflections used together, XML and XML parsing, parsing dom4j, jxab jdk8.0 new features, SVN, Maven, easyui

4. described as follows:

The first two phases of the basis of the static into motion, you can realize that the content of our web page is more rich, of course, if from the level of the market personnel, there are professionals in the front-end designers, we designed this phase of the goal is to front-end technology can be more intuitive to exercise the people's thinking and design capabilities. thinking and design ability. At the same time, we also integrate the advanced features of the second stage into this stage. This will take the learner to the next level.

Fourth: enterprise-level development framework

1. Difficulty of the program: three stars

2. The amount of time (technical knowledge + stage project tasks + comprehensive ability)

3. The main technologies include: Hibernate, Spring, SpringMVC, log4j slf4j integration. myBatis, str4j, myBatis, myBatis, myBatis, myBatis, myBatis, myBatis, myBatis, struts2, Shiro, redis, process engine activity, crawling technology nutch, lucene, webServiceCXF, Tomcat cluster and hot standby, MySQL read-write separation

4. Description is as follows:

If the entire JAVA course is compared to a pastry store, then the first three stages can make a Wu Da Lang bakery (because it is purely handmade - too much trouble), and learning framework is to be able to open a Starbucks (high-tech equipment - save time and effort). From the J2EE development engineer's job requirements, the technology used in this stage is necessary to master, and we teach the course is higher than the market (the market mainstream three major frameworks, we teach the seven frameworks technology), and there are real business projects to drive. Requirements documentation, outline design, detailed design, source code testing, deployment, installation manual will be explained.

V. Phase V: Getting to Know Big Data

1. Difficulty level: three stars

2. Amount of time (technical knowledge + stage project tasks + comprehensive ability)

3. The main technologies include: Big Data (what is Big Data, application scenarios, how to learn the big database, the concept of virtual machines and the installation and so on), the common Linux commands (file management, system administration, system management, and so on). (file management, system management, disk management), Linux Shell programming (SHELL variables, loop control, applications), Getting Started with Hadoop (Hadoop composition, stand-alone environment, directory structure, HDFS interface, MR interface, simple SHELL, java access hadoop), HDFS (Introduction, SHELL, IDEA) development tools to use, fully distributed cluster build), MapReduce applications (intermediate computing process, Java operation MapReduce, program operation, log monitoring), Hadoop advanced applications (introduction to the YARN framework, configuration items and optimization, introduction to the CDH, the environment to build), extensions (optimization of the MAP side, COMBINER use see,TOP K SQOOP export, snapshots of other virtual machines VM, rights management commands, AWK and SED commands)

4. described as follows:

The stage is designed to allow newcomers to have a relatively large concept of big data how relative? After the study of JAVA in the front course to be able to understand how the program runs on a stand-alone computer. Now, what about Big Data? Big data is to run the program in a large-scale cluster of machines to deal with. Big data is of course about processing data, so again, the storage of data changes from single machine storage to massive cluster storage of multiple machines.

(What's a cluster, you ask? Well, I have a big pot of rice, I can finish it alone, but it takes a long time, now I call people. It's called a crowd when you're alone, what about when there are more people? Isn't it called a crowd!)

So big data can be roughly divided into: big data storage and big data processing so in this stage it, our course design big data standard: HADOOP big data operation it is not in how often used WINDOWS 7 or W10 above, but now the most widely used system: LINUX.

Sixth: big data database

Sixth: big data database

Sixth: big data database

Sixth: big data database

Sixth: big data storage and big data processing Data Database

1. degree of difficulty: four stars

2. amount of time (technical knowledge + stage project tasks + comprehensive ability)

3. the main technologies include: Hive Getting Started (Introduction to Hive, Hive use scenarios, environment, architecture, working mechanism), Hive Shell Programming (table building, query statements, partitioning and bucket, index management and views), Hive advanced applications (DISTINCT implementation, groupby, join, sql transformation principles, java programming, configuration and optimization), hbase introductory, Hbase SHELL programming (DDL, DML, Java operations to build a table, query, compression, filters), detailing the Hbase module ( REGION, HREGION SERVER, HMASTER, ZOOKEEPER introduction, ZOOKEEPER configuration, Hbase and Zookeeper integration), HBASE advanced features (read and write processes, data model, schema design read and write hotspots, optimization and configuration)

4. described as follows:

The phase is designed to allow you to understand how Big Data handles large-scale data at the same time. Simplify the time it takes to write a program for咋们, while improving read speed.

How is it simplified? In the first phase, if you need to carry out complex business correlation and data mining, write your own MR program is very cumbersome. So in this phase we introduced HIVE, the data warehouse in big data. Here's the keyword, data warehouse. I know you're going to ask me, so I'll start by saying that a data warehouse, which is used for data mining and analysis, is usually a very large data center, and the storage of this data, usually ORACLE, DB2, and other large databases, which are usually used for real-time online business.

In short, to analyze data based on the data warehouse is relatively slow. But the convenience lies in as long as you are familiar with SQL, learning is relatively simple, and HIVE it is such a tool, based on big data SQL query tool, this stage it also includes HBASE, it for big data inside the database. Narcissa, is not learning a kind of data called HIVE "warehouse" it?HIVE is based on MR, so the query is quite slow, HBASE based on big data can be done in real-time data query. One of the main analysis, the other main query

seven, the seventh stage: real-time data collection

1. Difficulty of the program: four stars

2. The amount of time (technical knowledge + stage project tasks + comprehensive ability)

3. The main technologies include: Flume log collection, KAFKA Getting Started (message queuing, application scenarios, cluster construction), KAFKA detailed (partition, subject, receiver, sender, and ZOOKEEPER integration, Shell development, Shell debugging), KAFKA advanced use (java development, the main configurations, optimization projects), data visualization (graphs and charts introduction, CHARTS tool classification, bar charts and pie charts, 3D charts and maps), STORM introductory (design), STORM (design), STORM (design, application scenarios, processing). Ideas, application scenarios, processing, cluster installation), STROM development (STROM MVN development, writing STORM native programs), STORM advanced (java development, the main configuration, optimization projects), KAFKA asynchronous sending and batch sending timings, KAFKA global message ordering, STORM multi- concurrency optimization

4. described as follows:

The previous stage of the data source is based on the existence of large-scale data sets to do, data processing and analysis of the results of the past is the existence of a certain delay, usually processed data for the previous day's data.

Example scenarios: website anti-theft chain, customer account anomalies, real-time credit, encountered these scenarios based on the previous day's data after analyzing it? Is it too late. So in this phase we introduced real-time data collection and analysis. Mainly includes: FLUME real-time data collection, collection of sources to support a very wide range of KAFKA data data reception and sending, STORM real-time data processing, data processing seconds level

Eight, Phase VIII: SPARK data analysis

1. Difficulty of the program: five stars

2. Amount of lesson time (technical knowledge + stage project tasks +)

3. Comprehensive ability)

3. The main techniques include: SCALA introductory (data types, operators, control statements, basic functions), SCALA advanced (data structures, classes, objects, traits, pattern matching, regular expressions), advanced use of SCALA (higher-order functions, Cory functions, partial functions, tail iteration, self-explanatory higher-order functions, etc.), SPARK introductory (the environment) building, infrastructure, operation mode), Spark dataset and programming model, SPARK SQL, SPARK advanced (DATA FRAME, DATASET, SPARK STREAMING principle, SPARK STREAMING support source, integration of KAFKA and SOCKET, programming model), SPARK advanced programming ( Spark-GraphX, Spark-Mllib Machine Learning), SPARK Advanced Applications (System Architecture, Major Configurations and Performance Optimization, Failure and Stage Recovery), SPARK ML KMEANS Algorithm, SCALA Implicit Transformations Advanced Features

4. The description is as follows:

Also starting with the previous phases, the main ones are Phase I. What about HADOOP is relatively quite slow in analyzing the speed of MR-based large-scale datasets, including machine learning, artificial intelligence and so on. And it is not suitable for iterative computing.SPARK it is in the analysis of MR as a substitute product, how to replace it? First of all, their operation mechanism, HADOOP based on disk storage analysis, and SPARK based on memory analysis. I say this you may not understand, and then a little image, like you want to take the train from Beijing to Shanghai, MR is a green train, and SPARK is a high-speed rail or magnetic levitation. SPARK is based on the SCALA language development, of course, the best support for SCALA, so the course first learn SCALA development language.

In terms of the design of the Kodo Big Data course, the job requirements of the market technology, basically full coverage. And it is not simply to cover the job requirements, but the course itself from front to back is a complete big data project process, one ring after another.

For example, from historical data storage, analysis (HADOOP, HIVE, HBASE), to real-time data storage (FLUME, KAFKA), analysis (STORM, SPARK), these are interdependent on the existence of real projects.