70-475 Study Guide: Designing and Implementing Big Data Analytics Solutions

The 70-475 Designing and Implementing Cloud Data Platform Solutions exam is one of the two Azure exams required to get your Microsoft Certified Solutions Associate (MCSA) Cloud Platform. This 70-475 study guide was created to help you find materials to study, and ace the exam. I will share both free and paid options, whether books, video training or simply links to articles and blog posts. I will not share any dumps as those are against the Microsoft Terms of Service, and by using dumps, we decrease the value of our certifications.

Certification Path

Exam Name

Link to Exam on Microsoft Learning

Study Guide

Choose two exams from those below
70-532 Developing Microsoft Azure Solutions

Developing Microsoft Azure Solutions

70-532 Study Guide: Developing Microsoft Azure Solutions

70-533 Implementing Microsoft Azure Infrastructure Solutions

Implementing Microsoft Azure Infrastructure Solutions

70-533 Study Guide: Implementing Microsoft Azure Infrastructure Solutions

70-534 Architecting Microsoft Azure Solutions

Architecting Microsoft Azure Solutions

70-534 Study Guide: Architecting Microsoft Azure Solutions

70-473 Designing and Implementing Cloud Data Platform Solutions

Designing and Implementing Cloud Data Platform Solutions

70-473 Study Guide: Designing and Implementing Cloud Data Platform Solutions

70-475 Designing and Implementing Big Data Analytics Solutions

Designing and Implementing Big Data Analytics Solutions

70-475 Study Guide: Designing and Implementing Big Data Analytics Solutions

Books

Mastering Azure Analytics: Architecting in the Cloud with Azure Data Lake, HDInsight, and Spark
Microsoft Azure has over 20 platform-as-a-service (PaaS) offerings that can act in support of a big data analytics solution. So which one is right for your project? This practical book helps you understand the breadth of Azure services by organizing them into a reference framework you can use when crafting your own big data analytics solution.Links:

Mastering Azure Analytics: Architecting in the Cloud with Azure Data Lake, HDInsight, and Spark
Microsoft Azure has over 20 platform-as-a-service (PaaS) offerings that can act in support of a big data analytics solution. So which one is right for your project? This practical book helps you understand the breadth of Azure services by organizing them into a reference framework you can use when crafting your own big data analytics solution.Links:

Video Training for the exam

NOTE: Pluralsight is a paid resource unlike Channel9 and Microsoft Virtual Academy which are free. The quality they provide is also superior because of all the quality checks they go through, and the instructors are one of the best in the industry. The Pluralsight courses have a link to where you can get a free trial and decide for yourself if paying a subscription or not is worth it, but the 10-day free trial should allow you to view all those courses for free.

Understanding Machine Learning
Need a short, clear introduction to machine learning? Watch this.

Getting Started with Azure Machine Learning
Machine learning helps predict the weather, route you around traffic jams, and display personalized ads on your web pages. In this course, you will learn how to use Azure machine learning in order to create, deploy, and maintain predictive solutions.

How to Think About Machine Learning Algorithms
If you don’t know the question, you probably won’t get the answer right. This course is all about asking the right machine learning questions, modeling real-world situations as one of several well understood machine learning problems.

Cert Exam Prep: Exam 70-475: Big Data and Analytics Solutions
This Certification Exam Prep session is designed for people experienced with Big Data and Data Analytics who are interested in taking the 70-475 exam. Attendees of this session can expect to review the topics covered in this exam in a fast-paced format, as well as receive some valuable test taking techniques. Attendees leave with an understanding of how Microsoft certification works, what are the key topics covered in the exams, and an exhaustive look at resources for finalizing getting ready for the exam. The session is led by a Microsoft Certified Trainer (MCT), experienced in delivering sessions on these topics.

Articles / Blog Posts per objective <In Progress>

Design big data batch processing and interactive solutions (25–30%)

  • Ingest data for batch and interactive processing
    • Ingest from cloud-born or on-premises data, store data in Microsoft Azure Data Lake, store data in Azure BLOB Storage, perform a one-time bulk data transfer, perform routine small writes on a continuous basis
  • Design and provision compute clusters
    • Select compute cluster type, estimate cluster size based on workload
  • Design for data security
    • Protect personally identifiable information (PII) data in Azure, encrypt and mask data, implement role-based security
  • Design for batch processing
    • Select appropriate language and tool, identify formats, define metadata, configure output
  • Design interactive queries for big data
    • Provision Spark cluster, set the right resources in Spark cluster, execute queries using Spark SQL, select the right data format (Parquet), cache data in memory (make sure cluster is of the right size), visualize using business intelligence (BI) tools (for example, Power BI, Tableau), select the right tool for business analysis

Design big data real-time processing solutions (25–30%)

  • Ingest data for real-time processing
    • Select data ingestion technology, design partitioning scheme, design row key of event tables in HBase
  • Design and provision compute resources
    • Select streaming technology in Azure, select real-time event processing technology, select real-time event storage technology, select streaming units, configure cluster size, assign appropriate resources for Spark clusters, assign appropriate resources for HBase clusters, utilize Visual Studio to write and debug Storm topologies
  • Design for Lambda architecture
    • Identify application of Lambda architecture, utilize streaming data to draw business insights in real time, utilize streaming data to show trends in data in real time, utilize streaming data and convert into batch data to get historical view, design such that batch data doesn’t introduce latency, utilize batch data for deeper data analysis
  • Design for real-time processing
    • Design for latency and throughput, design reference data streams, design business logic, design visualization output

Design Machine Learning solutions (20–25%)

  • Create and manage experiments
    • Create, manage, and share workspaces; create training experiment; select template experiment from Machine Learning gallery
  • Determine when to pre-process or train inside Machine Learning Studio
    • Select model type based on desired algorithm, select technique based on data size
  • Select input/output types
    • Select appropriate SQL parameters, select BLOB storage parameters, identify data sources, select HiveQL queries
  • Apply custom processing steps with R and Python
    • Visualize custom graphs, estimate custom algorithms, select custom parameters, interact with datasets through notebooks (Jupyter Notebook)
  • Publish web services
    • Operationalize Azure Machine Learning models, operationalize Spark models using Azure Machine Learning, operationalize custom models

Operationalize end-to-end cloud analytics solutions (25–30%)

  • Create a data factory
    • Identify data sources, identify and provision data processing infrastructure, utilize Visual Studio to design and deploy pipelines
  • Orchestrate data processing activities in a data-driven workflow
    • Leverage data-slicing concepts, identify data dependencies and chaining multiple activities, model complex schedules based on data dependencies, provision and run data pipelines
  • Monitor and manage the data factory
    • Identify failures and root causes, create alerts for specified conditions, perform a restatement
  • Move, transform, and analyze data
    • Leverage Pig, Hive, MapReduce for data processing; copy data between on-premises and cloud; copy data between cloud data sources; leverage stored procedures; leverage Machine Learning batch execution for scoring, retraining, and update resource; extend the data factory with custom processing steps; load data into a relational store, visualize using Power BI
  • Design a deployment strategy for an end-to-end solution
    • Leverage PowerShell for deployment, automate deployment programmatically

Additional Tips

I think the best thing that you can do after reading this, or even meanwhile, is to open a free Azure Trial (or a few), and play with those features, follow the tutorials and you shouldn’t have any problems with the exam!

Did I miss any cool links in this guide? Let me know in the comments!

Follow me on Social Media and Share this article with your friends!

Leave a comment and don’t forget to like the Absolute SharePoint Blog Page   on Facebook and to follow me on Twitter here  for the latest news and technical articles on SharePoint.  I am also a Pluralsight author, and you can view all the courses I created on my author page.
4.81/5 (21)

Please rate this