Software Development / Big Data & Analytics

Refine your data science skills with the heavy armory of tools provided by Julia

**About This Video**

- Learn to use the machine learning algorithms in Julia to make better decisions and smarter actions in real time without human intervention
- Get to grips with the specialized packages in Julia and leverage its performance capabilities to create efficient programs
- Create your own modules and contribute to the Julia package system

**In Detail**

Julia is an easy, fast, open source language that if written well performs nearly as well as low-level languages such as C and FORTRAN. Its design is a dance between specialization and abstraction, providing high machine performance without the sacrifice of human convenience. Julia is a fresh approach to technical computing, combining expertise from diverse fields of computational and computer science.

This video course walks you through all the steps involved in applying the Julia ecosystem to your own data science projects. We start with the basics and show you how to design and implement some of the general purpose features of Julia. Is fast development and fast execution possible at the same time? Julia provides the best of both worlds with its wide range of types, and our course covers this in depth. You will have organized and readable code by the end of the course by learning how to write Lisp style macros and modules.

The course demonstrates the power of the DataFrames package to manage, organize, and analyze data. It enables you to work with data from various sources, perform statistical calculations on them, and visualize their relationships in different kinds of plots through live demonstrations.

*Julia for Data Science* takes you from zero to hero, leaving you with the know-how required to apply

- Getting Comfortable with the Basic Structures in JuliaThe Course OverviewThis video provides an overview of the entire course.2:42Installing a Julia Working EnvironmentWe are going to install Julia with any one of the common development environments available. • Install Julia and work with REPL • Install three possible working environments: ,Text editor: Sublime Text, Notebook interface: Jupyter (IJulia) notebook, Cloud: JuliaBox • Check out the pros and cons of a working environment5:13Working with Variables and Basic TypesProgram data needs to be stored efficiently and in an easy to use form. • Learn how to store primitive data like numbers, true/false values, and ranges. • Learn how to store primitive data like strings • Learn how to store a number of data-items in arrays and how to work with these8:08Controlling the FlowThis video deals with the problem of how to control the order of execution in Julia code and what to do when errors occur. • The problem is stated • The solutions for conditional and repetitive code are detailed • The solution for how to handle errors in running code is shown5:18Using FunctionsJulia code is much less performant and readable when the code is not subdivided in functions. • We will look at the different ways to create functions • We will explore the versatility of functions • We will also measure the performance of functions versus global code, which proves our problem is solved8:36Using Tuples, Sets, and DictionariesArrays can only be accessed by index and all the elements have to be of the same type. We want more flexible data structures; in particular, we want to also store and retrieve data by keys. • Tuples can contain elements of diverse types • Sets have unique elements • Dictionaries are accessible by key. Their use is demonstrated in an integrated example. Demonstration of the use of dictionaries, that are accessible by key, with the help of an integrated example.5:54Working with Matrices for Data Storage and CalculationsData is often presented in the form of a matrix. We need to know how to work with matrices in order to work on data. • Learn how to make a matrix • Know what can we do with matrices • Apply useful data functions on matrices8:26

- Diving Deeper into JuliaUsing Types and Parameterized MethodsThe aim of the video is to show you the importance of using types and parametrized methods in writing generic and performant code. • Get some more background knowledge on using types, and defining our own types. • Show how to use general types in functions • Demonstrate multiple dispatch and show how this enhances performance6:43Optimizing Your Code by Using and Writing MacrosCoding is often a repetitive task. Shorten your code, make it more elegant and avoid repetition by making and using macros. • Learn how macros are possible in Julia • Build a few simple examples of macros • Show how macros work and how you can use other built-in macros7:12Organizing Your Code in ModulesIn order to build a Julia package we need something to structure that, why? Because of the following reasons: • A package can contain multiple files • Different packages can have functions with the same name that would conflict • Introduce a module to structure a project • Learn how to export definitions for making these available outside of the module • Work with using and import to make module definitions known in the current context6:26Working with the Package EcosystemFunctionality that you need in your project is often already written and exists as a package. How to search, install, and work with these packages? • Searching for packages • Installing and maintaining packages • Working with your installed packages6:19

- Working with Data in JuliaReading and Writing Data Files and Julia DataIn order to process data, we need to get them out of their data-sources and into our Julia program. • We first learn how to do this with CSV files • We then extend this to work with HDF5 and JSON formats • Finally, we also work with the Julia way to store and read data in the JLD format7:42Using DataArrays and DataFramesWorking with tabular data in matrices is possible, but not very convenient. The DataFrame offers us a more convenient data structure for data science purposes. • We need a way to deal with ‘Not Available’ values. This is dealt with in a DataArray • DataFrames are composed of DataArrays. We explore different ways to construct a DataFrame • We learn how to view parts of a DataFrame and how to get important info about its structure7:42The Power of DataFramesWhat are the possibilities that DataFrame offers for data manipulation? • We will review and enhance ways to access data items in rows and columns • You will learn how to extract data based on conditions and group data by features whilst applying functions to them • Finally, you will learn how to stack and sort data6:37Interacting with Relational Databases Like SQL ServerRelational databases are an important data source. How can we work from Julia with the data in these data sources? • We use SQL Server as a prototype for relational databases. We import the iris data and configure an ODBC string. • You will learn to work with the ODBC connection • You will learn how to work with data in a SQLite database7:21Interacting with NoSQL Databases Like MongoDBIn certain situations data is better stored in NoSQL databases. Julia can work with a number of these through specialized packages; amongst them are Mongo and Redis. • Explain when NoSQL can be important and what the possibilities are • Show how to work with MongoDB from Julia • Show how to work with Redis from Julia6:25

- Statistics with JuliaExploring and Understanding a Dataset StatisticallyWe need to calculate various statistical numbers to get insight into a dataset. How can we do this with Julia? • How to calculate extrema and quartiles • How to calculate histograms and countmaps • How to calculate correlations6:39An Overview of the Plotting Techniques in JuliaData must be graphically visualized to get better insight onto them. What are the possibilities Julia offers in this area? • Demonstrate the use of the Winston package for basic graphics • Demonstrate the use of the Gadfly package for data visualization • Understand the use of the PyPlot package with the most comprehensive graphical capabilities3:03Visualizing Data with Scatterplots, Histograms, and Box PlotsScatterplots, histograms, and box plots are some of the basic tools of the data scientist. We investigate our iris data by using each of them in turn. • Learn how to make scatterplots in Julia, showing its usefulness • Learn how to make histograms in Julia • Learn how to make box plots in Julia4:25Distributions and Hypothesis TestingIn statistical investigations, we need to be able to define distributions, cluster data into groups, and test hypotheses. • Learn how to work with the Distributions package • Learn how to work with the KernelDensity package • Learn how to work with the HypothesisTests package5:35Interfacing with RA lot of useful libraries exist written in R that are not yet implemented in Julia. Can we use these R libraries from Julia code? • Read R data files into Julia for processing • Call into R code with the RCall package • Learn how to use RCall4:25

- Machine Learning Techniques with JuliaBasic Machine Learning TechniquesData must be prepared before machine learning algorithms can be applied. Furthermore, applying an algorithm follows a specific cycle, which we will review here. The MLBase package will be used in this section. • We will highlight some data-preprocessing techniques • We will walk through a typical algorithm application cycle • We will discuss some techniques to validate how well a model performs6:16Classification Using Decision Trees and RulesData often needs to be classified in groups; Decision Tree is one of the basic algorithms to do that. • The principles of the Decision Tree algorithm are highlighted • A pruned tree classifier is applied, and we see how good it performs through cross validation. • Adaptive boosting and random decision forests are then applied to further improve the model7:01Training and Testing a Decision Tree ModelIn a realistic setting, a model is first trained, and then tested. • We divide the data so that we have training and testing datasets • We build the model and apply it • We calculate a number of different measures as well as visualizations to verify the accuracy of our model3:59Applying a Generalized Linear Model with GLMTo obtain better linear regression models, and to be able to work with more independent variables, we need more generalized linear modeling. • Use and apply the GLM package to get better linear regression • Use and apply the GLMNet package to get better linear regression for more independent variables6:18Working with Support Vector MachinesWe need a better classification algorithm than Decision Trees for more complex data, like in pattern recognition. The Support Vector Machine is developed for these tasks. • Get a general overview on SVM • We apply SVM to the iris dataset in a naive way, obtaining an accuracy of 93 percent. • SVM is applied again with a larger and more randomized training set, reaching an accuracy of 97 percent.7:12

- Data Science
- Machine Learning
- Statistical Modeling