Take this

Software Development / Big Data & Analytics

Julia for Data Science

Refine your data science skills with the heavy armory of tools provided by Julia


About This Video

  • Learn to use the machine learning algorithms in Julia to make better decisions and smarter actions in real time without human intervention
  • Get to grips with the specialized packages in Julia and leverage its performance capabilities to create efficient programs
  • Create your own modules and contribute to the Julia package system

In Detail

Julia is an easy, fast, open source language that if written well performs nearly as well as low-level languages such as C and FORTRAN. Its design is a dance between specialization and abstraction, providing high machine performance without the sacrifice of human convenience. Julia is a fresh approach to technical computing, combining expertise from diverse fields of computational and computer science.

This video course walks you through all the steps involved in applying the Julia ecosystem to your own data science projects. We start with the basics and show you how to design and implement some of the general purpose features of Julia. Is fast development and fast execution possible at the same time? Julia provides the best of both worlds with its wide range of types, and our course covers this in depth. You will have organized and readable code by the end of the course by learning how to write Lisp style macros and modules.

The course demonstrates the power of the DataFrames package to manage, organize, and analyze data. It enables you to work with data from various sources, perform statistical calculations on them, and visualize their relationships in different kinds of plots through live demonstrations.

Julia for Data Science takes you from zero to hero, leaving you with the know-how required to apply

Full details


  • Getting Comfortable with the Basic Structures in Julia
    The Course Overview
    This video provides an overview of the entire course.
    Installing a Julia Working Environment
    We are going to install Julia with any one of the common development environments available. • Install Julia and work with REPL • Install three possible working environments: ,Text editor: Sublime Text, Notebook interface: Jupyter (IJulia) notebook, Cloud: JuliaBox • Check out the pros and cons of a working environment
    Working with Variables and Basic Types
    Program data needs to be stored efficiently and in an easy to use form. • Learn how to store primitive data like numbers, true/false values, and ranges. • Learn how to store primitive data like strings • Learn how to store a number of data-items in arrays and how to work with these
    Controlling the Flow
    This video deals with the problem of how to control the order of execution in Julia code and what to do when errors occur. • The problem is stated • The solutions for conditional and repetitive code are detailed • The solution for how to handle errors in running code is shown
    Using Functions
    Julia code is much less performant and readable when the code is not subdivided in functions. • We will look at the different ways to create functions • We will explore the versatility of functions • We will also measure the performance of functions versus global code, which proves our problem is solved
    Using Tuples, Sets, and Dictionaries
    Arrays can only be accessed by index and all the elements have to be of the same type. We want more flexible data structures; in particular, we want to also store and retrieve data by keys. • Tuples can contain elements of diverse types • Sets have unique elements • Dictionaries are accessible by key. Their use is demonstrated in an integrated example. Demonstration of the use of dictionaries, that are accessible by key, with the help of an integrated example.
    Working with Matrices for Data Storage and Calculations
    Data is often presented in the form of a matrix. We need to know how to work with matrices in order to work on data. • Learn how to make a matrix • Know what can we do with matrices • Apply useful data functions on matrices
  • Diving Deeper into Julia
    Using Types and Parameterized Methods
    The aim of the video is to show you the importance of using types and parametrized methods in writing generic and performant code. • Get some more background knowledge on using types, and defining our own types. • Show how to use general types in functions • Demonstrate multiple dispatch and show how this enhances performance
    Optimizing Your Code by Using and Writing Macros
    Coding is often a repetitive task. Shorten your code, make it more elegant and avoid repetition by making and using macros. • Learn how macros are possible in Julia • Build a few simple examples of macros • Show how macros work and how you can use other built-in macros
    Organizing Your Code in Modules
    In order to build a Julia package we need something to structure that, why? Because of the following reasons: • A package can contain multiple files • Different packages can have functions with the same name that would conflict • Introduce a module to structure a project • Learn how to export definitions for making these available outside of the module • Work with using and import to make module definitions known in the current context
    Working with the Package Ecosystem
    Functionality that you need in your project is often already written and exists as a package. How to search, install, and work with these packages? • Searching for packages • Installing and maintaining packages • Working with your installed packages
  • Working with Data in Julia
    Reading and Writing Data Files and Julia Data
    In order to process data, we need to get them out of their data-sources and into our Julia program. • We first learn how to do this with CSV files • We then extend this to work with HDF5 and JSON formats • Finally, we also work with the Julia way to store and read data in the JLD format
    Using DataArrays and DataFrames
    Working with tabular data in matrices is possible, but not very convenient. The DataFrame offers us a more convenient data structure for data science purposes. • We need a way to deal with ‘Not Available’ values. This is dealt with in a DataArray • DataFrames are composed of DataArrays. We explore different ways to construct a DataFrame • We learn how to view parts of a DataFrame and how to get important info about its structure
    The Power of DataFrames
    What are the possibilities that DataFrame offers for data manipulation? • We will review and enhance ways to access data items in rows and columns • You will learn how to extract data based on conditions and group data by features whilst applying functions to them • Finally, you will learn how to stack and sort data
    Interacting with Relational Databases Like SQL Server
    Relational databases are an important data source. How can we work from Julia with the data in these data sources? • We use SQL Server as a prototype for relational databases. We import the iris data and configure an ODBC string. • You will learn to work with the ODBC connection • You will learn how to work with data in a SQLite database
    Interacting with NoSQL Databases Like MongoDB
    In certain situations data is better stored in NoSQL databases. Julia can work with a number of these through specialized packages; amongst them are Mongo and Redis. • Explain when NoSQL can be important and what the possibilities are • Show how to work with MongoDB from Julia • Show how to work with Redis from Julia
  • Statistics with Julia
    Exploring and Understanding a Dataset Statistically
    We need to calculate various statistical numbers to get insight into a dataset. How can we do this with Julia? • How to calculate extrema and quartiles • How to calculate histograms and countmaps • How to calculate correlations
    An Overview of the Plotting Techniques in Julia
    Data must be graphically visualized to get better insight onto them. What are the possibilities Julia offers in this area? • Demonstrate the use of the Winston package for basic graphics • Demonstrate the use of the Gadfly package for data visualization • Understand the use of the PyPlot package with the most comprehensive graphical capabilities
    Visualizing Data with Scatterplots, Histograms, and Box Plots
    Scatterplots, histograms, and box plots are some of the basic tools of the data scientist. We investigate our iris data by using each of them in turn. • Learn how to make scatterplots in Julia, showing its usefulness • Learn how to make histograms in Julia • Learn how to make box plots in Julia
    Distributions and Hypothesis Testing
    In statistical investigations, we need to be able to define distributions, cluster data into groups, and test hypotheses. • Learn how to work with the Distributions package • Learn how to work with the KernelDensity package • Learn how to work with the HypothesisTests package
    Interfacing with R
    A lot of useful libraries exist written in R that are not yet implemented in Julia. Can we use these R libraries from Julia code? • Read R data files into Julia for processing • Call into R code with the RCall package • Learn how to use RCall
  • Machine Learning Techniques with Julia
    Basic Machine Learning Techniques
    Data must be prepared before machine learning algorithms can be applied. Furthermore, applying an algorithm follows a specific cycle, which we will review here. The MLBase package will be used in this section. • We will highlight some data-preprocessing techniques • We will walk through a typical algorithm application cycle • We will discuss some techniques to validate how well a model performs
    Classification Using Decision Trees and Rules
    Data often needs to be classified in groups; Decision Tree is one of the basic algorithms to do that. • The principles of the Decision Tree algorithm are highlighted • A pruned tree classifier is applied, and we see how good it performs through cross validation. • Adaptive boosting and random decision forests are then applied to further improve the model
    Training and Testing a Decision Tree Model
    In a realistic setting, a model is first trained, and then tested. • We divide the data so that we have training and testing datasets • We build the model and apply it • We calculate a number of different measures as well as visualizations to verify the accuracy of our model
    Applying a Generalized Linear Model with GLM
    To obtain better linear regression models, and to be able to work with more independent variables, we need more generalized linear modeling. • Use and apply the GLM package to get better linear regression • Use and apply the GLMNet package to get better linear regression for more independent variables
    Working with Support Vector Machines
    We need a better classification algorithm than Decision Trees for more complex data, like in pattern recognition. The Support Vector Machine is developed for these tasks. • Get a general overview on SVM • We apply SVM to the iris dataset in a naive way, obtaining an accuracy of 93 percent. • SVM is applied again with a larger and more randomized training set, reaching an accuracy of 97 percent.


  • Machine Learning
  • Data Science
  • Statistical Modeling

Similar Courses

More Courses by this Instructor