Where to Learn Technical Skills

This document links to a number of resources (all available for free online) for learning the essential technical skills for working in CAFRI. Resources for more specialized skills may be located on the relevant project pages.

The aim here is to provide a highly opinionated list of resources, rather than to collect all available resources on a given topic. If you find a book or paper particularly useful for learning a skill, feel free to add it to these lists – but ask if any of the other items can be removed at the same time.

R

Our lab primarily uses R for all of our analysis work. You can use other things, but you’re probably on your own for problem solving and debugging. Collaborating with other lab members will probably require knowing some R.

The resources below provide a good introduction to R, plus some pointers to resources for more advanced usage.

Getting Started

  • R for Data Science is hands down the best book for people trying to learn R. Use this before any other resource. Go in order and code through all the examples. It’ll take a month or two, but it’ll get you to a place where you can use R for actual, honest to goodness analyses.

More Advanced

  • Advanced R is a great book for understanding R at a deeper level. Each chapter stands alone in a way R4DS doesn’t quite, making it a great reference book. Don’t be scared by the word “Advanced”; if you’ve finished R4DS and want to know more about how R works, you’re ready for this book.
  • Geocomputation with R is great for learning how to use R for geospatial data. It requires you to have a little bit of knowledge of R and a little bit of knowledge of spatial analytics.

Code Style

Git

All of our code lives on GitHub, which is a platform that hosts what are known as “git repositories”. git is a version control system, helping you “track changes” in a number of text files in a way that lets you easily undo mistakes, collaborate with tons of other people, and never need to name a file “file_final_v2_edits.docx” ever again. It’s an incredibly powerful tool and absolutely core to how we work.

Unfortunately, git is also a massive pain to learn at first. To quote Jenny Bryan:

A lone ranger, working on a single computer, can benefit from adopting version control. But not nearly enough to justify the pain of installation and workflow upheaval. There are much easier ways to get versioned back ups of files, if that’s all you’re worried about.
In my opinion, for new users, the pros of Git only outweigh the cons when you consider the overhead of working with other people, including your future self. And who among us does not need to do that? In a Git-based workflow, you document and, optionally, expose your work as you go. Communication and collaboration are the killer apps of version control. Git’s model of file management can feel uncomfortably rigid, but it enables the distribution of files across different people, computers, and time.
This has an implication for selecting your first Git projects: you will enjoy the most gain for your pain if you pick a project that involves sharing rapidly evolving files with others. It is tempting to pick a quiet, private project. But if you do, you may never find the benefits of formal version control compelling enough to cement the new habit.

If you’re going to work with lab members, you’re going to need to know git. A good way to get started is to read Jenny Bryan’s book Happy Git and GitHub for the useR, and then very quickly start collaborating with other people in the lab. The best way to learn git is to just get used to it, and so starting to collaborate right away will speed you through the worst of the learning curve. And remember, you can always ask for help when you get stuck.

Using the Terminal (Shell)

The terminal is a command-line user interface for operating your computer. In other words, instead of pointing and clicking around to make things happen on your computer, you can type text-based commands. While you may be able to avoid the terminal when writing code locally, a working knowledge of how to operate in the terminal will improve your understanding and ability to work with your computer, your code, and other installed software. It’s also fundamental if you want to work with any machines or servers without graphical user interfaces (e.g. our AWS compute instances. A terminal will look something like this:

If you use a Mac, or a Linux based operating system, you should be able to simply open the ‘terminal’ application. If you use a Windows operating system, follow the instructions in the Git section above to install ‘Git for Windows’ which will install a terminal along with git.

A solid guide to the most basic terminal commands and beyond can be found here. And of course, you can always ask a lab member for help if you get stuck.

Machine Learning

A lot of our lab’s projects use machine learning techniques to build predictive modeling. We primarily focus on applications for these techniques – we’re not generally developing new methods – which means that you don’t really need a deep understanding of linear algebra or similar. However, understanding how to implement and interpret these models will be extremely helpful. This section links to some resources for that; the focus is on highlighting the most useful resources, not all resources available under the sun.

Books

  • Hands-on Machine Learning with R (“HOML”) is a fantastic book for getting up and running with standard machine learning techniques. This is an extremely application-focused book, with examples in R throughout.
  • An Introduction to Statistical Learning is the gold standard machine learning textbook. It’s more mathematical than HOML, but is still a fantastic resource for people who need to use machine learning methods, not invent them.

Courses

Papers

These papers provide useful ways to think about machine learning.