Return to site

Strings For Quartz Font In R Studio For Mac

broken image


  1. Understand what are R and R Studio
  2. Develop the good habit of working with scripts
  3. Learn to import data in R
  4. Learn to manipulate R objects like vectors and data frames
  5. Make a simple plot

If you're on a Mac, enter quartz. As I noted above, you can specify a specific window size and font point size and family if you prefer. Enter the command more than once if you want to do side-by-side comparisons or have multiple plots visible at the same time. A simple font that allows you to put circled numbers into a document (such as a Finale file). The numbers 1-6 are included, in both serif and sans-serif forms. Download this font (Mac and PC versions included). Currently Fizzilla CFM uses the Mac gfx implementation. As a result, Fizzilla doesn't get the benefits of Quartz font anti-aliasing. Instead, the old worse font anti-aliasing is used. (On Classic, this could be fixed by installing SmoothType.) If Mozilla used Quartz for font rendering, the fonts would look better. Currently R‑Studio for Linux supports two versions of file type descriptions. Version 2 extends legacy Version 1 by adding variable signature offsets and AND/OR combination of several signatures in one file type. The version of file type description is specified by the version attribute of the FileTypeList section.

In our first tutorial we will begin to explore 'R' as a tool to analyse and visualise data.

R is a statistical programming language that has rapidly gained popularity in many scientific fields. It was developed by Ross Ihaka and Robert Gentleman as an open source implementation of the 'S' programming language. (Next time you need a fun fact, you can say 'Did you know that S came before R?') R is also the name of the software that uses this language for statistical computing. With a huge online support community and dedicated packages that provide extra functionality for virtually any application and field of study, there's hardly anything you can't do in R.

If you already know your way around statistical softwares like Minitab or SPSS, the main difference is that R has no graphical user interface, which means there are no buttons to click and no dropdown menus. R can be run entirely by typing commands into a text interface (welcome to the Matrix!). This may seem a little daunting, but it also means a whole lot more flexibility, as you are not relying on a pre-determined toolkit for your analyses.

Thanks for joining us on your learning journey. Like with any language, there is a learning curve (trust me, I'm learning German at the moment), but we will take it step by step, and in no time you will be coding your own analyses and graphs!

If you need any more convincing, why are we using R and not one of the many other statistical packages like MATLAB, Minitab, or even Microsoft Excel? Well, R is great because:

  • R is free and open source, and always will be! Anybody can use the code and see exactly how it works.
  • Because R is a programming language rather than a graphical interface, the user can easily save scripts as small text files for use in the future, or share them with collaborators.
  • R has a very active and helpful online community - normally a quick search is all it takes to find that somebody has already solved the problem you're having. You can start with our page with useful links!

As we said before, R itself does not have a graphical interface, but most people interact with R through graphical platforms that provide extra functionality. We will be using a program called RStudio as a graphical front-end to R, so that we can access our scripts and data, find help, and preview plots and outputs all in one place.

You can download R from CRAN (The Comprehensive R Archive Network). Select the link appropriate for your operating system.

Then, download RStudio from the RStudio website (select the free open source desktop version).

If you are using a Mac, in addition to R and RStudio, you need to download XQuartz (available here).

Open RStudio. Click on 'File/New File/R script'.

You will now see a window like the one above. You can type code directly into the console on the lower left (doesn't mean that you should*!). Pressing enter at the end of the line runs the code (try typing 2 + 2 and running it now). You can (should!) also write your code in the script file in the top left window. To run a line of code from your script, press Ctrl+R on Windows or Cmd+Enter on a Mac. On newer Windows computers, the default shortcut is Ctrl+Enter. The environment window gives you an overview of your current workspace**. You will see the data you have imported, objects you have created, functions you have defined, etc. Finally, the last panel has multiple tabs and will preview your plot and allow you to navigate around folders and look at the packages you currently have installed and loaded.

*A note about scripts (We love scripts!): Remember that if you enter code directly into the console, it will not be saved by R: it runs and disappears (although you can access your last few operations by hitting the ‘up' key on your keyboard). Instead, by typing your code into a script file, you are creating a reproducible record of your analysis. Writing your code in a script is similar to writing an essay in Word: it saves your progress and you can always pick up where you left off, or make some changes to it. (Remember to click Save (Ctrl+S) often, so that you actually save your script!)

When writing a script, it's useful to add comments to describe what you are doing by inserting a hasthag # in front of a line of text. R will see anything that begins with #as text instead of code, so it will not try to run it, but the text will provide valuable information about the code for whoever is reading your script (including future you!). Like with any piece of writing, scripts benefit from structure and clarity: we will learn more about proper coding etiquette in a later tutorial.

**A quicker note about the workspace: The workspace will have everything you have used in a session floating around your computer memory. When you exit, R will ask you if you want to save the current workspace. You almost never need to, and it's best to click no and start with a clear slate every time. (DO make sure you save your script though!!)

Begin to write in your script

For now, start by recording who is writing, the date, and the main goal - in our case, determining how many species from different taxa have been recorded in Edinburgh. Here's an example, which you can copy, paste and edit into your new script:

The next few lines of code usually load the packages you will be needing for your analysis. A package is a bundle of commands that can be loaded into R to provide extra functionality. For example, you might load a package for formatting data, or for making maps. (Or for making graphs with cats on them, or whatever floats your boat… As we said before, there's virtually nothing you cannot do!)

To install a package, type install.packages('package-name'). You only need to install packages once, so in this case you can type directly in the console box, rather than saving the line in your script and re-installing the package every time.

Once installed, you just need to load the packages using library(package-name). Today we will be using the dplyr package to provide extra commands for formatting and manipulating data. (You will learn more about the powerful features of dplyr in a later tutorial).

The next lines of code should define your working directory. This is a folder on your computer where R will look for data, save your plots, etc. To make your workflow easier, it is good practice to save everything related to one project in the same place, as it will save you a lot of time typing up computer paths or hunting for files that got saved R-knows-where. For instance, you could save your script and all the data for this tutorial in a folder called 'Intro_to_R'. (It is good practice to avoid spaces in file names as it can sometimes confuse R.) For bigger projects, consider having a root folder with the name of the project (e.g. 'My_PhD') as your working directory, and other folders nested within to separate data, scripts, images, etc. (e.g. My_PhD/Chapter_1/data, My_PhD/Chapter_1/plots, My_PhD/Chapter_2/data, etc.)

To find out where your working directory is now, run the code getwd(). If you want to change it, you can use setwd(). Set your working directory to the folder you just downloaded from GitHub:

Watch out! Note that on a Windows computer, a copied-and-pasted file path will have backslashes separating the folders ('C:folderdata'), but the filepath you enter into R should use forward slashes ('C:/folder/data').

Practice is the best way to learn any new language, so let's jump straight in and do some of our own statistical analysis using a publicly available dataset of occurrence records for many animal, plant and fungi species. We downloaded the records for 2000-2016 (from the NBN Gateway ) and saved them as edidiv.csv. First, you will need to download the data.

Follow the link, click on 'Download Zip', and save and unzip the folder somewhere on your computer. (Never heard of Github? Don't worry, we will cover it in a later tutorial. For now, it's simply the website where you can download our course material from.)

You can find all the files needed to complete this tutorial in this Github repository.

Click on Code and then Download zip. Remember to unzip the files before you start working with them in RStudio.

Now that you have the data saved on your computer, let's import it! In RStudio, you can either click on the Import dataset button and navigate to where you have saved your file, or use the read.csv() command. If you use the button, a window will pop up previewing your data. Make sure that next to Heading you have selected Yes (this tells R to treat the first row of your data as the column names) and click Import. In the console, you will see the code for your import, which includes the file path - it's a good idea to copy this code into your script, so that for future reference you know where your dataset came from.

R works best with .csv (comma separated values) files. If you entered your data in Excel, you would need to click on Save as and select csv as the file extension. When entering data in Excel, don't put any spaces in your row names, as they will confuse R later (e.g. go for something like height_meters rather than height (m). Some computers save .csv files with semicolons ;, not commas , as the separators. This usually happens when English is not the first or only language on your computer. If your files are separated by semicolons, use read.csv2 instead of read.csv, or alternatively use the argument 'sep' (for separator) in the read.csvfunction: r.csv('your-file-path', sep = ';').

Remember to save your script once in a while! If you haven't saved it already, why not save it in the same directory as the rest of the tutorial file, and give it a meaningful name.

A note about objects: R is an object-based language - this means that the data you import, and any values you create later, are stored in objects that you name. The arrow <- in the code above is how you assign objects. Here, we assigned our csv file to the object edidiv. We could just as easily have called it mydata or hello or biodiversity_recorded_around_Edinburgh_Scotland, but it's best to choose a unique, informative, and short name. In the top right window of RStudio, you can see the names of any objects currently loaded into R. See your edidiv object?

When you import your data into R, it will most likely become an object called a data frame. Text box vba excel for mac. A data frame is like a table, or spreadsheet - it has rows and columns with the different variables and observations you have loaded. But more on that later!

A really important step is to check that your data was imported without any mistakes. It's good practice to always run this code and check the output in the console - do you see any missing values, do the numbers/names make sense? If you go straight into analysis, you risk later finding out that R didn't read your data correctly and having to re-do it, or worse, analysing wrong data without noticing. To preview more than just the few first lines, you can also click on the object in your Environment panel, and it will show up as a spreadsheet in a new tab next to your open script. Large files may not display entirely, so keep in mind you could be missing rows or columns.

str(object.name) is a great command that shows the structure of your data. So often, analyses in R go wrong because R decides that a variable is a certain type of data that it is not. For instance, you might have four study groups that you simply called '1, 2, 3, 4', and while you know that it should be a categorical grouping variable (i.e. a factor), R might decide that this column contains numeric (numbers) or integer (whole number) data. If your study groups were called 'one, two, three, four', R might decide it's a character variable (words or strings of words), which will not get you far if you want to compare means among groups. Bottom line: always check your data structure!

You'll notice the taxonGroup variable shows as a character variable, but it should be a factor (categorical variable), so we'll force it to be one. When you want to access just one column of a data frame, you append the variable name to the object name with a dollar $sign. This syntax lets you see, modify, and/or reassign this variable.

In that last line of code, the as.factor() function turns whatever values you put inside into a factor (here, we specified we wanted to transform the character values in the taxonGroup column from the edidiv object). However, if you were to run just the bit of code on the right side of the arrow, it would work that one time, but would not modify the data stored in the object. By assigning with the arrow the output of the function to the variable, the original edidiv$taxonGroup in fact gets overwritten : the transformation is stored in the object. Try again to run class(edidiv$taxonGroup) - what do you notice?

Our edidiv object has occurrence records of various species collected in Edinburgh from 2000 to 2016. To explore Edinburgh's biodiversity, we will create a graph showing how many species were recorded in each taxonomic group. You could calculate species richness in Excel, but that has several disadvantages, especially when working with large datasets like ours - you have no record of what you clicked on, how you sorted the data and what you copied/deleted - mistakes can slip by without you noticing. In R, on the other hand, you have your script, so you can go back and check all the steps in your analysis.

Species richness is simply the total number of different species in a given place or group. To know how many bird, plant, mammal, etc. species we have in Edinburgh, we first need to split edidiv into multiple objects, each containing rows for only one taxonomic group. We do this with the useful filter()function from the dplyrpackage.

You need to do these steps for ALL of the taxa in the data, here we have given examples for the first two. If you see an error saying R can't find the object Beetle or similar, chances are you either haven't installed and/or loaded the dplyr package. Go back and install it using install.packages('dplyr') and then load it using library(dplyr).

Once you have created objects for each taxon, we can calculate species richness, i.e. the number of different species in each group. For this, we will nest two functions together: unique(), which identifies different species, and length(), which counts them. Project x hzrdus t800 55 graphite. You can try them separately in the console and see what they return!

If you type a (or however you named your count variables) in the console, what does it return? What does it mean? It should represent the number of distinct beetle species in the record.

Again, calculate species richness for the other taxa in the dataset. You're probably noticing this is quite repetitive and using a lot of copying and pasting! That's not particularly efficient - in future tutorials we will learn how to use more of dplyr's functions and achieve the same result with way less code! You will be able to do everything you just did in ONE line (promise!).

Now that we have species richness for each taxon, we can combine all those values in a vector. A vector is another type of R object that stores values. As opposed to a data frame, which has two dimensions (rows and columns), a vector only has one. When you call a column of a data frame like we did earlier with edidiv$taxonGroup, you are essentially producing a vector - but you can also create them from scratch.

We do this using the c() function (c stands for concatenate, or chain if that makes it easier to remember). We can also add labels with the names()function, so that the values are not coming out of the blue.

Notice:

  • The spaces in front of and behind <- and after , are added to make it easier to read the code.
  • All the labels have been indented on a new line - otherwise the line of code gets very long and hard to read.
  • Take care to check that you are matching your vector values and labels correctly - you wouldn't want to label the number of beetles as lichen species! The good thing about keeping a script is that we can go back and check that we have indeed assigned the number of beetle species to a. Even better practice would have been to give more meaningful names to our objects, such as beetle_sp, bird_sp, etc.
  • If you highlight a bracket )with your mouse, R Studio will highlight its matching one in your code. Missing brackets, especially when you start nesting functions like we did earlier with length(unique()) are one of the most common sources of frustration and error when you start coding!

We can now visualise species richness with the barplot() function. Plots appear in the bottom right window in RStudio.

Ta-daaaa! But there are a few things not quite right that we should fix - there are no axis titles, not all column labels are visible, and the value for plant species (n = 521) exceeds the highest value on the y axis, so we need to extend it. The great thing about R is that you don't need to come up with all the code on your own - you can use the help() function and see what arguments you need to add in. Look through the help output, what code do you need to add in?

We also want to save our plot. To do this, click Export in the Plots window. If you don't change the directory, the file will be saved in your working directory. You can adjust the dimensions to get the bar chart to look how you like it, and you should also add in a meaningful file name - Rplot01.png won't be helpful when you try to find the file later.

You can also save your file by wrapping the code in the png() and dev.off() functions, which respectively open and shut down the plotting device.

Figure 1. Species richness of several taxa in Edinburgh. Records are based on data from the NBN Gateway during the period 2000-2016.

In the last section we created vectors, i.e. a series of values, each with a label. This object type is suitable when dealing with just one set of values. Often, however, you will have more than one variable and have multiple data types - e.g. some continuous, some categorical. In those cases, we use data frame objects. Data frames are tables of values: they have a two-dimensional structure with rows and columns, where each column can have a different data type. For instance, a column called 'Wingspan' would have numeric values measured on different birds (21.3, 182.1, 25.1, 8.9), and a column 'Species' would have character values of with the names of the species ('House sparrow', 'Golden eagle', 'Eurasian kingfisher', 'Ruby-throated hummingbird') Another possible data format is a matrix - a matrix can have several rows of data as well (e.g. you can combine vectors into a matrix), but the variables must be all of the same type. For instance they are all numerical and are the same length in terms of the number of rows.

A note on good housekeeping:ALWAYS keep a copy of your raw data as you first collected it. The beauty of manipulating a file in an R script is that the modifications live on the script, not in the data. For Photoshop-savvy people, it's like adding layers to an image: you're not altering the original photo, just creating new things on top of it. That said, if you wrote a long piece of code to tidy up a large dataset and get it ready to analyse, you may not want to re-run the whole script every time you need to access the clean data. It's therefore a good idea to save your shiny new object as a new csv file that you can load, ready-to-go, with just one command. We will now create a data frame with our species richness data, and then save it using write.csv().

We will use the data.frame() function, but first we will create an object that contains the names of all the taxa (one column) and another object with all the values for the species richness of each taxon (another column).

Strings For Quartz Font In R Studio For Mac

Remember to save your script once in a while! If you haven't saved it already, why not save it in the same directory as the rest of the tutorial file, and give it a meaningful name.

A note about objects: R is an object-based language - this means that the data you import, and any values you create later, are stored in objects that you name. The arrow <- in the code above is how you assign objects. Here, we assigned our csv file to the object edidiv. We could just as easily have called it mydata or hello or biodiversity_recorded_around_Edinburgh_Scotland, but it's best to choose a unique, informative, and short name. In the top right window of RStudio, you can see the names of any objects currently loaded into R. See your edidiv object?

When you import your data into R, it will most likely become an object called a data frame. Text box vba excel for mac. A data frame is like a table, or spreadsheet - it has rows and columns with the different variables and observations you have loaded. But more on that later!

A really important step is to check that your data was imported without any mistakes. It's good practice to always run this code and check the output in the console - do you see any missing values, do the numbers/names make sense? If you go straight into analysis, you risk later finding out that R didn't read your data correctly and having to re-do it, or worse, analysing wrong data without noticing. To preview more than just the few first lines, you can also click on the object in your Environment panel, and it will show up as a spreadsheet in a new tab next to your open script. Large files may not display entirely, so keep in mind you could be missing rows or columns.

str(object.name) is a great command that shows the structure of your data. So often, analyses in R go wrong because R decides that a variable is a certain type of data that it is not. For instance, you might have four study groups that you simply called '1, 2, 3, 4', and while you know that it should be a categorical grouping variable (i.e. a factor), R might decide that this column contains numeric (numbers) or integer (whole number) data. If your study groups were called 'one, two, three, four', R might decide it's a character variable (words or strings of words), which will not get you far if you want to compare means among groups. Bottom line: always check your data structure!

You'll notice the taxonGroup variable shows as a character variable, but it should be a factor (categorical variable), so we'll force it to be one. When you want to access just one column of a data frame, you append the variable name to the object name with a dollar $sign. This syntax lets you see, modify, and/or reassign this variable.

In that last line of code, the as.factor() function turns whatever values you put inside into a factor (here, we specified we wanted to transform the character values in the taxonGroup column from the edidiv object). However, if you were to run just the bit of code on the right side of the arrow, it would work that one time, but would not modify the data stored in the object. By assigning with the arrow the output of the function to the variable, the original edidiv$taxonGroup in fact gets overwritten : the transformation is stored in the object. Try again to run class(edidiv$taxonGroup) - what do you notice?

Our edidiv object has occurrence records of various species collected in Edinburgh from 2000 to 2016. To explore Edinburgh's biodiversity, we will create a graph showing how many species were recorded in each taxonomic group. You could calculate species richness in Excel, but that has several disadvantages, especially when working with large datasets like ours - you have no record of what you clicked on, how you sorted the data and what you copied/deleted - mistakes can slip by without you noticing. In R, on the other hand, you have your script, so you can go back and check all the steps in your analysis.

Species richness is simply the total number of different species in a given place or group. To know how many bird, plant, mammal, etc. species we have in Edinburgh, we first need to split edidiv into multiple objects, each containing rows for only one taxonomic group. We do this with the useful filter()function from the dplyrpackage.

You need to do these steps for ALL of the taxa in the data, here we have given examples for the first two. If you see an error saying R can't find the object Beetle or similar, chances are you either haven't installed and/or loaded the dplyr package. Go back and install it using install.packages('dplyr') and then load it using library(dplyr).

Once you have created objects for each taxon, we can calculate species richness, i.e. the number of different species in each group. For this, we will nest two functions together: unique(), which identifies different species, and length(), which counts them. Project x hzrdus t800 55 graphite. You can try them separately in the console and see what they return!

If you type a (or however you named your count variables) in the console, what does it return? What does it mean? It should represent the number of distinct beetle species in the record.

Again, calculate species richness for the other taxa in the dataset. You're probably noticing this is quite repetitive and using a lot of copying and pasting! That's not particularly efficient - in future tutorials we will learn how to use more of dplyr's functions and achieve the same result with way less code! You will be able to do everything you just did in ONE line (promise!).

Now that we have species richness for each taxon, we can combine all those values in a vector. A vector is another type of R object that stores values. As opposed to a data frame, which has two dimensions (rows and columns), a vector only has one. When you call a column of a data frame like we did earlier with edidiv$taxonGroup, you are essentially producing a vector - but you can also create them from scratch.

We do this using the c() function (c stands for concatenate, or chain if that makes it easier to remember). We can also add labels with the names()function, so that the values are not coming out of the blue.

Notice:

  • The spaces in front of and behind <- and after , are added to make it easier to read the code.
  • All the labels have been indented on a new line - otherwise the line of code gets very long and hard to read.
  • Take care to check that you are matching your vector values and labels correctly - you wouldn't want to label the number of beetles as lichen species! The good thing about keeping a script is that we can go back and check that we have indeed assigned the number of beetle species to a. Even better practice would have been to give more meaningful names to our objects, such as beetle_sp, bird_sp, etc.
  • If you highlight a bracket )with your mouse, R Studio will highlight its matching one in your code. Missing brackets, especially when you start nesting functions like we did earlier with length(unique()) are one of the most common sources of frustration and error when you start coding!

We can now visualise species richness with the barplot() function. Plots appear in the bottom right window in RStudio.

Ta-daaaa! But there are a few things not quite right that we should fix - there are no axis titles, not all column labels are visible, and the value for plant species (n = 521) exceeds the highest value on the y axis, so we need to extend it. The great thing about R is that you don't need to come up with all the code on your own - you can use the help() function and see what arguments you need to add in. Look through the help output, what code do you need to add in?

We also want to save our plot. To do this, click Export in the Plots window. If you don't change the directory, the file will be saved in your working directory. You can adjust the dimensions to get the bar chart to look how you like it, and you should also add in a meaningful file name - Rplot01.png won't be helpful when you try to find the file later.

You can also save your file by wrapping the code in the png() and dev.off() functions, which respectively open and shut down the plotting device.

Figure 1. Species richness of several taxa in Edinburgh. Records are based on data from the NBN Gateway during the period 2000-2016.

In the last section we created vectors, i.e. a series of values, each with a label. This object type is suitable when dealing with just one set of values. Often, however, you will have more than one variable and have multiple data types - e.g. some continuous, some categorical. In those cases, we use data frame objects. Data frames are tables of values: they have a two-dimensional structure with rows and columns, where each column can have a different data type. For instance, a column called 'Wingspan' would have numeric values measured on different birds (21.3, 182.1, 25.1, 8.9), and a column 'Species' would have character values of with the names of the species ('House sparrow', 'Golden eagle', 'Eurasian kingfisher', 'Ruby-throated hummingbird') Another possible data format is a matrix - a matrix can have several rows of data as well (e.g. you can combine vectors into a matrix), but the variables must be all of the same type. For instance they are all numerical and are the same length in terms of the number of rows.

A note on good housekeeping:ALWAYS keep a copy of your raw data as you first collected it. The beauty of manipulating a file in an R script is that the modifications live on the script, not in the data. For Photoshop-savvy people, it's like adding layers to an image: you're not altering the original photo, just creating new things on top of it. That said, if you wrote a long piece of code to tidy up a large dataset and get it ready to analyse, you may not want to re-run the whole script every time you need to access the clean data. It's therefore a good idea to save your shiny new object as a new csv file that you can load, ready-to-go, with just one command. We will now create a data frame with our species richness data, and then save it using write.csv().

We will use the data.frame() function, but first we will create an object that contains the names of all the taxa (one column) and another object with all the values for the species richness of each taxon (another column).

If we want to create and save a barplot using the data frame, we need to slightly change the code - because data frames can contain multiple variables, we need to tell R exactly which one we want it to plot. Like before, we can specify columns from a data frame using $:

In this tutorial, we found out how many species from a range of taxa have been recorded in Edinburgh. We hope you enjoyed your introduction to R and RStudio - the best is yet to come! Keen to make more graphs? Check out our Data Visualisation tutorial!

For common problems in R and how to solve them, as well as places where you can find help, check out our second tutorial on troubleshooting and how to find help online. Feeling ready to go one step furher? Learn how to format and manipulate data in a tidy and efficient way with our tidyr and dplyr tutorial.

  1. You are familiar with the RStudio interface
  2. You can create and annotate a script file
  3. You can import your own datasets into RStudio
  4. You can check and explore data
  5. You can make simple figures

Still with us? Well done! If you're completely new to R, don't worry if you don't grasp quite everything just yet. Go over the sections you found difficult with a fresh eye later, or check our resources to get up to speed with certain concepts.

If you've already caught the coding bug, we have a challenge for you that builds on what we have learned today.

Here are (fictional) values of the wingspan (in cm) measured on four different species of birds. Can you produce a bar plot of the mean wingspan for each species and save it to your computer? (What could the function for calculating the mean be? Think simple)

bird_spwingspan
sparrow22
kingfisher26
eagle195
hummingbird8
sparrow24
kingfisher23
eagle201
hummingbird9
sparrow21
kingfisher25
eagle185
hummingbird9

Solution

Don't peek until you've tried! Here we suggest a solution; note that yours could be different and also work! The object names and the look of your plot will probably be different and that's totally ok - as long as the values themselves are correct.

Ready? Click this line to view the solution

And the final plot would look something like this:


Doing this tutorial as part of our Data Science for Ecologists and Environmental Scientists online course?

This tutorial is part of the Stats from Scratch stream from our online course. Go to the stream page to find out about the other tutorials part of this stream!

If you have already signed up for our course and you are ready to take the quiz, go to our quiz centre. Note that you need to sign up first before you can take the quiz. If you haven't heard about the course before and want to learn more about it, check out the course page.

Glossary:

To recap, here are a few important terms we learned in this lesson:

  • argument: an element of a function, either essential or optional, that informs or alters how the function works. For instance, it can be a file path where the function should import from or save to: file = 'file-path'. It can modify the colours in a plot: col = 'blue'. You can always find which arguments are taken by a function by typing ?function-nameinto the command line.
  • class: the type of data contained in a variable: usually character (text/words), numeric (numbers), integer (whole numbers), or factor (grouping values, useful when you have multiple observations for sites or treatments in your data).
  • command: a chunk of code that performs an action, typically contains one or more functions. You run a command by pressing 'Run' or using a keyboard shortcut like Cmd+Enter, Ctrl+Enter or Ctrl+R
  • comment: a bit of text in a script that starts with a hashtag # and isn't read as a command. Comments make your code readable to other people: use them to create sections in your script and to annotate each step of your analysis
  • console: the window where you can type code directly in the command line (2+2 followed by Enter will return 4), and where the outputs of commands you run will show.
  • data frame: a type of R object which consists of many rows and columns; think Excel spreadsheet. Usually the columns are different variables (e.g. age, colour, weight, wingspan), and rows are observations of these variables (e.g. for bird1, bird2, bird3) .
  • csv file: a type of file commonly used to import data in R, where the values of different variables are compressed together (a string, or line of values per row) and separated only by commas (indicating columns). R can also accept Excel (.xlsx) files, but we do not recommend it as formatting errors are harder to avoid.
  • function: code that performs an action, and really how you do anything in R. Usually takes an input, does something to it, and returns an output (an object, a test result, a file, a plot). There are functions for importing, converting, and manipulating data, for performing specific calculations (can you guess what min(10,15,5) and max(10,15,5) would return?), making graphs, and more.
  • object: the building blocks of R. If R was a spoken language, functions would be verbs (actions) and objects would be nouns (the subjects or, well, objects of these actions!). Objects are called by typing their name without quotation marks. Objects store data, and can take different forms. The most common objects are data frames and vectors, but there are many more, such as lists and matrices.
  • package: a bundle of functions that provide functionality to R. Many packages come automatically with R, others you can download for specific needs.
  • script: Similar to a text editor, this is where you write and save your code for future reference. It contains a mix of code and comments and is saved as a simple text file that you can easily share so that anyone can reproduce your work.
  • vector: a type of R object with one dimension: it stores a line of values which can be character, numeric, etc.
  • working directory: the folder on your computer linked to your current R session, where you import data from and save files to. You set it at the beginning of your session with the setwd() function.
  • workspace: this is your virtual working environment, which contains all the functions of the packages you have loaded, the data you have imported, the objects you have created, and so on. It's usually best to start a work session with a clear workspace.
Stay up to date and learn about our newest resources by following us on Twitter! We would love to hear your feedback, please fill out our survey! Contact us with any questions on ourcodingclub@gmail.com

Related tutorials:

here. Obviously for many users RStudio offersa superior way to work, and as far as I know works with all of the material presentedin these tutorials and with all the packages included. (However, some of themiscellaneous scripts at the bottom of the LabDSV R page will cause RStudio tocrash.) Personally, I'm an OldSchool guy, and I'm not completely sold on RStudio. As a linux user, it solvesproblems I don't have and it encourages (if not enforces) a way of working Idon't appreciate. Nonetheless, since so many people find it useful I'llcontribute some thoughts on best practices using RStudio with LabDSV.

If you just want advice on best practices in RStudio, skip down to here and avoid my rant.

The Good

Projects

One of the excellent things RStudio does is encourage users to define 'projects'and work within the projects. This keeps all the related data and work together in asingle workspace and separates it from other projects you might be working on.Windows users, in particular, that start R by clicking on the icon ontheir desktop often end up with much (or all) of their work in a singleworkspace, commingling work from many potentially different projects. Rof course has facilities to work with separate projects natively, but usersenamored of the desktop often do not use them to full extent. So, RStudioprojects are a good idea. You will certainly want a 'project' for the labspresented in LabDSV, and you might want to separate the labs into separateprojects like 'species modeling', 'ordination', and 'clustering.'

Script Editor

Strings For Quartz Font In R Studio For Macbook Pro

RStudio encourages users to work from a script file rather than directly at theconsole. In addition, it provides a text editor for those script files that ismuch superior to generic Wordpad. (Word obviously should never be used to writeR scripts.) As a bonus, the editor can be made to mimic vi, but that's perhapsmoot to most Windows and even Mac users.

Tool Tips

When you use the console or script editor window RStudio pops up the command-line version oftool-tips, providing the list of arguments for any function you start to enter.This can be really helpful for commands you're less familiar with and makes itunnecessary to start up the help file just to get the command arguments. Bravofor this innovation.

The Bad

Panel Format

By default, RStudio starts up with a four-panel configuration packed into asingle frame. This packaging is very Windowsesque and in my mind impractical.The distribution (and number) of window types within the frame is configurable, and each ofthe windows may have tabs allowing you to select exactly what will appear in thewindow. However, even if you have a large, high resolution monitor the four panel arrangementtries to pack too much into too small a space and results in undesirablecompression of information.

Command Redirection

Even if you work in the console, RStudio traps and redirects many simple Rcommands. In particular, I dislike the way it crams R help files into a tinylittle window. Normally,?command will produce a formatted output to theconsole that is easily negotiated with the arrow keys. If you want to stretchout, help.start() pops up the help file system in your browser with fullhot-link capability. It's the same browser with the same capability you'realready familiar with, and you can shrink it or pop it behind your R session to get it out ofthe way when you don't need it. It's a vastly superior solution to the RStudiobox, but RStudio captures the help.start() command and redirects it tothe little box. Bummer.

Wasted Space

Some of the little windows are of limited utility and waste space, e.g. theEnvironment, History, Connections window. The information it providesis normally easily printed to the console (if and when you want it) except thatnow RStudio traps those function calls. For example, history(100) willcall up the last 100 commands entered at the console which you can browse andmanage with your arrow keys. You can easily cut-and-paste commands from yourhistory into the console. Instead, in RStudio it redirects the output to thelittle Environment, History, Connections box. Bummer.

The Environment, History, Connections window provides theEnvironment tab which gives a summary of objects in yourworkspace. It's more detail than the ls() command provides, but notnearly as much information as str() provides. Clicking on the bluearrow button will provide the str() output, however. Maybe it'shelpful, but ls() and str() in the console provide the sameinformation without taking up real estate on your screen unless you specificallywant to see something.

Data File Import

Importing data into R seems to be a significant problem for many users.Admittedly the plethora of optional arguments for read.table() and theprofusion of read.whatever() functions makes things a littlefunky. You might think that this is one area where a GUI could improve things.Unfortunately, this appears not to be the case.

In RStudio there are at least two ways to import data: (1) using the File menu andselecting 'Importing Dataset', or (2) clicking on a file in the File tab in the'Files, Plots, Etc' panel. Unfortunately (maybe deliberately?) they do thingsdifferently. The File menu approach first pops up a list of import formatoptions. Notably, it offers a choice between 'From text (base) ..' and 'FromText (readr) ..' This is important because base and readrdiffer significantly in the data formatting they support. If you choose 'Fromtext (base)' you get a file chooser. Selecting a file opens the import GUI withoptions to the left, a file previewer to the top right, and a data.framepreviewer to the bottom right. This part is nice. Unfortunately, it doesn'toffer a code preview of the R function that will ultimately be called to do theimport. After clicking on 'Import' you can see in the console that it usesread.table. Importantly, it offers options for choosing or settingrow.names() and a box to click for 'stringsAsFactors'. It follows theread.table() protocol that if the first row has one fewer entries thanthe rest of the lines then the first line is assumed to be a header with columnnames. It offers a fairly limited set of separators (e.g not including |), butis otherwise fairy flexible and functional.

The second option (clicking on a file name in the File panel) is quitedifferent. First, it only offers the option to import files with a limitednumber of file extensions (maybe just .csv?). Files with .dat or .txt extensionscannot be imported. Second, even if package readr is not loaded, it'sfirst choice is to load readr and then use read_csv(). Forexample, for a file called test.csv, it provides a 'Code Preview' that lookslike

library(readr)test <- read_csv('test.csv')View(test)

There doesn't seem to be an option to choose (base) read.table(), butyou can edit the Code Preview and change whatever you like. To the lower left isa panel with options to select. This time, the Delimiter dropdown menu offers'other' and you can specify '|' for example, which automatically changes theCode Preview to use read_delim() instead of read_csv.Unfortunately, read_delim() seems incapable of reading files where thefirst line has fewer entries than the rest of the lines. The previewer gives nowarning (although you can see the headings are misaligned if you look). Whenyou click 'Import', however, the console fills with error messagesending with

In rbind(names(probs), probs_f) : number of columns of result is not a multiple of vector length (arg 1)

That's a fairly cryptic error message for a parsing failure, but the materialthat comes before that is somewhat more helpful. Curiously, despite the copiouserror messages it does not abort the import and happily reads the data into thewrong columns, pruning off the last column of data instead of manufacturing acolumn heading for the last column. As far as I know, this is standard behaviorfor readr functions, but it's disturbing.

Since read_delim() is a readr function, it returns a tibbleinstead of a data.frame whether that's what you want or not. I STRONGLY advise against using tibbles in LabDSV; row.names are critical and tibbles often mangle them. There is obviously more to say, but I'll leave it for the moment.

If you're really stubborn, you can edit the code in the Code Preview window to dowhat you want, but every time you touch something in the options panel RStudiowill rewrite the Code Preview window to its liking. Frankly, it's much easierto use the console.

The Ugly

Ridiculous Graphics

RStudio totally hoses the outstanding graphics capabilities of R. By default, in RStudiographics get plotted to a little window in a box that is much too small to allowfor a decent font or resolution. It's cramped and ugly. Worse yet, it decides on the X and Y limits depending on the size and shape of that little box. There is an option to'zoom' the plot, which produces another re-sizable graphics window. The windowis re-sizable, but it's not possible to specify an actual dimension or fontpoint size so that it's difficult to simulate a graphic you would include in alab report or publication. ?RStudioGD offers no help.

Much worse, from the perspective of LabDSV, while you can resize it (with'automatic' font and glyph size re-scaling) you cannot interact with the zoomedwindow; you still have to precisely identify points in the tiny window. You can(and generally have to) resize the box (at the expense of the console and scripteditor) to actually see what you're doing. Equally bad, it doesn't render thepoints as you draw (just pops up a stupid blue bubble), and you can't see whatyou have drawn until you click 'finish' or 'ESC'. It's ridiculously easy tomake mistakes, and if you do, all you can do is start over. RStudio usersshould avoid the RStudioGD graphics device like the plague.

Worse yet, in my mind, it only offers you a single window. It stores all ofyour plots and you can recall them later if you want, but you can't put twoplots up side-by-side to compare. You can do par(mfrow=(1,2)) to gettwo plots side-by-side, but now their aspect ratios are squished and ridiculousuntil you resize the box again at the expense of the console and script editor windows. It's difficult to switch back and forth btween them and interact withthem. It's lame.

I routinely have two or three graphics windows open with specified sizes, fontsizes and families. I can pop them to the front or back with the window managerand put them side-by-side or atop each other easily. It's beautiful.

The Script Editor

As I noted above, providing a built-in script editor is arguably a good idea,especially if you're stuck using Windows. If I had to use someone else's Windowscomputer that lacked a decent text editor I would undoubtedly be delighted tohave the one inside RStudio.

Strings For Quartz Font In R Studio For Mac 10 11 6 Download

While in exploratory mode I generally use R directly from the console. Encouragingusers to write reproducible scripts is a good idea as their work matures.However, I insist 'It's only a script if you source it.' Instead, whatI observe is that students enter all their interactive coding into the scriptwindow, as opposed to the console, and then highlight specific lines and clickon 'Run' to get it transferred down to the console. It's not just that it'sinefficient (and it is, just type the command in the freakin console), it leadsto horrible habits. The script window code gives the impression of a specificorder of execution, but when you just highlight random lines and click on 'Run'you have no real order of operation. You could easily highlight the same linetwice and get different results because you have executed other commands in themeantime. Now arguably this is user error and not RStudio's fault, but it should be impossibleto execute specific commands without re-executing all lines in the script thatprecede that line. So, instead of a script, at best the script editor becomes acloset of R code, and at worst a junk drawer of R code. It's dangerous.

The Insufferable

Window Manager

RStudio insists on managing your Windows according to its own rules. On mycomputer, to cut-and-paste I use the first and second mouse buttons with no needto use the keyboard. In RStudio to transfer code from the console to thescript window I have to use the heinous CTRL-C/CTRL-V mantra. Worse, on my computer Iuse 'focus follows Mouse.' I.e., whatever window my mouse cursor is in gets focus andI don't have to click in the window to get the old one to let go and get the newone to pay attention. Having to click in a window before you can type in it islike having to punch somebody in the nose before you can talk to them; it'sviolent and unnecessary. And it's infuriating when you are constantly typing inthe wrong window after deliberately moving the cursor to the window you want.Students have heard many an expletive from me when I have to work on Windows,and RStudio does its damnedest to convert my linux machine to Windows.

Command Redirection

In R I can generally execute the system() command to get to the shellinterpreter and execute miscellaneous commands. In RStudio I can sometimes dothat, i.e. system('ls -l') works and gives a list of files from mycurrent directory in the console. On the other hand, something as simple assystem('more filename.txt') crashes RStudio and you lose all your work.It's unforgivable. There is a separate 'terminal' tab in the console windowthat allows interaction with the system, but it's another unnecessary nuisancein using RStudio.

Script Editor and fix()

In R you can enter fix(function_name)and pop up an editing session withthe editor of your choice. When you are done, you exit the editor and it savesa copy of your function into your current workspace. If you are comfortable inyour editor it's a Godsend. In RStudio, if you enterfix(function_name) it pops up a ridiculous little postage stamp of aneditor and completely ignores your choice of editor. Even though I specified viemulation in the script editor the fix()

Strings For Quartz Font In R Studio For Mac 10 9

editor ignores that and givesme some primitive editing functions. To further theaggravation, the window is not re-sizable, but rather offers a horizontalscrolling bar if your text is wider than some small number of characters. Give me a break!

Instead, you have to use the script editor (even though you're writing afunction, not a script). When you're done, however, you can't just enter theexit command and save the function to your workspace. You can write it to a filewith the 'Source on Save' box ticked, or you can highlight all the lines and click on the Run button. However, that saves the whole thing to your consoleone line at a time, scrolling off anything else you were interested in and cloggingup your .Rhistory file unnecessarily. It's lame.

Best Practices in RStudio

If you find that RStudio provides a better way to interact with R then by allmeans make use of it. However, I strongly suggest the following:

Strings For Quartz Font In R Studio For Mac Statistics

  • Do NOT import data using the RStudio file-based data import tool. It will hose your data and make your life miserable. The drop-down menu allows .read.table()and is semi-functional but certainly no better then the console.
  • Take your mouse, grab the vertical bar separating the script window andconsole from the other two windows and push the bar all the way to the right toeliminate the 'Environment, History, Connections' and 'Files, Plots, etc'windows. You can always get them back if you want, but get them out of theway.
  • Click in the console window, and enter x11() to pop up a floatinggraphics window to plot to. If you're on a Mac, enter quartz().As I noted above, you can specify a specific window sizeand font point size and family if you prefer. Enter the command more than once if you want to do side-by-side comparisons or have multiple plots visible at the sametime. The windows will get numbered (starting with 2) and you can specify whichwindow is the current device with dev.set(2) for example.
  • If you're not actually writing a script intended to be run from the first lineto the last, enter your commands directly in the console, not the scripteditor. Don't worry, you have permission to enter text there; it doesn't belongto RStudio exclusively. The order of operation is saved to the .Rhistoryfile so you know exactly what you have done. Later, you can easily edit the .Rhistoryfile into a script if desired.




broken image