Support



R AnalyticFlow First Guide

Version 1.0.6

Nov 25, 2011


Chapter 1. Introduction

About R AnalyticFlow

R AnalyticFlow allows a wide variety of data analysis by drawing flowcharts. Each process in a flowchart is written in the "R language", a powerful programming language for data analysis. A flowchart is transformed into R script by "running" it, and executed by the R system. Once a flowchart is drawn it is easy to run, so you can effectively share processes of data analysis even with non-R-user.

More than 2,000 (In Nov 2009) additional packages for R are available, where state-of-the-art data analysis methods for various domains are implemented. In addition, transformation between R scripts and flowcharts is supported. R AnalyticFlow makes the great data analysis engine R more useful and convenient.

About this document

By reading this document you will learn how to use R AnalyticFlow in tutorial style. First you install the software by following this document. Then you will learn basic skills to conduct data analysis using R AnalyticFlow.

Chapter 2. Installation of preview edition

Installation on Windows

This document explains how to install R AnalyticFlow on Windows. Follow this instruction to install the software properly.

Note

All the installation processes should be done with administrator authority.

System Requirements

  • OS: Windows (XP SP3 recommended. May run on 7 / Vista.)

  • R (from 2.5.1 to 2.14.x(32-bit))

Download

Installer program for windows platforms can be obtained from R AnalyticFlow website .

Install R

R is required for R AnalyticFlow. To obtain R, visit CRAN (The Comprehensive R Archive Network) mirror near you.

Install R AnalyticFlow

Note

If you have an earlier version of R AnalyticFlow on your system, please uninstall it before proceeding.

Execute R AnalyticFlow installer, and follow the instruction to install the software.

After the installation, a desktop icon will be created. Double-click the icon to execute R AnalyticFlow. If the installation is successful, two windows will be appeared after initial configurations.

In the case where you encounter some error, or the program does not run, uninstall all the software installed here and try re-install in the above order with administrator authority.

Installation on Linux

This document explains how to install R AnalyticFlow on Linux. Follow this instruction to install the software properly.

Note

All the installation processes should be done with administrator authority. This document supposes that you are in sodoers; you can use sudo command to behave as a super user.

System Requirements

  • Java SE Development Kit (Version 6)

  • R-2.5.1 or higher

  • R add-on packages (codetools and rJava)

As of November 2009, a quick validation has been done with the following distributions. Descriptions in this document are based on them.

  • Fedora 8

  • Ubuntu 7.10

Note

This document is witten to be useful; however, some parts might be inapplicable in your environment. We appreciate if you can share information on adjustments needed for particular systems.

Download

Download tar.gz archive from R AnalyticFlow website download page.

Install Java

Sun Microsystems Java SE Development Kit is required (Version 6). JDK is available from Sun Microsystems website . Linux installation process is explained in this installation notes by Sun Microsystems.

Ubuntu (or other Debian-based distributions)

The following command is available for Ubuntu (sources.list may need to be edited).

sudo aptitude install sun-java6-jdk

Install R

R-2.5.1 or higher is required for R AnalyticFlow. To obtain R, visit CRAN (The Comprehensive R Archive Network) mirror near you.

Ubuntu (or other Debian-based distributions)

The following command is available (sources.list may need to be edited).

sudo aptitude install r-base

Fedora

The following command is available.

sudo yum install r-base

R configuration

Execute the following command from console:

sudo R CMD javareconf

If it doesn't work well, JAVA_HOME may need to be specified. See the following example:

sudo R CMD javareconf JAVA_HOME=/usr/java/default

Install R add-on packages

Run R from console:

sudo R

Execute the following from the R console:

install.packages(c("codetools", "rJava"))

Run R AnalyticFlow

First extract the tar.gz archive:

tar xzvf RAnalyticFlow_Linux_1.0.6.tar.gz

Then a directory named "RAnalyticFlow_1.0.6" is created. Execute the following command in the directory to run R AnalyticFlow. If the installation is successful, two windows will be appeared after initial configurations.

./rflow &

Finally make a symbolic link to a directory in the path. For example:

ln -s rflow /usr/local/bin/rflow

Troubleshooting

Character corruption occurs in the R console

Set environment variable LANG properly with UTF-8 encoding. To use Japanese font, for example, type as follows:

export LANG=ja_JP.UTF-8

In the same way, use zh_CN.UTF-8 to use simplified Chinese, etc.

Some characters are displayed as ◻ in the R console

Fonts might not be installed properly. The following are examples to use Japanese sazanami font. In Debian, type

sudo aptitude install ttf-sazanami-gothic sudo
aptitude install ttf-sazanami-mincho

to install sazanami font.

In Fedora 8,

mkdir /usr/java/default/jre/lib/fonts/fallback
cd /usr/java/default/jre/lib/fonts/fallback
ln -s /etc/X11/fontpath.d/sazanami-fonts-gothic/sazanami-gothic-ttf

to set JRE to use sazanami font.

Installation on Mac OS X

This document explains how to install R AnalyticFlow on Mac OS X. Follow this instruction to install the software properly.

Note

All the installation processes should be done with administrator authority.

System Requirements

  • Java SE Runtime Environment Version 6

  • R-2.5.1 or higher

  • R add-on packages (codetools and rJava)

Note

As of December 2009, a quick validation has been done with the following environment: Mac OS X 10.5 (Leopard), Java SE 6 (64-bit) and R-2.10.1 (R-2.10.1.pkg, Three-way universal binary).

Download

Download zip file from R AnalyticFlow website download page.

Install Java

Apple Java (SE 6) is required. JDK is available from Apple website .

Configure Java

R AnalyticFlow should be run under Java SE 6. Configure Java with the Java Preferences utility (under /Applications/Utilities/).

Install R

R-2.5.1 or higher is required for R AnalyticFlow. To obtain R, visit CRAN (The Comprehensive R Archive Network) mirror near you.

Install R add-on packages

Execute the following from the R console:

install.packages(c("codetools", "rJava"))

Run R AnalyticFlow

Double-click on the zip file to expand. Then just drag and drop the expanded application icon to an appropriate folder (e.g. Applications) to complete the installation.

Double-click on application icon to execute the program. If the installation is successful, two windows will be appeared after initial configurations.

Setup

Launch R AnalyticFlow

Windows

To launch the software, double-click on desktop icon or R AnalyticFlow document (extension .rflow) file icon . Or you can go through "Start" > "All Programs" > "R AnalyticFlow" and click on "R AnalyticFlow".

Linux

Execute the following command:

rflow

To open an R AnalyticFlow document (.rflow) on start-up, specify the file name as follows:

rflow BostonAnalysis.rflow &

Then the file "BostonAnalysis.rflow" is opened.

Mac OS X

Double-click on application icon or R AnalyticFlow document (extension .rflow) file icon .

Initial settings

The following settings are required for each user, when the software is launched at the first time.

R Environment (Windows only)

Set R Home Directory that this software uses. (e.g., C:\Program Files\R\R-2.10.0 )

Initial Working Directory

Set default working directory for a new project. When the software is launched without existing .rflow file, this directory is used as a working directory. If the directory specified does not exist, it will newly be created.

With this option on, you will be asked "Copy sample files to the working directory?" after clicking on the "OK" button. If you click on "Yes" (recommended), sample files will be copied to the default working directory specified here.

Update checking

Set the update check function on/off. If this option is enabled, the software automatically checks its update when launched. The system should be connected to the Internet to enable this function.

Language

Select a language for user interfaces.

Screen layout

When R AnalyticFlow is launched, two windows appear as follows (The images are captured on Windows XP and should differ in other OSes).

Main window

In this window you draw/run an analysis flow.

Console window

This window displays output from R. You can also input R commands directly on this window.

Quit R AnalyticFlow

To quit R AnalyticFlow, click on "File" > "Exit" from the main window menu, or click on the close button on the main window. A confirmation dialogue will appear if there is an unsaved change in the current flow.

Chapter 3. Tutorial

Introduction

About this tutorial

In this tutorial you will learn how to conduct data analysis using R AnalyticFlow. In this section you will learn the basic concept of R AnalyticFlow. Then in the next section you will learn basic operations of the software through illustrative examples.

About analysis flow

In R AnalyticFlow you conduct data analysis by drawing an analysis flow. Let us draw an analysis flow represented by the following short R script:

data(iris)
plot(iris[, 1:4], col = as.integer(iris$Species) + 1)
boxplot(Petal.Length ~ Species, data = iris, col = 3, main = "Petal.Length")

You do not have to care about the details of this script. Summing up, it performs the following process in order:

  1. Loading data

  2. Drawing a scatterplot

  3. Drawing a boxplot

An analysis flow is a graphical representation of such a process. This example can be described as follows:

A graphic symbol (disk icon / triangle) is called a node, and an arrow between nodes is called an edge. The combination of nodes and edges represents the order that processes are executed.

In R AnalyticFlow you draw such an analysis flow and conduct data analysis by executing processes according to the flow. Analysis flow makes it easier to overview processes of data analysis, which helps smooth sharing of knowledge.

Basic operation

Launching R AnalyticFlow

Install R AnalyticFlow following Chapter 2, Installation of preview edition. See the section called “Setup” for initial setup, and launch the program.

Create a node

At first read sample data for analysis. In this tutorial "iris" data set is used from R sample data sets. This is a famous data measuring iris flowers. For details, see Fisher (1936) or type help(iris) in R console.

Basically all the processes are described by creating nodes in R AnalyticFlow. Create a node to read a data set here. Click on "Node" > "Create Simple Node" from the menu.

A Simple node represents a process which can be expressed by a single R expression. Into "Code", input

data(iris)

as follows:

Click on "OK", then a node is created on the flow area.

Execute a node

The process is not executed by creating a node. To execute a process described as a node, right-click on the node and click on "Run". Now run on the data node we created. Then you will have the following output in the console window:

> data(iris)
> 

By running on a single node, the R code described in the node is sent to the console and executed. Here you can see that data(iris) has been executed and R is waiting for the next input.

Execute from console

R codes can directly be executed from the console window. To execute the function head to look into the data, input as follows:

Push enter key to execute the code. Then the first certain rows are displayed, so you can see that this data have four quantitative variables (height and width of petal, sepal) and one qualitative variable (species of iris). In such a situation as a quick check of data, direct execution from console is useful.

Tip

You can also create a simple node from the console, by pushing control key (command key on Mac) and enter key together after inputting code. You can take a trial and error on the console, and leave only necessary things in a flow.

Draw an analysis flow

Next return to the main window to add another node. Click on blank space of the flow area, and you can see the node created earlier as follows:

The brackets indicates that this node is the last one which are already excuted. Click on this node to make it selected.

If a new node is created when another node is selected, an edge (arrow) is automatically drawn from the selected node to a new node. Click on "Node" > "Create Simple Node" from the menu, and input the following code:

plot(iris[, 1:4], col = as.integer(iris$Species) + 1)

An edge is drawn automatically, resulting in the flow as follows:

Run a flow

On running on a node in a flow, all nodes in the path are executed in order, from the root node (the first node in the path). If there is a node with brackets in the excution path, execution starts from the next node to the bracketed node. Right-click on the plot node we created, and run on it. The graphic windows displays a figure as follows:

A scatterplot with four quantitative variables are drawn. The points indicate Species, which suggests that iris species may be well discriminated if these quantitative variables are used in an efficient way.

Tip

Now look at the console window. You can see that only plot function was executed followed by head function we executed earlier.

> data(iris)
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
> plot(iris[, 1:4], col = as.integer(iris$Species) + 1)
> 

This is because the data is bracketed (already executed) so there is no need to execute the former path again. If you want to execute all the nodes in the path, use "Clear and Run" in spite of "Run". It clears all the objects on the workspace, and all the nodes on the path are executed from the root node to the node that is run.

Edge operation

The scatterplot suggests that Petal.Length varies widly according to Species.

Let us draw a boxplot to examine this relation closer. Select the plot node and crate a simple node with the following code:

boxplot(Petal.Width ~ Species, data = iris, col = 3, main = "Petal.Width")

Then the flow becomes as follows:

Since the boxplot node comes next to the plot node in this flow, they will be executed in this order. It is natural as a process of exploratory analysis — however, it is not necessary when you want to see each result separately.

Therefore we rearrange this flow so that the boxplot node comes next to the data node, in the same way as the plot node does. As there are several ways to do this, the easiest way is simply drawing a new edge from the data node to the boxplot.

First click on the data node (the source of the new edge) to be selected:

Next center-click (or Alt + click) on the boxplot (the destination of the new edge).

Then a new edge is drawn, replacing the existing edge. To make it more eye-friendly, drug the boxplot to reallocate it:

Now the edge replacement has been done. Here the plot node (which was executed at the last) does not come before the boxplot node. So when the flow is run on the boxplot node, excution starts from the data node.

Save a flow

Finally save the flow we have drawn. Click on "File" > "Save As" on the menu to save the current flow. A saved flow can be loaded by clicking on "File" > "Open".

Next steps

So far, you have learned basic usages of R AnalyticFlow. Now you can create and utilize analysis flows for various types of analyses.

R AnalyticFlow has many other functions which is not covered in this tutorial. The following sources are available to learn about them.

Self-learning example

Self-learning example files are available to learn various functions of R AnalyticFlow. These samples are placed at "Tutorial" directory under the default working directory, or "sample/Tutorial" directory under the installation directory.

There are some nodes which have comments. To see the comments, mouse-over and stop on these nodes.

NodeExample.rflow

In this flow you can learn two basic types of nodes, and the relationships between nodes and icons.

BoxExample.rflow

In this flow you can learn about the box functions. A box is a special node that can contain subflow (part of flow). With box functions complex flows can be organized and simplified.

CacheExample.rflow

In this flow you can learn about the cache function. If cache is set on a node, computational results are automatically saved on first run, and the results are loaded instead of re-computation on next run or later. Once you run a time-consuming part of the analysis, you can smoothly continue the remaining analysis.

Sample analyses

The following sample analyses are available. These samples are placed at "SampleAnalysis" directory under the default working directory, or "sample/SampleAnalysis" directory under the installation directory.

IrisAnalysis.rflow

An analysis of iris data which we used in this tutorial. This sample contains more detailed analyses; it includes creating a decision tree model to predict iris species, and validating the prediction error of the model.

An R script for the same analysis is placed at "script" directory under the sample directory, as "IrisAnalysis.R"

BostonAnalysis.rflow

This is a sample analysis of Boston housing data(Harrison and Rubinfeld (1978)).

This is a more practical example with a data mining framework. It includes the following analyses:

  • Explorative data analysis

  • Transformation of variables

  • Dividing data for training and testing

  • Training and validating a preditive model

  • Writing the result of prediction to a file

This example includes functions which could not be explained in this tutorial, for example; using "complex node" (see the "Transform" node in the flow), writing data into a file, or leaving comments on a node (you can see it by stopping the mouse pointer on the "Sampling" node).

Chapter 4. Conclusion

In this document you have learned basic usages of R AnalyticFlow. If you are familiar with R, you can create a new flow for your own analysis by clicking on "File" > "New" on the menu. Or you can also transform existing R scripts into a flow, by "File" > "Import from Source". Use R AnalyticFlow for various types of analyses.

R AnalyticFlow makes it more efficent to analyze data using R. In addition, this software is not only for experts of R. By using R AnalyticFlow, users at any level can easily share and reuse the processes data analysis made by R experts.

We wish that R AnalyticFlow makes data analysis better by smoother communication for sharing analysis processes.

References

[Fisher (1936)] The use of multiple measurements in taxonomic problems. R.A. Fisher. Copyright © 1936. Annals of Eugenics. 7, Part II. 179–188.

[Harrison and Rubinfeld (1978)] Hedonic prices and the demand for clean air. D. Harrison and D.L. Rubinfeld. Copyright © 1978. J. Environ. Economics and Management. 81–102.



page top