Version 1.0.6
Copyright © 2007, 2008, 2009, 2010, 2011 Ef-prime, Inc.
Nov 25, 2011
Table of Contents
Table of Contents

R AnalyticFlow allows a wide variety of data analysis by drawing flowcharts. Each process in a flowchart is written in the "R language", a powerful programming language for data analysis. A flowchart is transformed into R script by "running" it, and executed by the R system. Once a flowchart is drawn it is easy to run, so you can effectively share processes of data analysis even with non-R-user.
More than 2,000 (In Nov 2009) additional packages for R are available, where state-of-the-art data analysis methods for various domains are implemented. In addition, transformation between R scripts and flowcharts is supported. R AnalyticFlow makes the great data analysis engine R more useful and convenient.
This document explains how to install R AnalyticFlow on Windows. Follow this instruction to install the software properly.
Note
All the installation processes should be done with administrator authority.
OS: Windows (XP SP3 recommended. May run on 7 / Vista.)
R (from 2.5.1 to 2.14.x(32-bit))
Installer program for windows platforms can be obtained from R AnalyticFlow website .
R is required for R AnalyticFlow. To obtain R, visit CRAN (The Comprehensive R Archive Network) mirror near you.
Note
If you have an earlier version of R AnalyticFlow on your system, please uninstall it before proceeding.
Execute R AnalyticFlow installer, and follow the instruction to install the software.
After the installation, a desktop icon
will be created. Double-click the icon to execute R AnalyticFlow.
If the installation is successful, two windows will be appeared after initial configurations.
In the case where you encounter some error, or the program does not run, uninstall all the software installed here and try re-install in the above order with administrator authority.
This document explains how to install R AnalyticFlow on Linux. Follow this instruction to install the software properly.
Note
All the installation processes should be done with administrator authority.
This document supposes that you are in sodoers;
you can use sudo command to behave as a super user.
Java SE Development Kit (Version 6)
R-2.5.1 or higher
R add-on packages (codetools and rJava)
As of November 2009, a quick validation has been done with the following distributions. Descriptions in this document are based on them.
Fedora 8
Ubuntu 7.10
Note
This document is witten to be useful; however, some parts might be inapplicable in your environment. We appreciate if you can share information on adjustments needed for particular systems.
Download tar.gz archive from R AnalyticFlow website download page.
Sun Microsystems Java SE Development Kit is required (Version 6). JDK is available from Sun Microsystems website . Linux installation process is explained in this installation notes by Sun Microsystems.
R-2.5.1 or higher is required for R AnalyticFlow. To obtain R, visit CRAN (The Comprehensive R Archive Network) mirror near you.
The following command is available (sources.list may need to be edited).
sudo aptitude install r-base
Execute the following command from console:
sudo R CMD javareconf
If it doesn't work well, JAVA_HOME may need to be specified. See the following example:
sudo R CMD javareconf JAVA_HOME=/usr/java/default
Run R from console:
sudo R
Execute the following from the R console:
install.packages(c("codetools", "rJava"))First extract the tar.gz archive:
tar xzvf RAnalyticFlow_Linux_1.0.6.tar.gz
Then a directory named "RAnalyticFlow_1.0.6" is created. Execute the following command in the directory to run R AnalyticFlow. If the installation is successful, two windows will be appeared after initial configurations.
./rflow &
Finally make a symbolic link to a directory in the path. For example:
ln -s rflow /usr/local/bin/rflow
Set environment variable LANG properly with UTF-8 encoding. To use Japanese font, for example, type as follows:
export LANG=ja_JP.UTF-8
In the same way, use zh_CN.UTF-8 to use simplified Chinese, etc.
Fonts might not be installed properly. The following are examples to use Japanese sazanami font. In Debian, type
sudo aptitude install ttf-sazanami-gothic sudo aptitude install ttf-sazanami-mincho
to install sazanami font.
In Fedora 8,
mkdir /usr/java/default/jre/lib/fonts/fallback cd /usr/java/default/jre/lib/fonts/fallback ln -s /etc/X11/fontpath.d/sazanami-fonts-gothic/sazanami-gothic-ttf
to set JRE to use sazanami font.
This document explains how to install R AnalyticFlow on Mac OS X. Follow this instruction to install the software properly.
Note
All the installation processes should be done with administrator authority.
Java SE Runtime Environment Version 6
R-2.5.1 or higher
R add-on packages (codetools and rJava)
Note
As of December 2009, a quick validation has been done with the following environment: Mac OS X 10.5 (Leopard), Java SE 6 (64-bit) and R-2.10.1 (R-2.10.1.pkg, Three-way universal binary).
Download zip file from R AnalyticFlow website download page.
Apple Java (SE 6) is required. JDK is available from Apple website .
R AnalyticFlow should be run under Java SE 6. Configure Java with the Java Preferences utility (under /Applications/Utilities/).
R-2.5.1 or higher is required for R AnalyticFlow. To obtain R, visit CRAN (The Comprehensive R Archive Network) mirror near you.
Execute the following from the R console:
install.packages(c("codetools", "rJava"))
Double-click on the zip file to expand.
Then just drag and drop the expanded application icon
to an appropriate folder (e.g. Applications) to complete the installation.
Double-click on application icon to execute the program. If the installation is successful, two windows will be appeared after initial configurations.
To launch the software, double-click on desktop icon
or R AnalyticFlow document (extension .rflow) file icon
. Or you can go through "Start" > "All Programs" > "R AnalyticFlow" and click on "R AnalyticFlow".
Execute the following command:
rflow
To open an R AnalyticFlow document (.rflow) on start-up, specify the file name as follows:
rflow BostonAnalysis.rflow &
Then the file "BostonAnalysis.rflow" is opened.
The following settings are required for each user, when the software is launched at the first time.
Set R Home Directory that this software uses. (e.g., C:\Program Files\R\R-2.10.0 )
Set default working directory for a new project. When the software is launched without existing .rflow file, this directory is used as a working directory. If the directory specified does not exist, it will newly be created.
With this option on, you will be asked "Copy sample files to the working directory?" after clicking on the "OK" button. If you click on "Yes" (recommended), sample files will be copied to the default working directory specified here.
Set the update check function on/off. If this option is enabled, the software automatically checks its update when launched. The system should be connected to the Internet to enable this function.
When R AnalyticFlow is launched, two windows appear as follows (The images are captured on Windows XP and should differ in other OSes).
Table of Contents
In this tutorial you will learn how to conduct data analysis using R AnalyticFlow. In this section you will learn the basic concept of R AnalyticFlow. Then in the next section you will learn basic operations of the software through illustrative examples.
In R AnalyticFlow you conduct data analysis by drawing an analysis flow. Let us draw an analysis flow represented by the following short R script:
data(iris) plot(iris[, 1:4], col = as.integer(iris$Species) + 1) boxplot(Petal.Length ~ Species, data = iris, col = 3, main = "Petal.Length")
You do not have to care about the details of this script. Summing up, it performs the following process in order:
Loading data
Drawing a scatterplot
Drawing a boxplot
An analysis flow is a graphical representation of such a process. This example can be described as follows:

A graphic symbol (disk icon / triangle) is called a node, and an arrow between nodes is called an edge. The combination of nodes and edges represents the order that processes are executed.
In R AnalyticFlow you draw such an analysis flow and conduct data analysis by executing processes according to the flow. Analysis flow makes it easier to overview processes of data analysis, which helps smooth sharing of knowledge.
Install R AnalyticFlow following Chapter 2, Installation of preview edition. See the section called “Setup” for initial setup, and launch the program.
At first read sample data for analysis.
In this tutorial "iris" data set is used from R sample data sets.
This is a famous data measuring iris flowers. For details, see Fisher (1936) or
type help(iris) in R console.
Basically all the processes are described by creating nodes in R AnalyticFlow. Create a node to read a data set here. Click on "Node" > "Create Simple Node" from the menu.

A Simple node represents a process which can be expressed by a single R expression. Into "Code", input
data(iris)
as follows:

Click on "OK", then a node is created on the flow area.

The process is not executed by creating a node.
To execute a process described as a node, right-click on the node and click on "Run".
Now run on the data node we created.
Then you will have the following output in the console window:
> data(iris) >
By running on a single node, the R code described in the node is sent to the console and executed.
Here you can see that data(iris) has been executed and R is waiting for the next input.
R codes can directly be executed from the console window.
To execute the function head to look into the data, input as follows:

Push enter key to execute the code. Then the first certain rows are displayed, so you can see that this data have four quantitative variables (height and width of petal, sepal) and one qualitative variable (species of iris). In such a situation as a quick check of data, direct execution from console is useful.
Tip
You can also create a simple node from the console, by pushing control key (command key on Mac) and enter key together after inputting code. You can take a trial and error on the console, and leave only necessary things in a flow.
Next return to the main window to add another node. Click on blank space of the flow area, and you can see the node created earlier as follows:

The brackets indicates that this node is the last one which are already excuted. Click on this node to make it selected.

If a new node is created when another node is selected, an edge (arrow) is automatically drawn from the selected node to a new node. Click on "Node" > "Create Simple Node" from the menu, and input the following code:
plot(iris[, 1:4], col = as.integer(iris$Species) + 1)
An edge is drawn automatically, resulting in the flow as follows:

On running on a node in a flow, all nodes in the path are executed in order, from the root node (the first node in the path).
If there is a node with brackets in the excution path, execution starts from the next node to the bracketed node.
Right-click on the plot node we created, and run on it.
The graphic windows displays a figure as follows:

A scatterplot with four quantitative variables are drawn. The points indicate Species, which suggests that iris species may be well discriminated if these quantitative variables are used in an efficient way.
Tip
Now look at the console window. You can see that only plot function was executed
followed by head function we executed earlier.
> data(iris) > head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa > plot(iris[, 1:4], col = as.integer(iris$Species) + 1) >
This is because the data is bracketed (already executed) so there is no need to execute the former path again.
If you want to execute all the nodes in the path, use "Clear and Run" in spite of "Run".
It clears all the objects on the workspace, and all the nodes on the path are executed from the root node to the node that is run.
The scatterplot suggests that Petal.Length varies widly according to Species.
Let us draw a boxplot to examine this relation closer.
Select the plot node and crate a simple node with the following code:
boxplot(Petal.Width ~ Species, data = iris, col = 3, main = "Petal.Width")
Then the flow becomes as follows:

Since the boxplot node comes next to the plot node in this flow,
they will be executed in this order. It is natural as a process of exploratory analysis —
however, it is not necessary when you want to see each result separately.
Therefore we rearrange this flow so that the boxplot node comes next to the data node,
in the same way as the plot node does.
As there are several ways to do this, the easiest way is simply drawing a new edge
from the data node to the boxplot.
First click on the data node (the source of the new edge) to be selected:

Next center-click (or Alt + click) on the boxplot (the destination of the new edge).

Then a new edge is drawn, replacing the existing edge.
To make it more eye-friendly, drug the boxplot to reallocate it:

Now the edge replacement has been done.
Here the plot node (which was executed at the last) does not come before the boxplot node.
So when the flow is run on the boxplot node, excution starts from the data node.
So far, you have learned basic usages of R AnalyticFlow. Now you can create and utilize analysis flows for various types of analyses.
R AnalyticFlow has many other functions which is not covered in this tutorial. The following sources are available to learn about them.
Self-learning example files are available to learn various functions of R AnalyticFlow.
These samples are placed at "Tutorial" directory under the default
working directory, or "sample/Tutorial" directory under the installation directory.
There are some nodes which have comments. To see the comments, mouse-over and stop on these nodes.
In this flow you can learn two basic types of nodes, and the relationships between nodes and icons.

In this flow you can learn about the box functions. A box is a special node that can contain subflow (part of flow). With box functions complex flows can be organized and simplified.

In this flow you can learn about the cache function. If cache is set on a node, computational results are automatically saved on first run, and the results are loaded instead of re-computation on next run or later. Once you run a time-consuming part of the analysis, you can smoothly continue the remaining analysis.

The following sample analyses are available.
These samples are placed at "SampleAnalysis" directory under the default
working directory, or "sample/SampleAnalysis" directory under the installation directory.
An analysis of iris data which we used in this tutorial. This sample contains more detailed analyses; it includes creating a decision tree model to predict iris species, and validating the prediction error of the model.

An R script for the same analysis is placed at "script" directory
under the sample directory, as "IrisAnalysis.R"
This is a sample analysis of Boston housing data(Harrison and Rubinfeld (1978)).

This is a more practical example with a data mining framework. It includes the following analyses:
Explorative data analysis
Transformation of variables
Dividing data for training and testing
Training and validating a preditive model
Writing the result of prediction to a file
This example includes functions which could not be explained in this tutorial, for example; using "complex node" (see the "Transform" node in the flow), writing data into a file, or leaving comments on a node (you can see it by stopping the mouse pointer on the "Sampling" node).
In this document you have learned basic usages of R AnalyticFlow. If you are familiar with R, you can create a new flow for your own analysis by clicking on "File" > "New" on the menu. Or you can also transform existing R scripts into a flow, by "File" > "Import from Source". Use R AnalyticFlow for various types of analyses.
R AnalyticFlow makes it more efficent to analyze data using R. In addition, this software is not only for experts of R. By using R AnalyticFlow, users at any level can easily share and reuse the processes data analysis made by R experts.
We wish that R AnalyticFlow makes data analysis better by smoother communication for sharing analysis processes.
page top



