Home My Page Projects Code Snippets Project Openings diderot
Summary Activity Tracker Tasks SCM

SCM Repository

[diderot] Annotation of /trunk/doc/whitepaper/design.tex
ViewVC logotype

Annotation of /trunk/doc/whitepaper/design.tex

Parent Directory Parent Directory | Revision Log Revision Log


Revision 233 - (view) (download) (as text)

1 : jhr 233 %!TEX root = paper.tex
2 :     %
3 :    
4 :     \section{A DSL for image analysis}
5 :     \label{sec:design}
6 :    
7 :     One of the major contributions of this research will be the design of a domain-specific
8 :     language (DSL), which we call \lang{}, for image-analysis algorithms.
9 :     The goals of this design are as follows:
10 :     \begin{enumerate}
11 :     \item
12 :     Abstract away from the discrete nature of the image data and present the programmer
13 :     with a continuous model of the image.
14 :     \item
15 :     Support the programming of image-analysis algorithms using the mathematical
16 :     properties of the continuous data and its derived attributes.
17 :     \item
18 :     Expose the inherent parallelism of the algorithms, while abstracting away from the
19 :     details of how computations are mapped onto parallel hardware.
20 :     \end{enumerate}%
21 :    
22 :     In the remainder of this section, we describe our preliminary thoughts about what the programming
23 :     model of this language will be.
24 :     It is important to understand that this discussion represents very preliminary ideas
25 :     about the design space.
26 :     In particular, there is a tradeoff between the flexibility provided by lower-level
27 :     general-purpose language constructs and the expressiveness advantages of more abstract
28 :     higher-level mechanisms.
29 :     In the discussion below, we have biased the design toward more general-purpose
30 :     constructs, but with the expectation that the design will get higher level as
31 :     it evolves.
32 :    
33 :     \subsection{Design sketch}
34 :     In this section we sketch a preliminary design for \lang{} using the examples from
35 :     the previous section as motivation.
36 :    
37 :     A \lang{} program has three logical parts.
38 :     The first are global declarations of the datasets and fields that the
39 :     analysis algorithm computes over, as well as other global variables that
40 :     control the analysis.
41 :     The datasets and global variables (but not the derived fields) are the
42 :     inputs to the program.
43 :     The second part defines \emph{actors}, which are independent computational
44 :     agents that define the algorithm.
45 :     The third part specifies the coordination of the actors --- how they interact and
46 :     what the global termination conditions are.
47 :    
48 :     \subsubsection{Types}
49 :     \lang{} is a statically-typed language.
50 :     It has the usual scalar types, including booleans, integer types of various sizes,
51 :     and single and double-precision floating-point values.
52 :     Similar to graphics-shader languages~\cite{opengl-shading:3ed} and
53 :     OpenCL~\cite{opencl-spec-v1}, it also includes small-dimension vector and matrix types.
54 :     In addition to these basic computational types, \lang{} also includes \emph{datasets},
55 :     \emph{fields}, and \emph{actor} types, which are discussed below.
56 :     Our initial design does not include user-defined data structures, but we expect to
57 :     add them at some point in the language evolution.
58 :    
59 :     \subsubsection{Datasets}
60 :     An image-analysis program begins with the image data that is to be analyzed.
61 :     In \lang{}, the image data is represented by a dataset, which is a
62 :     multi-dimensional array of scalars, vectors, or tensors.
63 :     These data sets come from scientific or medical imaging instruments.
64 :     We provide a simple form for declaring a data set.
65 :     For example, the following declares \texttt{data} to be a three-dimensional
66 :     array of floating-point values.
67 :     \begin{quote}\begin{lstlisting}
68 :     dataset float[][][] data;
69 :     \end{lstlisting}\end{quote}%
70 :     Datasets also come with their own coordinate system, which defines how
71 :     integer indices map to world coordinates.
72 :    
73 :     Datasets are not limited to arrays of scalars; they may also have vector or
74 :     tensor elements.
75 :     On disk, datasets are stored using the NRRD (Nearly Raw Raster Data) format~\cite{nrrd},
76 :     which supports multidimensional arrays of multidimensional data, such as those produced
77 :     by modern imaging instruments.
78 :    
79 :     \subsubsection{Fields and kernels}
80 :     As discussed in \secref{sec:image}, we are interested in image analysis algorithms that
81 :     work with the continuous field that is reconstructed from discrete data.
82 :     Our DSL supports these algorithms by including continuous fields as a primitive notion.
83 :     Fields can be defined by applying a convolution kernel to a dataset.
84 :     For example, here we define a three-dimensional field generated from the dataset \text{data}
85 :     using a convolution kernel named ``\texttt{cubic}''
86 :     \begin{quote}\begin{lstlisting}
87 :     field F = cubic (data);
88 :     \end{lstlisting}\end{quote}%
89 :     The dimension of the field is inferred from the dimension of the dataset.
90 :     In our initial design, we expect to provide a predefined set of convolution kernels
91 :     (essentially those already supported by Teem~\cite{teem}).
92 :    
93 :     The main operation of fields is the \emph{probe}, which extracts the value at some
94 :     location.
95 :     This operation is similar to indexing an array, except that the indices are
96 :     floating-point values.
97 :     For example,
98 :     \begin{quote}\begin{lstlisting}
99 :     F@(x, y, z)
100 :     \end{lstlisting}\end{quote}%
101 :     evaluates to the scalar value of the field \texttt{F} at the coordinate $(\mathtt{x},\mathtt{y},\mathtt{z})$.
102 :     We can also define new fields by applying operators to them.
103 :     For example,
104 :     \begin{quote}\begin{lstlisting}
105 :     (D (D F)) @ v
106 :     \end{lstlisting}\end{quote}%
107 :     probes the Hessian field derived by taking the second derivative of \texttt{F} at the
108 :     location specified by the vector \texttt{v}.
109 :     An important feature of the design is that operations, such as $\mathrm{FA}$
110 :     (fractional anisotropy of a tensor), are treated as
111 :     higher-order operators that map fields to fields (\ie{}, a field of tensors to a field of
112 :     scalars).
113 :     % implies other fields:
114 :     % D F --- gradient of F (has type float[3])
115 :     % D (D F) --- Hessian of F (has type float[3][3])
116 :     %
117 :     % QUESTION: is the domain of a field always in the interval [0..1] (for each dimension)
118 :     % or should we allow the programmer to specify the domain?
119 :     % ANSWER: domain is part of nrrd file format.
120 :    
121 :     \subsection{Other global definitions}
122 :     In addition to datasets and fields, a \lang{} program may have other global
123 :     definitions.
124 :     These include auxiliary functions, such as the color composition and transfer
125 :     functions for direct-volume rendering (\secref{sec:dvr}),
126 :     and program inputs, which are denoted using the \kw{in} keyword:
127 :     \begin{quote}\begin{lstlisting}
128 :     in float timeStep; // simulation timestep
129 :     \end{lstlisting}\end{quote}%
130 :     Other examples of globals include constants, such as the matrix that maps
131 :     pixel positions to image-plane positions in direct-volume rendering.
132 :     Note that global variables are immutable; \lang{} does not have an explicit notion
133 :     of global state.
134 :    
135 :     \subsubsection{Actors}
136 :     Actors are an abstraction of the computations that the image analysis performs.
137 :     Examples include the rays in direct-volume rendering (\secref{sec:dvr}),
138 :     the fibers in fiber tractography (\secref{sec:fibers}), and the
139 :     particles in a particle system (\secref{sec:particles}).
140 :     A given program may have multiple types of actors, but we expect that most algorithms
141 :     will involve only a single actor definition.
142 :    
143 :     An actor definition is similar to a class declaration in an object-oriented
144 :     language, and includes declarations of its local state, initialization,
145 :     and an \kw{update} method.
146 :     The \kw{update} methods of actors are written using a simple C-style language that is
147 :     similar to the graphics shader language GLSL~\cite{opengl-shading:3ed}.
148 :     For example, the direct volume rendering algorithm (Algorithm~\ref{alg:volrend})
149 :     is implemented as the following actor definition:
150 :     \begin{quote}\begin{lstlisting}
151 :     actor Ray (vec2i p)
152 :     {
153 :     // map pixel location to 3D position in image plane
154 :     vec3f pos = ToImagePlane * ((float)p.x, (float)p.y, 1.0);
155 :     // ray direction
156 :     vec3f dir = normalize(pos - eyePosition);
157 :     float t = 0.0;
158 :     vec4f color = initialColor;
159 :    
160 :     update () {
161 :     vec3f newPos = pos + t*dir;
162 :     if (newPos.z < farClip) {
163 :     color = compose(color, transferFn(F@newPos));
164 :     if (color.a == 1.0)
165 :     stable; // color is opaque, so we are done
166 :     else
167 :     t += timeStep;
168 :     }
169 :     else
170 :     stable;
171 :     }
172 :     }
173 :     \end{lstlisting}\end{quote}%
174 :     In this example, the \emph{instance variables} \texttt{pos}, \texttt{dir},
175 :     \texttt{t}, and \texttt{color} comprise the local state of an actor.
176 :     The \kw{update} method evolves this state until the actor reaches an
177 :     \emph{stable} state (denoted by executing the \kw{stable} statement).
178 :     An actor may also declare itself dead using the \kw{die} statement and actors may
179 :     create new actors using the \kw{new} statement.
180 :    
181 :     For some algorithms, such as those described in \secref{sec:particles}, it is
182 :     necessary for actors to examine the state of nearby actors.
183 :     This behavior has two aspects: the definition of an actor's \emph{neighborhood},
184 :     which is defined declaratively in the global-coordination part of the program (see below),
185 :     and the computation on neighbors.
186 :     An actor's neighborhood can be represented as array of actors that are passed
187 :     as a parameter to the \kw{update} method.
188 :     The \kw{update} method can query, but not modify, the instance variables of these actors.
189 :     % actor states: active, stable, dead
190 :     % actors can create new actors
191 :    
192 :     While the actor \kw{update} methods have some similarity to shader-language programs
193 :     and OpenCL kernels, there are some important differences.
194 :     First, actors compute over the continuous fields defined in the program\footnote{
195 :     Shader programs treat texture data as continuous, but the interpolation mechanisms
196 :     used in graphics applications are not suitable for image-analysis problems.
197 :     } instead of discrete data.
198 :     Second, actors can interact with other actors.
199 :     Third, actors can die or become stable.
200 :     Lastly, actors can create new actors.
201 :    
202 :     % note: system could have multiple kinds of actors?
203 :    
204 :     % QUESTION: how are actor interactions specified?
205 :    
206 :     \subsubsection{Global coordination}
207 :     The last part of a \lang{} specification is the definition of the global properties of the
208 :     algorithm.
209 :     These include the initial set of actors, which for the direct volume rendering
210 :     application might be defined using a set-comprehension notation
211 :     \begin{quote}\begin{lstlisting}
212 :     initially { { Ray (vec2i(i, j)) | i in 0..1023 } | j in 0..1023 };
213 :     \end{lstlisting}\end{quote}%
214 :     The global properties also include global termination conditions,
215 :     such a limit on the number of iterations or a predicate defined on a
216 :     global property, such as global energy, and a mechanism for reading out
217 :     the result of the computation from the local actor states.
218 :     Lastly, global coordination includes properties, such as the neighbors
219 :     of a particle, that define inter-agent interactions.
220 :    
221 :     \subsubsection{Semantics}
222 :    
223 :     The semantics of a \lang{} program is based on an iterative model that is similar
224 :     to bulk-synchronous parallelism.
225 :     Informally, the state of an actor is represented by a triple: $\langle{}a, \sigma, S\rangle{}$,
226 :     where $a$ is a unique actor name, $\sigma$ is a mapping from instance variables to values,
227 :     and $S$ is the actor's status, which is either $\mathbf{ACTIVE}$ or $\mathbf{STABLE}$.
228 :     The state of a program $\mathcal{P}$ is then defined to be a set of actor states.
229 :     The \kw{update} method is then modeled as a curried function that takes the program state
230 :     and an actor name and produces a set of actor states.
231 :     The \kw{update} method returns a set, since the actor may die, in which case the result is the
232 :     empty set, or may create new actors, which will also be included in the resulting set.
233 :     A single iteration of the program can then be defined by
234 :     \begin{displaymath}
235 :     \mathcal{P}_{i+1}
236 :     = \mathrm{Stable}(\mathcal{P}_i) \cup \left( \bigcup_{\langle{}a,\sigma,S\rangle{} \in \mathrm{Active}(\mathcal{P}_i)}{U\,\mathcal{P}_i\,a} \right)
237 :     \end{displaymath}%
238 :     where $U$ is the function denoted by the \kw{update} method and
239 :     \begin{eqnarray*}
240 :     \mathrm{Active}(\mathcal{P}) & = &
241 :     \{ \langle{}a,\sigma, \mathbf{ACTIVE}\rangle{} | \langle{}a, \sigma, \mathbf{ACTIVE}\rangle \in \mathcal{P} \} \\
242 :     \mathrm{Stable}(\mathcal{P}) & = &
243 :     \{ \langle{}a,\sigma, \mathbf{STABLE}\rangle{} | \langle{}a, \sigma, \mathbf{STABLE}\rangle \in \mathcal{P} \}
244 :     \end{eqnarray*}%
245 :     Note that this rule has the effect of discarding the dead actors.
246 :     A program has reached a terminal state when there are no active actors left.
247 :    
248 :     Global coordination is modeled by a program state to program state function
249 :     that is applied between iterations.
250 :     At this time, we also check for global termination conditions.
251 :    
252 :     Because the \kw{update} method is given the program state $\mathcal{P}_i$ to
253 :     compute the actor's $i+1$ state, each actor's update is independent of the others (even if
254 :     they depend on a neighboring actor's state).
255 :     This property makes the program amenable for efficient parallel execution.

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0