Home My Page Projects Code Snippets Project Openings diderot
Summary Activity Tracker Tasks SCM

SCM Repository

[diderot] Annotation of /trunk/doc/whitepaper/design.tex
ViewVC logotype

Annotation of /trunk/doc/whitepaper/design.tex

Parent Directory Parent Directory | Revision Log Revision Log


Revision 317 - (view) (download) (as text)

1 : jhr 233 %!TEX root = paper.tex
2 :     %
3 :    
4 :     \section{A DSL for image analysis}
5 :     \label{sec:design}
6 :    
7 :     One of the major contributions of this research will be the design of a domain-specific
8 :     language (DSL), which we call \lang{}, for image-analysis algorithms.
9 :     The goals of this design are as follows:
10 :     \begin{enumerate}
11 :     \item
12 :     Abstract away from the discrete nature of the image data and present the programmer
13 :     with a continuous model of the image.
14 :     \item
15 :     Support the programming of image-analysis algorithms using the mathematical
16 :     properties of the continuous data and its derived attributes.
17 :     \item
18 :     Expose the inherent parallelism of the algorithms, while abstracting away from the
19 :     details of how computations are mapped onto parallel hardware.
20 :     \end{enumerate}%
21 :    
22 :     In the remainder of this section, we describe our preliminary thoughts about what the programming
23 :     model of this language will be.
24 :     It is important to understand that this discussion represents very preliminary ideas
25 :     about the design space.
26 :     In particular, there is a tradeoff between the flexibility provided by lower-level
27 :     general-purpose language constructs and the expressiveness advantages of more abstract
28 :     higher-level mechanisms.
29 :     In the discussion below, we have biased the design toward more general-purpose
30 :     constructs, but with the expectation that the design will get higher level as
31 :     it evolves.
32 :    
33 : jhr 317 \subsection{Diderot by example}
34 :    
35 :     \begin{figure}[t]
36 :     \begin{quote}
37 :     \input{vr-phong-a}
38 :     \end{quote}%
39 :     \end{figure}%
40 :     \begin{figure}[p]
41 :     \begin{quote}
42 :     \input{vr-phong-b}
43 :     \end{quote}%
44 :     \end{figure}%
45 :    
46 : jhr 233 \subsection{Design sketch}
47 :     In this section we sketch a preliminary design for \lang{} using the examples from
48 :     the previous section as motivation.
49 :    
50 :     A \lang{} program has three logical parts.
51 :     The first are global declarations of the datasets and fields that the
52 :     analysis algorithm computes over, as well as other global variables that
53 :     control the analysis.
54 :     The datasets and global variables (but not the derived fields) are the
55 :     inputs to the program.
56 :     The second part defines \emph{actors}, which are independent computational
57 :     agents that define the algorithm.
58 :     The third part specifies the coordination of the actors --- how they interact and
59 :     what the global termination conditions are.
60 :    
61 :     \subsubsection{Types}
62 :     \lang{} is a statically-typed language.
63 :     It has the usual scalar types, including booleans, integer types of various sizes,
64 :     and single and double-precision floating-point values.
65 :     Similar to graphics-shader languages~\cite{opengl-shading:3ed} and
66 :     OpenCL~\cite{opencl-spec-v1}, it also includes small-dimension vector and matrix types.
67 :     In addition to these basic computational types, \lang{} also includes \emph{datasets},
68 :     \emph{fields}, and \emph{actor} types, which are discussed below.
69 :     Our initial design does not include user-defined data structures, but we expect to
70 :     add them at some point in the language evolution.
71 :    
72 :     \subsubsection{Datasets}
73 :     An image-analysis program begins with the image data that is to be analyzed.
74 :     In \lang{}, the image data is represented by a dataset, which is a
75 :     multi-dimensional array of scalars, vectors, or tensors.
76 :     These data sets come from scientific or medical imaging instruments.
77 :     We provide a simple form for declaring a data set.
78 :     For example, the following declares \texttt{data} to be a three-dimensional
79 :     array of floating-point values.
80 :     \begin{quote}\begin{lstlisting}
81 :     dataset float[][][] data;
82 :     \end{lstlisting}\end{quote}%
83 :     Datasets also come with their own coordinate system, which defines how
84 :     integer indices map to world coordinates.
85 :    
86 :     Datasets are not limited to arrays of scalars; they may also have vector or
87 :     tensor elements.
88 :     On disk, datasets are stored using the NRRD (Nearly Raw Raster Data) format~\cite{nrrd},
89 :     which supports multidimensional arrays of multidimensional data, such as those produced
90 :     by modern imaging instruments.
91 :    
92 :     \subsubsection{Fields and kernels}
93 :     As discussed in \secref{sec:image}, we are interested in image analysis algorithms that
94 :     work with the continuous field that is reconstructed from discrete data.
95 :     Our DSL supports these algorithms by including continuous fields as a primitive notion.
96 :     Fields can be defined by applying a convolution kernel to a dataset.
97 :     For example, here we define a three-dimensional field generated from the dataset \text{data}
98 :     using a convolution kernel named ``\texttt{cubic}''
99 :     \begin{quote}\begin{lstlisting}
100 :     field F = cubic (data);
101 :     \end{lstlisting}\end{quote}%
102 :     The dimension of the field is inferred from the dimension of the dataset.
103 :     In our initial design, we expect to provide a predefined set of convolution kernels
104 :     (essentially those already supported by Teem~\cite{teem}).
105 :    
106 :     The main operation of fields is the \emph{probe}, which extracts the value at some
107 :     location.
108 :     This operation is similar to indexing an array, except that the indices are
109 :     floating-point values.
110 :     For example,
111 :     \begin{quote}\begin{lstlisting}
112 :     F@(x, y, z)
113 :     \end{lstlisting}\end{quote}%
114 :     evaluates to the scalar value of the field \texttt{F} at the coordinate $(\mathtt{x},\mathtt{y},\mathtt{z})$.
115 :     We can also define new fields by applying operators to them.
116 :     For example,
117 :     \begin{quote}\begin{lstlisting}
118 :     (D (D F)) @ v
119 :     \end{lstlisting}\end{quote}%
120 :     probes the Hessian field derived by taking the second derivative of \texttt{F} at the
121 :     location specified by the vector \texttt{v}.
122 :     An important feature of the design is that operations, such as $\mathrm{FA}$
123 :     (fractional anisotropy of a tensor), are treated as
124 :     higher-order operators that map fields to fields (\ie{}, a field of tensors to a field of
125 :     scalars).
126 :     % implies other fields:
127 :     % D F --- gradient of F (has type float[3])
128 :     % D (D F) --- Hessian of F (has type float[3][3])
129 :     %
130 :     % QUESTION: is the domain of a field always in the interval [0..1] (for each dimension)
131 :     % or should we allow the programmer to specify the domain?
132 :     % ANSWER: domain is part of nrrd file format.
133 :    
134 :     \subsection{Other global definitions}
135 :     In addition to datasets and fields, a \lang{} program may have other global
136 :     definitions.
137 :     These include auxiliary functions, such as the color composition and transfer
138 :     functions for direct-volume rendering (\secref{sec:dvr}),
139 :     and program inputs, which are denoted using the \kw{in} keyword:
140 :     \begin{quote}\begin{lstlisting}
141 :     in float timeStep; // simulation timestep
142 :     \end{lstlisting}\end{quote}%
143 :     Other examples of globals include constants, such as the matrix that maps
144 :     pixel positions to image-plane positions in direct-volume rendering.
145 :     Note that global variables are immutable; \lang{} does not have an explicit notion
146 :     of global state.
147 :    
148 :     \subsubsection{Actors}
149 :     Actors are an abstraction of the computations that the image analysis performs.
150 :     Examples include the rays in direct-volume rendering (\secref{sec:dvr}),
151 :     the fibers in fiber tractography (\secref{sec:fibers}), and the
152 :     particles in a particle system (\secref{sec:particles}).
153 :     A given program may have multiple types of actors, but we expect that most algorithms
154 :     will involve only a single actor definition.
155 :    
156 :     An actor definition is similar to a class declaration in an object-oriented
157 :     language, and includes declarations of its local state, initialization,
158 :     and an \kw{update} method.
159 :     The \kw{update} methods of actors are written using a simple C-style language that is
160 :     similar to the graphics shader language GLSL~\cite{opengl-shading:3ed}.
161 :     For example, the direct volume rendering algorithm (Algorithm~\ref{alg:volrend})
162 :     is implemented as the following actor definition:
163 :     \begin{quote}\begin{lstlisting}
164 :     actor Ray (vec2i p)
165 :     {
166 :     // map pixel location to 3D position in image plane
167 :     vec3f pos = ToImagePlane * ((float)p.x, (float)p.y, 1.0);
168 :     // ray direction
169 :     vec3f dir = normalize(pos - eyePosition);
170 :     float t = 0.0;
171 :     vec4f color = initialColor;
172 :    
173 :     update () {
174 :     vec3f newPos = pos + t*dir;
175 :     if (newPos.z < farClip) {
176 :     color = compose(color, transferFn(F@newPos));
177 :     if (color.a == 1.0)
178 :     stable; // color is opaque, so we are done
179 :     else
180 :     t += timeStep;
181 :     }
182 :     else
183 :     stable;
184 :     }
185 :     }
186 :     \end{lstlisting}\end{quote}%
187 :     In this example, the \emph{instance variables} \texttt{pos}, \texttt{dir},
188 :     \texttt{t}, and \texttt{color} comprise the local state of an actor.
189 :     The \kw{update} method evolves this state until the actor reaches an
190 :     \emph{stable} state (denoted by executing the \kw{stable} statement).
191 :     An actor may also declare itself dead using the \kw{die} statement and actors may
192 :     create new actors using the \kw{new} statement.
193 :    
194 :     For some algorithms, such as those described in \secref{sec:particles}, it is
195 :     necessary for actors to examine the state of nearby actors.
196 :     This behavior has two aspects: the definition of an actor's \emph{neighborhood},
197 :     which is defined declaratively in the global-coordination part of the program (see below),
198 :     and the computation on neighbors.
199 :     An actor's neighborhood can be represented as array of actors that are passed
200 :     as a parameter to the \kw{update} method.
201 :     The \kw{update} method can query, but not modify, the instance variables of these actors.
202 :     % actor states: active, stable, dead
203 :     % actors can create new actors
204 :    
205 :     While the actor \kw{update} methods have some similarity to shader-language programs
206 :     and OpenCL kernels, there are some important differences.
207 :     First, actors compute over the continuous fields defined in the program\footnote{
208 :     Shader programs treat texture data as continuous, but the interpolation mechanisms
209 :     used in graphics applications are not suitable for image-analysis problems.
210 :     } instead of discrete data.
211 :     Second, actors can interact with other actors.
212 :     Third, actors can die or become stable.
213 :     Lastly, actors can create new actors.
214 :    
215 :     % note: system could have multiple kinds of actors?
216 :    
217 :     % QUESTION: how are actor interactions specified?
218 :    
219 :     \subsubsection{Global coordination}
220 :     The last part of a \lang{} specification is the definition of the global properties of the
221 :     algorithm.
222 :     These include the initial set of actors, which for the direct volume rendering
223 :     application might be defined using a set-comprehension notation
224 :     \begin{quote}\begin{lstlisting}
225 :     initially { { Ray (vec2i(i, j)) | i in 0..1023 } | j in 0..1023 };
226 :     \end{lstlisting}\end{quote}%
227 :     The global properties also include global termination conditions,
228 :     such a limit on the number of iterations or a predicate defined on a
229 :     global property, such as global energy, and a mechanism for reading out
230 :     the result of the computation from the local actor states.
231 :     Lastly, global coordination includes properties, such as the neighbors
232 :     of a particle, that define inter-agent interactions.
233 :    
234 :     \subsubsection{Semantics}
235 :    
236 :     The semantics of a \lang{} program is based on an iterative model that is similar
237 :     to bulk-synchronous parallelism.
238 :     Informally, the state of an actor is represented by a triple: $\langle{}a, \sigma, S\rangle{}$,
239 :     where $a$ is a unique actor name, $\sigma$ is a mapping from instance variables to values,
240 :     and $S$ is the actor's status, which is either $\mathbf{ACTIVE}$ or $\mathbf{STABLE}$.
241 :     The state of a program $\mathcal{P}$ is then defined to be a set of actor states.
242 :     The \kw{update} method is then modeled as a curried function that takes the program state
243 :     and an actor name and produces a set of actor states.
244 :     The \kw{update} method returns a set, since the actor may die, in which case the result is the
245 :     empty set, or may create new actors, which will also be included in the resulting set.
246 :     A single iteration of the program can then be defined by
247 :     \begin{displaymath}
248 :     \mathcal{P}_{i+1}
249 :     = \mathrm{Stable}(\mathcal{P}_i) \cup \left( \bigcup_{\langle{}a,\sigma,S\rangle{} \in \mathrm{Active}(\mathcal{P}_i)}{U\,\mathcal{P}_i\,a} \right)
250 :     \end{displaymath}%
251 :     where $U$ is the function denoted by the \kw{update} method and
252 :     \begin{eqnarray*}
253 :     \mathrm{Active}(\mathcal{P}) & = &
254 :     \{ \langle{}a,\sigma, \mathbf{ACTIVE}\rangle{} | \langle{}a, \sigma, \mathbf{ACTIVE}\rangle \in \mathcal{P} \} \\
255 :     \mathrm{Stable}(\mathcal{P}) & = &
256 :     \{ \langle{}a,\sigma, \mathbf{STABLE}\rangle{} | \langle{}a, \sigma, \mathbf{STABLE}\rangle \in \mathcal{P} \}
257 :     \end{eqnarray*}%
258 :     Note that this rule has the effect of discarding the dead actors.
259 :     A program has reached a terminal state when there are no active actors left.
260 :    
261 :     Global coordination is modeled by a program state to program state function
262 :     that is applied between iterations.
263 :     At this time, we also check for global termination conditions.
264 :    
265 :     Because the \kw{update} method is given the program state $\mathcal{P}_i$ to
266 :     compute the actor's $i+1$ state, each actor's update is independent of the others (even if
267 :     they depend on a neighboring actor's state).
268 :     This property makes the program amenable for efficient parallel execution.

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0