Home My Page Projects Code Snippets Project Openings diderot

# SCM Repository

[diderot] Annotation of /trunk/doc/whitepaper/design.tex
 [diderot] / trunk / doc / whitepaper / design.tex

# Annotation of /trunk/doc/whitepaper/design.tex

 1 : jhr 233 %!TEX root = paper.tex 2 : % 3 : 4 : \section{A DSL for image analysis} 5 : \label{sec:design} 6 : 7 : One of the major contributions of this research will be the design of a domain-specific 8 : language (DSL), which we call \lang{}, for image-analysis algorithms. 9 : The goals of this design are as follows: 10 : \begin{enumerate} 11 : \item 12 : Abstract away from the discrete nature of the image data and present the programmer 13 : with a continuous model of the image. 14 : \item 15 : Support the programming of image-analysis algorithms using the mathematical 16 : properties of the continuous data and its derived attributes. 17 : \item 18 : Expose the inherent parallelism of the algorithms, while abstracting away from the 19 : details of how computations are mapped onto parallel hardware. 20 : \end{enumerate}% 21 : 22 : In the remainder of this section, we describe our preliminary thoughts about what the programming 23 : model of this language will be. 24 : It is important to understand that this discussion represents very preliminary ideas 25 : about the design space. 26 : In particular, there is a tradeoff between the flexibility provided by lower-level 27 : general-purpose language constructs and the expressiveness advantages of more abstract 28 : higher-level mechanisms. 29 : In the discussion below, we have biased the design toward more general-purpose 30 : constructs, but with the expectation that the design will get higher level as 31 : it evolves. 32 : 33 : jhr 317 \subsection{Diderot by example} 34 : 35 : \begin{figure}[t] 36 : \begin{quote} 37 : \input{vr-phong-a} 38 : \end{quote}% 39 : \end{figure}% 40 : \begin{figure}[p] 41 : \begin{quote} 42 : \input{vr-phong-b} 43 : \end{quote}% 44 : \end{figure}% 45 : 46 : jhr 233 \subsection{Design sketch} 47 : In this section we sketch a preliminary design for \lang{} using the examples from 48 : the previous section as motivation. 49 : 50 : A \lang{} program has three logical parts. 51 : The first are global declarations of the datasets and fields that the 52 : analysis algorithm computes over, as well as other global variables that 53 : control the analysis. 54 : The datasets and global variables (but not the derived fields) are the 55 : inputs to the program. 56 : The second part defines \emph{actors}, which are independent computational 57 : agents that define the algorithm. 58 : The third part specifies the coordination of the actors --- how they interact and 59 : what the global termination conditions are. 60 : 61 : \subsubsection{Types} 62 : \lang{} is a statically-typed language. 63 : It has the usual scalar types, including booleans, integer types of various sizes, 64 : and single and double-precision floating-point values. 65 : Similar to graphics-shader languages~\cite{opengl-shading:3ed} and 66 : OpenCL~\cite{opencl-spec-v1}, it also includes small-dimension vector and matrix types. 67 : In addition to these basic computational types, \lang{} also includes \emph{datasets}, 68 : \emph{fields}, and \emph{actor} types, which are discussed below. 69 : Our initial design does not include user-defined data structures, but we expect to 70 : add them at some point in the language evolution. 71 : 72 : \subsubsection{Datasets} 73 : An image-analysis program begins with the image data that is to be analyzed. 74 : In \lang{}, the image data is represented by a dataset, which is a 75 : multi-dimensional array of scalars, vectors, or tensors. 76 : These data sets come from scientific or medical imaging instruments. 77 : We provide a simple form for declaring a data set. 78 : For example, the following declares \texttt{data} to be a three-dimensional 79 : array of floating-point values. 80 : \begin{quote}\begin{lstlisting} 81 : dataset float[][][] data; 82 : \end{lstlisting}\end{quote}% 83 : Datasets also come with their own coordinate system, which defines how 84 : integer indices map to world coordinates. 85 : 86 : Datasets are not limited to arrays of scalars; they may also have vector or 87 : tensor elements. 88 : On disk, datasets are stored using the NRRD (Nearly Raw Raster Data) format~\cite{nrrd}, 89 : which supports multidimensional arrays of multidimensional data, such as those produced 90 : by modern imaging instruments. 91 : 92 : \subsubsection{Fields and kernels} 93 : As discussed in \secref{sec:image}, we are interested in image analysis algorithms that 94 : work with the continuous field that is reconstructed from discrete data. 95 : Our DSL supports these algorithms by including continuous fields as a primitive notion. 96 : Fields can be defined by applying a convolution kernel to a dataset. 97 : For example, here we define a three-dimensional field generated from the dataset \text{data} 98 : using a convolution kernel named \texttt{cubic}'' 99 : \begin{quote}\begin{lstlisting} 100 : field F = cubic (data); 101 : \end{lstlisting}\end{quote}% 102 : The dimension of the field is inferred from the dimension of the dataset. 103 : In our initial design, we expect to provide a predefined set of convolution kernels 104 : (essentially those already supported by Teem~\cite{teem}). 105 : 106 : The main operation of fields is the \emph{probe}, which extracts the value at some 107 : location. 108 : This operation is similar to indexing an array, except that the indices are 109 : floating-point values. 110 : For example, 111 : \begin{quote}\begin{lstlisting} 112 : F@(x, y, z) 113 : \end{lstlisting}\end{quote}% 114 : evaluates to the scalar value of the field \texttt{F} at the coordinate $(\mathtt{x},\mathtt{y},\mathtt{z})$. 115 : We can also define new fields by applying operators to them. 116 : For example, 117 : \begin{quote}\begin{lstlisting} 118 : (D (D F)) @ v 119 : \end{lstlisting}\end{quote}% 120 : probes the Hessian field derived by taking the second derivative of \texttt{F} at the 121 : location specified by the vector \texttt{v}. 122 : An important feature of the design is that operations, such as $\mathrm{FA}$ 123 : (fractional anisotropy of a tensor), are treated as 124 : higher-order operators that map fields to fields (\ie{}, a field of tensors to a field of 125 : scalars). 126 : % implies other fields: 127 : % D F --- gradient of F (has type float[3]) 128 : % D (D F) --- Hessian of F (has type float[3][3]) 129 : % 130 : % QUESTION: is the domain of a field always in the interval [0..1] (for each dimension) 131 : % or should we allow the programmer to specify the domain? 132 : % ANSWER: domain is part of nrrd file format. 133 : 134 : \subsection{Other global definitions} 135 : In addition to datasets and fields, a \lang{} program may have other global 136 : definitions. 137 : These include auxiliary functions, such as the color composition and transfer 138 : functions for direct-volume rendering (\secref{sec:dvr}), 139 : and program inputs, which are denoted using the \kw{in} keyword: 140 : \begin{quote}\begin{lstlisting} 141 : in float timeStep; // simulation timestep 142 : \end{lstlisting}\end{quote}% 143 : Other examples of globals include constants, such as the matrix that maps 144 : pixel positions to image-plane positions in direct-volume rendering. 145 : Note that global variables are immutable; \lang{} does not have an explicit notion 146 : of global state. 147 : 148 : \subsubsection{Actors} 149 : Actors are an abstraction of the computations that the image analysis performs. 150 : Examples include the rays in direct-volume rendering (\secref{sec:dvr}), 151 : the fibers in fiber tractography (\secref{sec:fibers}), and the 152 : particles in a particle system (\secref{sec:particles}). 153 : A given program may have multiple types of actors, but we expect that most algorithms 154 : will involve only a single actor definition. 155 : 156 : An actor definition is similar to a class declaration in an object-oriented 157 : language, and includes declarations of its local state, initialization, 158 : and an \kw{update} method. 159 : The \kw{update} methods of actors are written using a simple C-style language that is 160 : similar to the graphics shader language GLSL~\cite{opengl-shading:3ed}. 161 : For example, the direct volume rendering algorithm (Algorithm~\ref{alg:volrend}) 162 : is implemented as the following actor definition: 163 : \begin{quote}\begin{lstlisting} 164 : actor Ray (vec2i p) 165 : { 166 : // map pixel location to 3D position in image plane 167 : vec3f pos = ToImagePlane * ((float)p.x, (float)p.y, 1.0); 168 : // ray direction 169 : vec3f dir = normalize(pos - eyePosition); 170 : float t = 0.0; 171 : vec4f color = initialColor; 172 : 173 : update () { 174 : vec3f newPos = pos + t*dir; 175 : if (newPos.z < farClip) { 176 : color = compose(color, transferFn(F@newPos)); 177 : if (color.a == 1.0) 178 : stable; // color is opaque, so we are done 179 : else 180 : t += timeStep; 181 : } 182 : else 183 : stable; 184 : } 185 : } 186 : \end{lstlisting}\end{quote}% 187 : In this example, the \emph{instance variables} \texttt{pos}, \texttt{dir}, 188 : \texttt{t}, and \texttt{color} comprise the local state of an actor. 189 : The \kw{update} method evolves this state until the actor reaches an 190 : \emph{stable} state (denoted by executing the \kw{stable} statement). 191 : An actor may also declare itself dead using the \kw{die} statement and actors may 192 : create new actors using the \kw{new} statement. 193 : 194 : For some algorithms, such as those described in \secref{sec:particles}, it is 195 : necessary for actors to examine the state of nearby actors. 196 : This behavior has two aspects: the definition of an actor's \emph{neighborhood}, 197 : which is defined declaratively in the global-coordination part of the program (see below), 198 : and the computation on neighbors. 199 : An actor's neighborhood can be represented as array of actors that are passed 200 : as a parameter to the \kw{update} method. 201 : The \kw{update} method can query, but not modify, the instance variables of these actors. 202 : % actor states: active, stable, dead 203 : % actors can create new actors 204 : 205 : While the actor \kw{update} methods have some similarity to shader-language programs 206 : and OpenCL kernels, there are some important differences. 207 : First, actors compute over the continuous fields defined in the program\footnote{ 208 : Shader programs treat texture data as continuous, but the interpolation mechanisms 209 : used in graphics applications are not suitable for image-analysis problems. 210 : } instead of discrete data. 211 : Second, actors can interact with other actors. 212 : Third, actors can die or become stable. 213 : Lastly, actors can create new actors. 214 : 215 : % note: system could have multiple kinds of actors? 216 : 217 : % QUESTION: how are actor interactions specified? 218 : 219 : \subsubsection{Global coordination} 220 : The last part of a \lang{} specification is the definition of the global properties of the 221 : algorithm. 222 : These include the initial set of actors, which for the direct volume rendering 223 : application might be defined using a set-comprehension notation 224 : \begin{quote}\begin{lstlisting} 225 : initially { { Ray (vec2i(i, j)) | i in 0..1023 } | j in 0..1023 }; 226 : \end{lstlisting}\end{quote}% 227 : The global properties also include global termination conditions, 228 : such a limit on the number of iterations or a predicate defined on a 229 : global property, such as global energy, and a mechanism for reading out 230 : the result of the computation from the local actor states. 231 : Lastly, global coordination includes properties, such as the neighbors 232 : of a particle, that define inter-agent interactions. 233 : 234 : \subsubsection{Semantics} 235 : 236 : The semantics of a \lang{} program is based on an iterative model that is similar 237 : to bulk-synchronous parallelism. 238 : Informally, the state of an actor is represented by a triple: $\langle{}a, \sigma, S\rangle{}$, 239 : where $a$ is a unique actor name, $\sigma$ is a mapping from instance variables to values, 240 : and $S$ is the actor's status, which is either $\mathbf{ACTIVE}$ or $\mathbf{STABLE}$. 241 : The state of a program $\mathcal{P}$ is then defined to be a set of actor states. 242 : The \kw{update} method is then modeled as a curried function that takes the program state 243 : and an actor name and produces a set of actor states. 244 : The \kw{update} method returns a set, since the actor may die, in which case the result is the 245 : empty set, or may create new actors, which will also be included in the resulting set. 246 : A single iteration of the program can then be defined by 247 : \begin{displaymath} 248 : \mathcal{P}_{i+1} 249 : = \mathrm{Stable}(\mathcal{P}_i) \cup \left( \bigcup_{\langle{}a,\sigma,S\rangle{} \in \mathrm{Active}(\mathcal{P}_i)}{U\,\mathcal{P}_i\,a} \right) 250 : \end{displaymath}% 251 : where $U$ is the function denoted by the \kw{update} method and 252 : \begin{eqnarray*} 253 : \mathrm{Active}(\mathcal{P}) & = & 254 : \{ \langle{}a,\sigma, \mathbf{ACTIVE}\rangle{} | \langle{}a, \sigma, \mathbf{ACTIVE}\rangle \in \mathcal{P} \} \\ 255 : \mathrm{Stable}(\mathcal{P}) & = & 256 : \{ \langle{}a,\sigma, \mathbf{STABLE}\rangle{} | \langle{}a, \sigma, \mathbf{STABLE}\rangle \in \mathcal{P} \} 257 : \end{eqnarray*}% 258 : Note that this rule has the effect of discarding the dead actors. 259 : A program has reached a terminal state when there are no active actors left. 260 : 261 : Global coordination is modeled by a program state to program state function 262 : that is applied between iterations. 263 : At this time, we also check for global termination conditions. 264 : 265 : Because the \kw{update} method is given the program state $\mathcal{P}_i$ to 266 : compute the actor's $i+1$ state, each actor's update is independent of the others (even if 267 : they depend on a neighboring actor's state). 268 : This property makes the program amenable for efficient parallel execution.