1 : 
jhr 
233 
%!TEX root = paper.tex

2 : 


%

3 : 



4 : 


\section{A DSL for image analysis}

5 : 


\label{sec:design}

6 : 



7 : 


One of the major contributions of this research will be the design of a domainspecific

8 : 


language (DSL), which we call \lang{}, for imageanalysis algorithms.

9 : 


The goals of this design are as follows:

10 : 


\begin{enumerate}

11 : 


\item

12 : 


Abstract away from the discrete nature of the image data and present the programmer

13 : 


with a continuous model of the image.

14 : 


\item

15 : 


Support the programming of imageanalysis algorithms using the mathematical

16 : 


properties of the continuous data and its derived attributes.

17 : 


\item

18 : 


Expose the inherent parallelism of the algorithms, while abstracting away from the

19 : 


details of how computations are mapped onto parallel hardware.

20 : 


\end{enumerate}%

21 : 



22 : 


In the remainder of this section, we describe our preliminary thoughts about what the programming

23 : 


model of this language will be.

24 : 


It is important to understand that this discussion represents very preliminary ideas

25 : 


about the design space.

26 : 


In particular, there is a tradeoff between the flexibility provided by lowerlevel

27 : 


generalpurpose language constructs and the expressiveness advantages of more abstract

28 : 


higherlevel mechanisms.

29 : 


In the discussion below, we have biased the design toward more generalpurpose

30 : 


constructs, but with the expectation that the design will get higher level as

31 : 


it evolves.

32 : 



33 : 
jhr 
317 
\subsection{Diderot by example}

34 : 



35 : 


\begin{figure}[t]

36 : 


\begin{quote}

37 : 


\input{vrphonga}

38 : 


\end{quote}%

39 : 


\end{figure}%

40 : 


\begin{figure}[p]

41 : 


\begin{quote}

42 : 


\input{vrphongb}

43 : 


\end{quote}%

44 : 


\end{figure}%

45 : 



46 : 
jhr 
233 
\subsection{Design sketch}

47 : 


In this section we sketch a preliminary design for \lang{} using the examples from

48 : 


the previous section as motivation.

49 : 



50 : 


A \lang{} program has three logical parts.

51 : 


The first are global declarations of the datasets and fields that the

52 : 


analysis algorithm computes over, as well as other global variables that

53 : 


control the analysis.

54 : 


The datasets and global variables (but not the derived fields) are the

55 : 


inputs to the program.

56 : 


The second part defines \emph{actors}, which are independent computational

57 : 


agents that define the algorithm.

58 : 


The third part specifies the coordination of the actors  how they interact and

59 : 


what the global termination conditions are.

60 : 



61 : 


\subsubsection{Types}

62 : 


\lang{} is a staticallytyped language.

63 : 


It has the usual scalar types, including booleans, integer types of various sizes,

64 : 


and single and doubleprecision floatingpoint values.

65 : 


Similar to graphicsshader languages~\cite{openglshading:3ed} and

66 : 


OpenCL~\cite{openclspecv1}, it also includes smalldimension vector and matrix types.

67 : 


In addition to these basic computational types, \lang{} also includes \emph{datasets},

68 : 


\emph{fields}, and \emph{actor} types, which are discussed below.

69 : 


Our initial design does not include userdefined data structures, but we expect to

70 : 


add them at some point in the language evolution.

71 : 



72 : 


\subsubsection{Datasets}

73 : 


An imageanalysis program begins with the image data that is to be analyzed.

74 : 


In \lang{}, the image data is represented by a dataset, which is a

75 : 


multidimensional array of scalars, vectors, or tensors.

76 : 


These data sets come from scientific or medical imaging instruments.

77 : 


We provide a simple form for declaring a data set.

78 : 


For example, the following declares \texttt{data} to be a threedimensional

79 : 


array of floatingpoint values.

80 : 


\begin{quote}\begin{lstlisting}

81 : 


dataset float[][][] data;

82 : 


\end{lstlisting}\end{quote}%

83 : 


Datasets also come with their own coordinate system, which defines how

84 : 


integer indices map to world coordinates.

85 : 



86 : 


Datasets are not limited to arrays of scalars; they may also have vector or

87 : 


tensor elements.

88 : 


On disk, datasets are stored using the NRRD (Nearly Raw Raster Data) format~\cite{nrrd},

89 : 


which supports multidimensional arrays of multidimensional data, such as those produced

90 : 


by modern imaging instruments.

91 : 



92 : 


\subsubsection{Fields and kernels}

93 : 


As discussed in \secref{sec:image}, we are interested in image analysis algorithms that

94 : 


work with the continuous field that is reconstructed from discrete data.

95 : 


Our DSL supports these algorithms by including continuous fields as a primitive notion.

96 : 


Fields can be defined by applying a convolution kernel to a dataset.

97 : 


For example, here we define a threedimensional field generated from the dataset \text{data}

98 : 


using a convolution kernel named ``\texttt{cubic}''

99 : 


\begin{quote}\begin{lstlisting}

100 : 


field F = cubic (data);

101 : 


\end{lstlisting}\end{quote}%

102 : 


The dimension of the field is inferred from the dimension of the dataset.

103 : 


In our initial design, we expect to provide a predefined set of convolution kernels

104 : 


(essentially those already supported by Teem~\cite{teem}).

105 : 



106 : 


The main operation of fields is the \emph{probe}, which extracts the value at some

107 : 


location.

108 : 


This operation is similar to indexing an array, except that the indices are

109 : 


floatingpoint values.

110 : 


For example,

111 : 


\begin{quote}\begin{lstlisting}

112 : 


F@(x, y, z)

113 : 


\end{lstlisting}\end{quote}%

114 : 


evaluates to the scalar value of the field \texttt{F} at the coordinate $(\mathtt{x},\mathtt{y},\mathtt{z})$.

115 : 


We can also define new fields by applying operators to them.

116 : 


For example,

117 : 


\begin{quote}\begin{lstlisting}

118 : 


(D (D F)) @ v

119 : 


\end{lstlisting}\end{quote}%

120 : 


probes the Hessian field derived by taking the second derivative of \texttt{F} at the

121 : 


location specified by the vector \texttt{v}.

122 : 


An important feature of the design is that operations, such as $\mathrm{FA}$

123 : 


(fractional anisotropy of a tensor), are treated as

124 : 


higherorder operators that map fields to fields (\ie{}, a field of tensors to a field of

125 : 


scalars).

126 : 


% implies other fields:

127 : 


% D F  gradient of F (has type float[3])

128 : 


% D (D F)  Hessian of F (has type float[3][3])

129 : 


%

130 : 


% QUESTION: is the domain of a field always in the interval [0..1] (for each dimension)

131 : 


% or should we allow the programmer to specify the domain?

132 : 


% ANSWER: domain is part of nrrd file format.

133 : 



134 : 


\subsection{Other global definitions}

135 : 


In addition to datasets and fields, a \lang{} program may have other global

136 : 


definitions.

137 : 


These include auxiliary functions, such as the color composition and transfer

138 : 


functions for directvolume rendering (\secref{sec:dvr}),

139 : 


and program inputs, which are denoted using the \kw{in} keyword:

140 : 


\begin{quote}\begin{lstlisting}

141 : 


in float timeStep; // simulation timestep

142 : 


\end{lstlisting}\end{quote}%

143 : 


Other examples of globals include constants, such as the matrix that maps

144 : 


pixel positions to imageplane positions in directvolume rendering.

145 : 


Note that global variables are immutable; \lang{} does not have an explicit notion

146 : 


of global state.

147 : 



148 : 


\subsubsection{Actors}

149 : 


Actors are an abstraction of the computations that the image analysis performs.

150 : 


Examples include the rays in directvolume rendering (\secref{sec:dvr}),

151 : 


the fibers in fiber tractography (\secref{sec:fibers}), and the

152 : 


particles in a particle system (\secref{sec:particles}).

153 : 


A given program may have multiple types of actors, but we expect that most algorithms

154 : 


will involve only a single actor definition.

155 : 



156 : 


An actor definition is similar to a class declaration in an objectoriented

157 : 


language, and includes declarations of its local state, initialization,

158 : 


and an \kw{update} method.

159 : 


The \kw{update} methods of actors are written using a simple Cstyle language that is

160 : 


similar to the graphics shader language GLSL~\cite{openglshading:3ed}.

161 : 


For example, the direct volume rendering algorithm (Algorithm~\ref{alg:volrend})

162 : 


is implemented as the following actor definition:

163 : 


\begin{quote}\begin{lstlisting}

164 : 


actor Ray (vec2i p)

165 : 


{

166 : 


// map pixel location to 3D position in image plane

167 : 


vec3f pos = ToImagePlane * ((float)p.x, (float)p.y, 1.0);

168 : 


// ray direction

169 : 


vec3f dir = normalize(pos  eyePosition);

170 : 


float t = 0.0;

171 : 


vec4f color = initialColor;

172 : 



173 : 


update () {

174 : 


vec3f newPos = pos + t*dir;

175 : 


if (newPos.z < farClip) {

176 : 


color = compose(color, transferFn(F@newPos));

177 : 


if (color.a == 1.0)

178 : 


stable; // color is opaque, so we are done

179 : 


else

180 : 


t += timeStep;

181 : 


}

182 : 


else

183 : 


stable;

184 : 


}

185 : 


}

186 : 


\end{lstlisting}\end{quote}%

187 : 


In this example, the \emph{instance variables} \texttt{pos}, \texttt{dir},

188 : 


\texttt{t}, and \texttt{color} comprise the local state of an actor.

189 : 


The \kw{update} method evolves this state until the actor reaches an

190 : 


\emph{stable} state (denoted by executing the \kw{stable} statement).

191 : 


An actor may also declare itself dead using the \kw{die} statement and actors may

192 : 


create new actors using the \kw{new} statement.

193 : 



194 : 


For some algorithms, such as those described in \secref{sec:particles}, it is

195 : 


necessary for actors to examine the state of nearby actors.

196 : 


This behavior has two aspects: the definition of an actor's \emph{neighborhood},

197 : 


which is defined declaratively in the globalcoordination part of the program (see below),

198 : 


and the computation on neighbors.

199 : 


An actor's neighborhood can be represented as array of actors that are passed

200 : 


as a parameter to the \kw{update} method.

201 : 


The \kw{update} method can query, but not modify, the instance variables of these actors.

202 : 


% actor states: active, stable, dead

203 : 


% actors can create new actors

204 : 



205 : 


While the actor \kw{update} methods have some similarity to shaderlanguage programs

206 : 


and OpenCL kernels, there are some important differences.

207 : 


First, actors compute over the continuous fields defined in the program\footnote{

208 : 


Shader programs treat texture data as continuous, but the interpolation mechanisms

209 : 


used in graphics applications are not suitable for imageanalysis problems.

210 : 


} instead of discrete data.

211 : 


Second, actors can interact with other actors.

212 : 


Third, actors can die or become stable.

213 : 


Lastly, actors can create new actors.

214 : 



215 : 


% note: system could have multiple kinds of actors?

216 : 



217 : 


% QUESTION: how are actor interactions specified?

218 : 



219 : 


\subsubsection{Global coordination}

220 : 


The last part of a \lang{} specification is the definition of the global properties of the

221 : 


algorithm.

222 : 


These include the initial set of actors, which for the direct volume rendering

223 : 


application might be defined using a setcomprehension notation

224 : 


\begin{quote}\begin{lstlisting}

225 : 


initially { { Ray (vec2i(i, j))  i in 0..1023 }  j in 0..1023 };

226 : 


\end{lstlisting}\end{quote}%

227 : 


The global properties also include global termination conditions,

228 : 


such a limit on the number of iterations or a predicate defined on a

229 : 


global property, such as global energy, and a mechanism for reading out

230 : 


the result of the computation from the local actor states.

231 : 


Lastly, global coordination includes properties, such as the neighbors

232 : 


of a particle, that define interagent interactions.

233 : 



234 : 


\subsubsection{Semantics}

235 : 



236 : 


The semantics of a \lang{} program is based on an iterative model that is similar

237 : 


to bulksynchronous parallelism.

238 : 


Informally, the state of an actor is represented by a triple: $\langle{}a, \sigma, S\rangle{}$,

239 : 


where $a$ is a unique actor name, $\sigma$ is a mapping from instance variables to values,

240 : 


and $S$ is the actor's status, which is either $\mathbf{ACTIVE}$ or $\mathbf{STABLE}$.

241 : 


The state of a program $\mathcal{P}$ is then defined to be a set of actor states.

242 : 


The \kw{update} method is then modeled as a curried function that takes the program state

243 : 


and an actor name and produces a set of actor states.

244 : 


The \kw{update} method returns a set, since the actor may die, in which case the result is the

245 : 


empty set, or may create new actors, which will also be included in the resulting set.

246 : 


A single iteration of the program can then be defined by

247 : 


\begin{displaymath}

248 : 


\mathcal{P}_{i+1}

249 : 


= \mathrm{Stable}(\mathcal{P}_i) \cup \left( \bigcup_{\langle{}a,\sigma,S\rangle{} \in \mathrm{Active}(\mathcal{P}_i)}{U\,\mathcal{P}_i\,a} \right)

250 : 


\end{displaymath}%

251 : 


where $U$ is the function denoted by the \kw{update} method and

252 : 


\begin{eqnarray*}

253 : 


\mathrm{Active}(\mathcal{P}) & = &

254 : 


\{ \langle{}a,\sigma, \mathbf{ACTIVE}\rangle{}  \langle{}a, \sigma, \mathbf{ACTIVE}\rangle \in \mathcal{P} \} \\

255 : 


\mathrm{Stable}(\mathcal{P}) & = &

256 : 


\{ \langle{}a,\sigma, \mathbf{STABLE}\rangle{}  \langle{}a, \sigma, \mathbf{STABLE}\rangle \in \mathcal{P} \}

257 : 


\end{eqnarray*}%

258 : 


Note that this rule has the effect of discarding the dead actors.

259 : 


A program has reached a terminal state when there are no active actors left.

260 : 



261 : 


Global coordination is modeled by a program state to program state function

262 : 


that is applied between iterations.

263 : 


At this time, we also check for global termination conditions.

264 : 



265 : 


Because the \kw{update} method is given the program state $\mathcal{P}_i$ to

266 : 


compute the actor's $i+1$ state, each actor's update is independent of the others (even if

267 : 


they depend on a neighboring actor's state).

268 : 


This property makes the program amenable for efficient parallel execution.
