Hierarchical Penumbra Casting
2005; Wiley; Volume: 24; Issue: 3 Linguagem: Inglês
10.1111/j.1467-8659.2005.00856.x
ISSN1467-8659
Autores Tópico(s)Remote Sensing and LiDAR Applications
ResumoThis paper describes a new algorithm for computing physically-based soft shadows from area light sources. Our technique is a generalization of irregular shadow maps [AL04, JMB04], and like most previous methods, represents arbitrary light sources as a set of light samples. Our algorithm can be seen as a transpose of ray tracing: Ray tracing finds the blocker triangles using a hierarchical spatial subdivision, and therefore scales well with respect to the number of triangles. Our method uses two hierarchies simultaneously to quickly find the rays that are blocked by a given triangle, and thus scales well with respect to the output resolution and the number of light samples. As our algorithm processes one triangle at a time, there is no need for building a spatial subdivision for the scene geometry. This is particularly convenient for procedural geometry or for models that are too large to fit into physical memory. Irregular shadow maps first determine the visible surfaces from the point of view of the camera. Then the visible samples (x,y,z), i.e. the shadow receiver points, are transformed to the image plane of the light source, producing sampling points (x′,y′) and the corresponding light-space depth values z′. The (x′,y′) correspond exactly to the intersections of shadow rays and the image plane of the light source. Finally, the irregularly spaced points (x′,y′) are stored into a BSP tree and used as sampling points when the scene is rasterized from the light source. This gives the shadow information exactly at the receiver points, yielding the same result as casting shadow rays. Reflections and semi-transparent shadow receivers are handled using a deep frame buffer, i.e., by storing and processing multiple surfaces per output pixel. Irregular shadow maps could be directly applied to area lights by executing the algorithm separately for all light samples. However, such a method would suffer from several weaknesses. All operations (projection to the image plane of a light source, building the BSP, processing the triangles) would have to be repeated for every light sample. Furthermore, the same pattern of light samples would have to be used for every pixel, resulting in visible banding. We build the hierarchy for the receiver points in object space, thus avoiding the projection altogether (Section 3.1). We follow the naming convention of Chin and Feiner [CF92]: the penumbra volume of a triangle contains all regions of space that can be either fully (umbra) or partially (penumbra) in shadow with respect to the triangle and an area light source. We construct a static three-level hierarchy for the light samples, and for each triangle, we traverse both the light sample hierarchy and the receiver point hierarchy simultaneously. In practice, this is done by intersecting the penumbra volumes of the triangle with the receiver point hierarchy (Section 3.2). This allows us to quickly reject the receiver points that are not shadowed by the given triangle. The output-sensitivity of the algorithm can be increased with a number of optimizations (Section 3.3). Additionally, the pattern of light samples can be selected separately for each pixel, thus converting banding to less visible noise (Section 3.4). We analyze the performance and scalability of our algorithm by comparing it against a production-quality ray tracer in Section 4. Discussion and future work, along with a rough complexity analysis, are given in Section 5. This section concentrates on algorithms that create physically-based soft shadows from area light sources. Additionally, a vast amount of literature exists for generating hard shadows from point light sources [WPF90], as well as for non-physical [RSC87] and real-time approximation of soft shadows [HLHS03]. Stochastic ray tracing Stochastic ray tracing algorithms compute shadows by sampling an area light source using shadow rays [CPC84]. In order to get smooth and temporally coherent penumbra regions, hundreds of shadow rays are often needed for each point to be shaded. The intersection tests can be accelerated by employing variants of shadow cache [HG86], and the distribution of the samples can be improved by using importance sampling [SWZ96]. With a hierarchical spatial subdivision, the complexity of ray tracing is logarithmic with respect the number of triangles. However, the complexity remains linear with respect to the output resolution and the number of light samples. Tracing thick rays Amanatides [Ama84] parameterizes rays with a spread angle using cones. The technique can approximate soft shadows by tracing a single cone from a surface point to an area light source. Amanatides uses a heuristic approximation for modelling the occlusion inside a cone. Heckbert and Hanrahan [HH84] describe a related technique of beam tracing in polygonal environments. Occlusion is correctly taken into account by clipping the beam with the occluding geometry. In highly tessellated scenes, the beam geometry quickly becomes prohibitively complex, and the performance degrades due to lost coherence. Pencil tracing [STN87] is a similar technique, where a pencil is a set of rays in the vicinity of a given ray. It handles refractions more accurately than beam tracing, and also provides error tolerance analysis in an elegant manner. Ghazanfarpour and Hasenfratz [GH98] describe a variant of beam tracing that does not clip the beam geometry, but instead subdivides the beam recursively until each sub-beam is either fully lit, fully occluded, or a specified subdivision limit is reached. Conceptually, this technique extends ray tracing to process the light samples hierarchically (i.e. sub-linearly), whereas the scalability with respect to output resolution remains linear. Techniques related to shadow volumes Nishita and Naka-mae [NN83] use two shadow volumes [Cro77] for identifying the parts of the scene that lie within the penumbra. Soft shadow computations are performed only for polygons that intersect the penumbra. Silhouette edges of the shadow casters are projected onto the light source, and clipped to its borders. Finally, irradiance is computed using an exact analytic formula. Shadow casters must be decomposed into sets of convex polyhedra, which limits the practicality of the approach. Chin and Feiner [CF92] construct separate BSP trees for the scene, for the umbra volume and for the outer penumbra volume. Shadow receivers are then classified into three regions: fully lit, umbra, and penumbra. An analytic shadow term is computed by traversing the BSP tree of the scene and clipping away the occluded parts of the polygonal light source. Our method bears some similarity to this technique in the sense that we also utilize penumbra volumes. Tanaka and Takahashi [TT97] propose culling methods for efficiently determining the set of objects that can affect the shadowing of a given point. View-dependence and discretization The visibility skeleton [DDP97] finds and stores all visibility events that cause discontinuities in visibility or shading. Illumination due to an area light source can be accurately computed for any point in the scene, but unfortunately the preprocessing and storage requirements are substantial. Discontinuity meshing [Hec92, LTG92] essentially subdivides receiver geometry along shadow boundaries. Back projection algorithms [DF94, SG94] track visibility events to build a data structure for efficiently determining the visible parts of a light source. These techniques have trouble scaling to complex scenes, and the algorithms are also prone to numerical inaccuracies. When computing soft shadows for a particular viewpoint instead of the whole scene (e.g. radiosity [CW93]), it is unnecessary to consider the entire scene geometry as shadow receivers. Instead, the visible receiver points can be computed in advance [Cro77, Hei91]; this greatly simplifies the subsequent shadow computations. Furthermore, the receiver points can be organized hierarchically in order to exploit the coherence in hard shadow computations [AL04, JMB04, AAM04]. We exploit this observation as a part of our algorithm. Miscellaneous techniques Soler and Sillion [SS98] approximate soft shadows using convolution, and present a hierarchical algorithm that drives the approximation error below a threshold value. Agrawala et al. [ARHM00] present an image-based soft shadow algorithm that uses layered attenuation maps for fast approximations. A coherent ray tracer is used for generating higher-quality images. Bala et al. [BWG03] approximate soft shadows by computing the shadowed illumination in a sparse set of points, and then filtering the output image by taking into account important discontinuities such as shadow boundaries. Parker et al. [PSS98] render soft shadows at interactive rates in a parallel ray tracer by using only a single sample per pixel and "soft-edged" objects. Their algorithm is very fast, but not physically-based. Assarsson and Akenine-Möller [AAM03] describe an approximate soft shadow volume algorithm, which offers realtime performance in simple scenes. Two gross approximations are made: assumption that the silhouette of an object is constant from all receiver points, and a heuristic occluder fusion method. BRDFs Analytic determination of irradiance from an area light source is possible only in a few special cases. If the visible parts of the light source are polygons and the emission function is constant over the source, the area integral may be converted into an integral over the boundaries of the polygon by using Stokes' theorem. Nishita and Nakamae [NN85] compute direct illumination analytically for diffuse BRDFs, and Arvo [Arv95] describes how to handle receiver BRDFs consisting of a linear combination of Phong-lobes. Arbitrary receiver BRDFs can be handled using integration. The emission function is sampled in a number of random points in the visible parts of the light source, and a weighted average of the samples gives an estimate of the illumination. In this section, we present our shadow computation algorithm in detail. First, we define the terms and symbols in Section 3.1 and proceed by giving a simplified version of the algorithm in Section 3.2. Next, we discuss a number of important optimizations in Section 3.3. Finally, extensions to the algorithm are presented in Section 3.4. Let us consider a planar area light source L. We define the light source using a planar, convex bounding polygon ΔL and a set of light samplesi that lie inside the bounding polygon. For now, we assume that the same set of light samples is used throughout the computation. In reality, this would cause banding artifacts, and a solution is presented later in Section 3.4. In order to avoid processing each light sample separately, we group the light samples spatially into light sample groups, denoted Gk, each of which contains a number of light samples i. For each Gk, we also compute a convex bounding polygon Δ that lies on the plane of the lightsource. This creates a three-level hierarchy for the light samples: on the top level is the entire light source L, on the middle level are the light sample groups Gk, and on the bottom level are the individual light samples i. Since we process each shadow-casting blocker triangle separately, we need to consider only a single blocker triangle T at a time. We construct penumbra volumes for the two top levels of the light sample hierarchy, namely for the entire light source L and for the light sample groups Gk. In addition, we construct hard shadow volumes for the light samples i. These volumes are illustrated in Figure 1. The penumbra volume defined by the blocker triangle T and the bounding polygon ΔL of the light source is denoted VL. This penumbra volume encloses all points that may be at least partially shadowed by T. The penumbra volumes defined by T and the bounding polygons Δ of light sample groups Gk are denoted V. Thus, penumbra volume V encloses all points where T may cast shadows from light samples in group Gk. Finally, the hard shadow volumes defined by T and a light sample i are denoted V i. A hard shadow volume V i thus encloses exactly the points that are in shadow from light sample i. The volumes associated with the three levels of the light sample hierarchy. In this illustration, the light source consists of 16 light samples grouped into four light sample groups. (a) The main penumbra volume VL is defined by the bounding polygon ΔL of the light source and the blocker triangle T . (b) For each light sample group Gk , a penumbra volume V is constructed based on the bounding polygon Δ of the light sample group and the blocker triangle T . (c) For each light sample i , a hard shadow volume V i is constructed, based on the location of i and the blocker triangle T . The points in the scene that need shadow information are called receiver points, and denoted rj. Thus, the purpose of the algorithm is to determine which i→rj relations are blocked by blocker triangles. The receiver points are placed into a bounding volume hierarchy, where axis-aligned boxes are used as the bounding volumes. A node in the hierarchy is denoted N, and its bounding box BN. The symbols are summarized in Table 1. Rendering the scene with shadows computed using our algorithm consists of the following steps: Rasterize the scene, store depth values of visible surfaces Using the depth values, construct receiver points rj Construct the receiver point hierarchy Construct the light sample groups Gk Process all blocker triangles Perform shading First, the scene is rasterized from the viewpoint of the camera, and the depth values of visible surfaces are stored into a deep depth buffer. The visible samples are then transformed into world space producing receiver points rj. Abounding volume hierarchy is constructed for the receiver points rj, and a bit mask is allocated for each rj. These bit masks are used for storing the visibility status of every i→rj relation. The bit masks are initialized to all zeros, indicating that all, i→rj relations are initially unblocked. We build the bounding volume hierarchy by recursively dividing the receiver points in two groups. As long as there are more than 1K points in a node, we simply split the bounding box of the node in two equal halves. When there are less than 1K points, we find the split position that minimizes the total surface area of the resulting child nodes. Next, the light sample groups Gk are constructed so that each Gk contains a number of nearby light samples. Constructing the groups is trivial if the light samples are positioned according to a jittered grid distribution. Otherwise, a heuristic grouping algorithm may be applied. After the groups are constructed, a convex bounding polygon Δ is determined for each group Gk. The next step is processing all blocker triangles in order to find the blocked, i→rj relations. A pseudocode version of the basic algorithm is given in Figure 2. We now examine the algorithm in detail. Pseudocode of the basic algorithm. A detailed explanation of the algorithm is given in Section 3.2. The continue keyword is borrowed from C language and it causes the next iteration in the nearest enclosing for loop to be started immediately. Processing a blocker triangle T begins by constructing the penumbra volume VL defined by T and the bounding polygon ΔL of the entire light source (Line 1), as well as the penumbra volumes V defined by T and the bounding polygons Δ of the light sample groups (Line 2). A penumbra volume consists of the separating planes between the defining polygon Δ and the blocker triangle T. These planes are found by enumerating every vertex-edge pair between Δ and T, and selecting the planes that have all vertices of Δ on one side and all vertices of T on the other side. The number of separating planes is at most equal to the total number of edges in Δ and T. In addition, if the plane of T does not intersect Δ, the plane of T is added to cap the penumbra volume. The hard shadow volumes V i are then constructed for each light sample i (Line 3). These always consist of four planes: three defined by, i and the edges of T, and the plane of T. The volumes are illustrated in Figure 1 and used extensively in the following discussion. During the recursive hierarchy traversal, we maintain a list of active light sample groups. A light sample group Gk is active at node N if triangle T may block at least some of the relations i→rj, where i∈Gk and rj is under node N. This is possible only if the penumbra volume V intersects the bounding box BN at least partially. Initially, all light sample groups are active (Line 4). Traversing the hierarchy begins from the root node (Line 5). When entering hierarchy node N, we first test if the bounding box BN of the node intersects penumbra volume VL. If not, the traversal is terminated (Line 6) since triangle T cannot cast shadows to any receiver point under node N. Otherwise, we construct a tighter list of active light sample groups so that only the groups Gk whose penumbra volume V intersects BN are included (Lines 7–10). Again, if there is no intersection, triangle T cannot cast shadows from light sample group Gk to any receiver point under node N. If no active groups remain, the traversal is terminated (Line 11). If node N is not a leaf node, the traversal continues with the tighter list of active light sample groups (Lines 13 and 14). Otherwise, the receiver points rj in the node are tested for potential shadowing by triangle T. We immediately skip the further processing of rj that are outside penumbra volume VL (Line 17). Then, the light sample groups Gk whose corresponding penumbra volume V does not contain rj are skipped (Line 19). Also, the relations that are already blocked by the earlier triangles need no further consideration (Line 21). For the remaining light samples i, we test if triangle T blocks relation i→rj by testing if rj is inside the hard shadow volume V i (Line 22). Since V i always has four planes, this test involves four dot products and these can be executed in parallel with Intel SSE SIMD instructions, as is done in our implementation. If the test indicates that rj is inside V i, the relation is marked as blocked in the bit mask associated with rj (Line 23). After all blocker triangles have been processed, the bit masks of the receiver points tell which light samples i are visible to which receiver points rj. The final step is to perform shading using the computed shadow information. One way of accomplishing this is to rasterize the scene again from the viewpoint of the camera, and fetch the shadow information that corresponds to the visible samples. An alternative implementation might store all the data needed for shading into a deep frame buffer during the first pass. There are a number of straightforward optimizations that fit naturally into the basic algorithm discussed above. The aim of these optimizations is to make the algorithm behave in a more output-sensitive fashion both in terms of computation time and memory consumption. There are five main ways for accomplishing this. First, redundant processing of already blocked, i→rj relations can often be terminated early in the traversal. Second, the number of bounding box vs. penumbra volume plane tests can be reduced so that, e.g., when the traversal has proceeded to a node that is completely inside a penumbra volume, further tests can be safely omitted in child nodes. Third, constructing the penumbra volumes V for the light groups and the hard shadow volumes V i can be postponed until they are needed, and omitted altogether in many situations. Fourth, the memory consumption can be reduced by on-demand allocation of the bit masks for rj. Finally, the efficiency of the aforementioned optimizations can be further enhanced by coarsely sorting the blocker triangles according to their occlusion power. In this section, we explain each of these optimizations in detail. A pseudocode version of the optimized algorithm is given in Figure 3. In the pseudocode, each added line is tagged with the respective optimization by letter A-D. To save space, only a single pseudocode with all the optimizations is given, rather than showing the evolution of the algorithm after each optimization. We recommend investigating the entire pseudocode as a whole only after all optimizations have been covered. Pseudocode of the optimized algorithm. A detailed explanation of the algorithm is given in Section 3.2. Letters A-D after the line numbers refer to the corresponding optimizations in Section 3.3. A. Umbra bits The most important optimization involves storing aggregate shadow information into hierarchy nodes and receiver points. We add group-umbra and full-umbra bits to each hierarchy node N as well as to every receiver point rj. Whenever all relations i→rj, where i∈Gk, become blocked for some rj, the group-umbra bit k of rj is set (Line 72 in Figure 3). In addition, when all group-umbra bits are set in rj, we set the full-umbra bit of rj (Line 74). The umbra bits of a leaf node are computed by performing a Boolean AND operation over the corresponding bits of all receiver points rj in the node (Lines 77–80), and the umbra bits of an internal node are computed similarly by performing a Boolean AND operation over the corresponding bits of its two child nodes (Lines 50–54). The umbra bits are used for terminating the traversal and for pruning the list of active light sample groups. If nodeN has its full-umbra bit set, the traversal can be immediately terminated (Line 35), since all receiver points under it are already in shadow. Similarly, if node N has group-umbra bit k set, the light sample group Gk needs no further consideration in the child nodes of N, as all relations i→rj, where i∈Gk and rj is under N, are known to be blocked (Line 39). The same observations apply for avoiding the processing of receiver points rj (Lines 57 and 66). Since the light samples with full-umbra bits set need no further processing, we rebuild the receiver point hierarchy from scratch when a large enough fraction of receiver points are in umbra (Line 29). This ensures that the hierarchy stays relatively balanced throughout the computation, and also decreases the average height of the hierarchy, i.e., number of traversal steps from the root node to child nodes. B. Active plane sets The basic algorithm performs many redundant box-vs-plane tests when traversing the hierarchy. Recalling that all penumbra volumes consist of a set of planes, we can skip the majority of these tests by associating active plane sets to the penumbra volumes. This is a commonly used technique in e.g. hierarchical view frustum culling [BEW*98]. If bounding box BN is completely on the positive side, i.e. "inside", of a plane, so are the bounding boxes of all nodes under it. In this case, the plane can be marked inactive and safely ignored while traversing the sub tree under N. In particular, when BN is completely inside a penumbra volume, the associated active plane set becomes empty. Active plane sets PL and P are associated with penumbra volumes VL and V, respectively. The special token ALL refers to a set where all planes are active (Lines 32, 33 and 44). PL is passed as a parameter in recursion calls (Lines 34, 48 and 49), and P are coupled with the corresponding groups Gk in the active-groups list (Lines 32, 38, 42, 44 and 65). We modify the box-vs-plane intersection test (function INTERSECTS in the pseudocode) so that it takes the active plane set of the penumbra volume as input and produces a new plane set as its output (Line 36: PL⇒PL′, Line 41: P⇒P′). The POINT-IN-VOLUME tests (Lines 58 and 67) are also modified to consider only the active planes. C. Lazy penumbra volume and hard shadow volume con struction With the above optimizations, the processing of redundant blocker triangles that cast shadows to already shadowed areas is very efficient. With such triangles, most of the time is spent in the construction of penumbra volumes V and hard shadow volumes V i. This can be avoided by postponing the construction of these volumes until they are actually needed. Therefore, instead of constructing these volumes before traversing the hierarchy, we merely note that they have not been constructed yet (Line 31). As long as the volumes are not available, we do not prune the list of active light sample groups by box-vs-volume tests (Lines 40 and 44). If we encounter a receiver point where a previously unblocked, i→rj relation may become blocked, we construct the volumes (Lines 60–64). D. On-demand bit mask allocation It is quite uncommon that all receiver points rj are simultaneously in penumbra. In most situations, it is possible to conserve memory by allocating the bit mask for rj only when one is needed. Initially all receiver points are fully lit and thus need no bit masks to store the status of the blocked, i→rj relations. When the hierarchy traversal finds an rj that is potentially occluded by a blocker triangle, an empty bit mask is allocated if the receiver point does not already have one (Line 59). In addition, since we store the full-umbra bits to rj (Line 74), we may deallocate the bit mask if the receiver point has all, i→rj relations blocked (Line 75). Coarse blocker sorting The overall efficiency of the above optimizations can be increased by sorting the blockers coarsely according to their estimated occlusion power. This is beneficial, since it is favorable to find large shadow regions early in the process, thus eliminating the need for processing the affected receiver points later. Especially with receiver point hierarchy rebuilding (Line 29), finding large shadowed regions early reduces the size of the entire problem for the remaining blocker triangles. We perform the sorting by bucketing the blocker triangles according to their surface areas. Other criteria might yield more efficient ordering, but we did not investigate this further. Bucketing is a linear-time operation that can be executed entirely out-of-core by using separate files for storing the sorted triangles. We first loop through all blocker triangles and construct a histogram of their areas. We then assign an area range to each bucket and create an empty file for each bucket. Next, we loop through the blocker triangles again, streaming the blocker triangles into the bucket files. By processing the bucket files in order from largest to smallest triangles, we obtain a coarse sorting for the blocker triangles in linear time. In this section, we consider four extensions to the algorithm. These include a method for suppressing the banding artifacts caused by using the same set of light samples for every receiver point, support for alpha matte textures, adaptive antialiasing, and support for volumetric light sources. Multiple sets of light samples The most important extension involves using multiple sets of light samples for suppressing the banding artifacts. The optimal solution would be using a different light sample set for every receiver point, as is done by a stochastic ray tracer. This can be approximated by using a limited number of distinct light sample sets. In our test scenes, eight light sample sets were enough for suppressing the banding artifacts. Supporting multiple light sample sets is straightforward and requires only three modifications to the algorithm. First, the bounding polygons Δ of the light sample sets must be constructed so that they bound the light samples of the respective group in all light sample sets. This ensures that the penumbra volumes V, defined by Δ and T, contain all points that may be shadowed by T regardless of which light sample set is used. Second, the hard shadow volumes V i must be constructed separately for each light sample set. Finally, the same light sample set must always be used for the same receiver point rj. This is accomplished by assigning a light sample set to each rj and using it consistently in the POINT-IN-VOLUME test (Line 70). Alpha matte textures In order to support textures with alpha matte, i.e. with both opaque and transparent texels, a texture fetch needs to be performed in addition to the POINT-IN-VOLUME test on Line 70. We have not implemented this in our prototype implementation, and all polygons are considered opaque in our test scenes. The texture must be sampled at the intersection of the i→rj ray and the blocker triangle. Because of the additional cost of computing the texture sampling location and performing the texture fetch, it is probably best to process the alpha matte textured blocker triangles only after all opaque blocker triangles have been processed. This can be achieved by placing the alpha matte textured triangles in separate buckets during the coarse sorting of the blocker triangles. Adaptive antialiasing Since our algorithm requires that all receiver points rj are known before processing the blocker triangles, adaptive antialiasing cannot be performed in a single pass. Instead, a feasible approach is to first render the image with one sample per pixel, then determine the pixels that need antialiasing, and perform a second pass with multiple samples taken at these pixels. Volumetric light sources Our approach is not limited to planar light sources. Volumetric light sources can be supported simply by replacing the bounding polygons of the light source and the light sample groups with convex bounding volumes. The planes of the penumbra volumes can then be computed exactly as with bounding polygons, since the same method works for convex polygonal volumes as well. We compared our algorithm against a commercial ray tracer (Mental Ray 3.2) in three test scenes shown in Figure 4. The simple GRIDS scene has relatively few triangles, but the penumbra regions are very large. The FLOWERS scene features procedurally generated foliage consisting of a large number of triangles. In SPONZA, a low-polygon mesh is combined with dense procedural grass and two relatively low-polygon trees. All test scenes are illuminated with a single rectangular area light. The test scenes used for measuring the shadow computation speed. We performed five renderings for all test scenes. Three of the renderings used 256 light samples at resolutions 640 × 480, 1280 × 960 and 2560 × 1920. In order to measure the scalability of our method with respect to the number of light samples, a 1280 × 960 rendering was made with 1024 light samples. Finally, the sensitivity of our algorithm with respect to the size of the area light source was measured by performing a 1280 × 960 rendering with the surface area of the light source quadrupled, using 256 light samples. Only the time taken by shadow computation was accounted for. For the comparison method, this was accomplished by performing the renderings both with and without shadows, and calculating the difference in the rendering times. Based on initial tests, we chose to use 32 light samples per light sample group and 16 receiver points in the leaf nodes of the receiver point hierarchy. The light samples were distributed according to a jittered grid distribution, which enabled straightforward grouping. In addition, we used 8 distinct light sample sets for suppressing the banding artifacts. Our prototype implementation considers only the geometry of the scene, and thus cannot handle materials with alpha matte texture. Therefore, we changed the materials of all test scenes to opaque white for the renderings with the comparison method, thereby making the results comparable. Of our test scenes, only FLOWERS features a small number of alpha matte textured surfaces. The images in Figure 4 are rendered with original materials that show the structure of the scenes better than the black-and-white renderings produced by the benchmark renderings. All tests were run on a laptop with 1.6 GHz Intel Pentium M processor and 1.5 GB of memory. The results of the benchmark renderings are shown in Table 2. We see that our algorithm scales much better than the comparison method with respect to the output resolution. This can be expected because of the hierarchical processing of the receiver points rj. The scalability with respect to the number of samples used for sampling the light source is also good. The performance of our algorithm degrades when the size of the area light source is increased, due to the enlarged size of the penumbra volumes and consequently diminished efficiency of hierarchical culling. Since we effectively compute the shadows analytically until processing the receiver points rj, the amount of redundant work grows when the light samples i are located sparsely. The memory usage of our algorithm is observed to remain at an acceptable level, even though the storage cost of the receiver points is quite high especially with the highest output resolutions. We have shown in several tests that the performance of our algorithm compared favorably to tracing shadow rays, establishing that tracing shadow rays is not the only realistic option for computing physically-based soft shadows. The proposed method scales well with respect to the number of light samples and the output resolution. Because the entire scene geometry does not need to be considered at any point during the rendering, the method is particularly well suited for rendering soft shadows from procedural geometry or very complex scenes. Since no spatial hierarchy is needed for the scene geometry, our algorithm is able to handle dynamic scenes without the overhead of rebuilding such hierarchy for every frame in an animation sequence. In this section we discuss the algorithm's limitations, potential improvements and extensions, as well as some thoughts on future work. The computational complexity of our algorithm differs fundamentally from ray tracing. Denoting the number of receiver points, light samples and triangles as R, L and T, respectively, we may consider the computational complexity of both ray tracing and our algorithm. With the very crude assumption that every i→rj relation is blocked by a separate triangle, ray tracing has execution time of complexity O(RLlogT), whereas the respective complexity for our algorithm is O(TlogRlogL), assuming a light sample hierarchy with logL levels. It is however uncertain whether a full light sample hierarchy (instead of a three-level hierarchy) would be beneficial in practice. Shadow caching for ray tracing and the optimizations for our algorithm (Section 3.3) affect the output-sensitivity of both algorithms substantially, making theoretical analysis more complicated. In addition, it is usually the case that most triangles block more than one i→rj relation, and not all relations become blocked. Both of these benefit our algorithm more than ray tracing. In particular, our algorithm performs practically no work for completely lit receiver points. The memory consumption complexity for ray tracing with a hierarchical acceleration structure is O(T), while for our algorithm it is O(RL), again exhibiting the exact transpose of ray tracing. Because of the different execution time and memory consumption complexities, it is always possible to construct cases where one algorithm performs better than the other. The algorithm could be directly extended to handle semi-transparent shadow casters, but that would unfortunately require 24 times more mask memory, i.e., an RGB value instead of a single bit for each i→rj relation. We plan to attack this problem by augmenting each relation with a bit that indicates whether or not the relation is blocked by semi-transparent blockers. The relations blocked only by semi-transparent blockers can then be treated in a second shadow rendering pass, e.g., by processing a certain maximum number of semi-transparent relations at a time or by using shadow rays. As observed in Section 4, increasing the size of the area light source decreases the performance of our algorithm. This undesirable phenomenon could perhaps be tackled by extending the method of maintaining active light sample groups during the hierarchy traversal to individual light samples, and choosing the appropriate granularity adaptively. The good scalability of our algorithm with respect to the number of light samples partially alleviates the problem already, since more samples are generally needed for larger area light sources [ARBJ03]. It is possible to extend the algorithm to process multiple blocker triangles simultaneously. This would complicate the geometry of the penumbra volumes, but also reduce the number of hierarchy traversals and the number of receiver points that are temporarily classified to be in penumbra. This might improve the performance in many scenes. The algorithm has two desirable characteristics that might make it feasible for hardware implementation in the future: the processing is performed one triangle at a time and no spatial hierarchy of the scene is required. Acknowledgements We thank Jaakko Lehtinen, Janne Kontkanen and Lauri Savioja for fruitful discussions. The original Sponza Atrium model by Marko Dabrovic, RNA studio, www.rna.hr. This research was funded by the National Technology Agency of Finland, Bitboys, Hybrid Graphics, Nokia and Remedy Entertainment. Timo Aila was partially supported by ATI.
Referência(s)