Saturday, July 6, 2024

GraphRAG: New tool for complex data discovery now on GitHub

Earlier this year, we introduced Graph rag, a graph-based approach to retrieval-augmented generation (RAG) that enables question-answering over private or previously unseen datasets. Today, we’re pleased to announce that GraphRAG is now available on Git hub, offering more structured information retrieval and comprehensive response generation than naive RAG approaches. The GraphRAG code repository is complemented by a solution accelator, providing an easy-to-use API experience hosted on Azure that can be deployed code-free in a few clicks.

GraphRAG uses a large language model (LLM) to automate the extraction of a rich knowledge graph from any collection of text documents. One of the most exciting features of this graph-based data index is its ability to report on the semantic structure of the data prior to any user queries. It does this by detecting “communities” of densely connected nodes in a hierarchical fashion, partitioning the graph at multiple levels from high-level themes to low-level topics, as illustrated in Figure 1. Using an LLM to summarize each of these communities creates a hierarchical summary of the data, providing an overview of a dataset without needing to know which questions to ask in advance. Each community serves as the basis of a community summary that describes its entities and their relationships.

Advantages of community summaries for “global questions”

In a recent prepint, we explore how these community summaries can also help answer global questions—which address the entire dataset rather than focusing on specific chunks of text—where naive RAG approaches based on vector search fall short. For example, consider the question “What are the main themes in the dataset?” This is a reasonable starting point but one where naive RAG will always give misleading answers. This is because it generates answers from chunks of text semantically similar to the question, not necessarily from the subset of input texts needed to answer it.

However, if a question addresses the entire dataset, all input texts should be considered. Since naive RAG only considers the top-k most similar chunks of input text, it fails. Even worse, it will match the question against chunks of text that are superficially similar to that question, resulting in misleading answers. Community summaries help answer such global questions because the graph index of entity and relationship descriptions has already considered all input texts in its construction. Therefore, we can use a map-reduce approach for question answering that retains all relevant content from the global data context:

Group community reports up to the LLM context window size.
Map the question across each group to create community answers.
Reduce all relevant community answers into a final global answer.

Evaluation and results

To evaluate this approach against naive RAG and hierarchical source-text summarization, we used the LLM GPT-4 to generate a diverse set of activity-centered sense-making questions from short descriptions of two datasets: podcast transcripts and news articles. We then selected three metrics for head-to-head comparison of generated answers, as evaluated by an LLM judge: comprehensiveness (covers all aspects in detail), diversity (provides different perspectives), and empowerment (supports informed decision making).

The results show that GraphRAG, when using community summaries at any level of the community hierarchy, outperforms naive RAG on comprehensiveness and diversity (~70–80% win rate). GraphRAG using intermediate- and low-level community summaries also performed better than source text summarization on these metrics at lower token costs (~20–70% token use per query). Performance was competitive with hierarchical source text summarization for the highest-level communities at substantially lower token costs (~2–3% token use per query). This is shown in Figure 2.

Research insights and future directions

Through the initial research cycle, we demonstrated that LLMs can successfully derive rich knowledge graphs from unstructured text inputs, and these graphs can support a new class of global queries for which (a) naive RAG cannot generate appropriate responses, and (b) hierarchical source text summarization is prohibitively expensive per query. The overall suitability of GraphRAG for any given use case, however, depends on whether the benefits of structured knowledge representations, readymade community summaries, and support for global queries outweigh the upfront costs of graph index construction.

We’re currently exploring various approaches to reduce these costs while maintaining response quality. Our latest work on automatically tuning LLM extraction prompts to the problem domain is an example of how we are reducing the upfront effort required to customize these prompts, enumerate entity types, create few-shot examples, and so on. To enable evaluation of GraphRAG with minimal upfront indexing costs, we’re also investigating NLP-based approaches to approximating the knowledge graph and community summaries that would be generated by a full indexing process. Our goal is to ensure that, whatever the constraints of the deployment context, there is a GraphRAG configuration that can accommodate these constraints while still delivering exceptional response quality.

DOWNLOAD Graphragaccelator

By making GraphRAG and a solution accelator, publicly available, we aim to make graph-based RAG approaches more accessible for users and use cases where it’s critical to understand data at a global level. We encourage community feedback and suggestions on both the code repository and solution accelerator as we work together to enable the next generation of RAG experiences.

Acknowledgements

Joshua Bradley, Christine Caggiano, Mónica Carvajal, Alex Chao, Newman Cheng, Ed Clark, Ben Cutler, Andres Morales Esquivel, Nathan Evans, Alonso Guevara Fernández, Amber Hoak, Kate Lytvynets, Gaudy Blanco Meneses, Apurva Mody, Robert Ness, Gabriel Nieves-Ponce, Douglas Orbaker, Richard Ortega, Rodrigo Racanicci, Billie Rinaldi, Katy Smith, Sarah Smith, Shane Solomon, Dayenne Souza, David Tittsworth, Chris Trevino, Derek Worthen

JEP Draft. Aot linked classes

JEP Draft. AOT linked classes

Make classes of an application appear in a fully loaded and linked state, instantly when the VM starts, by extending the Java virtual machine’s AOT (ahead-of-time) snapshot capability. This capability, called Cached Data Storage (CDS) will monitor a simple training run performed at build time, and cache classes needed by the application in an AOT format which is instantly available on subsequent production runs, as those same classes are required. The larger the application, the greater the benefit from caching such data.

Goals

Give instant access to classes needed by a running application by means of a snapshot of decisions made during a training run. Such instant access will improve application startup time.
Simplify future enhancements to the snapshot technology, by stabilizing addresses of AOT loaded classes. This is necessary groundwork for further improvements to startup and warmup performance.

Both of these improvements are to CDS, HotSpot’s technology for organizing AOT snapshots. They indirectly serve the larger goals of project Leyden which include better startup, warmup, and footprint for Java applications.

Non-Goals

It is not a goal to improve the existing AOT workflows, as based on -Xshare:… and similar options; improvements to usability are left for possible future work.
It is not a goal to cache compiler output from training runs (that is, AOT compilation); that is left for possible future work.
It is not a goal to blur the distinction between training and production runs, as visible to users of today’s AOT workflows via -Xshare:…. Improved “auto-training” workflows are left for possible future work.
It is not a goal to expand already-existing AOT support for user-defined class loaders; that is left for possible future work.

Success Metrics

Measurable startup time improvements, due to AOT offloading of loading and linking.
AOT class configurations that are a stable basis for future improvements to startup, warmup, and footprint, as part of Project Leyden.

See below for how we measure start up and why it differs from other metrics.

Motivation

The Java Platform is attractive for developing server applications because it combines a safe language, a vast choice of libraries, and the reliable Java VM that silently optimizes applications for peak performance. With the help of tools such as Maven and Gradle, application developers routinely rely on dozens or hundreds of libraries, and frequently upgrade them to get new functionality, performance improvements, and security fixes. With the help of test frameworks such as JUnit, developers readily validate that their application’s behavior does not change when the libraries underneath evolve.

While these technologies make creating applications more efficient, they do not solve an old complaint: that Java applications are slow to start. It is time consuming for the VM to scan hundreds of JAR files on disk, load together thousands of class files into memory, and link them together so that they can use each others’ APIs. During startup, the VM also executes initialization code in many classes, and this code might create many objects, or perform I/O-bound tasks such as opening logs or configuration files. Such startup activities are required by the structure of the application, and even though they only need to be done once, their execution always delays the moment when the application can perform useful work.

Even when startup is complete and the application is capable of serving requests, it will take some time before the VM can optimize the application for peak performance. This period, known as warmup, needs the application to be run under load so that the VM can detect which code runs most frequently and is worth optimizing most heavily. Unfortunately, the VM’s analysis of the running application is made harder by the fact that hundreds of popular libraries use reflection to examine the application’s configuration at run time, e.g., the presence of @Bean annotations for Spring and @Path annotations for Jersey. The wide use of such reflective techniques, on top of the sheer number of classes in many applications, means that the VM cannot predict ahead of time which code will run, let alone run most frequently, despite having all the class files at its disposal. Rather, it must wait and see which classes are worth optimizing the most. As a result, the warmup period, which might be milliseconds for a small mathematical application with no reflection, might be seconds (or even minutes) for a complex business application that relies on reflection-heavy libraries for XML processing, persistence, logging, and so on.

Better application startup through AOT

Note: This document employs the very useful adjectives JIT and AOT in their simple sense, as acronyms for just in time and ahead of time. Although the term JIT, used as a noun, has also come to be shorthand for just in time compiler in the Java ecosystem, this shorthand is not used here. In fact, no changes to compilation of any sort are proposed here.

The VM includes a feature, Cached Data Storage (CDS), that can improve startup for most applications, with a modest ahead-of-time investment of effort. Most startup activities are consequences of the structure of the application, structure which is the same for all runs of that application. Therefore, CDS can rely on the observation that mostly the same classes are loaded from the class path every time the application runs, even if the exact set of classes is hard to predict correctly. In short, class loading that happened before will likely happen again: Applications repeat themselves.

To exploit this observation, CDS uses ahead-of-time (or AOT) execution. Before deploying the application in production, you run it with a sample workload that gives the VM a sneak peek at how execution of the application unfolds, which classes are actually loaded from disk, and how they link to each other. During this ahead-of-time training run, the VM takes a snapshot of the class definitions in its own memory. This includes the fields in each class, and the code in their method bodies, and their interrelationships. Later, in production, the snapshot serves as a cache: the VM looks first for the definition of a class in the snapshot, from where it can be loaded extremely quickly, and only then scans the class path and loads class files from disk.

The cache of loaded classes has high fidelity: it includes classes loaded due to reflection, both in the application and in libraries. In production, however, the application may need to serve requests by running code from classes not loaded in the AOT training run and therefore not stored in the cache. If this happens, the VM gracefully falls back to searching the class path in the traditional way, which is lazily just in time (or JIT). In other words, configuring the application AOT to improve startup does not restrict the use of advanced reflective configuration techniques by libraries at run time, because AOT does not conflict with JIT. On the contrary, CDS snapshots preserve the ability, much appreciated by developers, to extend applications and libraries by simply placing new JARs on the class path. The purpose of AOT processing is only to streamline startup, never disrupt it.

The unobtrusive nature of the VM’s use of CDS is illustrated by the fact that, since 2017, every Java runtime has an AOT cache of over 1000 definitions for core JDK classes, created when the JDK is built from source code.

As for the sample workload performed by the AOT training run, a typical scenario is to run the application’s own test suite: its unit tests, and integration tests if available. Running tests will exercise code throughout the application and its dependencies, triggering the class loading that CDS caches for use in production. The exact set of classes that are cached will be far smaller than the entire set of classes on the class path, because an individual application rarely uses all the functionality of the libraries it depends on (and the libraries they depend on, and so on).

Improving AOT

The benefit of using these AOT techniques is that they permit one-time decisions about class loading be made truly once, ahead of time, rather than being repeated (as JIT decisions) every time the application is run. If an AOT training run generates a snapshot, it contains decisions shifted from run time to build time. The more and richer decisions that are recorded in the snapshot at build time, the greater the opportunity for the VM to optimize startup (and in future, warmup).

A most attractive enhancement to CDS is to have it record not only decisions about which classes to load, but also the interrelationships that link them together, so they can access each others APIs (that is, their fields and methods). Suppose code somewhere in the application performs new Person() then assigns 42 to the age field in the new object. At run time, the VM reads the definition of class Person so it can calculate the size of a memory block to allocate for the new object. It then calculates where the value 42 should be stored within the block so that other code which accesses the age field can find the value. In fact, each class that uses the API of Person must make its own reference copy of such sizes and locations. This interaction between a pair of classes is a one-time bookkeeping task called linking.

Linking traditionally happens at run time (just in time), but it would be worthwhile to do it at build time (ahead of time). Every class that refers to age in a Person object will cause the VM to calculate the location of age again. Similarly, code that calls the startWork method in class Person will cause the VM to calculate the address of that method’s code in memory. Although this location is the same no matter which code calls the method, each class must perform its own one-time bookkeeping to locate it. (Readers interested in linking will find a full explanation in Chapter 5 of the Java VM Specification, Moving all this bookkeeping from just in time to ahead of time makes startup happen faster.

To record decisions about linking in an AOT training run, the VM would still snapshot its definitions of loaded classes, but augment them with the memory locations it has calculated for their fields and methods. Later, in production, the VM would use this enhanced cache to access fields and call methods more quickly. This will improve application performance overall, but especially at startup when other VM optimizations are not usually applicable.

The enhanced cache would have high fidelity: It would include the memory locations of fields and methods accessed by reflection, and as defined in classes loaded either statically or reflectively. Thus, if a library uses reflection to access the fields of the application at startup (a common idiom for, e.g., test frameworks), the VM could skip reflective bookkeeping and give the library direct access to the fields in memory. Classes “spun up” reflectively (e.g., for lambdas) would be present in the cache alongside the more usual classes on class path. The enhanced cache is also unobtrusive: if the class path at run time differs from the class path at build time, then the fields and methods in the cache might no longer be relevant, but the VM will detect this and gracefully fall back to linking fields and methods on the run time class path in the traditional way.

A springboard to faster warmup

The benefit of enhancing CDS to cache more decisions is that, in production, it appears as if all the classes in the application and its dependencies are fully loaded and linked and ready to run – even before the main method has run. The entire codebase is stable in memory, even if the precise flow of execution from one class to another is hard to predict, because of reflection or other complexities. Having an application that is stable at startup is a springboard for further developments that improve warmup, not just startup. Future JEPs may use CDS to optimize warmup by recording AOT profiling data or even AOT code in the AOT training run, allowing more efficient compilation and execution earlier in the production run. Ultimately, just as a whole suite of AOT loaded and linked classes can be ready to use at startup, a whole suite of AOT code could be used directly in the production run, from the very first moment.

Description

The VM includes a feature, Cached Data Storage (CDS), that can improve startup for most applications. CDS tracks how classes are used in a run of some application, and creates a cache that makes the classes load faster, the next time the same application is run.

The overall design of CDS has already been described more fully in the section above.

In JDK (FIXME)99, CDS will create an enhanced snapshot file that caches data about how code in each class uses the fields and methods of other classes. As with earlier versions of CDS, this cache is created during an ahead-of-time run of the application called a training run. Later, when the application runs in production, the VM uses the cache to load and link all the code of the application, including its dependencies, in one go at startup. In effect, many costs of bringing up the application are shifted to build time, away from run time.

Using CDS

CDS is built into the HotSpot VM since JDK 6. It is enabled with command line options to the java launcher:

First, run your application with a sample workload so that CDS can prepare a cache of class definitions. (This is myapp.classlist in the example below.)
Second, complete the processing of the cache by building a CDS archive. (This is myapp.cds below.)
Finally, every time you run the application from now on, specify the CDS archive you built, so startup can go faster.

Here are examples of those three steps, with command lines broken up for easier reading:

$ # Run sample workload and record some decisions:
$ java -Xshare:off -XX:DumpLoadedClassList=myapp.classlist \
    -cp myapp.jar com.example.MyApp \
    < MyAppSampleInput.txt

$ # Cache the recorded decisions in a CDS archive:
$ java -Xshare:dump -XX:SharedClassListFile=myapp.classlist \
    -XX:SharedArchiveFile=myapp.cds \
    -cp myapp.jar

$ # Use cached decisions to accelerate startup:
$ java -XX:SharedArchiveFile=myapp.cds \
    -cp myapp.jar com.example.MyApp \
    < RealWorkLoadInput.txt

The production run uses the -XX:SharedArchiveFile option to activate the CDS technology, pointing the VM at the archive file. As one would expect from a cache-based optimization, the behavior of the production run would not be changed by the presence or absence of that one option, except for performance.

In JDK (FIXME)99, the first and third steps are unchanged. Only the second step, that builds a CDS archive is changed. An extra command line option triggers creation of the enhanced cache:

$ # Cache even more recorded decisions in a CDS archive:
$ java -Xshare:dump -XX:SharedClassListFile=myapp.classlist \
    -XX:SharedArchiveFile=myapp.cds \
    -XX:+AOTLoadedClasses \
    -cp myapp.jar

As an example, a tiny “Hello, world” application starts in:

0.64 seconds without the use of CDS;
0.24 seconds with CDS in JDK 23; and
(FIXME)999 seconds with the enhanced cache.

The effect of the -XX:+AOTLoadedClasses flag is twofold:

It puts a new attribute into the CDS snapshot that instructs the VM to bring cached classes into an loaded state, immediately on startup. This ensures that classes used by the application (as discovered by the AOT training run) are immediately available. This is a change from the previous behavior of loading classes on demand (that is, just in time), from a pre-parsed state in CDS.
It instructs CDS to preset some linkages between the classes, so that in their immediately loaded state, they are also fully linked up to access APIs (fields and methods) of other immediately loaded classes. This bypasses more kinds of bookkeeping during startup.

Thus, -XX:+AOTLoadedClasses shifts just-in-time loading and linking activity to a cache of AOT loaded classes, saving more startup work than before.

Implementation Notes

As noted above, Java applications can be reliably and easily composed from a huge menu of libraries and frameworks, and can be configured for testing and deployment easily, with little ceremony. Programmers enjoy fast development cycles, easy observability, and a powerful set of tools. The foundation for all this is a pair of Java’s “super powers”: separate compilation and dynamic linking. Classes can be managed and inspected in isolation by inspecting their classfiles. When they are composed by dynamic linking, their integrity is protected by the VM, and yet the VM also gives them high performance access to each others’ API points. (Such API points include fields and methods, accessed both reflectively and directly via bytecode.) Crucially, the configuration of an application arises naturally from the classes presented at run time, as they connect to each other; there is no “linking ceremony” required, at build time, to exhaustively define the application configuration. Most of the mechanical steps of Java application configuration happen on the fly, invisibly to the programmer.

This works, in part, because Java, despite being staticly typed, is a highly dynamic language: Loading, linking, machine code generation, and storage reclamation are some of the dynamic behaviors. All of this dynamism, while it provides great flexiblity to the programmer, comes at a low-level cost. Each execution of the application must repeat the same work over and over, each time finding the right classfile bytes for a given class name, or the right addresses of methods or fields, or the right runtime support data, or the right machine code to optimize the application. This repetition is necessary in today’s Java VMs, as long as they perform most of their operations lazily, just in time. Dynamism allows computed decisions to be deferred until the last moment; dynamism allows loading and linking and optimization to be organized as just-in-time operations, maximizing flexibility.

When deploying an application, many of these dynamically computed decisions have stabilized and can be expected to have the same result as previous runs. Such stability does not cancel dynamism. If an application in production decides to perform a new behavior not previously expected, the VM can respond dynamically to the change, perhaps loading some new classes, perhaps discarding some previously optimized code and data, perhaps reoptimizing. Only the smallest and simplest Java applications are immune to such unpredicted behavior, but just-in-time processing, allowed by dynamism, covers all the possibilities in every application.

The overall set of configuration and optimization decisions made by an application (with the VM that runs it) are thus predictable, in many cases. The specification of the Java VM allow much freedom to schedule decisions, however dynamically they are requested. An unpredicted decision must always be handled as a just-in-time service, but a predictable one can also be handled ahead of time. In many cases, it is straightforward to provide AOT resources, serving them up without delay to the application, whenever it needs them. The information required to make this shift from JIT to AOT is prediction, foreknowledge of the decisions made to configure or optimize the application. The predictions do not need to be 100% accurate, as long as there is a way to recover from misprediction. Often, the most direct way to make these predictions is to perform a training run of the application and observe the decisions made during that run. Assuming similar future runs will make similar decisions, the VM can prepare, ahead of time, to execute them for the next run. This is the basis for the CDS technology.

One might object that the Java VM Specification states that classes must be loaded just in time, at their first usage. Moving just-in-time loading and linking back to VM startup would seem to be an invalid optimization, if loading or linking did have significant side effects, visible to application logic. Indeed, where significant side effects are visible, the AOT optimizations of CDS will be disabled, and the application will fall back to JIT class loading with some loss of startup performance. In the very common case, when significant loading side effects are not visible, the VM is free to pre-emptively load classes. To do so it appeals to an as-if of optimization: Despite the VM’s invisible AOT loading, the application observes class loading as if the VM did loading work at the exact moment of request, but unaccountably fast. So from the application’s point of view, AOT loading behaves the same as JIT (i.e., just in time) loading, except for speed.

Such as-if rules are routine in VM technology: For example, JIT-compiled code runs “as if” the interpreter were running it, only it runs faster, and the GC allows the application unlimited allocations “as if” memory were infinite.

Another common VM technique is called speculative optimization, which happens when the VM acts as though some fact is true, while also having fallback paths to compensate for speculation failure - that is, if the supposed true fact turns out to be false after all. In production runs, the VM can speculate that previous decisions, recorded during a training run by CDS, will still hold. If application code in production turns out to need a different set of classes, the VM can easily detect the new requirement, and process the new classes just in time, in the traditional fully dynamic way, as if CDS had never been involved.

Changes to CDS

Today’s CDS is able to speculate class loading decisions, based on an AOT training run. For each classfile it selects, it saves away a pre-parsed (or “pickled”) internal form, as an independently loadable asset within the CDS archive file. When the VM starts, although all CDS assets are immediately available in VM memory, they are not yet usable as classes, nor can they be linked together.

When the Java application eventually gets around to requesting a CDS class for the first time, the VM permanently makes the pre-parsed form “live” and associates the class name to the live metadata. Only at that point can it can be linked to other loaded classes. This is can be viewed as a partial AOT, partial JIT implementation of class loading.

Building an archive with -XX:+AOTLoadedClasses causes the VM itself to initiate AOT loading, in a very early period before the application’s main method starts to run. This early period is sometimes called the premain phase of execution. At this time, both loading and linking happen quickly, from CDS assets already present VM memory, and pre-formatted for easy adoption as live metadata. Because of the way assets are brought into VM memory from the CDS archive, they have stable and predictable memory locations. This stability in turn allows them to be pre-formatted in an already-linked state, with direct references to each other. Very specifically, the enhanced pre-formatting affects the constant pool entries in each class asset; they can be populated with resolved locations and sizes of fields, methods, and other classes, as long as those entities are also present in AOT loaded classes.

Thus, these AOT loading and linking activities happen more quickly, compared to classes which are processed piecemeal by just-in-time loading and linking. But by an appeal to an “as-if” optimization, the loading and linking may also be viewed as happening just in time, on demand by the application. The only evidence of the shift from JIT to AOT is indirect, perhaps from a change in file system activity, or from log messages emitted by the VM.

In the future, the presence in VM memory of many application classes, at predictable (“stabilized”) addresses, will be a springboard for further enhancements to CDS. Additional kinds of VM data, such as method profiles and compiled code, can stored as new assets in the CDS archive, pre-formatted so as to directly link to whatever classes, methods, and fields that they need.

It is instructive to compare the CDS optimization of class loading and linking to the VM’s processing of class initialization, which is defined as the execution of per-class static initializers. Developers are much more aware of (and reliant on) the JIT order of class initialization, because initialization perturbs program logic through side effects, which the developer sometimes makes use of. But CDS does not shift initialization of the developer’s classes.

Consistency between training and production

The implementation CDS already does enforce, and will continue to enforce, suitable rules ensuring consistency between training runs and production runs, ensuring that both runs are processing the same application. This is the fine print in the contract which promises that the “same” application will run faster in production.

To being with, the training and production runs must use the same JDK release and must be on the same hardware platform. It is not possible to use a CDS archive that was built on a different JDK release or on a different hardware platform: The VM detects the mismatch, issues a warning, and ignores the CDS archive. (This is an example of fallback from speculation failure.) You can demand that CDS be used with -Xshare:on, in which case the VM will not attempt a fallback; it will report the configuration mismatch and exit with an error.

The training and productions runs must also have consistent class paths. The production run may specify extra class path entries, appended to the end; otherwise, the class paths must be identical. A training run must use only JAR-based class paths; directory-based class paths cannot be checked for consistency, since directory contents may change concurrently with execution of the application. In the production run, classes from class path entries not present in the CDS archive will be loaded just in time, as usual.

Perhaps surprisingly, the training and production runs may run different main classes, as long as they are drawn from the same class path.

The two runs may have different environmental settings, such as Java properties. If an environmental setting is internally significant to the JDK, and it differs between training and production, it is up to the VM and JDK code to choose which setting to honor, or whether to discard the CDS archive altogether.

If present, the use of -m or --module options must be consistent across training and production runs. Various other configuration options are not supported by CDS and will be rejected by the training run, so that a CDS archive cannot be created at all; they include --limit-modules, --patch-module, and --upgrade-module-path.

As a general principle, if a training run (and subsequent dump command) generates a CDS archive, that CDS archive will produce a correct execution of the production run, or else it will be ignored, followed by a differently ordered (but still correct) execution of the production run. A complete description of consistency requirements is beyond the scope of this document.

How we measure time

Although startup and warmup are similar concepts, to measure them properly, one must understand their distinction. For practical purposes, they are defined in terms of some particular application performing a repeatable workload, such as a request server. Startup time is how long the VM takes to load and execute enough code in the JDK, in libraries on the class path, and in the application, so that the application can start to serve requests. Warmup time is how long the VM takes to optimize a running application so that it serves requests with peak performance. Warmup usually consumes more resources (time and memory) than startup.

In more detail, startup is a series of one-time setup tasks, while warmup is a continuing optimization. During startup, the VM and application load, link, and initialize classes, and configure other resources such as Java objects. An application warms up over time, first as the VM selectively compiles byte code from class files to machine code, and then as the VM tracks “hot spots” in application code and reoptimizes their machine code. Besides code generation, the VM tunes certain ergonomic settings during warmup.

Warmup and startup overlap during the milliseconds after the application launches. And both activities can trail off into an indefinite future: An application can run for seconds or minutes and suddenly perform new startup activities because it accepts a new kind of request. The VM can also work for a long time optimizing the application, eventually (after seconds or minutes) reaching a steady state with peak performance. Even then, if a new kind of request suddenly arrives, the VM may have to re-enter warmup activities to accommodate new code paths. Both startup and warmup tasks can be addressed by AOT or JIT techniques, whether speculative or not, and usually all of the above. Thus, startup and warmup are distinct sets of activities, and each deserves its own attention when assessing and improving VM technology.

In the big picture, startup and warmup are not the only important measures of quality. In carrying out its duties, an application should consume moderate amounts of time and space, delivering good throughput (time per workload unit) and footprint (working memory size). Of course, it should also be correct (producing the right answers) and stable (predictable execution, without crashes or any other misbehavior). Throughput, correctness, and stability have always been core values within the Java ecosystem. Project Leyden is making a fresh focus on improving startup, warmup, and footprint, by shifting selected computations to new points in time, either earlier (ahead of time, AOT) or later (just in time, JIT). Within that big picture, this work is about AOT optimizations to improve startup, and eventually warmup.

Each deployed application will need its own specific definition of what constitutes one repetition of its repeatable workload; this could be a service request, or an integration test, or a benchmark, or a stress test, or some other “omnibus test” of many parts of the application. The first repetition loads and initializes all relevant classes and application data structures, while subsequent repetitions spur the VM to optimize the application, eventually reaching peak performance. In the setting of such an application and its repeatable workload, warmup can be measured as the time to reach a given fraction (such as 95%) of the eventual peak throughput, while startup can be measured as the time to bring the first workload repetition up to some application-specific “ready point”, or else to the end of the first repetition of the workload.

Future work

The AOT loading of classes is subject to certain limits, which may be mitigated in future work:

Since -XX:+AOTLoadedClasses changes the order of class loading, agents which attempt to perform instrumentation actions on classfile loading (and on related events such as linking and preparation) may observe the different order. An agent attached after startup may observe that some classes are already loaded and linked, however early it was attached. Although such behavior is within specification, it may be unexpected to the agent.
Because of the previous limitation, some attempts to transform classfile bytecode may fail, specifically those that are sensitive to class file load order, as the loading shifts from JIT to AOT. If an agent performs dynamic instrumentation on classes already loaded, it will not be impacted by AOT loading. That is because AOT loaded classes are much the same as just-in-time loaded classes, and as such are equally reconfigurable.
As a fallback, if a VM is configured both with a CDS archive containing AOT loaded classes, and with an agent that requires notification of all class load events (including the earliest ones), then the VM will favor the agent by ignoring the CDS archive, with some loss of performance.
An AOT loaded class remains present in the VM, even if the application (as the result of its dynamic behavior) does not actually request loading of that particular class. Such a class is not subject to class unloading. Therefore it will use up memory footprint, where it would not if it were loaded just in time.
User-defined class loaders will not participate in AOT loading activities. This is because at present there is no technique for tracking the identity of a user-defined class loader across both training and production runs. The effect of this limitation is to load such classes just in time, giving them reduced performance. The present work is thought to provide groundwork necessary to overcome this limitation, by first stabilizing those classes which define the user-defined class loaders.
Defining a class using MethodHandles.Lookup::defineClass() is an irreversible decision if the class is named. Such calls will result in a LinkageError with a message about attempted duplicate class definition, if the affected named class was also loaded AOT in the CDS archive. This is a standard response to an attempt to define the same class name twice.
The only way to make a training run at present is to have the application process some representative workload. It should run at least through startup, and must then exit, to signal trigger creation of the CDS archive. Possible future work on AOT workflows may add new tools to help the programmer more flexibly define and evaluate such training runs and workloads.

Testing

We will create new unit test cases that cover specific behavior of the -XX:+AOTLoadedClasses option.
The -XX:+AOTLoadedClasses option is independent of existing CDS features. Therefore, we can use run existing CDS test cases with this option explicitly enabled. Such test cases should still pass.

Risks and Assumptions

We assume, for most applications and frameworks that want to take advantage of the shifting afforded by -XX:+AOTLoadedClasses, that the corresponding constraint of not being able to request conflicting configurations when deploying to production is an acceptable tradeoff. For example, incompatible class paths or module system settings can prevent use of the CDS archive. Such restrictions may be softened by future work.

Note that CDS supports just-in-time (JIT) loading of classes with user-defined class loaders, allowing users to dynamically configure part of their class loading activity, even while loading other classes ahead of time (AOT), but the fullest benefits come only from AOT loading. Conversations with users suggest that they are willing to accept fixed class paths (with their built-in class loaders), and to use specialized class loaders only when more flexibility is absolutely required.

CES 2024 was all about interoperability beyond the smart home / In between all the flashy new monitors, electric vehicle prototypes, and palm-scanning door locks shown off at CES this year, there was a trend linking the less eye-catching announcements.

CES 2024 was all about interoperability beyond the smart home / In between all the flashy new monitors, electric vehicle prototypes, and palm-scanning door locks shown off at CES this year, there was a trend linking the less eye-catching announcements.

Last year, you couldn’t mention CES without bringing up Matter. It was a pivotal year for the smart home standard, as big names like Samsung, GE, and Amazon promised better interoperability between their devices and a world of sensors, appliances, and accessories. But that promise largely started and ended with smart home tech.

This year, things were a little different at CES: the idea of making products work nicely across ecosystems bled into other areas of the showcase and rippled across a range of different devices — even putting rivals on the same page to better serve users.

Google, for instance, revealed several updates to Android that show a clear push toward interoperability. One of Google’s biggest updates was to Nearby Share, the Android equivalent of AirDrop that lets users share files with other devices that are close by. Instead of going it alone, Google announced that it’s combined Nearby Share with Samsung’s own take on the feature, called Quick Share. The newly merged sharing system will adopt Samsung’s Quick Share label and bring the “best” of both companies’ “sharing solutions together into a singular cross-Android solution,” according to Google. That should make it easier to share files across both Samsung and Pixel devices.

Technology

Saturday, July 6, 2024

GraphRAG: New tool for complex data discovery now on GitHub

Advantages of community summaries for “global questions”

Evaluation and results

Research insights and future directions

Acknowledgements

JEP Draft. Aot linked classes

Goals

Non-Goals

Success Metrics

Motivation

Better application startup through AOT

Improving AOT

A springboard to faster warmup

Description

Using CDS

Implementation Notes

Changes to CDS

Consistency between training and production

How we measure time

Future work

Testing

Risks and Assumptions

CES 2024 was all about interoperability beyond the smart home / In between all the flashy new monitors, electric vehicle prototypes, and palm-scanning door locks shown off at CES this year, there was a trend linking the less eye-catching announcements.

Semiconductor Recycling: Addressing E-Waste Challenges

Blog Archive