What MuleSoft Architects Should Know to Successfully Leverage Anypoint DataGraph

APIs have revolutionized how we write applications and have made the composable enterprise possible. For organizations that have embraced API-led architecture, you're likely starting to uncover a different kind of problem from the one presented by monolithic applications. That problem is the proliferation of APIs across multiple systems.

According to the 2022 Connectivity benchmark report, the number of systems in average enterprises continues to climb, with respondents indicating they now have, on average, 273 systems that are integrated. It’s difficult to realize the ideal of an API network - reusability - when you are overwhelmed with the number of APIs available. The challenge is amplified when the need is only for a subset of the information that an API exposes.

Last year, MuleSoft released a tool that solves these problems - Anypoint DataGraph. DataGraph provides a SaaS GraphQL endpoint that lets you combine data actions from multiple APIs into a single call.

It's a tool that can make a Mule architect's life easier, but it may not be something you are aware of. Let's explore DataGraph, including its optimal use case, potential mistakes to avoid, and best practices for leveraging this powerful addition to the Anypoint toolset.

The Basics of DataGraph

DataGraph is MuleSoft's answer for architects struggling with the broad number of systems' data needing to be combined. It provides a single, unified schema, offering a visual representation of your API network across many data sources. As described by MuleSoft when they announced the tool, “This allows developers to reuse those APIs at once — without writing multiple API requests and custom code to parse through long responses.”

DataGraph uses RAML and other API specification formats to automatically import and parse the schemas and actions available to reduce manual complexity in creating the graph. This enables easy updates needed because of changes to the underlying APIs.

This is the evolution of reusable APIs. DataGraph essentially creates a "Super API" layer that sits on top of your System or Process APIs, laying the foundation for reusable queries and mutations on the underlying data without changing the existing APIs or developing new ones.

DataGraphs’ Optimal Use Case

Like many Mule organizations, you likely have multiple System APIs that you are querying separately to integrate their data. This results in multiple queries for each client, giving you the data you're looking for and a lot of data you don't need for this application.

DataGraph connects the schemas from these APIs together, making it possible to make a single call to retrieve a response and receive only the data you need, with the data structured for your unique needs. DataGraph shows the relationship between the schemas, as well, so new developers can rapidly gain an understanding of the relationship between the primary ids of each data type used and how they connect.

Let's look at an example - perhaps you have a requirement to retrieve a list of clients, their active orders, and their representative's contact information.

In a traditional API-led system, you’d have to:

Make a call to the Clients API to get a list of clients
Make a call to the Orders API to get their orders
Make a call to the Client Reps API to get the reps’ info

Three separate calls that must be executed in the proper order and which inevitably return a plethora of data, only a small portion of which is needed. From there, the data will need to be massaged to get it into a usable format that correlates all the data and restructures it accordingly.

DataGraph reduces these 3 calls to a single call. Clients, their active orders, and their reps' information are returned in a single payload. The response structure takes into account the relation of the data, based on the GraphQL query, so the data returned is already pruned and combined into a neatly defined type created by someone who already knows the relation between the data.

These characteristics mean that DataGraph is faster to develop against, less prone to data transformation bugs, and requires less overhead on the consuming API. This is a huge win for developers across the board and is the next step to treating APIs as composable building blocks for an interconnected enterprise.

Potential DataGraph Mistakes to Avoid

Like any tool, it can be easy to misuse or make mistakes with DataGraph that keep you from realizing its full potential. Because of the complexity - or, rather, the abstraction from complexity - related to DataGraph, the creation of relations between the data should be left to a Mule architect.

Data Governance

DataGraph provides permissions for accessing and viewing the information within the tool itself and for connecting to the underlying APIs. But this must all be manually configured by the user, or the enterprise security team, and lacks granularity. You will not be able to prevent a developer from seeing the relations from the APIs they’re working on to APIs they are not. For example, a developer working with order data will also be able to see the data relations into client account systems. Data governance will be essential since you'll be connecting data from many sources.

Versioning

Data in DataGraph is highly interconnected, so versioning and related issues can occur. You can provide specific versions of an underlying source API, but you must ensure the version's schema does not conflict with the other schemas. Collaboration can also be an issue if more than one person updates the schema simultaneously. Due to lack of version control in DataGraph any changes made should require a review process to prevent unintended consequences.

Field Naming

Naming fields in a unified schema is an art. A data architect should own this, ensuring that names meet your established naming standards and are clear without being overly verbose.

Managing Design Constraints

DataGraph is a great tool, but it has constraints that must be considered when designing queries. For example, the amount of returned data must be managed - responses can be no larger than 5 MB and can only return 100 fields simultaneously. Another thing to consider is process run time. A long-running process attached to the schema - such as one that must verify inserts with a downstream system - is problematic due to the tool having a 5-second timeout on each underlying API call. This can cause an issue if, for example, your Client Rep data is stored on a third-party service that is slow to respond or if there is a lambda with a long cold-start in your call chain.

Data Relation

How you connect data within your DataGraph schema is up to you - there is nothing to prevent you from connecting two pieces of data that are not actually related. This can become a big problem without proper management as DataGraph has no test suite. It relies on the consumers of the API to validate that the returned data is valid. Again, this is why the high-level view of your MuleSoft and data architects are needed to design DataGraph queries and mutations.

DataGraph Best Practices

While some possible mistakes with DataGraph can be concerning, the good news is that following its best practices eliminates the mistakes outlined above.

DataGraph’s optimal use case is in creating small, simple queries and mutations on highly interconnected data within a variety of systems. That requires thoughtful design and a focus on clarity within the schemas to ensure a well thought out model that connects everything together.

We’ve already said this, but someone with a high-level overview of the data and how it relates from a business perspective should be designing the schemas - most likely a data architect.

Architects should additionally ensure the RAML schemas for all the underlying APIs are clear and extensive so that the schemas imported into DataGraph are fully correct as any missing parameters or fields will not be available.

Another best practice is to ensure that data connected with the unified schema does not have sensitive data exposed because access to the schema is broad and not granular. If you need pieces of the schema to be more tightly controlled, you need to ensure authentication headers to the underlying API are passed through by the client.This allows you to expose all the relations between the data while still requiring proper access to each underlying API to access the sensitive data itself.

Finally, you’ll want to emphasize reusability as DataGraph is the next step for your application network. Once you have created a unified schema, you can create reusable combinations of actions across these APIs. This lets you expose common business logic (e.g., querying clients and their representatives) as higher-level components usable by higher-level APIs.

Conclusion

Above all, DataGraph should be used to help you promote the visibility and reusability of your System APIs. These queries and mutations aren't single-purpose or single-use, but instead, a way to combine APIs into a valuable set of components with minimal overhead and allow you to grow your API library while keeping relations between the underlying data clear. This not only encourages the creation of small, single-purpose System and Process APIs, but allows common actions on that data to be exposed for reusability.

If you'd like to learn more about DataGraph, and how it applies to your existing ecosystem, Big Compass would love to chat with you. Helping clients discover the tools that get the most out of their MuleSoft investment is why we do what we do.

Jared Mosely