Syncing Rutter Data to Your Database

Customers who leverage large quantity of data in order to execute their business decisions may pull Rutter data and store it themselves on their own cloud services (database on AWS, GCP, Azure, etc). This workflow enables them to run complex calculations and analyses at a larger scale much more effectively.

Requirements

Database

Given that the commerce and accounting models include multiple relations that you might want to take advantage of in your queries, we’d recommend using a relational database like MySQL or Postgres.

Be ready for large amounts of data from some connections. We’ve seen some merchants with over a million orders in a year. This can lead tables to fill up quickly so we recommend planning according to the growth you expect to achieve.

Server

In order to fetch Rutter data, most of our customers will rely on a queue and worker system. When a new connection has its data ready, you can add the connection to a queue (such as Amazon SQS or RabbitMQ), and then workers running on any basic server can ingest messages from the queue and start the data fetching process.

This worker and queue system also helps with any incremental updates you want to do.

Fetching Rutter Data

Rutter offers a REST API to fetch data. A request to an endpoint that lists objects will return the first page of data, with a next_cursor field that points to the next page. In order to fetch all data, you should continue to make requests with each next_cursor until it is null. You can also filter data by date. Endpoints may offer created_at_min and created_at_max (or updated_at_min and updated_at_max) query parameters which accept UNIX timestamps in milliseconds and will limit your results to only data within those ranges. Providing both a minimum and a maximum allows you to query data within a window of time.

Below is an example Python script that pulls all orders on May 25th, 2022:

Ingestion Script

1import json
2import requests
3import base64
4
5# client_id and client_secret can be found at https://dashboard.rutterapi.com/dashboard
6CLIENT_ID = ''
7CLIENT_SECRET = ''
8# access token can be found at https://dashboard.rutterapi.com/connections
9ACCESS_TOKEN = ''
10
11url = 'https://production.rutterapi.com/versioned/orders?access_token=' + ACCESS_TOKEN + '&created_at_min=1653436800000&created_at_max=1653523199000'
12
13headers = {
14    'Accept': 'application/json',
15    'Authorization': 'Basic %s' % base64.b64encode(str(CLIENT_ID + ':' + CLIENT_SECRET).encode()).decode(),
16    'X-Rutter-Version': '2024-08-31'
17}
18
19
20def main():
21    print('Fetching data...')
22    filtered_orders = []
23    response = json.loads(requests.get(url, headers=headers).text)
24    orders = response['orders']
25    for order in orders:
26        filtered_orders.append(order)
27    next_page = response['next_cursor']
28
29    while next_page:
30        new_response = json.loads(requests.get((url + '&cursor=' + next_page), headers=headers).text)
31        new_orders = new_response['orders']
32        for order in new_orders:
33            filtered_orders.append(order)
34
35        next_page = new_response['next_cursor']
36
37    return filtered_orders
38
39
40if __name__ == "__main__":
41    main()

Querying windows of time also allows you to parallelize your data pulls from Rutter and speed up your data fetch. For example, for large connections, to fetch the last year of data you can have 12 concurrent workers each ingest one month of data.

Storing Rutter Data

We’ve seen some customers break up Rutter’s data schema into two parts: required data (data used in your underwriting model) and non-required data (data you don’t use right now but might want to store). Required data can be enforced as part of a table schema. Non-required data can be placed in a JSON or JSONB column. This makes it so that any changes to non-required data don’t break your data ingestion process.

Important fields to note when persisting Rutter data include:

The id field is a Rutter generated id that is unique within each endpoint. A unique constraint should be created on the id and endpoint (or object type, e.g. bills) to prevent duplicate data in your datastore as rows are updated. In some cases, the same id can exist across different endpoints.
The platform_id field is a unique identifier that comes from the external platform. This can be used to reference the row directly in the external platform and is only unique within each connection.

Updating Rutter Data

New data is ingested into Rutter's system regularly and existing data also changes over time as actions are taken on the external platform, so it is strongly recommended by Rutter to regularly poll for updates to keep data in sync with Rutter's system. This can achieved by creating a cron job that runs at a regular interval (based on your company’s needs for data freshness).

Accounting

Accounting endpoints, prefixed with (/accounting), return objects with a field named last_synced_at which specifies the timestamp at which the object was last synced from the external platform. The query parameter last_synced_at_min expects a UNIX timestamp in milliseconds and will fetch data greater than or equal to that timestamp. To ensure the data is returned in ascending order, the sort query parameter can be provided with the value last_synced_at ASC. To ingest data without gaps, it is recommended to use both the last_synced_at_min and the sort parameters in your request to begin paginating from your last ingestion checkpoint. To recap, the steps to ingest data may look like the following:

Check for the last timestamp the ingestion script started.
Query a list endpoint, providing the last_synced_at_min parameter the timestamp and the sort parameter the value last_synced_at ASC.
Store the response data and extract the next_cursor.
Continue steps 2 and 3 while next_cursor is not null.

When data is deleted from the underlying platform, Rutter may sync those deletions and remove those objects from its system. Data that has been deleted in Rutter's system will no longer be exposed via the GET methods. To query for which objects have been deleted, you can refer to Monitoring Deleted Objects

Commerce

Commerce endpoints have an updated_at field which specifies the timestamp when the object was last updated on the external platform. This field comes directly from the external platform and can be used to filter for rows that have been updated since the last time your ingestion script has run. The steps to ingest data are similar to the ones listed above for accounting, but with the updated_at_min parameter instead. Data is also sorted by updated_at DESC by default, so the steps to ingest data may look like the following

Check for the last timestamp the ingestion script started.
Query a list endpoint, providing the updated_at_min parameter the timestamp.
Store the response data and extract the next_cursor.
Continue steps 2 and 3 while next_cursor is not null.

Webhooks

Customers may also leverage Rutter’s webhooks to be notified of updates to Rutter data. For more information, please see the Webhooks Quickstart page for information. Please not that it is not recommended to solely rely on Webhooks to ingest data from Rutter.