Syncing Rutter Data to Your Database
Customers who leverage large quantity of data in order to execute their business decisions may pull Rutter data and store it themselves on their own cloud services (database on AWS, GCP, Azure, etc). This workflow enables them to run complex calculations and analyses at a larger scale much more effectively.
Requirements
Database
Given that the commerce and accounting models include multiple relations that you might want to take advantage of in your queries, we’d recommend using a relational database like MySQL or Postgres.
Be ready for large amounts of data from some connections. We’ve seen some merchants with over a million orders in a year. This can lead tables to fill up quickly so we recommend planning according to the growth you expect to achieve.
Server
In order to fetch Rutter data, most of our customers will rely on a queue and worker system. When a new connection has its data ready, you can add the connection to a queue (such as Amazon SQS or RabbitMQ), and then workers running on any basic server can ingest messages from the queue and start the data fetching process.
This worker and queue system also helps with any incremental updates you want to do.
Fetching Rutter Data
Rutter offers a REST API to fetch data. A request to an endpoint that lists objects will return the first page of data, with a next_cursor
field that points to the next page. In order to fetch all data, you should continue to make requests with each next_cursor
until it is null
. You can also filter data by date. Endpoints may offer created_at_min
and created_at_max
(or updated_at_min
and updated_at_max
) query parameters which accept UNIX timestamps in milliseconds and will limit your results to only data within those ranges. Providing both a minimum and a maximum allows you to query data within a window of time.
Below is an example Python script that pulls all orders on May 25th, 2022:
1import json
2import requests
3import base64
4
5# client_id and client_secret can be found at https://dashboard.rutterapi.com/dashboard
6CLIENT_ID = ''
7CLIENT_SECRET = ''
8# access token can be found at https://dashboard.rutterapi.com/connections
9ACCESS_TOKEN = ''
10
11url = 'https://production.rutterapi.com/versioned/orders?access_token=' + ACCESS_TOKEN + '&created_at_min=1653436800000&created_at_max=1653523199000'
12
13headers = {
14 'Accept': 'application/json',
15 'Authorization': 'Basic %s' % base64.b64encode(str(CLIENT_ID + ':' + CLIENT_SECRET).encode()).decode(),
16 'X-Rutter-Version': '2024-08-31'
17}
18
19
20def main():
21 print('Fetching data...')
22 filtered_orders = []
23 response = json.loads(requests.get(url, headers=headers).text)
24 orders = response['orders']
25 for order in orders:
26 filtered_orders.append(order)
27 next_page = response['next_cursor']
28
29 while next_page:
30 new_response = json.loads(requests.get((url + '&cursor=' + next_page), headers=headers).text)
31 new_orders = new_response['orders']
32 for order in new_orders:
33 filtered_orders.append(order)
34
35 next_page = new_response['next_cursor']
36
37 return filtered_orders
38
39
40if __name__ == "__main__":
41 main()
Querying windows of time also allows you to parallelize your data pulls from Rutter and speed up your data fetch. For example, for large connections, to fetch the last year of data you can have 12 concurrent workers each ingest one month of data.
Storing Rutter Data
We’ve seen some customers break up Rutter’s data schema into two parts: required data (data used in your underwriting model) and non-required data (data you don’t use right now but might want to store). Required data can be enforced as part of a table schema. Non-required data can be placed in a JSON or JSONB column. This makes it so that any changes to non-required data don’t break your data ingestion process.
Important fields to note when persisting Rutter data include:
- The
id
field is a Rutter generated id that is unique within each endpoint. A unique constraint should be created on theid
and endpoint (or object type, e.g.bills
) to prevent duplicate data in your datastore as rows are updated. In some cases, the sameid
can exist across different endpoints. - The
platform_id
field is a unique identifier that comes from the external platform. This can be used to reference the row directly in the external platform and is only unique within each connection.
Updating Rutter Data
New data is ingested into Rutter's system regularly and existing data also changes over time as actions are taken on the external platform, so it is strongly recommended by Rutter to regularly poll for updates to keep data in sync with Rutter's system. This can achieved by creating a cron job that runs at a regular interval (based on your company’s needs for data freshness).
Accounting
Accounting endpoints, prefixed with (/accounting
), return objects with a field named last_synced_at
which specifies the timestamp at which the object was last synced from the external platform. The query parameter last_synced_at_min
expects a UNIX timestamp in milliseconds and will fetch data greater than or equal to that timestamp. To ensure the data is returned in ascending order, the sort
query parameter can be provided with the value last_synced_at ASC
. To ingest data without gaps, it is recommended to use both the last_synced_at_min
and the sort
parameters in your request to begin paginating from your last ingestion checkpoint. To recap, the steps to ingest data may look like the following:
- Check for the last timestamp the ingestion script started.
- Query a list endpoint, providing the
last_synced_at_min
parameter the timestamp and thesort
parameter the valuelast_synced_at ASC
. - Store the response data and extract the
next_cursor
. - Continue steps 2 and 3 while
next_cursor
is notnull
.
When data is deleted from the underlying platform, Rutter may sync those deletions and remove those objects from its system. Data that has been deleted in Rutter's system will no longer be exposed via the GET
methods. To query for which objects have been deleted, you can refer to Monitoring Deleted Objects
Commerce
Commerce endpoints have an updated_at
field which specifies the timestamp when the object was last updated on the external platform. This field comes directly from the external platform and can be used to filter for rows that have been updated since the last time your ingestion script has run. The steps to ingest data are similar to the ones listed above for accounting, but with the updated_at_min
parameter instead. Data is also sorted by updated_at DESC
by default, so the steps to ingest data may look like the following
- Check for the last timestamp the ingestion script started.
- Query a list endpoint, providing the
updated_at_min
parameter the timestamp. - Store the response data and extract the
next_cursor
. - Continue steps 2 and 3 while
next_cursor
is notnull
.
Webhooks
Customers may also leverage Rutter’s webhooks to be notified of updates to Rutter data. For more information, please see the Webhooks Quickstart page for information. Please not that it is not recommended to solely rely on Webhooks to ingest data from Rutter.