Exporting raw data via the API
Exports raw data to downloadable files. Data is split into roughly 256 MB files, and is not necessarily ordered. Each file contains 1 line per session, with each session JSON-encoded. Export files are automatically deleted 24 hours after export occurs. Data becomes available to export every 2 to 4 hours, and only for complete sessions. You cannot export data that has become available more than 60 days ago. You may only export data 24 times per day. Exports with invalid arguments do not count towards this limit.
To construct an api call to export raw data, you will need, at least, the following:
- APP_ID_KEY - Can be found in the keys & settings of your app.
- EXPORT_KEY - Can be found in the keys & settings of your app.
- startDate - this is the date that you would want to export data from.
And your API call will be constructed to look like the following:
The following arguments are optional:
|endDate||String||Last date in range to include in PDT/PST (format: YYYYmmdd). Defaults to startDate if not provided. Example: 20141118|
|startTime||Number||First time (when data became available) to include (seconds since midnight UTC on January 1, 1970). If not provided, accepts all times before endTime, or all times if endTime is also not provided. The main use is to set this to the last time you exported data to only get the new data since your last export.|
|endTime||Number||Last time (when data became available) to include (seconds since midnight UTC on January 1, 1970). If not provided, accepts all times after startTime, or all times if startTime is also not provided.|
|callbackUrl||String||URL to POST a response to when the export completes. The response is the response format of getExportResults.|
|exportFormat||String||The format to export data. Can be either json or csv. Default: json.|
|s3BucketName||String||The name of an AWS S3 bucket to copy exported files to.|
|s3AccessId||String||The AWS Access ID used to authenticate to S3. Required if s3BucketName is set.|
|s3AccessKey||String||The AWS Secret Access Key used to authenticate to S3. Required if s3BucketName is set.|
|s3ObjectPrefix||String||An optional prefix of files to write to S3. Example: dirname/ to write files to a directory within the S3 bucket.|
|compressData||Boolean||An option to compress the data. Only works when uploading to S3. If set to true, the files will be compressed using gzip before being uploaded.|
|jobId||String||The job ID of the pending export job, if data matching supplied arguments is available.|
To then get the files with the jobId, the call would be constructed in the following manner:
The getExportResults call should be repeated periodically (e.g. every minute) until the state is either FAILED or FINISHED. If the job is failed it should be retried. If the job is finished, files can be consumed from the provided URI-s.
The resulting file will give you data about a user's session such as their device, the events that they've triggered, userAttributes and etc. The file will either be in JSON or CSV format, depending on how you construct the call.
To automate this process with a Python Script, see more here.
Historical export is, for example, an export of all sessions data for a specific historical period. This is usually an ad-hoc export. Data can be exported for up to 60 in the past.
Parameters startDate and endDate are used to specify the historical period. All currently available sessions that started between 00:00:00 PST on startDate and 23:59:59 PST on endDate will be exported.
Export sessions data continuously as it become available in Leanplum.
We strongly recommend using auto-export for this use case instead of relying on the exportData API.
Auto-export can be configured following this article
To implement this use case with the Leanplum API, one needs to maintain a timestamp for the last successful export. The exportData requests should be made periodically (e.g. every 2 hours or daily). Any subsequent exportData request should only be made if the last export request status is FINISHED or FAILED, otherwise some data may be exported twice.
The exportData parameters should be configured as follows:
|Parameter||Value||Example value (assuming now = 2019-05-08 10:00:00 UTC)||Notes|
|startDate||now -8 days in PST||20190501||To capture offline sessions|
|endDate||now +2 days in PST||20190510|
|startTime||the endTime of the last successful export job||1557302400||This value should be maintained (remembered) and updated|
|endTime||now (epoch seconds)||1557309600||To capture scheduled sends|
Wait for job completion
|status||a job transitions through PENDING, RUNNING states and after that it is either FINISHED or FAILED|
|numSessions||number of exported sessions|
|numBytes||size of exported data|
|files||list of URI-s of exported files|
Repeat this call periodically (e.g. every minute) until the state is either FAILED or FINISHED. Retry the job if its status is FAILED. If the job status is FINISHED, you can consume the files from the provided URI-s.
Updated over 1 year ago