Data Import
Viafoura supports importing your existing user and comment data into the audience engagement platform. Viafoura stores its comment data in a unique way that is optimized for quick access and data resiliency. To facilitate this, clients will be required to provide their data in two standardized JavaScript Object Notation (JSON) formats, which are described in this document. Our system then parses this data and inserts it into your Viafoura instance.
The two components of data required for import are users, containers, comments, and likes. Users are fairly straightforward. Containers are the entities where comments relating to the same topic reside, such as comments on a webpage, and contain nested comments. Likes (and dislikes) are user reactions to comments.
Getting Started
After reading this document in its entirety, please provide us sample files for approximately 10,000 comments (in containers) and associated users for a test import. We will load this data into a test environment for us to validate the data format. After verification, we will provide you with additional instructions to upload the full data set to be imported.
JSON Format Requirements
All files must be valid JSON formatted.
Two types of files must be supplied:
_User Data File. Defines all users referenced in the following file as an array of users.
_Container, Comments, and Likes Data File. Defines containers that hold comments, the comments, and lists users who have liked the comments. Likes are optional.
It is recommended, due to the anticipated size of files, to not “pretty format” these files as this will significantly increase file sizes.
Standard JSON formatting rules apply. For example, to have a multi-line string value, you would need to use “\n” as a new-line character.
If a mandatory field is missing in a record, the record will not be imported.
Optional data items can be omitted from the JSON file; do not add optional items as : null, remove the key instead as null is not as same as not providing an optional value. But please keep in mind any comment associated with such items below, as a defaulted value might be supplied by our import process.
HTML tags, for example “
” or “<é” will be imported as-is; in other words, no translation will be done. This requirement simplifies using our API for non-web purposes, such as in mobile applications or when sending data in an email.
It is highly recommended that HTML tags are removed before data is written to JSON files. If not these tags will show as text instead of providing text formatting.
Date-Times must conform to ISO-8601 formats for a timestamp consisting of a date and time.
Examples:
- 2019-12-03T10:15:30
- 2019-12-03T10:15:30+01:00
- 2019-12-03T10:15:30+01:00[Europe/Paris]
Timestamps without an offset will be treated as UTC values, while offsets are treated as offsets relative to UTC.
Organizing Your Data
Large files (over 1GB) may be split into multiple files, though each file must be a complete file satisfying the requirements laid out in this document (you cannot ‘break’ file into two at an arbitrary position as the second file would be missing parent information).
Each file must be valid JSON on its own.
The user data files are the first files to be imported and have no dependencies on other files. Container, comments, and likes files will refer to users in these files.
Comments data should contain linked user data. The referring fields to a user must already be supplied earlier in the import files.
Users at Viafoura
Site groups and Syndication
Sites for a client of Viafoura's may be organized into one or more site groups. A site group allows users and content to be shared between multiple sites, enabling content syndication. If you wish to enable syndication across a site group in the future, the users you provide us must be the same across all those sites. In particular, the "sub" field for a user (described in detail later) must be the same across the sites.
The users for a site group can be supplied in multiple files, but they should all represent the same pool of users.
It is also possible to have multiple site groups if that better suits your needs, and in that case, please group your Users files appropriately.
If you have any questions, please contact Viafoura support.
Social Providers and Authentication
Viafoura supports integration with many external authentication sources. Broadly, authentication sources can be broken down into three categories:
- First Party users: These are users who exist only inside the Viafoura system, without an external login mechanism. These users log in using an email address.
- Second Party users: These users exist at the authentication provider. This may be an authentication provider that you host, or a login vendor such as LoginRadius, Gigya, or Janrain.
- Third Party users: These users log in via an authentication provider, but ultimately identify themselves by logging in elsewhere. An example would be a Google or Facebook user logging in via LoginRadius or Gigya.
It is possible to have a mix of all three users to import.
If you wish to import First Party users, then only an email address must be provided. We do not support importing passwords, and the users must request a password reset email when logging in for the first time after migration. You should communicate this to your users.
For Second Party users, we currently support LoginRadius, Gigya, Janrain, and custom cookie logins. Custom cookie logins receive a token on your webpage, pass it to our servers, then our servers pass it to your servers for validation. If your server accepts the token, it must return a JSON with fields describing each of a unique user id (sub), a display name (name), and optionally an email address. Please contact support for more information regarding cookie login.
For Third Party users, we support Apple, Facebook, Google, LinkedIn, Microsoft, Twitter, and Yahoo.
OpenID Connect (OIDC)
Viafoura supports OpenID Connect integration. Viafoura needs the well-known url of the ID provider as well as your client id. After these two are set up on Viafoura’s admin panel, we are able to accept token IDs and use them for authentication.
OpenID Connect Federation
Viafoura may implement OpenID Connection Federation once the specification is out of the draft stage.
User Data Format
Our user import format is based on OpenID Connect standard claims format.
If there are any errors, missing fields, or inconsistencies in the supplied JSON, the records will not be imported.
Field name | Required | Description |
---|---|---|
sub | Yes | A string between 1 and 250 characters long uniquely identifying a user in your system. The sub value should be unique across the site group. Note: This field is NOT case sensitive. If you are using a site group (or syndication), this field must be the same across all sites in the site group for this user. NOTE: This value should match the exact value sent to us from your auth platform in your OIDC token UID value or Cookie Token UID value. |
originating_provider | Yes | Specify the provider for the user. This is not about where the user data is coming from, but about which user login integration will be used after the import. Must be one of the following values: email (for First Party users) cookie (for custom cookie login) oidc janrain loginradius apple microsoft yahoo (Live and MSN should also be provided as “microsoft”) |
originating_provider_id | Yes | Specify the user's unique identifier at the provider. For First Party users, the value is the email address. For custom cookie login, the value is the sub itself, unless clients need to use a different value. This value should contain precisely the identifier and nothing more. In particular, it shouldn't be a URL. |
name | Yes | The display name of the user, a string between 1 and 250 characters long. If the name is not available, email may be used to form one. For example; [email protected] -> User, [email protected] -> User Name. Otherwise provide a default value like ‘Not Provided’. |
Yes, for First Party | User’s email address; for sending notifications, password resets, and so on. The email value should be unique across the site group. The value cannot exceed 250 characters. The value MUST conform to the enhanced version of the RFC 5322 section 3.4.1 addr-spec syntax that is based on Sandeep V. Tamhankar’s email validation method. May be omitted if not known for a user. Strongly recommended but not required. If not provided no value will be assigned. | |
email_verified | Specify only if email is supplied | A boolean true/false indicating if the email address is verified or not. If the email is provided, this field must be provided as well. |
created_at | Yes | ISO 8601 formatted date-time stamp. |
updated_at | Yes | ISO 8601 formatted date-time stamp. It must be equal or greater than created_at. |
ban_type | No | A string indicating the type of ban: none (same as not providing this field), no_posting (user can not post), no_login (user can not login), no_login_hide_posts (user can not login and all their content is hidden), ghost (user can login and posts, but their content is hidden to other users) If this field is not supplied but ban_expiry is provided, a default type of no_login will be used. |
ban_expiry | No | Specify an ISO 8601 formatted date-time stamp indicating the ban expiry time. If a ban_type is supplied and is not none, and ban_expiry is not supplied, the ban will be permanent. |
Example of User Data file
Filename: users.json
(pretty-formatted for clarity only)
{
"users":[
{
"sub":"daffy_duck",
"name":"Daffy Duck",
"email":"[email protected]",
"email_verified":true,
"created_at":"2020-10-03T10:15:30",
"updated_at":"2020-11-09T10:15:30",
"originating_provider":"email",
"originating_provider_id":"[email protected]",
"ban_type":"none"
},
{
"sub":"clark kent",
"name":"Clark Kent",
"email":"[email protected]",
"email_verified":false,
"created_at":"2020-10-03T10:16:30",
"updated_at":"2020-11-09T10:19:35",
"originating_provider":"facebook",
"originating_provider_id":"123456789ABC",
"ban_type":"no_posting",
"ban_expiry":"2021-10-03T10:16:30"
},
{
"sub":"freddy_104",
"name":"Fred Van der Vleet",
"created_at":"2020-10-13T11:16:30",
"updated_at":"2020-10-13T11:16:30",
"originating_provider":"cookie",
"originating_provider_id":"freddy_104"
},
{
"sub":"newbie",
"name":"Larry Cableguy",
"created_at":"2020-10-13T11:17:30",
"updated_at":"2020-10-13T11:17:30",
"originating_provider":"oidc",
"originating_provider_id":"my_oidc_id"
}
]
}
Containers, Comments and Likes Data Format
Likes are nested in comments, and comments are nested in containers. As such, each object type will be described separately.
If there are any errors, missing fields, or inconsistencies in the supplied JSON, the records will not be imported.
Containers Data Format
Field name | Required | Description |
---|---|---|
id | Yes | A string uniquely identifying a container in your system. This must match the vf-container-id supplied to a vf-conversations widget, or the vf:container_id HTML meta tag. Between 1 and 200 characters long. See content container documentation. |
url | Yes | The canonical URL of a page where the container appears. Used to generate links back to a page from our tools. This should match the vf:url or og:url meta tag on the canonical URL where the content will be displayed. Containers may be displayed on multiple pages as part of Content Syndication. |
title | Yes | The title of your container. This should match the first occurrence of the following tags in the HTML: vf:title meta tag, og:title meta tag, HTML title tag. Between 0 and 2048 characters. If not provided, this field will default to an empty string, "". |
description | No | The description of your container. This should match the first occurrence of the following tags in the HTML: vf:description meta tag, og:description meta tag, description meta tag. Between 0 and 2048 characters. If not provided, this field will default to an empty string, "" |
created_at | Yes | ISO 8601 formatted date-time stamp. If not provided, this field will default to a date-time stamp representing the time of import. |
updated_at | Yes | ISO 8601 formatted date-time stamp. Must be equal or greater than created_at, if specified. If unavailable, it’ll be set to the same value as created_at |
comments | Yes | A JSON list of comments as defined below. It may be empty and empty list (“[]”) |
Comments Data Format
Field Name | Required | Description |
---|---|---|
id | Yes | A string between 1 and 250 characters long uniquely identifying a comment in your system. |
sub | Yes | A string between 1 and 250 characters long matching the sub field of a user object in the User Data file(s). |
content | Yes | Comment body. Should not be encoded in any way:. no HTML, entities, double encoding, and so on, with the exception of standard JSON escaping. See the JSON Format Requirements section above. Blank comments will be ignored. |
created_at | Yes | ISO 8601 formatted date-time stamp. Must be equal to or greater than the parent object's created_at. Must be equal to or greater than the referenced user (sub) object's created_at. |
updated_at | Yes | ISO 8601 formatted date-time stamp. Must be equal or greater than created_at, if specified. If unavailable, set to the same value as created_at. |
status | Yes | The status of the comment. Must be one of: visible (show to the public), disabled (not shown), spam (not shown and marked as spam), or awaiting_moderation (still waiting to be approved by a moderator) A default value of visible is used if a value is not provided. |
likes | Yes | A JSON list of like objects. It may be empty. See below for the format. |
comments | Yes | A JSON list of replies in the same format. It may be an empty list (“[]”). (replies can also have replies themselves and can be infinitely nested) |
Likes Data Format
Note that it is invalid to provide more than one like object with the same sub for a comment.
Field Name | Required | Description |
---|---|---|
sub | Yes | A string between 1 and 250 characters long matching the sub field of a user object in the User Data file(s). |
created_at | Yes | ISO 8601 formatted date-time stamp. Must be equal to or greater than the parent object's created_at. Must be equal to or greater than the referenced user (sub) object's created_at. |
updated_at | Yes | ISO 8601 formatted date-time stamp. Must be equal or greater than created_at. |
status | Yes | A value of like or dislike. |
Example of Containers, Comments and Likes Data File
Filename: comments.json
(pretty-formatted for clarity only)
{
"containers": [
{
"id": "12345",
"": "https://example.com/comics/batman/batman.html",
"title": "This is the \"fantastic\" Batman",
"description": "The fantastic Batman",
"created_at": "2020-10-13T11:17:30",
"updated_at": "2020-10-13T11:18:30",
"comments": [
{
"id": "45678",
"sub": "daffy_duck",
"content": "I think it is important that all ducks quack and repel water",
"created_at": "2020-10-13T11:16:30",
"updated_at": "2020-10-14T11:16:30",
"status": "enabled",
"likes": [],
"comments": [
{
"id": "4343",
"sub": "mrsmith",
"content": "totally agreed!",
"created_at": "2020-11-13T11:16:30",
"updated_at": "2020-11-14T11:16:30",
"status": "enabled",
"likes": [
{
"sub": "sherlock_holmes",
"status": "like",
"created_at": "2020-10-23T10:16:30",
"updated_at": "2020-10-23T10:16:55"
}
],
"comments": []
}
]
},
{
"id": "45943",
"sub": "clark kent",
"content": "If you are superman, you can both quack, repel water and fly",
"created_at": "2020-10-15T11:16:31",
"updated_at": "2020-10-15T11:17:31",
"status": "enabled",
"likes": [
{
"sub": "freddy_104",
"status": "dislike",
"created_at": "2020-10-23T10:16:32",
"updated_at": "2020-10-23T10:16:32"
},
{
"sub": "sherlock_holmes",
"status": "like",
"created_at": "2020-10-25T10:17:30",
"updated_at": "2020-10-25T10:17:30"
}
],
"comments": []
}
]
},
{
"id": "9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d",
"sub": "clark kent",
"path": "https://example.com/comics/robin/robin.html",
"title": "I love Batman AND Robin",
"created_at": "2020-10-13T11:16:35",
"updated_at": "2020-10-13T11:16:40",
"likes": [],
"comments": [
{
"id": "0000eb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d",
"sub": "bob the third",
"content": "I call \"fowl\" on this stupid discussion",
"created_at": "2020-10-15T11:17:31",
"updated_at": "2020-10-15T11:17:31",
"status": "spam",
"likes": [],
"comments": []
}
]
}
]
}
Client-Side Validation Instructions For User Data
The client-side validation process may not provide 100% accuracy, but it drastically reduces the time & manual labour required to resolve the errors due to incorrectly crafted data files for our clients and ourselves. Please follow the following steps to set up your environment for client-side validation.
- Download & install node & npm
- Install the following libraries:
ajv-cli library -> npm install -g ajv-cli
ajv-keywords library -> npm install -g ajv-keywords
*ajv-formats library -> npm install -g ajv-formats
Download user schema
_It is suggested to initially run this validation on a small subset of data first to ensure that the data format written to the users.json file is in the correct format. Note that this validation only validates data formats and does not perform any validation relating to data relationships between different fields.
_Run the validation against the user schema and user data (the following commands assume that you have the schema fileuser_schema.json
and client’s user data file ‘users.json` located inside the current directory you’re running the commands at).
Commands to run validation:
ajv -s user_schema.json -d users.json --strict=false -c ajv-formats -c ajv-keywords --all-errors --errors=json > output.json 2>&1
ajv -s user_schema.json -d users.json --strict=false -c ajv-formats -c ajv-keywords --verbose --all-errors --errors=json > output.json 2>&1
If the above commands seem to hang, try removing the “-c ajv-keywords” and re-run.
If the above command completed with a message like this:
error: Cannot create a string longer than 0x1fffffe8 characters
The file to validate is too big;
the validator cannot be used to verify this file.
Check the produced output.json file for errors.
Client-Side Validation Instructions For Comment Data
As mentioned in the section “Client-Side Validation Instructions For User Data”, download and install node & npm, ajv-cli, ajv-keywords, ajv-formats.
Download comments/likes schema
Run the validation against the comment schema and use data (the following commands assume that you’ve the schema file comment_schema.json
and the comment data file ‘comments.json` located inside the current directory you’re running the commands at):
ajv -s comment_schema.json -d comments.json --strict=false -c ajv-formats -c ajv-keywords --all-errors --errors=json > output.json 2>&1
ajv -s comment_schema.json -d comments.json --strict=false -c ajv-formats -c ajv-keywords --verbose --all-errors --errors=json > output.json 2>&1
If the above commands seem to hang, try removing the “-c ajv-keywords” and re-run.
Check the produced output.json file for errors.
Known differences between client-side and server-side validator
Users:
While the server-side validator uses Apache Commons’ email validator that goes beyond the RFC 5322 addr-spec syntax, client-side validator uses RFC 5322. So emails like [email protected] or [email protected] would be evaluated as valid emails for the client-side validator, while they would be invalid for the server-side validator. These users would be ignored by the importer during the actual import process.
Certain value comparisons for dates would only be validated by the server-side validator, for example; updated_at should be later than created_at, ban_expiry should be later than created_at. These cases would be reported at the dry-run that we perform before the actual run. You would need to fix these issues and resend the updated files.
Containers/Comments/Likes:
We suggest you initially run this validation on a small subset of data first to ensure that the data format written to the comments.json file is in the correct format. Note that this validation only validates data formats and does not perform any validation relating to data relationships between different fields and between users.json and comments.json..
Value comparisons for dates as mentioned above also apply here.
There are no semantic checks for the content property of the comments to identify whether html tags are used or not.
Common Errors When Importing Data
- The originating_provider values should be submitted according to their description. Providing incorrect values delay the process, and may require you to modify & resubmit the files.
- Optional values shouldn’t be set to null, please remove them from the file instead. This would semantically fix the problem, but also reduce the size of the import file.
- Character length constraints should be followed. For example, a user name’s length should not exceed 250 characters.
- Name field can be set to a value like
Not Provided
if it’s not available. Specifying null or empty strings won’t be enough. - The updated_at dates should come after the created_at dates.
- If the email field is provided, this email_verified should be provided as well. The email_verified property shouldn’t be provided if email property is not specified.
- Content value for the comments should not include html tags.
- Specifying the same container with the same id but different set of comments in different places within the import file causes the importer to use update mode instead of the insert mode. This is a recoverable behavior for us but it complicates the import file, reduces performance of the importer and increases the import file size due to duplicate values. Merging these records into one entry per the same container would fix all of these issues at once.
- A message indicating that the file is too big means the validator cannot use this file. We recommend keeping your files under 100MB each.
Updated 3 months ago