Data Export

Overview

This describes files shared in Raw Data Export, which exports messages published
by users on Viafoura widgets, along with other information such as users and likes on a daily
basis. The files are generated at approximatly 6am UTC every day.

There are three main files shared by default:

  • comments content
  • user_information
  • container_ids

There are a few optional files that we can provide upon request:

  • moderation_assessments
  • user_assessments
  • poll_actions
  • engage_time

This document is intended to give more context about the data contained in each column of
those files, and to provide information on how to access the data and connect information
across files.

Getting Access

We offer two standard options to enable access to raw data export in our S3 bucket:

  • S3 access via client IAM Role: If the client could provide us with an AWS IAM role, we
    can allow access to client specific S3 bucket in our data export account. Specifically, the
    role would have permissions to list objects in the bucket ("s3:ListBucket") and download
    each object ("s3:GetObject").
  • SFTP: The client could access the bucket via SFTP , in which case we would require a
    public key from the machine(s) that will access via SFTP.

More information would be shared with the client once everything has been set up on our end.
In most cases, daily files would be stored in the "RawData" folder and historical data would be
shared in the "HistoricalData" folder.

File Specifications

Daily files are generated based on raw data events collected on the previous day without
advanced processing. It is recommended that the client should create tables to store historical
data and update records as new data comes in as 1) some information could change over time,
such as username, and 2) some identifiers could appear on different days in different files,
making it hard to connect data across different files using just one day’s data.

In most of the files, each line is an action, uniquely identified by event_uuid field. Actions capture interactions performed by actors, and those actors might be users, moderators, applications and APIs (among others). Users refer to registered users unless stated otherwise.
All timestamps are in UTC.

Note that not all data events are captured so not all information would be available in the files.
For example, an updated username would only show up in the user_information file if the user’s logged in status changes.

Files are plain text extractions with the following characteristics:
● Format: CSV
● Delimiter: ","
● Text qualifier: """
● Contains header: True
● Encryption: None
● File name: Each file name would have a prefix based on file creation time in UTC in the
format of "YYYYMMDD_HHMMSS_" like "20231005_060126_"

Comments Content

This file contains all messages published by users, whether they are the main thread comment
(the message that started the sequence of replies) or a reply (a message linked to a main
comment), as well as other actions performed on the comments.

Fields information

NameDescriptionFormat
content_uuidId of the content item that was interacted onUUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-036
8f9a80495
container_uuidId of the widget where the action took placeUUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-036
8f9a80495
payload_parent_uuidThe UUID of the post to which this is a reply, or the container_uuid if it is top levelUUID (Universally unique
identifier) ex. 00000000-0000-4000-8000-036
8f9a80495
actor_uuidID of the user or actor who caused the interaction. This
could be a User, a Moderator or an internal ID
UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-036
8f9a80495
event_uuidUnique identifier of the eventUUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-036
8f9a80495
site_nameName of the site (or property)String
ex. https://www.somepage.com
message_typeDefines the type of message where the action took place. String

Main posts have these values:
livecomment _post
chat_message
liveblog_post
livereview _post

Reply posts have these values:
reply_to_livecomment_post
reply_to_chat_message
reply_to_liveblog_post
reply_to_livereview_post
actionAction that generated the entry. String

There are several possible values and they are displayed in the Action table below.
payload_metadata_origin_urlDisplay the entire URLString
ex. https://www.somepage.com
timestampTimestamp of the event that results in an entryTimestamp
'2020-10-13 13:28:06.340'
payload_contentThe text published by the userString
ex. 'Best wishes.'

List of actions available (not all may be in use)

ActionMeaning
pinnedwhen a post is pinned to the top of the thread
updatedwhen a post is edited
dislikedwhen a post is disliked or down voted by a user
createdwhen a post is created
pickedwhen a post has been selected, as an editor's pick
unlikedwhen a like to a post is reverted
disabledwhen a message is removed from view and general public cannot see it anymore
unpickedwhen a post has been removed, as an editor's pick
deletedwhen a user has voluntarily deleted their own post
likedwhen a post is liked
undislikedwhen a dislike to a post is reverted
flags_clearedwhen a moderator clears flags of a post
flaggedwhen a user flags a post
spammedwhen a post is marked as spam
visiblewhen a post is visible to the general public (barring mutes, ghost bans)
unpinnedwhen a post is unpinned from top of a thread

User Information

This file provides additional information about the users that interacted with Viafoura widgets,
including username and third party id if available.

Fields information

NameDescriptionFormat
user_idId of the user that made the actionBig Int
ex. 8081300019356
actor_uuidID of the user or actor who caused the interaction. This could be a User, a Moderator or an internal IDUUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495
third_party_idThis is the actor ID when user creates an account using third
party services, such as Google, Facebook, LoginRadius
UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495
usernameDisplays the username chosen by the userString
'Some Name'

Container IDs

This file lists containers created or updated on a given day. A container is where comments
could be posted.

Fields information

NameDescriptionFormat
container_uuidID of the widgetUUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495
site_nameName of the site (or property)String
ex. https://www.somepage.com
payload_container_idId of the widget in a different
format
Big Int
ex. 8081300019356

Moderation Assessments

This file provides information about moderation assessments on comments.

Fields information

NameDescriptionFormat
site_nameName of the site (or property)String
ex. https://www.somepage.com
assessmentOutcome of assessment. String

Possible Values:
approved
deferred
rejected
assessment_typeType of assessment.String

Possible Values:
content_moderation
spam_detection
flag_moderation
content_container_uuidUUID of container the content
was in (livechat id, page id etc)
UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495
content_source_typeType of content being
moderated (see message_type
in Comments Content)
String
entity_uuidUUID of content being
moderated
UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495
providerThe entity performing this
assessment.
String

Possible Values:
Human moderations have these
values:
console
human

Auto moderations have these
values:
automod_service
automod (deprecated)
keepcon (deprecated)

Auto spam detection has this
value:
spam_service
provider_decisionThe decision made by the
assessment service before
settings were applied to it
String
ex. 'approved'
section_uuidUnique site identifierUUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495
tagsTags attached to the assessmentString
'[{"confidence":0.000000000000
000e+00, "status":"confirmed", "tag":"personal_attack"}]'
timestampTimestamp of the event that
results in an entry
Timestamp
'2020-10-13 13:28:06.340'

User Moderations

This file provides information about user moderations including user bans, avatar moderations
and username moderations.

Fields information

NameDescriptionFormat
site_nameName of the site (or property)
timestampTimestamp of the event that results in an entry
event_typeEvent type.String

Possible Values:
ban.user (user bans)
user.moderate (avatar and
username moderations)
content_typeThe type of content to which
interaction is related.
String

Possible Values:
username
avatar
actor_uuidID of the user or actor who caused the interaction. This
could be a Moderator or an internal ID
UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495
event_uuidUnique identifier of the eventUUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495
interaction_statusStatus of username/avatarString

Possible Values:
approved
rejected

Poll Actions

This file provides information about management of polls and engagements in polls.

Fields information

NameDescriptionFormat
timestampTimestamp of the event that results in an entryTimestamp
'2020-10-13 13:28:06.340'
site_nameName of the site (or property)String
ex. https://www.somepage.com
container_uuidUUID of container the content
was in (livechat id, page id etc)
UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495
poll_uuidUnique poll identifierUUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495
poll_titlePoll title, only available with management event_typeString
user_idID of the user or actor who caused the interaction. This could be a registered or anonymous User, a Moderator or an internal IDUUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495
event_uuidUnique identifier of the eventUUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495
event_typeEvent type.String

Possible Values:
engagement (votes)
management
event_actionEvent action.Strting

Possible Values:
Possible values when
event_type is 'engagement':
vote

Possible values when
event_type is 'management':
publish
close
delete
voter_picked_option_uuidPoll option picked by voter, only
available when event_type is
'engagement'
UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495
poll_optionsArray containing poll options (option order, text, uuid and votes received), only available when event_type is 'management'Array

[ { "option": "Monday" , "00000000-0000-4000-8000-0368f9a80495" , "order": 1, "votes": 0 }, { "option": "Tuesday" , "optionUuid": "00000000-0000-4000-8000-036 8f9a80496" , "order": 2, "votes": 0 } ]
poll_close_timestampTimestamp of when poll should end (if applicable), only available when event_type is 'management'Timestamp
'2020-10-13 13:28:06'

Engage Time

This file provides information about users time on site and time in comments.

Fields information

NameDescriptionFormat
dayDate of eventTimestamp
'2020-10-13'
site_nameName of the site (or property)String
ex. https://www.somepage.com
actor_uuidID of the user or actor who caused the interaction. This could be a User, a Moderator or an internal IDUUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495
view_uniqueidFirst party tracking cookie ID, unique identifier of the anonymous user who caused the interactionUUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495
time_on_pageTime user spent on site (in milliseconds)Big Int
ex. 141192
time_in_commentsTime user spent in commenting
widgets (in milliseconds)
Big Int
ex. 141192

Entity Relationship Diagram

The Entity Relationship Diagram (ERD) the files.

Frequently Asked Questions

How can we resolve user id to email addresses?

Both user_id and actor_uuid are internal to Viafoura. To map to users in your system, you can use username or third_party_id in user information files if available. We can also add a column for email address if required (sharing sensitive PII data should be avoided in most cases).

How do I generate a public key for data access through SFTP?

Please follow instructions in
https://docs.aws.amazon.com/transfer/latest/userguide/key-management.html#sshkeygen.
You need to share the public key with us and use the private key for data access.

How do I read multi-line comments?

The files we share are .csv files, so the file should be opened/read as .csv in order to be displayed or processed correctly. For example, you could open the file with Microsoft Excel, or programmatically read the file using CSV parsers.

Why do old articles show up in recent container_ids files?

Clients need to add comments/conversations code on a page for it to appear (see
https://documentation.viafoura.com/docs/new-conversations#step-2-add-the-conversations-c
ode-to-your-page). Comment containers could be missing if the code has not been deployed on a page. Container creation for comments is purely controlled by the client and we capture container creation events instantly.

Why do some actor_uuid appear in comments content file but not in user_information file on the same day?

The files are created based on different events as not all data fields are available in all the events, and they serve different purposes. The user_information file is only intended to provide additional user information such as username. If desired, one can create a table to store the latest user information per actor_uuid from all historical user_information files and use the table to look up user information.

Why do some container_uuid appear in comments_content file but not in container_ids file on the same day?

The files are created based on different events. Each day’s container_ids file only contains information on newly created or updated containers. One should be able to find the container_uuid if looking at all historical container_ids files as a whole. For example, if a user created a comment on 11 October in a comment container created on 10 October, the container’s id would appear in the comments content file for 11 October and container_ids file for 10 October.

Why is there a difference between the number of comments from the file
and the number on the website?

This could be attributed to a few different factors. First of all, it depends on how the number of comments is extracted from the files. Secondly, a comment could go through a few different moderation processes and its visibility could change depending on both content status (created and awaiting moderation/visible/disabled/spammed/deleted) and user status (deleted/banned). User status could affect the visibility of both the user’s comments and replies to these comments. It has never been considered in reporting the number of comments for analytics purposes as it could change with time and counting the number of comments
created is enough as a measure of user engagement.