Data Export

Overview

This describes files shared in Raw Data Export, which exports messages published
by users on Viafoura widgets, along with other information such as users and likes on a daily
basis. The files are generated at approximatly 6am UTC every day.

There are three main files shared by default:

  • comments_content
  • user_information
  • container_ids

There are a few optional files that we can provide upon request:

  • moderation_assessments
  • user_assessments
  • poll_actions
  • engage_time

This document is intended to give more context about the data contained in each column of
those files, and to provide information on how to access the data and connect information
across files.

Getting Access

We offer two standard options to enable access to raw data export in our S3 bucket:

  • S3 access via client IAM Role: If the client could provide us with an AWS IAM role, we
    can allow access to client specific S3 bucket in our data export account. Specifically, the
    role would have permissions to list objects in the bucket ("s3:ListBucket") and download
    each object ("s3:GetObject").
  • SFTP: The client could access the bucket via SFTP , in which case we would require a
    public key from the machine(s) that will access via SFTP.

More information would be shared with the client once everything has been set up on our end.
In most cases, daily files would be stored in the "RawData" folder and historical data would be
shared in the "HistoricalData" folder.

File Specifications

Daily files are generated based on raw data events collected on the previous day without
advanced processing. It is recommended that the client should create tables to store historical
data and update records as new data comes in as 1) some information could change over time,
such as username, and 2) some identifiers could appear on different days in different files,
making it hard to connect data across different files using just one day’s data.

In most of the files, each line is an action, uniquely identified by event_uuid field. Actions capture interactions performed by actors, and those actors might be users, moderators, applications and APIs (among others). Users refer to registered users unless stated otherwise.
All timestamps are in UTC.

Note that not all data events are captured so not all information would be available in the files.
For example, an updated username would only show up in the user_information file if the user’s logged in status changes.

Files are plain text extractions with the following characteristics:
● Format: CSV
● Delimiter: ","
● Text qualifier: """
● Contains header: True
● Encryption: None
● File name: Each file name would have a prefix based on file creation time in UTC in the
format of "YYYYMMDD_HHMMSS_" like "20231005_060126_"

Comments Content

This file contains all messages published by users, whether they are the main thread comment
(the message that started the sequence of replies) or a reply (a message linked to a main
comment), as well as other actions performed on the comments.

Fields information

Name

Description

Format

content_uuid

Id of the content item that was interacted on

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-036
8f9a80495

container_uuid

Id of the widget where the action took place

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-036
8f9a80495

payload_parent_uuid

The UUID of the post to which this is a reply, or the container_uuid if it is top level

UUID (Universally unique
identifier) ex. 00000000-0000-4000-8000-036
8f9a80495

actor_uuid

ID of the user or actor who caused the interaction. This
could be a User, a Moderator or an internal ID

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-036
8f9a80495

event_uuid

Unique identifier of the event

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-036
8f9a80495

site_name

Name of the site (or property)

String
ex. https://www.somepage.com

message_type

Defines the type of message where the action took place.

String

Main posts have these values:
livecomment _post
chat_message
liveblog_post
livereview _post

Reply posts have these values:
reply_to_livecomment_post
reply_to_chat_message
reply_to_liveblog_post
reply_to_livereview_post

action

Action that generated the entry.

String

There are several possible values and they are displayed in the Action table below.

payload_metadata_origin_url

Display the entire URL

String
ex. https://www.somepage.com

timestamp

Timestamp of the event that results in an entry

Timestamp
'2020-10-13 13:28:06.340'

payload_content

The text published by the user

String
ex. 'Best wishes.'

List of actions available (not all may be in use)

ActionMeaning
pinnedwhen a post is pinned to the top of the thread
updatedwhen a post is edited
dislikedwhen a post is disliked or down voted by a user
createdwhen a post is created
pickedwhen a post has been selected, as an editor's pick
unlikedwhen a like to a post is reverted
disabledwhen a message is removed from view and general public cannot see it anymore
unpickedwhen a post has been removed, as an editor's pick
deletedwhen a user has voluntarily deleted their own post
likedwhen a post is liked
undislikedwhen a dislike to a post is reverted
flags_clearedwhen a moderator clears flags of a post
flaggedwhen a user flags a post
spammedwhen a post is marked as spam
visiblewhen a post is visible to the general public (barring mutes, ghost bans)
unpinnedwhen a post is unpinned from top of a thread

User Information

This file provides additional information about the users that interacted with Viafoura widgets,
including username and third party id if available.

Fields information

Name

Description

Format

user_id

Id of the user that made the action

Big Int
ex. 8081300019356

actor_uuid

ID of the user or actor who caused the interaction. This could be a User, a Moderator or an internal ID

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495

third_party_id

This is the actor ID when user creates an account using third
party services, such as Google, Facebook, LoginRadius

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495

username

Displays the username chosen by the user

String
'Some Name'

Container IDs

This file lists containers created or updated on a given day. A container is where comments
could be posted.

Fields information

Name

Description

Format

container_uuid

ID of the widget

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495

site_name

Name of the site (or property)

String
ex. https://www.somepage.com

payload_container_id

Id of the widget in a different
format

Big Int
ex. 8081300019356

Moderation Assessments

This file provides information about moderation assessments on comments.

Fields information

Name

Description

Format

site_name

Name of the site (or property)

String
ex. https://www.somepage.com

assessment

Outcome of assessment.

String

Possible Values:
approved
deferred
rejected

assessment_type

Type of assessment.

String

Possible Values:
content_moderation
spam_detection
flag_moderation

content_container_uuid

UUID of container the content
was in (livechat id, page id etc)

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495

content_source_type

Type of content being
moderated (see message_type
in Comments Content)

String

entity_uuid

UUID of content being
moderated

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495

provider

The entity performing this
assessment.

String

Possible Values:
Human moderations have these
values:
console
human

Auto moderations have these
values:
automod_service
automod (deprecated)
keepcon (deprecated)

Auto spam detection has this
value:
spam_service

provider_decision

The decision made by the
assessment service before
settings were applied to it

String
ex. 'approved'

section_uuid

Unique site identifier

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495

tags

Tags attached to the assessment

String
'ew**

This describes files shared in Raw Data Export, which exports messages published
b'

timestamp

Timestamp of the event that
results in an entry

Timestamp
'2020-10-13 13:28:06.340'

User Moderations

This file provides information about user moderations including user bans, avatar moderations
and username moderations.

Fields information

Name

Description

Format

site_name

Name of the site (or property)

timestamp

Timestamp of the event that results in an entry

event_type

Event type.

String

Possible Values:
ban.user (user bans)
user.moderate (avatar and
username moderations)

content_type

The type of content to which
interaction is related.

String

Possible Values:
username
avatar

actor_uuid

ID of the user or actor who caused the interaction. This
could be a Moderator or an internal ID

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495

event_uuid

Unique identifier of the event

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495

interaction_status

Status of username/avatar

String

Possible Values:
approved
rejected

Poll Actions

This file provides information about management of polls and engagements in polls.

Fields information

Name

Description

Format

timestamp

Timestamp of the event that results in an entry

Timestamp
'2020-10-13 13:28:06.340'

site_name

Name of the site (or property)

String
ex. https://www.somepage.com

container_uuid

UUID of container the content
was in (livechat id, page id etc)

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495

poll_uuid

Unique poll identifier

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495

poll_title

Poll title, only available with management event_type

String

user_id

ID of the user or actor who caused the interaction. This could be a registered or anonymous User, a Moderator or an internal ID

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495

event_uuid

Unique identifier of the event

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495

event_type

Event type.

String

Possible Values:
engagement (votes)
management

event_action

Event action.

Strting

Possible Values:
Possible values when
event_type is 'engagement':
vote

Possible values when
event_type is 'management':
publish
close
delete

voter_picked_option_uuid

Poll option picked by voter, only
available when event_type is
'engagement'

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495

poll_options

Array containing poll options (option order, text, uuid and votes received), only available when event_type is 'management'

Array

[  
{  
"option": "Monday"  
,  
"00000000-0000-4000-8000-0368f9a80495"  
,  
"order": 1,  
"votes": 0  
},  
{  
"option": "Tuesday"  
,  
"optionUuid":  
"00000000-0000-4000-8000-036  
8f9a80496"  
,  
"order": 2,  
"votes": 0  
}  
]

poll_close_timestamp

Timestamp of when poll should end (if applicable), only available when event_type is 'management'

Timestamp
'2020-10-13 13:28:06'

Engage Time

This file provides information about users time on site and time in comments.

Fields information

Name

Description

Format

day

Date of event

Timestamp
'2020-10-13'

site_name

Name of the site (or property)

String
ex. https://www.somepage.com

actor_uuid

ID of the user or actor who caused the interaction. This could be a User, a Moderator or an internal ID

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495

view_uniqueid

First party tracking cookie ID, unique identifier of the anonymous user who caused the interaction

UUID (Universally unique
identifier)
ex. 00000000-0000-4000-8000-0368f9a80495

time_on_page

Time user spent on site (in milliseconds)

Big Int
ex. 141192

time_in_comments

Time user spent in commenting
widgets (in milliseconds)

Big Int
ex. 141192

Entity Relationship Diagram

The Entity Relationship Diagram (ERD) the files.

Frequently Asked Questions

How can we resolve user id to email addresses?

Both user_id and actor_uuid are internal to Viafoura. To map to users in your system, you can use username or third_party_id in user information files if available. We can also add a column for email address if required (sharing sensitive PII data should be avoided in most cases).

How do I generate a public key for data access through SFTP?

Please follow instructions in
https://docs.aws.amazon.com/transfer/latest/userguide/key-management.html#sshkeygen.
You need to share the public key with us and use the private key for data access.

How do I read multi-line comments?

The files we share are .csv files, so the file should be opened/read as .csv in order to be displayed or processed correctly. For example, you could open the file with Microsoft Excel, or programmatically read the file using CSV parsers.

Why do old articles show up in recent container_ids files?

Clients need to add comments/conversations code on a page for it to appear (see
https://documentation.viafoura.com/docs/new-conversations#step-2-add-the-conversations-c
ode-to-your-page). Comment containers could be missing if the code has not been deployed on a page. Container creation for comments is purely controlled by the client and we capture container creation events instantly.

Why do some actor_uuid appear in comments content file but not in user_information file on the same day?

The files are created based on different events as not all data fields are available in all the events, and they serve different purposes. The user_information file is only intended to provide additional user information such as username. If desired, one can create a table to store the latest user information per actor_uuid from all historical user_information files and use the table to look up user information.

Why do some container_uuid appear in comments_content file but not in container_ids file on the same day?

The files are created based on different events. Each day’s container_ids file only contains information on newly created or updated containers. One should be able to find the container_uuid if looking at all historical container_ids files as a whole. For example, if a user created a comment on 11 October in a comment container created on 10 October, the container’s id would appear in the comments content file for 11 October and container_ids file for 10 October.

Why is there a difference between the number of comments from the file
and the number on the website?

This could be attributed to a few different factors. First of all, it depends on how the number of comments is extracted from the files. Secondly, a comment could go through a few different moderation processes and its visibility could change depending on both content status (created and awaiting moderation/visible/disabled/spammed/deleted) and user status (deleted/banned). User status could affect the visibility of both the user’s comments and replies to these comments. It has never been considered in reporting the number of comments for analytics purposes as it could change with time and counting the number of comments
created is enough as a measure of user engagement.