Notes on “Sociophysics, an Introduction”

Sociophysics (Parongama Sen, Bikas K. Chakrabarti – 2013)

These are my notes as I was reading the book, which I found to be a very good overview with good detail that didn’t get in the way of the narrative. The references are stellar. When I found an appropriate paper mentioned in the text, I’ve included it as a link, usually with an accompanying abstract.

I read the book to support the model I’m working on for my PhD on trustworthy news. I’ve been doing agent-based simulations since the ’90s when I was working on my Master’s thesis on the The Coevolution of Weapons and Aggression. I certainly feel as though it has helped update my awareness of progress in the field since that effort, back when the term sociophysics didn’t even exist.

  • Chapter 2: Basic features of social systems and modelling
    • Minority Opinion Spreading in Random Geometry
      • Abstract: The dynamics of spreading of the minority opinion in public debates (a reform proposal, a behavior change, a military retaliation) is studied using a diffusion reaction model. People move by discrete step on a landscape of random geometry shaped by social life (offices, houses, bars, and restaurants). A perfect world is considered with no advantage to the minority. A one person-one argument principle is applied to determine locally individual mind changes. In case of equality, a collective doubt is evoked which in turn favors the Status Quo. Starting from a large in favor of the proposal initial majority, repeated random size local discussions are found to drive the majority reversal along the minority hostile view. Total opinion refusal is completed within few days. Recent national collective issues are revisited. The model may apply to rumor and fear propagation.
      • Clustering coefficient (video)
        CC = 0
        numNodes = 0
        for(i = 0 to max)
        	for(j = 0 to max)
        		n = node(i,j)
        		k = n.numNeighbors()
        		a = n.numLinksBetweenNeighbors()
        		CC += n.getNodeCC()
        CC = CC/numNodes
      • Clustering coefficient ordering: random -> small world -> regular
      • To build a scale-free network, AL Barabási, R Albert in Emergence of scaling in random networks start with a small random network and incrementally add nodes where the probability of connecting a new node with existing nodes is proportional to how many connections the current nodes have.
        for(i = 0 to desired)
        	n = createNewNode()
        	totalLinks = countAllLinks()
        	for(j = 0 to network.numNodes)
        		curNode = getNode(j)
        		links = curNode.getLinks
        		probability = links/totalLinks
        		curNode.addNeighbor(n, probability)
      • Does node aging matter in this model?
      • Null Models For Social Networks (for comparison and testing)
      • Downloaded the following from the references section to my Group Polarization folder
      • A bubble could be an example of a strong community [pg 17] would need to figure out a way of establishing in and out links in knowledge space
      • Benchmark networks to test community detection algorithms [pg 17]. Artificially generated and the Zachary Karate club
      • I appear to be working with (maybe?) class ‘C’ social networks, where links connect people indirectly [pg 19].Covered in chapter 7 – Of Flocks, Flows and Transports
      • Page 25 discusses Marian Boguña et al Models of Social Networks based on Social Distance Attachment which uses the concept of social distance. A set of quantities (e.g. profession, religion, location) are used and the social distance between two individuals is the difference in the quantities.
      • More state-space simulation from page 28: Spin-glass-like Dynamics of Social Networks. Digging around uncovered her thesis: Information and Entropy in Neural Networks and Interacting Systems. From the abstract:
        • Like neural networks, large ensembles of similar units that interact also need a generalization of classical information-theoretic concepts. We extend the concept of Shannon entropy in a novel way, which may be relevant when we have such interacting systems, and show how it differs from Shannon entropy and other generalizations, such as Tsallis entropy.
      • Mean Field Approximation – In physics and probability theory, mean field theory (MFT also known as self-consistent field theory) studies the behavior of large and complex stochastic models by studying a simpler model. Such models consider a large number of small individual components which interact with each other. The effect of all the other individuals on any given individual is approximated by a single averaged effect, thus reducing a many-body problem to a one-body problem.
    • Chapter 3: Opinion formation in a society
    • Chapter 4: Social choices and popularity – skimmed, not appropriate
    • Chapter 5: Crowd-avoiding dynamical phenomena – skimmed, not appropriate
    • Chapter 6: Social phenomena on complex networks
      • Claudio Castellano (Google Scholar)
      • Loops of nodes behave differently from trees. what to do about that? I think loops drive the echo chamber process? It is, after all, feedback..
      • There is also a ‘freezing’ issue, where a stable state is reached where two cliques containing different states are lightly connected, but not enough that the neighbors in one clique can be convinced to change their opinion [Fig. 6.2, pg 135]
      • Residual Energy: The difference between the actual energy and the known energy of the perfectly-ordered ground state (full consensus).
      • Dynamical Processes on Complex Networks. Got the Kindle edition so now I can search! Interesting section: 10.6 Coevolution of opinions and network
      • Similar chapter in this book – Social Phenomena on coevolutionary networks [pg 166]. One of the interesting things here is the use of the iterated prisoner’s dilemma. On a network, the agents typically calculate and aggregate payoff and imitate the strategy of the neighbor with the best payoff. In the coevolutionary model, an agent can cut off the link to a defector with a probability. This seems a bit like polarization, where the group severs ties with entities with sufficiently divergent views (and individuals leave when the group becomes too extreme)
      • Coevolution of agents and networks: Opinion spreading and community disconnection Abstract: We study a stochastic model for the coevolution of a process of opinion formation in a population of agents and the network which underlies their interaction. Interaction links can break when agents fail to reach an opinion agreement. The structure of the network and the distribution of opinions over the population evolve towards a state where the population is divided into disconnected communities whose agents share the same opinion. The statistical properties of this final state vary considerably as the model parameters are changed. Community sizes and their internal connectivity are the quantities used to characterize such variations.
      • Opinion and community formation in coevolving networks (Gerardo Iñiguez González)
        • Abstract: In human societies opinion formation is mediated by social interactions, consequently taking place on a network of relationships and at the same time influencing the structure of the network and its evolution. To investigate this coevolution of opinions and social interaction structure we develop a dynamic agent-based network model, by taking into account short range interactions like discussions between individuals, long range interactions like a sense for overall mood modulated by the attitudes of individuals, and external field corresponding to outside influence. Moreover, individual biases can be naturally taken into account. In addition the model includes the opinion dependent link-rewiring scheme to describe network topology coevolution with a slower time scale than that of the opinion formation. With this model comprehensive numerical simulations and mean field calculations have been carried out and they show the importance of the separation between fast and slow time scales resulting in the network to organize as well-connected small communities of agents with the same opinion.
        • Citing paper: Effects of deception in social networks (Gerardo Iñiguez González)<— Important???
          • Abstract: Honesty plays a crucial role in any situation where organisms exchange information or resources. Dishonesty can thus be expected to have damaging effects on social coherence if agents cannot trust the information or goods they receive. However, a distinction is often drawn between prosocial lies (‘white’ lies) and antisocial lying (i.e. deception for personal gain), with the former being considered much less destructive than the latter. We use an agent-based model to show that antisocial lying causes social networks to become increasingly fragmented. Antisocial dishonesty thus places strong constraints on the size and cohesion of social communities, providing a major hurdle that organisms have to overcome (e.g. by evolving counter-deception strategies) in order to evolve large, socially cohesive communities. In contrast, white lies can prove to be beneficial in smoothing the flow of interactions and facilitating a larger, more integrated network. Our results demonstrate that these group-level effects can arise as emergent properties of interactions at the dyadic level. The balance between prosocial and antisocial lies may set constraints on the structure of social networks, and hence the shape of society as a whole.
      • Section 6.5: Is it really a small world? Searching post Milgram
        • In the introduction to this section [page 168], the authors say a very interesting thing: “Although the network may have the small world property, searches are usually done locally: the individual may not know the global structure of the network that would help them find the shortest path to the target node“. I think that they are talking about social networks explicitly here, but the same concept applies to an information network. This is a network description of the information horizon problem. You can’t find what you can’t see, at least in a broad outline.
        • Also this: “Searching can regarded as a learning process; repeating the search several times can avoid infinite loops and lead to better solutions
        • 6.5.8 Funneling properties.
            • The funneling capability of a node can be defined as the fraction of successful dynamic paths through it when the target is fixed and the source is varied. Two thoughts: First, this seems to be a measurement of centrality. Second, Large, vague nodes are needed for ‘laundering’ information into misinformation or conspiracy theory.
            • Consider four agents. Who have characteristics that can vary between (0, 1).
              • Agent 1 has two color intensities: R=0.1, G= 0.7
              • Agent 2 has one color and two note volumes R=0.3, A=0.2, F=0.6
              • Agent 3 also has one color and two note volumes B=0.4, D=1, E=0.2
              • Agent 4 has three notes A=0.3, D=0.4, E=0.5
            • Let’s assume that funneling is not required if agents share a color or note. This means that A4 can get to A1 through A2, but A3 has to get to A1 via A4 and then A2. In a matrix this looks like
          R G B A D E F
          Agent1 0.1 0.7
          Agent2 0.3 0.2 0.6
          Agent3 0.4 1.0 0.2
          Agent4 0.3 0.4 0.5
            • But if we add the hypernyms Color and Notes, we can get funneling. I am summing the color and notes to give a sense of the agent’s ‘projection’ into the larger, more general space. I think the ‘size’ of the funnels are the number of items that go in them times the range of each item. So Color would have a range of (0, 3) and Notes would have a range of (0, 4), since I’m not including B, C, and G here:
          R G B A D E F Color Notes
          Agent1 0.1 0.7 0.8
          Agent2 0.3 0.2 0.6 0.3 0.8
          Agent3 0.4 1.0 0.2 0.4 1.2
          Agent4 0.3 0.4 0.5 1.2
            • Now agents 2 and 3 can get to each other through either Color or note in two hops, and the Agents 1 and 4 can reach each other by going through each of the funnels.
            • There should be a cost in using a funnel though. You loose the information about which color or which note. Intuitively, a series of steps with non-funnel links should be somehow more specific than the same number of steps through a funnel.
            • Practical uses would be a way to detect poorly reasoned conclusions, as long as the beginning and end of the train of thought could be identified.
      • Knowing a network by walking on it: emergence of scaling (Alexei Vázquez) Looks like an interesting guy with a wide range of publications.
    • Chapter 7:  of flocks, flows and transports [page 179]
      • Boids (Flocks, herds and schools: A distributed behavioral modelCraig Reynolds):
        • Try to avoid collisions with other boids (repulsion)
        • Attempt to match velocity with neighboring boids
        • attempt to stay close to nearby boids
      • If the collision avoidance is taken out and the number of dimensions increased, then this could be the model. Rather than the flock converging around a position, look at the distances between the individuals using DBSCAN and cluster.
      • Density and noise need to be independent variables and saved on runs. This would also be true in information space. You can have high organization in high density, low noise states. Thinking about that, this also implies one of the emergent properties of an information bubble is the low noise. Even though the environment may be very noisy, the bubble isn’t.
      • As with the other social models, individuals can have weight. That way the flock can have leaders and followers. (See Misinformed leaders lose influence over pigeon flocks to inform the model)
      • Also, I like the idea of a social network being built from belief proximity, which raises the cost for switching to another flock, even if they are nearby. It could be that once a social network forms that anti-belief repulsion starts to play a role.
      • Another component to include would be a Levy Flight (truncated?). That could account for cases where a leader makes a big jump and then the crowd follows with some ejection for those who can’t/won’t keep up.
      • Power law distribution of weight and max step size in the creation of the population
      • Thomas Schelling (Another Herbert Simon type) Segregation Model
      • Phase diagram of a Schelling segregation model (L Gauvin, J Vannimenus, JP Nadal – The European Physical Journal B, 2009). I’m beginning to think that the model could be a combination of a flocking and segregation model. That could be really interesting. I also seem to get nothing when I do a Scholar search on “flocking and segregation agent simulation
        • Satisfaction criteria – when the number of unlike agents is less than a fixed proportion F. As F gets larger there is an abrupt transition to a segregated state.
        • Definition of segregation coefficient – the weighted average (normalized) of all cluster sizes averaged over all configurations. When only two clusters survive, n(c) = N/2
      • Migration in a small world: A network approach to modeling immigration processes (B Fotouhi, MG Rabbat – Communication, Control, and Computing, 2012 –
    • Chapter 8: Endnote [page 202]
      • Frustration in Complexity (2008 – Philippe Binder)- The common thread between all complex systems may not be cooperation but rather the irresolvable coexistence of opposing tendencies.
      • Definition of consensus in an opinion model – the emergence of long-range order.
      • Looking for phase changes from heterogeneous to homogeneous or clustered states is important. Finding what parameters are causal and the values is considered a publishable result. Canonical types of transitions, such as the percolation threshold are discussed in the appendices.

Trustworthy News Model Assumptions


  • 12.13.16: Initial post
  • 12:16:16: Added reference to proposal and explicitly discussed explorer and exploiter types.

A web version of my Google Docs dissertation proposal is here. Blame them for the formatting issues. The section this is building on is Section 5.3.1. A standalone description of this task is here.

The first part of my dissertation work is to develop an agent-based simulation that exhibits information bubble/antibubble behavior. Using Sen and Chakrabarti’s Sociophysics as my guide, I’m working up the specifics of the model. My framework is an application (JavaFX, because that’s what I’m using at work these days). It’s basically an empty framework with a trivial model that allows clustering based on similar attributes such as color: strawmanapp

Going forward, I need to clarify and defend the model, so I’m going to be listing the components here.

Agent assumptions

  • Agents get their information from global sources (news media). They have equal access, but visibility is restricted
    • Agents are Explorers or Exploiters (Which may be made up of Confirmers and Avoiders)
    • Agents have ‘budgets’ that they can allocate
    • Finding sources has a cost. Sources from the social network has a lower cost to access
    • Keeping a source is cheaper than getting a new one
    • For explorers, the cost of getting a new source is lower than an exploiter.
    • The ‘belief’ as a set of ‘statements’ appears to be valid
    • The collection of statements and the associated values create a position in an n-dimensional hilbert space of information. Position and velocity should be calculable.
    • Start at one dimension to reproduce prior opinion models

Network assumptions

  • There are two items that we are looking for.
    • The first is the network configuration over time. What nodes do agents connect to for their information.
    • The second is the content of that information. For that, we’ll probably need some dimensionality reduction, such as NMF (look for a post on implementing this later). This is where we look for echo chambers of information, as opposed to the agents participating in them
  • Adjustable to include scale-free, small world, and null configurations
  • What about loops? Feedback could be interesting, since a small group that is semi-isolated could form into a very loud bubble that could lower the cost of finding information. So a notion of volume might be needed that emerges from a set of agreeing agents. This could be attraction, though I think I like an economic approach more?
  • There is also a ‘freezing’ issue, where a stable state is reached where two cliques containing different states are lightly connected, but not enough that the neighbors in one clique can be convinced to change their opinion [Fig. 6.2, pg 135]


  • Residual Energy: The difference between the actual energy and the known energy of the perfectly-ordered ground state (full consensus).
  • Deviation from null network.
  • Clustering as per community detection (Girard et. al)

Implementation details

  • Able to be run multiple times with the same configuration but different seed
  • Outputs to… something. MySql or Excel probably
  • Visualization using t-SNE? Description plus Java implementation is here:

More to come as the model fleshes out.


Natural Algorithms and Influence Systems

Natural Algorithms and Influence Systems

Bernard Chazelle (acm) (google)

Eugene Higgins Professor of Computer Science
Princeton University

If you’re going to remember one(?) thing:

Think of influence systems as a brand of networks that perpetually rewire themselves emergently – a broad family of multiagent models arising in social dynamics. The idea is to reach beyond numerical simulation to analyze the structure and interactions of biological systems. This article focuses on how algorithmic ideas can enrich our understanding of nature. Examples of in uence systems include the Ising model, neural nets, Bayesian social learning, protein-protein interaction networks, population dynamics, etc.


By “natural algorithms,” I mean the myriad of algorithmic processes evolved by nature over millions of years. Just as di erential equations have given us the tools to explain much of the physical world, so natural algorithms will help us model the living world and make sense of it.

Natural algorithms are quickly becoming the language of choice to model biological and social processes. And so algorithms, broadly construed, are both science and engineering.

Instead of identical particles subject to the same forces, the life sciences feature autonomous agents, each one with its own idea of what laws to obey. It is a long way, scienti cally speaking, from planets orbiting the sun in orderly fashion to unruly slime molds farming bacterial crops.

What is complexity? Such is the appeal of the word “complexity” that it comes in at least four distinct flavors.

    • Semantic: What is hard to understand. For 99.99% of mankind, complex means complicated.
    • Epistemological: What is hard to predict. An independent notion altogether: complex chaotic systems can be simple to understand while complicated mechanisms can be easy to predict.
    • Instrumental: What is hard to compute, the province of theoretical computer science.
    • Linguistic: What is hard to describe. Physics has low descriptive complexity|that’s part of its magic. By contrast, merely specifying a natural algorithm may require an arbitrarily large number of variables to model the diversity present in the system. To capture this type of complexity is a distinctive feature of natural algorithms.

A few words about our agent-based approach. Consider the di usion of pollen particles suspended in water. A Eulerian approach to this process seeks a di erential equation for the concentration c(x; t) of particles at any point x and time t. There are no agents, just density functions evolving over time [18]. An alternative approach, called Lagrangian, would track the movement of all the individual particles and water molecules by appealing to Newton’s laws. Given the sheer number of agents, this line of attack crashes against a wall of intractability. One way around it is to pick a single imaginary self-propelled agent and have it jiggle about randomly in a Brownian motion. This agent models a typical pollen particle – typical in the “ergodic” sense that its time evolution mimics the space distribution of countless particles caught on film in a snapshot. Scaling plays a key role: our pollen particles indeed can be observed only on a time scale far larger than the molecular bumps causing the jiggling. Luckily, Brownian motion is scale-free, meaning that it can be observed at any scale. As we shall see, the ability to express a dynamical process at different scales is an important feature of influence systems.
The strength of the Eulerian approach is its privileged access to an advanced theory of calculus. Its weakness lies in two commitments: global behavior is implied by in nitesimal changes; and every point is subject to identical laws. While largely true in physics, these assumptions break down in the living world, where diversity, heterogeneity, and autonomy prevail. Alas, the Lagrangian answer, agent-based modeling, itself suffers from a serious handicap: the lack of a theory of natural  algorithms.

The Google Similarity Distance

The Google Similarity Distance

Rudi L. Cilibrasi (acm)

Paul M.B. Vitanyi (acm) (google citations)


If you’re going to remember one(?) thing: 

Semantic cognition using algorithms appears to be possible.

Running code is available for download at


A way of using search engine results to compute a semantic relationship between any two (n?) items. It basically uses Information Distance / Kolmogorov Complexity to determine similarity. From the paper:

While the theory we propose is rather intricate, the resulting method is simple enough. We give an example: At the time of doing the experiment, a Google search for “horse”, returned 46,700,000 hits. The number of hits for the search term “rider” was 12,200,000. Searching for the pages where both “horse” and “rider” occur gave 2,630,000 hits, and Google indexed 8,058,044,651 web pages. Using these numbers in the main formula (III.3) we derive below, with N = 8, 058, 044, 651, this yields a Normalized Google Distance between the terms “horse” and “rider” as follows:

NGD(horse, rider) ≈ 0.443.

In the sequel of the paper we argue that the NGD is a normed semantic distance between the terms in question, usually (but not always, see below) in between 0 (identical) and 1 (unrelated), in the cognitive space invoked by the usage of the terms on the world-wide-web as filtered by Google.

This really sounds like a usable model of cognition. For example:

For us, the Google semantics of a word or phrase consists of the set of web pages returned by the query concerned. Note that this can mean that terms with different meaning have the same semantics, and that opposites like ”true” and ”false” often have a similar semantics. Thus, we just discover associations between terms, suggesting a likely relationship.

Trustworthy By Design

Trustworthy By Design

CSCW 2014 • The Office

Bran KnowlesLancaster University, Lancaster (acm)
Mike HardingLancaster University, Lancaster  (acm)
Lynne BlairLancaster University, Lancaster (acm)
Nigel Davies – Lancaster University, Lancaster (acm)
James Hannon InTouch Ltd (acm)
Mark Rouncefield – Lancaster University, Lancaster (acm)
John WaldenInTouch Ltd, (acm)


An ethnographic study of gully cleaners ‘Gangers’ and examination of an app that supports their actions and management. Computer-Mediated communication allows for the creation of a trusting environment that supports more flexibility and higher productivity, as opposed to the enforcement of rules, an essentially political approach. Very nicely written and easy to read.

Things to Remember

A key challenge is to collect, at the outset, accurate data in which organizations can place significant trust. This trusted data, as we will show, links with a range of other important qualities to do with successful completion and management of work, and can change for the better the way business process is organized.

We adopt a definition of trust fitting with [D. Gambetta. Can we trust trust. Trust: Making and breaking cooperative relations], i.e. trust is a subjective assessment of reliability, and we explore trust specifically as it relates to ‘trusted data’ in the context of developing quality systems.

In our fieldwork studies we paid attention to the social process of trust production, to specify the social mechanisms which generate trust and to examine and document the various ways in which trust is woven into the fabric of everyday organizational life as part of the taken for granted moral order [15]

What seems especially notable in highway maintenance work, an observably physically demanding job, is the sheer amount of paperwork involved. Documents seem to enable trust in part by creating a ‘stratified trace’ of the orderliness of activities. A document provides history, in this case of a stretch of road; but that historical record is only trustworthy as a result of lower level instantiations of trust: trust in location, trust in the represented order of events, and so on. A document further acts as a coordinating device, and trust placed in that document can be translated into an appropriate organizational formulation.

By allowing workers to add ‘frequency prediction’ information to an asset. More specifically, when a gully is collected/cleaned the user can select from a drop down menu how soon they think it will need to be cleaned again. (Prediction is a great way to develop trust!)

We contend there is an important distinction between accurate data and trusted data. While accuracy is a prerequisite of trusted data, it does not guarantee it; data can be accurate and still not be trusted.

We believe that a similar approach, i.e. trustworthy by design, is required for building trusted data-gathering systems. In ‘The Mechanics of Trust: A Framework for Research and Design,’ Rieselsberger et al [33] argue that: “If we are to realize the potential of new technologies for enabling new forms of interactions without these undesirable consequences, trust and the conditions that affect it must become a core concern of systems development. The role of systems designers and researchers is thus not one of solely increasing the functionality and usability of the systems that are used to transact or communicate, but to design them in such a way that they support trustworthy action and — based on that — well-placed trust.”


Eight key principles:
1) Security: Trusted data capture must necessarily be underpinned by a secure infrastructure, e.g. it must include measures that ensure tamper resistance.
2) Performance: Systems—both devices and web portals—must be quick and easy to use in order to encourage users to amend data as necessary.
3) Provenance: The system must enable users to trace the source of any data capture and amendment activity in a way that aids verification of the validity of that activity.
4) Translucency: Users must be able to see all relevant data that would help them undertake their work, but no more.
5) Flexibility: The system must allow users to adjust data when the device is unable to deliver accuracy, e.g. if an obstruction prevents the user from positioning the device over the asset.
6) Value to users: The system must be designed to deliver value to the user—as opposed to a model that treats users as ‘dumb sensors’—to ensure they benefit from producing accurate data.
7) Empowerment: The system must bring people ‘into the loop’ and engage their knowledge and intelligence toward a shared goal, such as increasing the quality and ease of work.
8) Competence: In empowering people and giving them responsibility, the system must build in assurances that users will succeed, e.g. facilitating the submission of all necessary data.

…little attention has been paid to the quality of data captured by mobile workers. If this data is inaccurate or untrustworthy, serious consequences can ensue. In this paper we study a system targeted at mobile workers in the highways sector that is deliberately designed to increase the accuracy and trustworthiness of the data collected.

This paper explores the elements of design that enable accurate, and above this, trusted data collection in this domain, with a view toward applying these more generally to other mobile data capture domains.

Trusted data is also critical in a wide range of domains such as health, policing, environmental monitoring, surveying and disaster management where inaccurate or untrustworthy data from the field can have serious consequences.

We use trust as an analytical lens for reflecting on lessons learned from our experience in creating a successful mobile asset collection system, and from this, develop several principles of successful system design that can be applied to a range of domains.

In recent years there has been an increased interest in smartphones for data capture, though many of these studies focus on data capture for purposes of crowdsourcing and participatory sensing [6, 20], experience capture [41], feedback [12], etc., in which accuracy and trust play a less critical role.

In the process, is it possible to identify the important role of trust in such organizational work, how people ‘perform’ trust, how it is instantiated in various paper and electronic documents and how it enters into everyday work through aspects of planning, coordination and awareness.

Our exploration of issues of trust in road maintenance began by identifying several different stakeholders or parties (or ‘users’) between whom and for whom issues of trust arise.

The (ongoing) fieldwork reported here was carried out by two ethnographers at seven different sites in the UK over the course of a year and amounted to approximately 70 interviews (of differing durations) and approximately three months worth of observation. Interviews were transcribed and fieldnotes typed up and examined for broad, recurring themes which acted not as precursors to the development of theory but as broad requirements for design.

Like ‘the boy who cried wolf’, contractors may over time be less inclined to take seriously the deadlines set by the council, potentially increasing the chance of instances of unreliability with more damaging consequences to accrued trust. In contrast, were the council to trust contractors, efficiency would result as a natural consequence of empowering people who are best able to make decisions, thereby catalyzing a virtuous circle of trust.

mobile data capture system intended to increase trust between various parties:

    • Supporting the identification and persistent storage of evidence.
    • Understanding and graphing the complexity of interdependent processes and relationships to deliver on organizational assertions.
    • Providing trust warnings in the form of data visualizations and inline user interfaces to involved stakeholders.
    • Visualizing the impact of individuals’ actions throughout the system and process.

Gangers work to a cyclical cleaning regimen, which they follow blindly. While this approach does enable thorough, systematic cleaning of the council’s gullies, it is highly inefficient, since many gullies do not actually require cleaning. Furthermore, and in terms of trust, while it is easy to fall short of expectations, it is not particularly easy to exceed expectations.

The ethnographic fieldwork we conducted for gullies was intended not only to reveal further insight into trust, but also, more fundamentally, to capture requirements for our system design.

In addition to position, users capture a range of data about the state of the gully including reports of any damage and a photograph of the current state of the gully.

This support for reinspections, which is the principle advance from second to third generation asset collection, enables users to recollect that same asset again and again, building a history of data against that asset. Users can also change any data that is incorrect by doing a recollection. They do not have to create a new inspection every time; data is pulled forward — some of which is immutable, some mutable, some blank, as appropriate

The system provides a sophisticated set of filters to enable viewers to hide irrelevant data from view. This combination of filtering and access to all the past data on local assets can transform the way highways maintenance operations are performed. In particular, it provides empowerment to the users, allowing them to be in control of their work and to choose how to carry it out, e.g. inspectors can choose to see all the gullies that they did not have time to check last time.

Some of this success can be put down to the extensive ethnographic research done up front and careful planning to attend to the needs highlighted by this research through iterative ethnographic/development cycles throughout, and the use of photographs to document, to provide a trustable record of, their work. Much of this success, however, we put down to serendipity.

We begin with a brief discussion of the affordances of our mobile data capture system and explore how these affect empowerment. Next we explore additional characteristics that contribute to system success. Finally, we conclude by discussing how all of these factors contribute toward trusted data.

Some of the affordances Inspections capitalizes on include the following:

    • Wizarding’: Collection and recollection both force the user to enter all mandatory data before they are able to move on.
    • Pinpointing location
    • Filtering data: Users can not only view relevant data, but also hide irrelevant data.
    • Gathering asset histories: The system ties data collection to a specific asset, and users can continue to add information to this same asset over time.

As [44] notes, introducing a system into an organization entails a complex emergent dynamic between the system and what that system enables users within the organization to do. “These factors,” the authors write, “go beyond basic functionality, dialogue and representations of a technology and encompass organization culture, changes in organizations, users’ identity and power differences and their emotional, symbolic and functional values related to the technology”.

We argue that Inspections makes possible several further organizational-level affordances: 1) it empowers crews to engage their creative intelligence; 2) it empowers the organization as a whole to shift to an intelligent management model; and 3) this empowerment is enabled by — and in turn perpetuates — the fostering of trust throughout the different levels of the organization.

At the organizational level, researchers differentiate between ‘empowering organizations’ and ‘empowered organizations’: the former serves to foster psychological empowerment for individuals within that organization or otherwise influenced by that organization [47]; whereas the latter (‘empowered’) are those that “influence the larger systems of which they are a part” [29], increasing their own effectiveness in achieving goals.

Inspections makes use of crews’ previously untapped knowledge by asking them to predict the frequency of cleaning required for each gully. But further, it enables them to act on this creatively and independently, giving them the power to craft their maintenance schedules in accordance with their expertise rather than feeling like robots or slaves to inflexible routines, workers can feel competent and engaged.

Inspections, on the other hand, enables the organization to increase efficiency through strategic means while enabling the workforce to operate at a safe, healthy, realistic pace — with the added benefit that this is conducive to data accuracy. Ultimately, this has revolutionized these organizations, freeing them from a blind, cyclical cleaning regimen and enabling them to proactively target the gullies that are likely to cause unsafe road conditions.

Inspections forces users to fill in all mandatory data fields, but because the device (and the UI decisions we have made) make doing so a very quick process, users are able to complete a collection or recollection quickly, knowing that they can recollect again just as quickly if they need to make corrections.

The benefit of these metadata traces is that they enable those that manage the data (whether or not they actively manage personnel) to determine the data’s validity. We are also aware of the closer association between provenance and trust (e.g. [1, 2, 24]), which indicates that ‘provenance’ is in greater alignment with our intended design ambitions.

Integrating On-demand Fact-checking with Public Dialogue

Integrating On-demand Fact-checking with Public Dialogue

CSCW 2014 – Mobilizing for Action

Travis KripleanUniversity of Washington (acm)
Caitlin BonnarUniversity of Washington (acm)
Alan BorningUniversity of Washington (acm)
Bo KinneySeattle Public Library (acm)
Brian GillSeattle Pacific University (acm) <- stats guy.


A description of the development and use of a Value-Centered-Design approach to a referendum fact checking website. Users developed pro and con lists and could request a fact check, which was done by lobriarians at the Seattle Public Library.

Things to Remember

  • “Our goal is to enhance public dialogue by providing authoritative information from a trusted third party.”
  • “Crowdsourcing systems can benefit from reintroducing professionals and institutions that have been up to now omitted. As socio-technical architects, we should be combining old and new to create entirely new possibilities.”
  • The journalistic fact-checking frame did not map smoothly to librarian reference practice.
  • Clever use of simulation statistics to compensate for comment drop off as election approached.
  • Librarians are very trustworthy.
  • This is a second or third generation site, and is still incorporating lessons learned.
  • LVG – Living Voters Guide


We explore the design space for introducing authoritative information into public dialogue, with the goal of supporting constructive rather than confrontational discourse.

We also present a specific design and realization of an archetypal sociotechnical system of this kind, namely an on-demand fact-checking service integrated into a crowdsourced voters guide powered by deliberating citizens.

Public deliberation is challenging, requiring communicators to consider tradeoffs, listen to others, seek common ground, and be open to change given evidence.

A few interfaces such as Opinion-Space [13] and ConsiderIt [29] have demonstrated that it is possible to create lightweight communication interfaces that encourage constructive interactions when discussing difficult issues.

We describe a new approach for helping discussants decide which factual claims to trust. Specifically, we designed and deployed a fact-checking service staffed by professional librarians and integrated into a crowdsourced voters guide.

It also serves as an archetype of a more general class of systems that integrate the contributions of professionals and established institutions into what Benkler [4] calls “commons-based peer production.”

One key design element is that the fact-checks are performed at the behest of discussion participants, rather than being imposed from outside.

To help explore design alternatives and evaluate this work, we turn to Value Sensitive Design (VSD), a methodology that accounts for human values in a principled and systematic way throughout the design process [6, 18]. As with prior work involving VSD in the civic realm [5], we distinguish between stakeholder values and explicitly supported values. Stakeholder values are important to some but not necessarily all of the stakeholders, and may even conflict with each other.

Explicitly supported values, on the other hand, guide designers’ choices in the creation of the system: here, the values are democratic deliberation, respect, listening, fairness, and civility.

Most commonly used communication interfaces, especially online comment boards, implicitly support liberal individualist and communitarian values through the particular interface mechanisms they provide.

Because communicative behaviors are context sensitive and can be altered by interface design [35, 38, 44], we believe we can design lightweight interfaces that gently nudge people toward finding common ground and avoiding flame wars.

In ConsiderIt, participants are first invited to create a pro/con list that captures their most important considerations about the issue. This encourages users to think through tradeoffs — a list with only pros or cons is a structural nudge to consider both sides. Second, ConsiderIt encourages listening to other users by enabling users to adopt into their own pro/con lists the pros and cons contributed by others.

Finally, a per-point discussion facility was added in the 2011 deployment to help participants drill down into a particular pro/con point and have a focused conversation about it. We call special attention to this functionality because one component of our evaluation is an examination of how the focused discussion progressed before and after factchecks.

We have run the LVG for the past three elections inWashington State, with 30,000 unique visitors from over 200 Washington cities using LVG for nearly ten minutes on average. Our analysis of behavioral data has revealed a high degree of deliberative activity; for example, 41.4% of all submitted pro/con lists included both pros and cons [17, 28, 30]. Moreover, the tone of the community discussion has been civil: although CityClub actively monitored the site for hate speech and personal attacks, less than ten out of a total of 424 comments have been removed over the three deployments.

Participants have difficulty understanding what information to trust. Content analysis of the pro/con points in 2010 found that around 60% contained a verifiable statement of fact, such as a claim about what the ballot measure would implement, or included a reference to numerical data from an external source. Anyone can state a claim, but how do others know whether that claim is accurate?

Suitable primary sources are often unavailable, and most deliberating bodies do not have the ability to commission a report from a dedicated organization like the Congressional Budget Office.

Fact-checkers produce an evaluation of verifiable claims made in public statements through investigation of primary and secondary sources. A fact-check usually includes a restatement of the claim being investigated, some context to the statement, a report detailing the results of the report, and a summative evaluation of the veracity of the claim (e.g., Politifact’s “Truth-O-Meter” ranging from “True” to “Pants-on-Fire”).

Establishing the legitimacy of fact-checks can be challenging because the format of a fact-check juxtaposes the claim and the fact-check.

Those whose prior beliefs are threatened by the result of the fact-check are psychologically prone to dismiss counter-attitudinal information and delegitimate the source of the challenging information [31, 33, 34, 46], sometimes even strengthening belief in the misinformation [37].

Another approach is to synthesize information available in reliable secondary sources. This differs from fact-checking in that (1) the investigation does not create new interpretations of original sources and (2) the report does not explicitly rate the veracity of the claims.

One of the main roles of librarians is to help patrons find the information they seek amidst a sometimes overwhelming amount of source material. Librarians assess the content of resources for accuracy and relevance, determine the authority of these resources, and identify any bias or point of view in the resource [2].

Establishing trust in the results of these crowdsourced efforts is often a challenge [7, 9], but can be accomplished to some extent by transparency of process [27].

Authoritative information is usually published and promoted by dedicated entities. For example, fact-checking is often provided as a stand-alone service, as with Snopes, Politifact, and

Professionals facilitating a discussion can shepherd authoritative information directly into a discussion.

The ALA’s Code of Ethics [1], emphasizes the role of libraries in a democracy: “In a political system grounded in an informed citizenry, we are members of a profession explicitly committed to intellectual freedom and the freedom of access to information. We have a special obligation to ensure the free flow of information and ideas to present and future generations.”

The specific guidelines that the librarians settled on were: (1) We will not conduct in-depth legal or financial analysis, but we will point users to research that has already been conducted; (2) We will not evaluate the merits of value or opinion statements, but we will evaluate their factual components; (3) We will not evaluate the likelihood of hypothetical statements, but we will evaluate their factual components.

For this work, we call out the following direct stakeholders: (1) users of the Living Voters Guide, (2) authors of points that are fact-checked and (3) the reference librarians.

The primary design tension we faced was enabling LVG users to easily get a sense of which factual claims in the pro/con points were accurate (in support of deliberative values), while not making the fact-checking a confrontational, negative experience that would discourage contributors from participating again (or at all).

The service was on-demand: any registered user could request a fact-check, submitted with a brief description of what he or she wanted to have checked.

By relying on LVG participants themselves to initiate a fact-check, we hypothesized that the degree of confrontation with an authority would be diffused. Further, we hoped that requests would come from supporters of the point to be checked, not just opponents — for example, a supporter (or even the author) might request a check as a way of bolstering the point’s credibility.

Each fact-check comprised (1) a restatement of each factual claim in the pro or con, (2) a brief research report for each claim, and (3) an evaluation of the accuracy of each claim.

We settled on a simple scheme of “accurate,” “unverifiable,” and “questionable.” Each of the evaluation categories was accompanied by a symbol representing the result: a checkmark for accurate, an ellipsis for unverifiable, and a question mark for questionable.

For each fact-checked pro or con, an icon representing the most negative evaluation was shown at the bottom. When a user hovered over the icon, a fact-check summary was shown (Figure 1).

The full fact-check was presented immediately below the text of the point when users drilled into the point details and discussion page

Every fact-check request generated a notification e-mail to the librarians. Librarians then logged into a custom fact-checking dashboard. The dashboard listed each point where a user had requested a fact-check, showing the fact-check status (e.g. completed), which librarian was taking responsibility for it, and the result of the fact-check (if applicable). Librarians could claim responsibility for new requests, and then conduct the fact-check.

The fact-checking page enabled librarians to restate each factual claim made in a pro or con as a researchable and verifiable question, and then answer the research question.

The fact-checking team decided that the librarians should always identify as many researchable, verifiable questions as a pro or con point contained, even if the request was only for a very specific claim to be checked.

Every fact-check was a collaboration between at least two librarians. One librarian would write the initial fact-check, which was then reviewed by a second librarian before being published.

This communication facilitated learning and coherence for the service, and also drove functionality changes during the early stage of the pilot.

After each fact-check was published, a notification e-mail was sent out to the requester of the fact-check, the author of the pro/con point, and any other interested party (such as people who were participating in a discussion on that point).

In an informal analysis, librarians found that approximately half of all submitted pros and cons in 2011 contained claims checkable by their criteria.

Not all pros and cons are examined with the same degree of scrutiny by other users. For example, some are poorly worded or even incoherent. Because of ConsiderIt’s PointRank algorithm [29], these points rapidly fall to the bottom and are only seen by the most persistent users.

One factor mitigating this possibility is a structural resiliency in ConsiderIt that disincentivizes political gaming of the fact checking service: if a strong advocate decides to request fact checks of all the points he or she disagrees with, the claims could be evaluated as accurate and end up bolstering the opposing side’s case. The more likely risk in overloading the librarians stems from pranksters operating without a political agenda. This could be handled by triaging requests based on the user requesting the fact-check.

One reason for the large number of “unverifiable” claims is that, for the political domain, what one might think are straightforward factual questions turn out to be more nuanced on closer examination. Another reason was SPL’s policy about not doing legal research;

Users generally agreed with the librarian’s analysis and found value in it (8.6% strongly agreed, 65.7% agreed, 25.7% neutral, 0% disagreed, 0% strongly disagreed). People who were fact-checked felt that librarians generally assessed their points in a fair manner (62.5% “fair,” 0% “unfair,” 37.5% “neither”).

Users did express a desire for better communication with the librarians.

The extensive positive press coverage that the service received also suggests that the legitimacy of LVG increased. For example, a Seattle Times column [21] praised the librarian’s contribution to the LVG, stating that “there’s something refreshing in such a scientific, humble approach to information.”

To conduct the permutation test, Monte Carlo simulation was used to repeatedly randomly reassign the 47 fact-checks and their original timestamps to 47 of the 294 points. In the randomization process, each fact-check was assigned at random to a point which had at least one view prior to the timestamp at which the original request for the fact-check was submitted.

Some librarians raised concerns about the ability of the library to maintain their reputation as a neutral institution.

The lack of a communication mechanism also prevented librarians from knowing how their fact-checks were received by the author of the point, the requester(s) of the fact-check, and anyone reading the fact-check, leading to a general disconnect between librarian and user.

Librarians felt able to provide authoritative, relevant information to the public. They felt that this project was not only a good way to showcase the skills that they possess in terms of providing answers to complex questions, but also a way to reach a wider audience.

Users welcomed the librarians’ contributions, even those whose statements were challenged. Our perspective is that correcting widely held, deep misperceptions is something that cannot be quickly fixed (e.g., with a link to a Snopes article), but is rather a long process of engagement that requires a constructive environment [32].

The journalistic fact-checking frame did not map smoothly to librarian reference practice. Librarianship is rooted in guiding people to sources rather than evaluating claims. Discomfort stepping outside these bounds was magnified by the lack of opportunities for librarians to communicate with users to clarify request and the original point. This points to an evolution of our approach to introducing authoritative information into public dialogue that we call interactive fact-seeking.