Setting the Bar

I was hopeful that the kerfuffle around DA-RT (Data Access and Research Transparency) would be a boon for discussions about the philosophy of social science. After all, at the core of DA-RT is defining a minimal threshold for what constitutes evidence for an empirical claim. I have been disappointed.

Many years ago (1978 to be specific) I took a required graduate course at Indiana University on Scope and Methods taught by John Gillespie. That course covered elemental philosophy of science and forays into standards for the social sciences. That course continues to haunt me, forcing me to think about what standards I value in my own work and the work of others. I have spent a large bit of my career at Rice raising the same issues with graduate students, hopefully forcing them to consider what standards they use when evaluating their own work. Sadly, I wish that I had a clear set of principles, but I don’t. Whenever I meet someone who is a philosopher of science, I immediately ask what they are reading and who they think is making inroads. I haven’t been bowled over.

I thought Jeff Isaac’s editorial (and subsequent blog posts) might provide me a principled discussion of what constitutes a knowledge claim. After all, here is someone exercising editorial discretion to weigh in on evidentiary standards. I remain disappointed. Here is the crux of his claims:

  • Interpretivism needs a “safe space.” Much of Jeff’s discussion on this point retells the story of the Perestroika movement in political science and worries that its energy has been dissipated. In part he fears that there is a “resurgent neo-positivism” that is taking over publication outlets in the discipline. While this is a useful commentary on the history of science, it hardly addresses fundamentals.
  • DA-RT solves no central problem of concern to the social sciences. As he notes, no one objects to transparency. Consequently, do we need rules to mandate a stalking horse for methodological rigor? We need standards in order to judge our own work and the work of others. Even if there is no imminent danger of wholesale fraud in the discipline, we still owe it to ourselves to articulate what we value.
  • Jeff contends, and this comes out most forcefully in a blog post, that “a discipline that was serious about public relevance, credibility, and accessibility would be less and not more obsessed with methodological purity.” Herein lies a semi-normative claim that methodological infatuation leads to irrelevance, lack of credibility and inaccessibility. However, it seems that many discussions about evidentiary standards are precisely about credible claims and making the basis of those claims more accessible.
  • The “standard method of hypothesis-testing” should not become the normative standard for the discipline. This has the most promising direction and Jeff’s set of standards (as he notes seem very impressionistic) involvess research “that center on ideas that are interesting and important.” Three points are noteworthy here.

First, the conception of the “standard method” is one that has been challenged since the 1950s and is hardly in its ascendance. Of course, Jeff is really concerned with evolving quantitative norms and concerned that a one-size fits all view about such norms will disadvantage qualitative work. Elsewhere I’ve commented that this is a false distinction. Neither interpretative nor game theoretic work gets a free pass when making causal or outcome claims.

The second point is that quantitative methods, adhering to evidentiary standards, somehow cannot be interesting and important. This seems odd given that Jeff’s ex-colleague, Elinor Ostrom, spent decades tackling key issues concerning climate change and self-governance of resources held by the commons in a very public manner, while relying on methodological rigor.

The third point is that there are no boundaries on which ideas are interesting and important. This is akin to “knowing it when I see it.” Of course Jeff argues that an Editor calls on the expertise of reviewers and pushes them to think “outside their comfort zones.” But, like any Editor, Jeff knows how easy it is to stack the deck when selecting reviewers and he also knows how editorial judgment can easily overlook aspects of reviews that do not fit one’s own views.

Overall, the arguments have more to do with staking a political position than with tackling fundamental normative claims. In today’s public environment, where citizens and public leaders alike dismiss evidence-based claims, we need to seriously consider the standards we set for our colleagues and ourselves. I keep hoping for someone better than I to come along and articulate a set of normative principles by which I can evaluate good work.

Perhaps Feyerabend’s nihilism is good enough for the social sciences. Yet we make claims about our findings and project what they mean for setting public policy. Policy makers, however, remain skeptical of what we claim or sometimes belittle our findings because they don’t line up with their own opinions. Letting a thousand flowers bloom is not going to make the discipline relevant to policy makers.

DA-RT, TOP and Rolling Back Transparency

I am more than a little dismayed by efforts to roll back transparency and openness in political science. The “movement” began in mid-August with emails to editors of political science journals that had signed on to DA-RT (Data Access and Research Transparency) from the Executive Council of the Interpretive Methodologies and Methods Conference Group. This has been followed up with petition issued this month to delay DA-RT implementation. Of course, who the petition is aimed at and what it demands is an open question.

Personally, I am inclined to sign the DA-RT delay petition because DA-RT does not go far enough. In June 2015, I joined with people from across the social sciences in proposing a set of guidelines for Transparency and Openness Promotion (TOP). The TOP guidelines details best practices and are aimed at journals in the social sciences. These guidelines focus on quantitative analysis, computational analysis and formal theory. Because qualitative research involves more complicated issues, TOP has left this for the future and for input from the community.

I find it puzzling that there is resistance to making it clear how one reaches a conclusion. Suppose I naively divide research into two types: interpretative and empirical. Both make claims and should be taken seriously by scholars. Both should be held to high standards. Interpretative research often derives conclusions from impressions gleaned from listening, immersing, reading and carrying out thought experiments. Those conclusions are valuable for providing insight into complex processes. A published (peer reviewed) article provides a complete chain of reasoning so that a reader can reconstruct the author’s logic – or at least it should. In this sense I see little difference between a carefully crafted hermeneutic article or a game theoretic article. Both offer insight and the evidence for the conclusion is embedded in the article. Given that the chain of reasoning in the article is the “evidence” for the conclusion, it would be absurd to mandate stockpiling impressionistic data in some data warehouse.

What I am calling empirical work has a different set of problems. I acknowledge that such work heavily focuses on measurement and instrumentation that is socially constructed. Research communities build around their favorite tools and methods and, as such, instantiate norms about how information is collected and processed. Those communities appeal to TOP (or DA-RT) for standards by which to judge empirical claims. I see little harm in making certain that when someone offers an empirical claim that I am given the basis on which that claim rests. Being transparent about the process by which data are collected, manipulated, processed and interpreted is critical for me to draw any conclusion about the merit of the finding. Note both interpretative and empirical research (as I have naively labeled them) interpret their data. The difference is that the later can more easily hide behind a wall of research decisions and statistical manipulations that are skipped past in an article. This material deserves to be in the public domain and subject to scrutiny. An empirical article rarely produces the same chain of logic that I can read in an interpretative article.

There are two points that are clear in resisting TOP or DA-RT. First is the issue of privileging empirical work. I agree that there is some danger here. If Journals adopt TOP (or DA-RT) and insist that empirical work lives up to those standards, this may drive some authors from submitting their work to those Journals. This does not mean that authors working in the interpretative tradition should be fearful. Neither DA-RT nor TOP mandate data archiving (see the useful discussion in Political Science Replication). As I note above, it would be ridiculous to insist that this be done. However, “beliefs” about the motives of Editors are a common barrier to publication. When I edited AJPS I was occasionally asked why more interpretative work was not published. The simple answer was that not much was ever sent my way. I treated such work just like any other. If the manuscript looked important, I tried to find the very best reviewers to give me advice. Alas, rejection rates for general journals are very high, no matter the flavor of research. The barriers to entry are largely in the author’s head.

Second, there is the sticky problem of replication. Many critics of DA-RT complain that replication is difficult, if not impossible. The claim is that this is especially true for interpretative work where the information collected is unique. I have sympathy for that position. While it might be nice to see field notes, etc., I am less concerned with checking to see if a researcher has “made it all up” than with learning how the researcher did the work. Again, the interpretative tradition is usually pretty good with detailing how conclusions were reached.

I am also less interested in seeing a “manipulated” data set so that I can get the same results as the author (though as the recent AJPS policy shows, this can be useful in ensuring that the record is clear). I would much rather see the steps that the author took to reach a conclusion. For empirical work this generally means a clearly defined protocol, the instrumentation strategy and the code used for the analysis.

I am interested in a research providing as much information as possible about how claims were reached. This would allow me, in principle, to see if I could reach similar conclusions. The stronger the claim, the more I want to know just how robust it might be. To do so, I need to see how the work was done. All good science is about elucidating the process by which one reaches a conclusion.

In the end I hope the discipline continues to standup up for science. I certainly hope that the move to delay DA-RT is due to the community deciding it has clearer standards in mind. If not, then I’m afraid the movement is about fighting for a piece of the pie.

Transparency, Openness and Replication

It is ironic that I am writing this post today. On May 19 Don Green asked that an article he recently co-authored in Science be retracted. The article purported to show that minimal contact with an out-group member (in this case, someone noting that he was gay) had a long-term effect on attitudes. As it turns out the data appear to be a complete fabrication (see the analysis by Broockman, Kalla and Aronow). The irony stems from the fact that I have been sending letters to editors of political science journals, asking them to commit to Transparency and Openness Promotion (TOP) guidelines. These guidelines make recommendations for the permanent housing of data and code, what should be elaborated about a research design and the analytic tools, and issues for pre-registration of studies. Don Green is a signatory to the letter and he was instrumental in pushing forward many of the standards.

The furor over the LaCour and Green retraction (and the recent rulings on the Montana field experiments) has forced me to think a bit more sharply about ethics. There are four lessons to be learned here.

First, science works. Will Moore makes this point quite nicely.  If someone has a new and interesting finding, it should never be taken as the last word. Science requires skepticism. While I teach my students that they should be their own worst critic, this is not enough. The process of peer review (as much as it might be disparaged) provides some opportunity for skepticism. The most important source of skepticism, however, should come from the research community. A finding, especially one that is novel, needs to be replicated (more on that below). Andrew Gelman makes this point on the Monkey Cage. We should be cautious when we see a finding that stands out. It should attract research attention and be “stress-tested” by others. The positive outcome of many different researchers focusing on a problem is that it allows us to calibrate the value of a finding and it should deter misconduct.

Second, the complete fabrication of findings is a rare event. There have been few instances in political science of outright fraud. This is not so much because of close monitoring by the community, nor the threat of deterrence. It seems that most of us do a good job of transmitting ethics to our students. We stress the importance of scientific integrity. I suspect that this case will serve as a cautionary tale. Michael LaCour had a promising career ahead of him. I’ve seen him present several papers and I thought all of them were innovative and tackling hugely important questions. Now, however, I do not trust anything I have seen or heard. My guess is that his career is destroyed. While we stress that our students adopt ethical norms of scientific integrity, it is equally important to enforce those norms when violated. I assume that will happen in this case. This case also raises the question of the role of LaCour’s co-author in monitoring the work and of LaCour’s advisors. All of us who have co-authors trust what they have done. But at the same time, co-authors also serve as an important check on our work. I know that my co-authors constantly question what I have done and ask for additional tests to ensure that a finding is robust. I do the same when I see something produced by a co-author over which I had no direct involvement. This is a useful check on findings. Of course, it will not prevent outright fraud. In a different vein students are apprentices. Our role as an advisor is to closely supervise their work. Whether this role is sufficient to prevent outright fraud is an open question.

Third, there is enormous value in replication. These days there is little incentive to replicate findings, but it is important. The team of Broockman and Kalla were trying an extension of the LaCour and Green piece. And why not, the field experiment the latter designed seemed to be a good start for additional research. This quasi-replication quickly demonstrated some of the problems associated with the LaCour and Green study. The Broockman, Kalla and Aronow discussion has proven to be persuasive. I worry, however, that this spurs replications that are resemble “gotcha” journalism. I certainly encourage replications and extensions that demonstrate the implausibility of a finding. However, I also hope that corroborating replications and extensions get their due. We need to encourage replications that allow us to assess our stock of knowledge. The Journal of Experimental Political Science openly welcomes replications and the same is true of the new Journal of the Economic Science Association. There is a movement afoot among experimental economists to focus on major papers published in the past year and subject the key findings to replication. Psychology has mounted “The Reproducibility Project” in which 270 authors are conducting replications of 100 different studies. While this may be easier for those of us who use lab experiments, we should more generally address this issue.

Fourth, the incentives for scholars is a bit perverse. Getting a paper published in Science or Nature is a big hit. Getting media attention for a novel finding is valuable for transmitting our findings (but see the retraction by This American Life ). We put an enormous amount of pressure on junior faculty to produce in the Big 3. By doing so, we ignore how important it is for junior faculty (and for senior faculty as well) to build a research program that tackles and answers questions through sustained work. Incentivizing big “hits” reduces sustained work in a subfield. Of course major Journals (I’ll capitalize this so that you know I’m referring to general journals in disciplines) often are accused of sensationalizing science. The Journals are thought to prioritize novel findings. This is true. While Editor at AJPS I wanted articles that somehow pushed the frontiers of subfields and challenged the conventional wisdom. My view is that the Journals are part of a conversation about science and not the repository for what is accepted “truth.” Articles published in top Journals ought to challenge the community and spur further research.

In the end, I am not depressed by this episode. It is spurring me to push journals to adopt standards that ensure transparency, openness and reproducibility. It also revives my confidence in the discipline and the care that many scholars take with their work.

The Science of Politics

I have an opportunity to design and teach a MOOC (massive open on-line course). It will be entitled the “Introduction to the Science of Politics” and is intended for freshman entering college. I want to teach the essentials of what every freshman should know about political science before taking one of my courses. So what is the “canon” of political science? What things should every undergraduate know before entering our mid-level courses?

A MOOC is not just a videotape of a talking head and some powerpoints. I’ve seen some very good courses offered on Coursera and edX. My course will last only four weeks with between 60 and 90 minutes of on-line content each week. I know enough about this type of pedagogy to plan on presenting concepts in 4-7 minute modules. I will have plenty of support at Rice for carrying out the course.

The hard part, of course, is considering the content of the course. This has made me think about what the discipline of political science has to say to the broader public. Here is what I have in mind so far.

Coordination problems. When people have shared preferences but there are multiple equilibrium, they face a coordination problem. Leadership is one mechanism that solves coordination problems that is directly relevant to politics.

Collective Action problems. The provision of public goods and the resolution of commons dilemmas have the same underpinnings. Here private interests diverge from group interests, leading to free riding. Political science has had a good deal to say concerning these problems.

Collective Choice problems. What happens when individuals have heterogeneous preferences, but a choice has to be made that is applied to all? This is the crux of politics. It not only speaks to democracies, but also to oligarchies and dictatorships. In the end, institutional rules matter for outcomes.

Principal/Agent problems. When an agent enjoys an information advantage the principal is put in a weakened position. This provides core insights for Bureaucratic/Legislative/Executive dilemmas. It also goes to the heart of the representational relationship. At the core is understanding the difficulty faced by a Principal in getting an Agent to act on her behalf. Obviously the problem is compounded with many principals and/or many agents.

Inter-group Conflict. This strikes me as a separate problem that is endemic to humans (and most other social animals). We easily develop strong in-group/out-group biases. We often use those biases to coordinate around killing one another (or otherwise subjugating out-groups). This poses a puzzle about when violence can be triggered – whether it is inter-state or intra-state conflict.

I need to do some thinking. In order to get at each of these topics noted above, I’ll have to introduce basic building blocks (utility, preferences, choice spaces, etc.). At the same time I know I’m leaving a lot out.

What is your list of things you would like your Freshmen to know before they enter your course? Obviously I am being provocative and I am staking out a very specific view of Political Science. Still, I am interested in what you might add to my list. What is the “canon?”

Research and Accessibility

A lot of our best basic research often seems esoteric and is rarely approachable to those outside our own specialization. But this need not be the case. Some disciplines are excellent at promoting their work and getting the word out. Consider the search for the Higgs boson and the hoopla when it was found. Most of us don’t know what the Higgs boson is and why it matters (much less being able to see it). Yet we all know it is important and it was a remarkable scientific achievement. The physics community did a great job making their work accessible.

How can Political Scientists make their work more accessible? The question here is how to balance the rigor of our science with making it clear to non-specialists about what we found and why it is important. Rather than complaining that we never make the effort, I thought I would try my hand at short, cartoonish, interpretations of articles that I have recently read and like. My first effort focuses on a forthcoming paper in the American Journal of Political Science by Kris Kanthak and Jon Woon entitled: “Women Don’t Run? Election Aversion and Candidate Entry.” I liked this paper the first time I heard it presented and it has only gotten better. You can see my take on it on YouTube under my channel Politricks.

I am going to try to do more of these over time. Who knows if they will get much attention. However, I see it as breaking out of the usual mold in which we write papers, cite the work and try to teach it to our students. Perhaps this will inspire others.

Others who have done similar work in the social sciences have inspired me. The first I remember seeing was featured in The Monkey Cage. The cartoon was remarkable for being short and exactly on the mark. The article that it translated was a dense piece of formal theory. The cartoon got it exactly right. More recently I was impressed by a very short animation that perfectly points to a problem in decision theory regarding queuing. It is perfectly understandable because we have all been there.

When I teach an Introduction to American Government class, I often use this to explain problems inherent with “first past the post” electoral systems.  While a little long, it is clear and the students get it quickly.

There are plenty of other examples and I’ll post things I like as I find them.

Publishing, the Gender Gap and the American Journal of Political Science

Over a year ago I wrote a short piece  concerning whether women were getting a fair shake at the AJPS. I thought so and I reported some statistics that reflected that opinion. However, I thought I could do better than simply report the percentage of published articles with women as authors or co-authors. What was glaringly absent was a benchmark metric. I did not even know what proportion of women authors submitted manuscripts to AJPS. I decided to rectify this (and my colleague Ashley Leeds urged me to stick with it).

Compiling a list of all manuscripts that were submitted while I was editor was easy. This information can easily be retrieved from the electronic manuscript manager that I used. However, getting information about characteristics of the corresponding author is nearly impossible. For co-authors, it is impossible. No information is collected about the gender, race, age or other characteristics of authors. The same problem is true with reviewers. I downloaded all of the manuscript data and all of the reviewers tied to each manuscript. I then had two research assistants code each author, co-author and reviewer for gender. Altogether there were 2,835 unique manuscripts that arrived at the AJPS offices from January 1, 2010 through December 31, 2013. There were a total of 5,064 authors. Of course, these are not unique authors, since more than a few authors submitted multiple manuscripts over the four years I was with the journal. On the reviewer side there were a total of 10,984 reviewers initially solicited. Of that set, 6,158 completed their review.

Authors.

In the Monkey Cage post I noted that 19.8% of AJPS articles had a woman as a lead author and 34.8% of AJPS articles had at least one woman as an author. This later percent accounts for articles that are co-authored. The problem with this count is that there is no useful denominator. It merely reflects the percentage relative to all published articles. It does not take into account the percentage of articles submitted by women.

I now have the distributions for manuscripts submitted to AJPS. It turns out that women are publishing at about the same rate as they submit. While I do not have data concerning the lead author, if I take solo-authored articles, then women submit 21.4% of the articles – slightly more than the 19.8% of AJPS articles with a female lead author. Of course, these are not exactly comparable. On the other hand, 31.96% of the articles submitted had at least one female author while 34.8% percentage of accepted articles had at least one female author. In a sense my earlier count was not far off.

The table below looks at basic decisions: desk rejections, declines on first review and manuscripts invited back or accepted. As a rule, on my first reading, I tended to blind myself to the author(s). Apparently I desk rejected manuscripts with male authors more frequently than manuscripts with female authors (about 5 percent more). This evens out following review, with males being declined at just over 50 percent, while manuscripts with a female author are declined almost 55 percent of the time. There is no appreciable difference in R&Rs or first accepts between the two groups.

Editorial Decisions
Male authors only At least one female author
Desk Reject 38.31%

(739)

33.22%

(301)

Reviewed and Declined 50.29%

(970)

54.75%

(54.75)

R&R and/or Accept 11.41%

(220)

11.76%

(109)

It appears that once manuscripts enter the review process the probability of receiving an R&R or acceptance is not correlated with the sex of the author. The only thing that is certain is that if you do not send in a manuscript, you will not get published.

Reviewers.

I worked hard to eliminate biases at the initial stage of review (whether to desk reject a manuscript). It could be that biases emerged in second critical stage, as reviewers are assigned to manuscripts.

A large number of reviewers were initially contacted. Of the 10,984 reviewers, 24.14 percent were women (2,652). This is slightly more than the 21.25% of authors who were female (1,076). In part this may be due to the fact that I asked my Editorial Assistants to add to our reviewer database and expand beyond the usual suspects.

These are aggregate numbers and count the same reviewer multiple times. I tried very hard not to call on the same reviewers more than twice a year, but there is the possibility that females were used disproportionately. Of the 5,133 unique reviewers used, 25.85% were female. This is above the proportion of women submitting manuscripts to the AJPS.

It may be that females are more conscientious, so I called on them more often. However, this does not seem to be the case. Male reviewers completed their review 55.3% of the time compared with 54.5% of female reviewers.

It appears that, at the margin, I called on women to review disproportionate to their numbers in submitting manuscripts. The percentage differences are not large. I do not have data to indicate whether this was a part of my deliberate outreach to junior faculty and some advanced graduate students.

The Bottom Line

Journals should be transparent in what they do. A start is providing these kinds of data. They allow the community to check on any biases that may creep into decisions. Editorial boards and interested communities have every incentive to monitor the decisions by Editors.

These data are also useful for Editors for double-checking what they are doing during the course of their tenure. I wish that I had done this in the middle of my tenure rather than after I stopped being Editor.

These data are very hard to collect. I used a lot of student coding time to pull these data together. Associations and Editors should press their electronic manuscript managers to add a handful of fields that are required for authors and co-authors. The burden should be minimal for those submitting a manuscript. Getting reviewers to enter additional information, however may be difficult. I review for a lot of different journals, and I am positive that I have failed to enter much personal information – I’m usually overwhelmed with other things that need to be done and finishing and submitting a review is about all I’m interested in doing. I doubt that I am alone in this feeling. Despite my reluctance, I see that such information is useful and I will change my bad habits.

Comcast Hell, Epilogue

A week ago Comcast returned to bury the “buried” cable line. This escapade began on July 24 when I was eagerly signed up for internet service.

After much discussion and consultation, the “super-supervisors” agreed that buying the cable would be very difficult and that the line would be strung using our current power lines. Parts of the stringing would be difficult, but I agreed to clear a lot of tree branches from around the lines. This was something I should have done a while ago. The aerial crew was to show up the following day.

The crew showed and immediately said it would be impossible to string the wire. They wanted another crew to show up and bury the wire. I simply said they needed to work this out with their “super-supervisor.” After much gnashing of teeth and delay, they set to work. The crew did a great job and everything works!

Like everyone with cable internet, there are isolated, short, spotty slow-downs with service. But, I can finally work efficiently from home. It has been nice to have bandwidth that rivals the lower half of internet speed in the world.

The lesson that I have learned in dealing with a large corporation with lots of subcontractors is that social media is valuable. My complaining on Twitter was picked up. It put me ahead of the queue for people who have relied on the usual phone-tree approach. Why social media should be so powerful in getting a response is puzzling. My sense is that consumer dissatisfaction is far more widespread than what appears on social media. Fixing the problem by having employees cruise social media is like sticking a finger in a leaking dike.

The good news is that I now have service. I can finally vent about lousy service when it happens. So far I have nothing to complain about.