Category

ai

Catalog of visualization tools

By | ai, bigdata, machinelearning | No Comments

There are a lot of visualization-related tools out there. Here’s a simple categorized collection of what’s available, with a focus on the free and open source stuff.

This site features a curated selection of data visualization tools meant to bridge the gap between programmers/statisticians and the general public by only highlighting free/freemium, responsive and relatively simple-to-learn technologies for displaying both basic and complex, multivariate datasets. It leans heavily toward open-source software and plugins, rather than enterprise, expensive B.I. solutions.

I found some broken links, and the descriptions need a little editing, but it’s a good place to start.

Also, if you’re just starting out with visualization, you might find all the resources a bit overwhelming. If that’s the case, don’t fret. You don’t have to learn how to use all of them. Let your desired outcomes guide you. Here’s what I use.

Tags:


Source link

Weekly Initial Unemployment Claims decrease to 234,000

By | ai, bigdata, machinelearning | No Comments


The DOL reported:

In the week ending January 14, the advance figure for seasonally adjusted initial claims was 234,000, a decrease of 15,000 from the previous week’s revised level. The previous week’s level was revised up by 2,000 from 247,000 to 249,000. The 4-week moving average was 246,750, a decrease of 10,250 from the previous week’s revised average. This is the lowest level for this average since November 3, 1973 when it was 244,000. The previous week’s average was revised up by 500 from 256,500 to 257,000.

There were no special factors impacting this week’s initial claims.
emphasis added

The previous week was revised up.

The following graph shows the 4-week moving average of weekly claims since 1971.

Click on graph for larger image.

The dashed line on the graph is the current 4-week average. The four-week average of weekly unemployment claims decreased to 246,750.

This was below the consensus forecast. This is the lowest level for the four week average since 1973 (with a much larger population).

The low level of claims suggests relatively few layoffs.


Source link

Counting is hard, especially when you don’t have theories

By | ai, bigdata, machinelearning | No Comments

(This article was originally published at Big Data, Plainly Spoken (aka Numbers Rule Your World), and syndicated at StatsBlogs.)

In the previous post, I diagnosed one data issue with the IMDB dataset found on Kaggle. On average, the third-party face-recognition software undercounted the number of people on movie posters by 50%.

It turns out that counting the number of people on movie posters is a subjective activity. Reasonable people can disagree about the number of heads on some of those posters.

For example, here is a larger view of the Better Luck Tomorrow poster I showed yesterday:

Betterlucktomorrowposter

By my count, there are six people on this poster. But notice the row of photos below the red title: someone could argue that there are more than six people on this poster. (Regardless, the algorithm is still completely wrong in this case, as it counted only one head.)

So one of the “rules” that I followed when counting heads is only count those people to whom the designer of the poster is drawing attention. Using this rule, I ignore the row of photos below the red title. Also by this rule, if a poster contains a main character, and its shadow, I only count the person once. If the poster contains a number of people in the background, such as generic soldiers in the battlefield, I do not count them.

Another rule I used is to count the back or side of a person even if I could not see his or her face provided that this person is a main character of the movie. For example, the following Rocky Balboa poster has one person on it.

Rockybalboapost

(cf. The algorithm counted zero heads.)

***

According to the distribution of number of heads predicted by the algorithm, I learned that some posters may have dozens of people on them. So I pulled out these outliers and looked at them.

This poster of The Master (2012) is said to contain 31 people.

Themasterposter

On a closer look, this is a tesselation of a triangle of faces. Should that count as three people or lots of people? As the color fades off on the sides of the poster, should we count those barely visible faces?

Counting is harder than it seems.

***

The discussion above leads to an important issue in building models. The analyst must have some working theory about how X is related to Y. If it is believed that the number of faces on the movie poster affects movie-goers' enthusiam, then that guides us to count certain people but not others.

***

If one were to keep pushing on the rationale of using this face count data, one inevitably arrives at a dead end. Here are the top results from a Google Image Search on “The Master 2012 poster”:

The_master_posters_google_search_more

Well, every movie is supported by a variety of posters. The bigger the movie, the bigger the marketing budget, the more numerous are the posters. There are two key observations from the above:

The blue tesselation is one of the key designs used for this movie. Within this design framework, some posters contain only three heads, some maybe a dozen heads, and some (like the one shown on IMDB) many dozens of heads.

Further, there are at least three other design concepts, completely different from the IMDB poster, and showing different number of people!

Going back to the theory that movie-goers respond to the poster design (in particular, the number of people in the poster), the analyst now realizes that he or she has a huge hole in the dataset. Which of these posters did the movie-goer see? Did IMDB know which poster was seen the most number of times?

Thus, not only are the counts subjective and imprecise, it is not even clear we are analyzing the right posters.

***

Once I led the students down this path, almost everyone decided to drop this variable from the dataset.

 

 

 

Please comment on the article here: Big Data, Plainly Spoken (aka Numbers Rule Your World)

The post Counting is hard, especially when you don’t have theories appeared first on All About Statistics.




Source link

Elements of a successful #openscience #rstats workshop

By | ai, bigdata, machinelearning | No Comments

(This article was first published on R – christopher lortie, and kindly contributed to R-bloggers)

What makes an open science workshop effective or successful*?

Over the last 15 years, I have had the good fortune to participate in workshops as a student and sometimes as an instructor. Consistently, there were beneficial discovery experiences, and at times, some of the processes highlighted have been transformative. Last year, I had the good fortune to participate in Software Carpentry at UCSB and Software Carpentry at YorkU, and in the past, attend (in part) workshops such as Open Science for Synthesis. Several of us are now deciding what to attend as students in 2017. I have been wondering about the potential efficacy of the workshop model and why it seems that they are so relatively effective. I propose that the answer is expectations.  Here is a set of brief lists of observations from workshops that lead me to this conclusion.

*Note: I define a workshop as effective or successful when it provides me with something practical that I did not have before the workshop.  Practical outcomes can include tools, ideas, workflows, insights, or novel viewpoints from discussion. Anything that helps me do better open science. Efficacy for me is relative to learning by myself (i.e. through reading, watching webinars, or stuggling with code or data), asking for help from others, taking an online course (that I always give up on), or attending a scientific conference.

Delivery elements of an open science training workshop

  1. Lectures
  2. Tutorials
  3. Demonstrations
  4. Q & A sessions
  5. Hands-on exercises
  6. Webinars or group-viewing recorded vignettes.

Summary expectations from this list: a workshop will offer me content in more than one way unlike a more traditional course offering. I can ask questions right there on the spot about content and get an answer.

Content elements of an open science training workshop

  1. Data and code
  2. Slide decks
  3. Advanced discussion
  4. Experts that can address basic and advanced queries
  5. A curated list of additional resources
  6. Opinions from the experts on the ‘best’ way to do something
  7. A list of problems or questions that need to addressed or solved both routinely and in specific contexts when doing science
  8. A toolkit in some form associated with the specific focus of the workshop.

Summary of expectations from this list: the best, most useful content is curated. It is contemporary, and it would be a challenge for me to find out this on my own.

Pedagogical elements of an open science training workshop

  1. Organized to reflect authentic challenges
  2. Uses problem-based learning
  3. Content is very contemporary
  4. Very light on lecture and heavy on practical application
  5. Reasonably small groups
  6. Will include team science and networks to learn and solve problems
  7. Short duration, high intensity
  8. Will use an open science tool for discussion and collective note taking
  9. Will be organized by major concepts such as data & meta-data, workflows, code, data repositories OR will be organized around a central problem or theme, and we will work together through the steps to solve a problem
  10. There will be a specific, quantifiable outcome for the participants (i.e. we will learn how to do or use a specific set of tools for future work).

Summary of expectations from this list: the training and learning experience will emulate a scientific working group that has convened to solve a problem. In this case, how can we all get better at doing a certain set of scientific activities versus can a group aggregate and summarize a global alpine dataset for instance. These collaborative solving-models need not be exclusive.

Higher-order expectations that summarize all these open science workshop elements

  1. Experts, curated content, and contemporary tools.
  2. Everyone is focussed exclusively on the workshop, i.e. we all try to put our lives on hold to teach and learn together rapidly for a short time.
  3. Experiences are authentic and focus on problem solving.
  4. I will have to work trying things, but the slope of the learning curve/climb will be mediated by the workshop process.
  5. There will be some, but not too much, lecturing to give me the big picture highlights of why I need to know/use a specific concept or tool.

To leave a comment for the author, please follow the link and comment on their blog: R – christopher lortie.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…




Source link

Randy Hunt on design at Etsy

By | ai, bigdata, machinelearning | No Comments

The O’Reilly Design Podcast: Collaborating with engineering, hiring for humility, and the code debate.

In this week’s Design Podcast, I sit down with Randy Hunt, VP of design at Etsy. We talk about the culture at Etsy, why it’s important to understand the materials you are designing with, and why humility is your most important skill.

Continue reading Randy Hunt on design at Etsy.




Source link

Philly Fed: Manufacturing Activity Continued to Improve in January

By | ai, bigdata, machinelearning | No Comments


Earlier from the Philly Fed: Manufacturing Activity Continued to Improve in January

Economic conditions continued to improve in January, according to the firms responding to this month’s Manufacturing Business Outlook Survey. The indexes for general activity, new orders, and employment were all positive this month and increased from their readings last month. Manufacturers have generally grown more optimistic in their forecasts over the past two months. The future indexes for growth over the next six months, including employment, continued to improve this month.

The index for current manufacturing activity in the region increased from a revised reading of 19.7 in December to 23.6 this month. … The general activity index has remained positive for six consecutive months, and the activity index reading was the highest since November 2014.

Firms reported an increase in manufacturing employment this month. … The current employment index improved 9 points, registering its second consecutive positive reading.

Here is a graph comparing the regional Fed surveys and the ISM manufacturing index:

Fed Manufacturing Surveys and ISM PMI Click on graph for larger image.

The New York and Philly Fed surveys are averaged together (yellow, through January), and five Fed surveys are averaged (blue, through December) including New York, Philly, Richmond, Dallas and Kansas City. The Institute for Supply Management (ISM) PMI (red) is through December (right axis).

It seems likely the ISM manufacturing index will show faster expansion again in January.


Source link

Knoxville, TN: R for Text Analysis Workshop

By | ai, bigdata, machinelearning | No Comments

(This article was originally published at r4stats.com, and syndicated at StatsBlogs.)

The Knoxville R Users Group is presenting a workshop on text analysis using R by Bob Muenchen. The workshop is free and open to the public. You can join the group at https://www.meetup.com/Knoxville-R-Users-Group. A description of the workshop follows.

Seeking Cloud

R for Text Analysis

When analyzing text using R, it’s hard to know where to begin. There are 37 packages available and there is quite a lot of overlap in what they can do. This workshop will demonstrate how to do three popular approaches: dictionary-based content analysis, latent semantic analysis, and latent Dirichlet allocation. We will spend much of the time on the data preparation steps that are important to all text analysis methods including data acquisition, word stemming/lemmatization, removal of punctuation and other special characters, phrase discovery, tokenization, and so on. While the examples will focus on the automated extraction of topics in the text files, we will also briefly cover the analysis of sentiment (e.g. how positive is customer feedback?) and style (who wrote this? are they telling the truth?)

The results of each text analysis approach will be the topics found, and a numerical measure of each topic in each document. We will then merge that with numeric data and do analyses combining both types of data.

The R packages used include quanteda, lsa, topicmodels, tidytext and wordcloud; with brief coverage of tm and SnowballC. While the workshop will not be hands-on due to time constraints, the programs and data files will be available afterwards.

Where: University of Tennessee Humanities and Social Sciences Building, room 201. If the group gets too large, the location may move and a notice will be sent to everyone who RSVPs on Meetup.com or who registers at the UT workshop site below. You can also verify the location the day before via email with Bob at [email protected]

When: 9:05-12:05 Friday 1/27/17

Prerequisite: R language basics

Members of UT Community register at: http://workshop.utk.edu under Researcher Focused

Members of other groups please RSVP on your respective sites so I can bring enough handouts.

Please comment on the article here: r4stats.com

The post Knoxville, TN: R for Text Analysis Workshop appeared first on All About Statistics.




Source link

Are women chess players intimidated by male opponents?

By | ai, bigdata, machinelearning | No Comments


The “Elo” rating system is a method most famous for ranking chess players, but which has now spread to many other sports and games.

How Elo works is like this: when you start out in competitive chess, the federation assigns you an arbitrary rating — either a standard starting rating (which I think is 1200), or one based on an estimate of your skill. Your rating then changes as you play.

What I gather from Wikipedia is that “master” starts at a rating of about 2300, and “grandmaster” around 2500. To get from the original 1200 up to the 2300 level, you just start winning games. Every game you play, your rating is adjusted up or down, depending on whether you win, lose, or draw. The amount of the adjustment depends on the difference in skill between you and your opponent. Elo calculates an estimate of the odds of winning, calculated from your rating and your opponent’s rating, and the loser “pays” points to the winner. So, the better your opponents, the more points you get for defeating them.

The rating is an estimate of your skill, a “true talent level” for chess. It’s calibrated so that every 400-point difference between players is an odds ratio of 10. So, when a 1900-rated player, “Ann”, faces a 1500-rated player, “Bob,” her odds of winning are 10:1 (.909). That means that if the underdog, Bob, wins, he’ll get 10 times as many points as Ann will get if she wins.

How many points, exactly? That’s set by the chess federation in an attempt to get the ratings to converge on talent, and the “400-point rule,” as quickly and accurately as possible. The idea is that the less information you have about the players, the more points you adjust by, because the result carries more weight towards your best estimate of talent.

For players below “expert,” the adjustment is 32 times the difference from expectation. For expert players, the adjustment is only 24 points per win, and, at the master level and above, it’s 16 points per win.

If Bob happens to beat Ann, he won 1.00 games when the expectation was that that he’d win only 0.09. So, Bob exceeded expectations by 0.91 wins. Multiply by 32, and you get 29 points. That means Bob’s rating jumps from 1500 to 1529, while Ann drops from 1900 to 1871.

If Ann had won, she’d claim 3 points from Bob, so she’d be at 1903 and Bob would wind up at 1497.

FiveThirtyEight recently started using Elo for their NFL and NBA ratings. It’s also used by my Scrabble app, and the world pinball rankings, and other such things. I haven’t looked it up, but I’d be surprised if it weren’t used for other games, too, like Backgammon and Go.

——-

For the record, I’m not an expert on Elo, by any means … I got most of my understanding from Wikipedia, and other internet sources. And, a couple of days ago, Tango posted a link to an excellent article by Adam Dorhauer that explains it very well.

Despite my lack of expertise, it seems to me that these properties of Elo are clearly the case:

1. Elo ratings are only applicable to the particular game that they’re calculated from. If you’re a 1800 at Chess, and I’m a 1600 at Scrabble, we have no idea which one of us would win at either game.

2. The range of ELO ratings varies between games, depending on the range of talent of the competitors, but also on the amount of luck inherent to the sport. If the best team in the NBA is (say) an 8:1 favorite against the worst team in the league, it must be rated 361 Elo points better. (That’s because 10 to the power of (361/400) equals 8.) But if the best team in MLB is only a 2:1 favorite, it has to be rated only 120 points better.

Elo is an estimate of odds of winning. It doesn’t follow, then, that a 1800 rating in one sport is comparable to a 1800 rating in another sport. I’m a better pinball player than I am a Scrabble player, but my Scrabble rating is higher than my pinball rating. That’s because underdogs are more likely to win at pinball. I have a chance of beating the best pinball player in the world, in a single game, but I’d have no chance at all against a world-class Scrabble player.

In other words: the more luck inherent in the game, the tighter the range (smaller the standard deviation) of Elo ratings.

3. Elo ratings are only applicable within the particular group that they’re applied to.

Last March, before the NCAA basketball tournament, FiveThirtyEight had Villanova with an Elo rating of 2045. Right now, they have the NBA’s Golden State Warriors with a rating of 1761.

Does that mean that Villanova was actually a better basketball team than Golden State? No, of course not. Villanova’s rating is relative to its NCAA competition, and Golden State’s rating is relative to its NBA competition.

If you took the ratings at face value, without realizing that, you’d be projecting Villanova as 5:1 favorites over Golden State. In reality, of course, if they faced each other, Villanova would get annihilated.

——–

OK, this brings me to a study I found on the web (hat tip here). It claims that women do worse in chess games that they play against men rather than against women of equal skill. The hypothesis is, women’s play suffers because they find men intimidating and threatening.

(For instance: “Girls just don’t have the brains to play chess,” (male) grandmaster Nigel Short said in 2015.)

In an article about the paper, co-author Maria Cubel writes:


“These results are thus compatible with the theory of stereotype threat, which argues that when a group suffers from a negative stereotype, the anxiety experienced trying to avoid that stereotype, or just being aware of it, increases the probability of confirming the stereotype.

“As indicated above, expert chess is a strongly male-stereotyped environment.“… expert women chess players are highly professional. They have reached a high level of mastery and they have selected themselves into a clearly male-dominated field. If we find gender interaction effects in this very selective sample, it seems reasonable to expect larger gender differences in the whole population.”


Well, “stereotype threat” might be real, but I would argue that you don’t actually have evidence of it in this chess data. I don’t think the results actually mean what the authors claim they mean.

——-

The authors examined a large database of chess results, and selected all players with a rating of at least 2000 (expert level) who played at least one game against an opponent of each of the two sexes.

After their regressions, the authors report,

“These results indicate that players earn, on average and ceteris paribus, about 0.04 fewer points [4 percentage points of win probability] when playing against a man as compared to when their opponent is a woman. Or conversely, men earn 0.04 points more when facing a female opponent than when facing a male opponent. This is a sizable effect, comparable to women playing with a 30 Elo point handicap when facing male opponents.”


The authors did control for Elo rating, of course. That was especially important because the women were, on average, less skilled than the men. The average male player in the study was rated at 2410, while the average female was only 2294. That’s a huge difference: if the average man played the average woman, the 116-point spread suggests the man would have a .661 winning percentage — roughly, 2:1 odds in favor of the man.

Also, there were many more same-sex matches in the database than intersex matches. There are two reasons for that. First, many tournaments are organized by ranking; since there are many more men, proportionally, in the higher ranks, they wind up playing each other more often. Second, and probably more important, there are many women’s tournaments and women’s-only competitions.

——-

So, now we see the obvious problem with the study, why it doesn’t show what the authors think it shows.

It’s the Villanova/Golden State situation, just better hidden.

The men and women have different levels of ability — and, for the most part, their ratings are based on play within their own group.

That means the men’s and women’s Elo ratings aren’t comparable, for exactly the same reason an NCAA Elo rating isn’t comparable to an NBA Elo rating. The women’s ratings are based more on their performance relative to the [less strong] women, and the men’s ratings more on their performance relative to the [stronger] men.

Of course, the bias isn’t as severe in the chess case as the basketball case, because the women do play matches against men (while Villanova, of course, never plays against NBA teams). Still, both groups played predominanly within their sex — the women 61 percent against other women, and the men 87 percent against other men.

So, clearly, there’s still substantial bias. The Elo ratings are only perfectly commensurable if the entire pool can be assumed to have faced a roughly equal caliber of competition. A smattering of intersex play isn’t enough.

Villanova and Golden State would still have incompatible Elos even if they played, say, one out of every five games against each other. Because, then, for the rest of their games, Villanova would go play teams that are 1500 against NCAA competition, and Golden State would go play teams that are 1500 against NBA competition, and Villanova would have a much easier time of it.

——

Having said that … if you have enough inter-sex games, the ratings should still work themselves out.

Because, the way Elo works, points can neither be created nor destroyed. If women play only women, and men play only men, on average, they’ll keep all the ratings points they started with, as a group. But if the men play even occasional games against the women, they’ll slowly scoop up ratings points from the women’s side to the men’s side. All that matters is *how many* of those games are played, not *what proportion*. The male-male and female-female games don’t make a huge difference, no matter how many there are.

The way Elo works, overrated players “leak” points to underrated players. No matter how wrong the ratings are to start, play enough games, and you’ll have enough “leaks” for the ratings all converge on accuracy.

Even if 99% of women’s games are against other women, eventually, with enough games played, that 1% can add up to as many points as necessary, transferred from the women to the men, to make things work out.

——

So, do we have enough games, enough “leaks”, to get rid of the bias?

Suppose both groups, the men and the women, started out at 1200. But the men were better. They should have been 1250, and the women should have been 1150. The woman/woman games and man/man games will keep both averages at 1200, so we can ignore those. But the woman/man games will start “leaking” ratings points to the men’s side.

Are there enough woman/man games in the database that the men could unbias the women’s ratings by capturing enough of their ratings points?

In the sample, there were 5,695 games by those woman experts (rating 2000+) who played at least one man. Of those games, 61 percent were woman against women. That leaves 2,221 games where expert women played (expert or inexpert) men.

By a similar calculation, there were 2,800 games where expert men played (expert or inexpert) women.

There’s probably lots of overlap in those two sets of games, where an expert man played an expert woman. Let’s assume the overlap is 1,500 games, so we’ll reduce the total to 3,500.

How much leakage do we get in 3,500 games?

Suppose the men really are exactly 116 points better in talent than the women, like their ratings indicate — which would be the case if the leakage did, in fact, take care of all the bias.

Now, consider what would have happened if there were no leakage. If the sexes played only each other, the women would be overrated by 116 points (since they’d have equal average ratings, but the men would be 116 points more talented).

Now, introduce intersex games. The first time a woman played a man, she’d be the true underdog by 116 points. Her male opponent would have a .661 true win probability, but treated by Elo as if he only had .500. So, the male group would gain .161 wins in expectation on that game. At 24 points per win, that’s 3.9 points.

After that game, the sum of ratings on the woman’s side drops by 3.9 points, so now, the women won’t be quite as overrated, and the advantage to the men will drop. But, to be conservative, let’s just keep it at 3.9 points all the way through the set of 3,500 games. Let’s even round it to 4 points.

Four points of leakage, multiplied by 3,500 games, is 14,000 ratings points moving from the women’s side to the men’s side.

There were about 2,000 male players in the study, and 500 female players. Let’s ignore their non-expert opponents, and assume all the leakage came from these 2,500 players.

That means the average female player would have (legitimately) lost 28 points due to leakage (14,000 divided by 500). The average male player would gain 7 points (14,000 divided by 2000).

So, that much leakage would have cut the male/female ratings bias by 35 points.

But, since we started the process with a known 116 points of bias, we’re left with 81 points still remaining! Even with such a large database of games, there aren’t enough male/female games to get rid of more than 30 percent of the Elo bias caused by unbalanced opponents.

If the true bias should be 81 points, why did the study find only 30?

Because the sample of games in the study isn’t a complete set of all games that went into every player’s rating. For one thing, it’s just the results of major tournaments, the ones that were significant enough to appear in “The Week in Chess,” the publication from which the authors compiled their data. For another thing, the authors used only 18 months worth of data, but most of these expert players have been in playing chess for years.

If we included all the games that all the players ever played, would that be enough to get rid of the bias? We can’t tell, because we don’t know the number of intersex games in the players’ full careers.

We can say hypothetically, though. If the average expert played three times as many games as logged in this 18-month sample, that still wouldn’t be enough — it would only cover be 105 of the 116 points. Actually, it would be a lot less, because once the ratings start to become accurate, the rate of correction decelerates. By the time half the bias is covered, the remaining bias corrects at only 2 points per between-sex game, rather than 4.

Maybe we can do this with a geometric argument. The data in the sample reduced the bias from 116 to 81, which is 70 percent of the original. So, a second set of data would reduce the bias to 57 points. A third set would reduce it to 40 points. And a fourth set would reduce it to 28 points, which is about what the study found.

So, if every player in this study actually had four times as many man vs. woman games as were in this database, that would *still* not be enough to reduce the bias below what was found in the study.

And, again, that’s conservative. It assumes the same players in all four samples. In real life, new players come in all the time, and if the new males tend to be better than the new females, that would start the bias all over again.

——-

So, I can’t prove, mathematically, that the 30-point discrepancy the authors found is an expected artifact of the way the rating system works. I can only show why it should be strongly suspected.

It’s a combination of the fact that, for whatever reason, the men are stronger players than the women, and, again for whatever reason, there are many fewer male-female games than you need for the kind of balanced schedule that would make the ratings comparable.

And while we can’t say for sure that this is the cause, we CAN say — almost prove — that this is exactly the kind of bias that happens, mathematically, unless you have enough male-female games to wash it out.

I think the burden is on the authors of the study to show that there’s enough data outside their sample to wash out that inherent bias, before introducing alternative hypotheses. Because, we *know* this specific effect exists, has positive sign, depends on data that’s not given in the study, and could plausibly exactly the size of the observed effect!

(Assuming I got all this right. As always, I might have screwed up.)

——-

So I think there’s a very strong case that what this study found is just a form of this “NCAA vs. NBA” bias. It’s an effect that must exist — it’s just the size that we don’t know. But intuitive arguments suggest the size is plausibly pretty close to what the study found.

So it’s probably not that women perform worse against men of equal talent. It’s that women perform worse against men of equal ratings.


Peter Backus, Maria Cubel, Matej Guid, Santiago Sanches-Pages, Enrique López Mañas: Gender, Competition and Performance: Evidence from Real Tournaments.


Source link