Science and Technology links (August 11th, 2017)

By | machinelearning

It looks like the Java programming language might finally get in-language support for vector instructions, these instructions are supported by modern processors and multiply the processing speed… but they often require different algorithms. Language designers have often ignored vector instructions, relying instead on optimizing compilers to “auto-vectorize” the code. Optimizing compilers are very good, but experience shows that language support for performance features pays.

It seems that armies worldwide are toying with the idea of smart full face military helmets. These helmets are bullet proof, have night vision and heat detection, they offer augmented reality. Sounds like something from Star Wars.

Whey is a common byproduct of the transformation of milk. It is cheap. Bodybuilders use whey protein supplements to grow bigger muscles. It seems that supplementing with whey proteins might be a good strategy for all of us, young and old. Personally, I remain somewhat cautious. We don’t know the long term effects of loading up on proteins. It is a popular thing to do, but it does not make it safe.

Parkinson’s is a terrible disease that cannot be stopped or even slowed currently. A diabetes drug called exenatide appeared to have halted the progression of disease in a small trial. If this could be verified, it would be a historical event.

Source link

ggvis Exercises (Part-2)

By | ai, bigdata, machinelearning

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

INTRODUCTION

The ggvis package is used to make interactive data visualizations. The fact that it combines shiny’s reactive programming model and dplyr’s grammar of data transformation make it a useful tool for data scientists.

This package may allows us to implement features like interactivity, but on the other hand every interactive ggvis plot must be connected to a running R session.
Before proceeding, please follow our short tutorial.

Look at the examples given and try to understand the logic behind them. Then try to solve the exercises below using R and without looking at the answers. Then check the solutions.
to check your answers.

Exercise 1

Create a list which will include the variables “Horsepower” and “MPG.city” of the “Cars93” data set and make a scatterplot. HINT: Use ggvis() and layer_points().

Exercise 2

Add a slider to the scatterplot of Exercise 1 that sets the point size from 10 to 100. HINT: Use input_slider().

Learn more about using ggvis in the online course R: Complete Data Visualization Solutions. In this course you will learn how to:

  • Work extensively with the ggvis package and its functionality
  • Learn what visualizations exist for your specific use case
  • And much more

Exercise 3

Add a slider to the scatterplot of Exercise 1 that sets the point opacity from 0 to 1. HINT: Use input_slider().

Exercise 4

Create a histogram of the variable “Horsepower” of the “Cars93” data set. HINT: Use layer_histograms().

Exercise 5

Set the width and the center of the histogram bins you just created to 10.

Exercise 6

Add 2 sliders to the histogram you just created, one for width and the other for center with values from 0 to 10 and set the step to 1. HINT: Use input_slider().

Exercise 7

Add the labels “Width” and “Center” to the two sliders respectively. HINT: Use label.

Exercise 8

Create a scatterplot of the variables “Horsepower” and “MPG.city” of the “Cars93” dataset with size = 10 and opacity = 0.5.

Exercise 9

Add to the scatterplot you just created a function which will set the size with the left and right keyboard controls. HINT: Use left_right().

Exercise 10

Add interactivity to the scatterplot you just created using a function that shows the value of the “Horsepower” when you “mouseover” a certain point. HINT: Use add_tooltip().

var vglnk = { key: ‘949efb41171ac6ec1bf7f206d57e90b8’ };

(function(d, t) {
var s = d.createElement(t); s.type = ‘text/javascript’; s.async = true;
s.src = “http://cdn.viglink.com/api/vglnk.js”;
var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r);
}(document, ‘script’));

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…




Source link

Optimizing polynomial hash functions (Java vs. Swift)

By | machinelearning

In software, hash functions are ubiquitous. They map arbitrary pieces of data (strings, arrays, …) to fixed-length integers. They are the key ingredient of hash tables which are how we most commonly implement maps between keys and values (e.g., between someone’s name and someone’s phone number).

A couple of years ago, I pointed out that you could almost double the speed of the default hash functions in Java with a tiny bit of effort. I find it remarkable that you can double the performance of standard ubiquitous functions so easily.

Richard Startin showed that this remains true today with Java 9. I used String hashing as an example, Richard makes the same demonstration using the Arrays.hashCode function, but the idea is the same.

You might object that, maybe, the performance of hash functions is irrelevant. That might be true for your application, but it gives you a hint as to how much you can speed up your software by tweaking your code.

In any case, I decided to used to experiment as a comparison point with Apple’s Swift language. Swift is the default language when building iOS applications whereas Java is the default language when building Android applications… and, in this sense, they are competitors.

I am not, for example, trying to determine whether Swift is better than Java. This sort of question is meaningless. However, I am trying to gain some perspective on the problem.

Whereas Java offers Arrays.hashCode as a way to hash arrays, I believe that Swift flat out abstains from helping. If you need to hash arrays, you have to roll your own function.

So let us write something in Swift that is equivalent to Java, a simple polynomial hash function:

func simpleHash(_ array : [Int]) -> Int {
  var hash = 0
  for x in array {
    hash = hash &* 31 &+ x
  }
  return hash
}

There are ampersands everywhere because Swift crashes on signed overflows. So if you have a 64-bit system and you write (1<<63)*2, then your program halts. This is viewed as being safer. You need to prefix your operators with the ampersand to keep the code running.

We can “unroll” the loop, that is process the data in blocks of four values. You can expect larger blocks to provide faster performance, albeit with diminishing returns.

Of course, if you are working with tiny arrays, this optimization is useless, but in such cases, you probably do not care too much about the performance of the hash function.

The code looks a bit more complicated, but what we have done is not sophisticated:

func unrolledHash(_ array : [Int]) -> Int {
  var hash = 0
  let l = array.count/4*4
  for i in stride(from:0,to:l,by:4) {
    hash = hash &* 31  &* 31  &* 31  &* 31  
           &+ array[i]  &* 31  &* 31  &* 31 
           &+ array[i + 1]  &* 31  &* 31    
           &+ array[i + 2]  &* 31  
           &+ array[i + 3]
  }
  for i in stride(from:l,to:array.count,by:1) {
      hash = hash &* 31 &+ array[i]
  }
  return hash
}

I have designed a little benchmark that hash a large array using Swift 3.0. I run it on a Linux box with a recent processor (Intel Skylake) running at 3.4 GHz. Like in the Java experiments, the unrolled hash function is nearly twice as fast:

simple hash 0.9 ns/element
unrolled hash 0.5 ns/element

In the unrolled case, we are using about 1.7 CPU cycles per element value, against 3 CPU cycles in the simple case.

Swift 4.0 is around the corner and I tried running the benchmark with a pre-release version of Swift 4.0, but the performance difference remained.

As usual, my code is available. It should run under Linux and macOS. It might possibly run under Windows if you are the adventurous type.

So, at a glance, Swift does not differ too much from Java: the relative performance gap between hand-tuned hash functions and naive hash functions is the same.

Obviously, it might be interesting to extend these investigations beyond Java and Swift. My current guess is that the gap will mostly remain. That is, I conjecture that while some optimizing compilers will be able to unroll the loop, none of them will do as well as the simple manual unrolling. In effect, I conjecture Java and Swift are not being particularly dumb.

Source link

BLS: Unemployment Rates Unchanged in 46 states in July, Two States at New Series Lows

By | ai, bigdata, machinelearning


From the BLS: Regional and State Employment and Unemployment Summary

Unemployment rates were higher in July in 3 states, lower in 1 state, and stable in 46 states and the District of Columbia, the U.S. Bureau of Labor Statistics reported today. Twenty-seven states had jobless rate decreases from a year earlier and 23 states and the District had little or no change. The national unemployment rate, 4.3 percent, was little changed from June but was 0.6 percentage point lower than in July 2016.

North Dakota and Colorado had the lowest unemployment rates in July, 2.2 percent and 2.4 percent, respectively. The rates in North Dakota (2.2 percent) and Tennessee (3.4 percent) set new series lows. (All state series begin in 1976.) Alaska had the highest jobless rate, 7.0 percent.
emphasis added

State Unemployment Click on graph for larger image.

This graph shows the current unemployment rate for each state (red), and the max during the recession (blue). All states are well below the maximum unemployment rate for the recession.

The size of the blue bar indicates the amount of improvement. The yellow squares are the lowest unemployment rate per state since 1976.

Ten states have reached new all time lows since the end of the 2007 recession. These ten states are: Arkansas, California, Colorado, Maine, Mississippi, North Dakota, Oregon, Tennessee, Washington, and Wisconsin.

The states are ranked by the highest current unemployment rate. Alaska, at 7.0%, had the highest state unemployment rate.

State UnemploymentThe second graph shows the number of states (and D.C.) with unemployment rates at or above certain levels since January 2006. At the worst of the employment recession, there were 11 states with an unemployment rate at or above 11% (red).

Currently one state has an unemployment rate at or above 7% (light blue); Only two states and D.C. are at or above 6% (dark blue). The states are Alaska (7.0%) and New Mexico (6.3%). D.C. is at 6.4%.


Source link

Bubbling up is lowering empathy at a civilization scale

By | machinelearning

Computer networks are a fantastic invention. When they came in my life, I remember spending hours, sometimes days, arguing with people I violently disagreed with. At first glance, it looks like a complete waste of time, but experience has taught me that it is tremendously precious. Not because it changes minds, but because it keeps minds opened.

My friend Seb Paquet invented the concept of ridiculously easy group formation. Seb observed in the early 2000s that it was now possible to create “groups” using web tools with very little effort. I have always been suspicious of groups. I prefer open networks to tightly organized groups. Still, I initially viewed Seb’s observation positively. People organizing more easily ought to foster a more collaborative society.

Maybe it did, at least initially, have very positive outcomes, but I am now very concerned with one particularly vicious side-effect: people organize along “party lines” for conflict. Instead of arguing with people we disagree, instead of seeking common ground, we “unfriend them”. In effect, we are creating cognitive bubbles of like-minded individuals around us. I believe that this might lower empathy at a civilization scale.

I don’t want to bring everything back to Donald Trump, but he is an obligatory reference. It is not that Trump is so interesting to me, but what he reveals is important. A week before the election, a student of mine asked me if Trump could get elected. I categorically said no. Trump could not get elected. How could it be given that all the smart people I know predict he won’t?

Then Trump was elected.

You can call it an accident if you want, but I don’t think it is a fair qualification. What this exposed to me is that I had bubbled up too much. I was no longer able to see the world in a sufficiently clear manner to make reasonable predictions. I should have been able to anticipate Trump’s election. I wasn’t. I failed.

I can be excused because I am Canadian. But the US is our powerful neighbor so I ought to know what is going on over there.

So what did I do? I decided to subscribe to blogs and podcasts of people who do support Trump. Scott Adams, the cartoonist behind Dilbert, is first on my list. I have been following Scott Adams and he has exposed me to a whole other way to view the world. At least as far as Trump is concerned, Scott Adams has a much better predictive record than anyone else I know.

It is not that Scott is right and other people are wrong. It would be nice if we could divide things up into right and wrong, black and white. It is never that simple, even when you think it should.

But I don’t think that’s what most people who did not see the election of Trump coming did. I suspect that many of them just doubled down on the bubbling. They started to actively seek out anyone who might not toe the party line and to exclude them from their view. So we got a more divided world.

We have lots of experience with such political bubbles. Fascism, Soviet-era Russia and Mao’s China and glaring examples of what this can lead to in the most extreme cases: brutal violence. We don’t want that. It is hell on Earth.

And it may be reassuring to think that it is your side that will crush the other side, but that’s very dangerous thinking. The very people who are “on your side” may soon either turn against you or hurt you indirectly. That’s what history taught us. Very few people did well under Mao.

We are all tempted by virtue signaling. Given any topic, we look around in our group and tend toward what is perceived as the “moral thing”. It is a powerful force. Sadly, it is also what makes fascism possible in the first place. It is at the core of our inner Nazi.

I believe that, inadvertently maybe, we have built tools (software and laws) that favor virtue signaling and bubbling up against growing empathy for diverse point of views. We create groups, disconnected networks and get stuck in a repeating feedback loop. Our positions become fossils, unable to evolve or change once set.

A friend of mine once remarked how surprisingly difficult it can be to host a blog where people come to vehemently disagree. And that’s something that we are losing out. Walled garden with algorithmically selected posts are replacing blogs. The bloggers who remain often close their comment feeds, or close it for “the other side”. We are pushing the debates at the edges, down in the YouTube comments where all we get is toxicity.

I am not arguing for a renewal of blogging, but I am arguing that “bubbling up” should become pejorative. We should not celebrate people who cater to people sharing a given dogma. We should look at these people with suspicion. And, yes, maybe we need to make it easier, through technology, for people to find diverse points of views. So, on a given issue, instead of presenting to users only whatever is most likely to please them, we should present them a spectrum of views.

And we definitively need to stop needlessly characterizing individuals. It’s not just the name calling per se that should stop, but the clustering of individuals into the acceptable and the unacceptable. Identity politics should go.

It is hard to take this path, it is much easier to continue with virtual signaling and bubbles, but I think that most of us don’t feel that we really belong in these closed groups in the first place. Most of us have nuanced point of views. Most of us don’t like to divide the world in two. Most of us are open and undecided. Most of us are interested in talking with people who have different points of views. We cherish it.

It takes social engineering to make the worse of us come out. The Nazi regime had to forcefully close down Jewish stores because Germans, even under Nazi rule, liked to do business with Jews. If we take force out of the equation, people do get along, even when they disagree.

Final quotes:

  • “He who knows only his own side of the case knows little of that. His reasons may be good, and no one may have been able to refute them. But if he is equally unable to refute the reasons on the opposite side, if he does not so much as know what they are, he has no ground for preferring either opinion… Nor is it enough that he should hear the opinions of adversaries from his own teachers, presented as they state them, and accompanied by what they offer as refutations. He must be able to hear them from persons who actually believe them…he must know them in their most plausible and persuasive form.” (John Stuart Mill)

  • Confronting, hearing, and countering offensive speech we disagree with is a skill. And one that should be considered a core requirement at any school worth its salt.

    In that regard, recent incidents suggest that colleges are fundamentally failing their students in imparting these skills. In just the past few weeks, from one campus to another and another and another, liberal students have silenced conservative speakers with violence, outrage, and threats. This collection of heckler’s vetoes is the farthest thing from a victory for the progressive causes these students champion.

    These incidents have not shut down a single bad idea. To the contrary, they’ve given their opponents’ ideas credence by adding the power of martyrdom. When you choose censorship as your substantive argument, you lose the debate.

    (Lee Rowland, Senior Staff Attorney, ACLU)

Source link

Starting a Rmarkdown Blog with Bookdown + Hugo + Github

By | ai, bigdata, machinelearning

(This article was first published on R – Tales of R, and kindly contributed to R-bloggers)

Finally, -after 24h of failed attempts-, I could get my favourite Hugo theme up and running with R Studio and Blogdown.

All the steps I followed are detailed in my new Blogdown entry, which is also a GitHub repo.

After exploring some alternatives, like Shirin’s (with Jekyll), and Amber Thomas advice (which involved Git skills beyond my basic abilities), I was able to install Yihui’s hugo-lithium-theme in a new repository.

However, I wanted to explore other blog templates, hosted in GiHub, like:

The three first themes are currently linked in the blogdown documentation as being most simple and easy to set up for unexperienced blog programmers, but I hope the list will grow in the following months. For those who are willing to experiment, the complete list is here.

Finally I chose the hugo-tranquilpeak theme, by Thibaud Leprêtre, for which I mostly followed Tyler Clavelle’s entry on the topic. This approach turned out to be easy and good, given some conditions:

  • Contrary to Yihui Xie’s advice, I chose github.io to host my blog, instead of Netlify (I love my desktop integration with GitHub, so it was interesting for me not to move to another service for my static content).
  • In my machine, I installed Blogdown & Hugo using R studio (v 1.1.336).
  • In GiHub, it was easier for me to host the blog directly in my main github pages repository (always named [USERNAME].github.io), in the master branch, following Tyler’s tutorial.
  • I have basic knowledge of html, css and javascript, so I didn’t mind to tinker around with the theme.
  • My custom styles didn’t involve theme rebuilding. At this moment they’re simple cosmetic tricks.

The steps I followed were:

Git & GitHub repos

  • Setting a GitHub repo with the name [USERNAME].github.io (in my case aurora-mareviv.github.io). See this and this.
  • Create a git repo in your machine:
    • Create manually a new directory called [USERNAME].github.io.
    • Run in the terminal (Windows users have to install git first):
    cd /Git/[USERNAME].github.io # your path may be different
    
    git init # initiates repo in the directory
    git remote add origin https://github.com/[USERNAME]/[USERNAME].github.io # connects git local repo to remote Github repo
    
    git pull origin master # in case you have LICENSE and Readme.md files in the GitHub repo, they're downloaded
  • For now, your repo is ready. We will now focus in creating & customising our Blogdown.

RStudio and blogdown

  • We will open RStudio (v 1.1.336, development version as of today).
    • First, you may need to install Blogdown in R:
    install.packages("blogdown")
    • In RStudio, select the Menu > File > New Project following the lower half of these instructions. The wizard for setting up a Hugo Blogdown project may not be yet available in your RStudio version (not for much longer probably).

Creating new Project

Creating new Project

Selecting Hugo Blogdown format

Selecting Hugo Blogdown format

Selecting Hugo Blogdown theme

Selecting Hugo Blogdown theme

A config.toml file appears

 

config.toml file appears


Customising paths and styles

Before we build and serve our site, we need to tweak a couple of things in advance, if we want to smoothly deploy our blog into GitHub pages.

Modify config.toml file

To integrate with GiHub pages, there are the essential modifications at the top of our config.toml file:

  • We need to set up the base URL to the “root” of the web page (https://[USERNAME].github.io/ in this case)
  • By default, the web page is published in the “public” folder. We need it to be published in the root of the repository, to match the structure of the GitHub masterbranch:
baseurl = "/"
publishDir = "."
  • Other useful global settings:
ignoreFiles = ["\.Rmd$", "\.Rmarkdown$", "_files$", "_cache$"]
enableEmoji = true

Images & styling paths

We can revisit the config.toml file to make changes to the default settings.

The logo that appears in the corner must be in the root folder. To modify it in the config.toml:

picture = "logo.png" # the path to the logo

The cover (background) image must be located in /themes/hugo-tranquilpeak-theme/static/images . To modify it in the config.toml:

coverImage = "myimage.jpg"

We want some custom css and js. We need to locate it in /static/css and in /static/jsrespectively.

# Custom CSS. Put here your custom CSS files. They are loaded after the theme CSS;
# they have to be referred from static root. Example
customCSS = ["css/my-style.css"]

# Custom JS. Put here your custom JS files. They are loaded after the theme JS;
# they have to be referred from static root. Example
customJS = ["js/myjs.js"]

Custom css

We can add arbitrary classes to our css file (see above).

Since I started writing in Bootstrap, I miss it a lot. Since this theme already has bootstrap classes, I brought some others I didn’t find in the theme (they’re available for .md files, but currently not for .Rmd)

Here is my custom css file to date:

/* @import url('https://maxcdn.bootstrapcdn.com/bootswatch/3.3.7/cosmo/bootstrap.min.css'); may conflict with default theme*/
@import url('https://fonts.googleapis.com/icon?family=Material+Icons'); /*google icons*/
@import url('https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css'); /*font awesome icons*/

.input-lg {
  font-size: 30px;
}
.input {
  font-size: 20px;
}
.font-sm {
    font-size: 0.7em;
}
.texttt {
  font-family: monospace;
}
.alert {
padding: 15px;
margin-bottom: 20px;
border: 1px solid transparent;
border-radius: 4px;
}
.alert-success {
color: #3c763d;
background-color: #dff0d8;
border-color: #d6e9c6;
}
.alert-danger,
.alert-error {
  color: #b94a48;
  background-color: #f2dede;
  border-color: #eed3d7;
}
.alert-info {
  color: #3a87ad;
  background-color: #d9edf7;
  border-color: #bce8f1;
}
.alert-gray {
  background-color: #f2f3f2;
  border-color: #f2f3f2;
}

/*style for printing*/
@media print {
    .noPrint {
       display:none;
   }
    }

/*link formatting*/
a:link {
    color: #478ca7;
    text-decoration: none;
} 
a:visited {
    color: #478ca7;
    text-decoration: none;
}
a:hover {
    color: #82b5c9;
    text-decoration: none;
}

Also, we have font-awesome icons!

Site build with blogdown

Once we have ready our theme, we can add some content, modifying or deleting the various examples we will find in /content/post .

We need to make use of Blogdown & Hugo to compile our .Rmd file and create our html post:

blogdown::build_site()
blogdown::serve_site()

In the viewer, at the right side of the IDE you can examine the resulting html and see if something didn’t go OK.

Deploying the site

Updating the local git repository

This can be done with simple git commands:

cd /Git/[USERNAME].github.io # your path to the repo may be different
git add . # indexes all files that wil be added to the local repo
git commit -m "Starting my Hugo blog" # adds all files to the local repo, with a commit message

Pushing to GitHub

git push origin master # we push the changes from the local git repo to the remote repo (GitHub repo)

Just go to the page https://[USERNAME].github.io and enjoy your blog!


R code

Works just the same as in Rmarkdown. R code is compiled into an html and published as static web content in few steps. Welcome to the era of reproducible blogging!

The figure 1 uses the ggplot2 library:

library(ggplot2)
ggplot(diamonds, aes(x=carat, y=price, color=clarity)) + geom_point()

diamonds plot with ggplot2.

Figure 1: diamonds plot with ggplot2.

Rmd source code

You can download it from here

I, for one, welcome the new era of reproducible blogging!

hexbins

var vglnk = { key: ‘949efb41171ac6ec1bf7f206d57e90b8’ };

(function(d, t) {
var s = d.createElement(t); s.type = ‘text/javascript’; s.async = true;
s.src = “http://cdn.viglink.com/api/vglnk.js”;
var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r);
}(document, ‘script’));

To leave a comment for the author, please follow the link and comment on their blog: R – Tales of R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…




Source link

On Melissa O’Neill’s PCG random number generator

By | machinelearning

Computers often need random numbers. Most times, random numbers are not actually random… in the sense that they are the output of a mathematical function that is purely deterministic. And it is not even entirely clear what “really random” would mean. It is not clear that we live in a randomized universe… it seems more likely that our universe is deterministic but that our limited access to information makes randomness a useful concept. Still, very smart people have spent a lot of time defining what random means, and it turns out that mathematical functions can be said to produce “random” outputs in a reasonable sense.

In any case, many programmers have now adopted a new random number generator called PCG and designed by professor Melissa O’Neill from Harvey Mudd college.

What O’Neill did is quite reasonable. She asked herself whether we could produce better random number generators, she wrote a paper and published code. The result was quickly adopted by engineers worldwide.

She also submitted her paper for consideration in what I expect to be a good, well-managed journal.

Her manuscript became lengthy in time and maybe exceeded some people’s style sensibilities, she justifies herself in this manner:

I prefer to write papers that are broadly accessible. I’d rather write a paper that can be enjoyed by people who are interested in the topic than one that can only be understood by a tiny number of experts. I don’t agree with the philosophy that the more impenetrable the paper, the better the work must be! Describing desirable qualities in detail seemed to be necessary for the paper to make sense to anyone not deeply entrenched in the field. Doing so also seemed necessary for anyone in the field who only cared about a subset of the qualities I considered desirable—I would need to convince them that the qualities they usually didn’t care about were actually valuable too.

As I pointed out, she had a real-world impact:

While attending PLDI and TRANSACT in June of 2015, I got one of the first clues that my work had had real impact. I can’t remember the talk or the paper, but someone was saying how their results had been much improved from prior work by switching to a new, better, random number generator. At the end I asked which one. It was PCG.

Meanwhile, at least one influential researcher (whose work I respect) had harsh words publicly for her result:

I’d be extremely careful before taking from granted any claim made about PCG generators. Wait at least until the paper is published, if it ever happens. (…) Several claims on the PCG site are false, or weasel words (…) You should also be precise about which generator you have in mind—the PCG general definition covers basically any generator ever conceived. (…) Note that (smartly enough) the PCG author avoids carefully to compare with xorshift128+ or xorshift1024*.

Her paper was not accepted. She put it in those terms:

What was more interesting were the ways in which the journal reviewing differed from the paper’s Internet reception. Some reviewers found my style of exposition enjoyable, but others found it too leisurely and inappropriately relaxed. (…) An additional difference from the Internet reaction was that some of the TOMS reviewers felt that what I’d done just wasn’t very mathematically sophisticated and was thus trivial/uninteresting. (…) Finally, few Internet readers had complained that the paper was too long but, as I mentioned earlier, the length of the paper was a theme throughout all the reviewing. (…) Regarding that latter point, I am, on reflection, unrepentant. I wanted to write something that was broadly accessible, and based on other feedback I succeeded.

I emailed O’Neill questions a couple of times, but she never got back to me.

So we end up with this reasonably popular random number generator, based on a paper that you can find online. As far as I can tell, the work has not been described and reviewed in a standard peer-reviewed manner. Note that though she is the inventor, nothing precludes us to study her work and write papers about it.

John D. Cook has been doing some work in this direction on his blog, but I think that if we believe in the importance of formal scientific publications, then we ought to cover PCG in such publications, if only to say why it is not worth consideration.

What is at stake here is whether we care for formal scientific publications. I suspect that Cook and O’Neill openly do not care. The reason you would care, fifty years ago, is that without the formal publication, you would have a hard time distributing your work. That incentive is gone. As O’Neill points out, her work is receiving citations, and she has significant real-world impact.

At least in software, there has long been a relatively close relationship between engineering and academic publications. These do not live in entirely separate worlds. I do not have a good sense as to whether they are moving apart. I think that they might be. Aside from hot topics like deep learning, I wonder whether the academic publications are growing ever less relevant to practice.

Source link