The Riddler of today has a secretary problem, where one measures sequentially N random variables until one deems the current variable to be the largest of the whole sample. The classical secretary problem has a counter-intuitive solution where one first measures N/e random variables without taking any decision and then and only then picks the first next outcome larger than the largest in the first group. The added information in the current riddle is that the distribution of those iid random variables is set to be uniform on {1,…,M}, which begs for a modification in the algorithm. As for instance when observing M on the current draw.
The approach I devised is clearly suboptimal, as I decided to pick the currently observed value if the (conditional) probability it is the largest is larger than the probability subsequent draws. This translates into the following R code:
M=100 #maximum value N=10 #total number of draws hyprob=function(m){ # m is sequence of draws so far n=length(m);mmax=max(m) if ((m[n]which produces a winning rate of around 62% when N=10 and M=100, hence much better than the expected performances of the secretary algorithm, with a winning frequency of 1/e.
Filed under: Kids, R Tagged: mathematical puzzle, R, secretary problem, stopping rule, The Riddler
To leave a comment for the author, please follow the link and comment on their blog: R – Xi'an's Og.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...
Open Source Mail Delivery, Superhuman AI, Probabilistic Graphical Models, and Golden Ages
- Postal — A fully featured open source mail delivery platform for incoming & outgoing e-mail, like SendGrid but open source. I enjoyed this comment on Hacker News, where the commenter talks about turning a $1K/mo mail bill into $4/mo by running their own mail infrastructure. (Downside: you would need to get yourself familiar with SMTP, postfix, SPF/DKIM, mx-validation, blacklists, etc. And by “familiar,” I mean “learn it to the core.”)
- The Myth of a Superhuman AI (Kevin Kelly) — he makes a good argument that buried in this scenario of a takeover of superhuman artificial intelligence are five assumptions that, when examined closely, are not based on any evidence. These claims might be true in the future, but there is no evidence to date to support them.
- Probabilistic Graphical Models — CS228 course notes turned into a concise introductory course […]. This course starts by introducing probabilistic graphical models from the very basics and concludes by explaining from first principles the variational auto-encoder, an important probabilistic model that is also one of the most influential recent results in deep learning.
- Watch It While It Lasts: Our Golden Age of Television — The Parisian golden age [of art] emerged out of the collapse of a system that penalized artistic innovation. For most of the 19th century, the Académie des Beaux-Arts, a state-sanctioned institution, dominated the production and consumption of French art. A jury of academicians decided which paintings were exhibited at the Salon, the main forum for collectors to view new work. The academy set strict rules on artistic expression, and preferred idealized scenes from classical mythology to anything resembling contemporary life. For the most part, the art that resulted was staid and predictable, painted by skilled but anonymous technicians. It sure doesn’t feel like we’re in a golden age of technology innovation, and I sure recognize a lot of the VC horde mentality in the Académie description.
Continue reading Four short links: 27 April 2017.
Do you remember that scene in Moneyball when the scouts are all talking about the intangibles regarding a potential player and the Brad Pitt character appears to be having intestinal cramps? That’s what a lot of these meetings are like. Of course nobody is spittin’ tobacco juice into plastic cups. But you get the picture. A Real Estate Director thinks that Penn Square Mall has the “potential” to flourish given the changes that are happening next door at J. Jill. Somebody in Finance is completely against the mall … “Oklahoma City is not an aspirational market” … even though the Finance Director has never been to Oklahoma. An EVP responsible for stores doesn’t want to get hit over the head by the Board of Directors for opening yet another store that never meets the sales projections authored by the Real Estate Team.
The big topic in 2017 is store closures. Y’all recall my omnichannel arguments of 2010 – 2014 … arguments suggesting that store closures were the logical outcome of a strategy of creating channel sameness in an effort to compete against Amazon. Turns out the customer doesn’t “demand” a one-brand approach to channels. Turns out the customer demands more from Amazon!
There’s no better place to apply your forecasting chops than in forecasting what happens to a trade area when a store closes. It’s work that is nearly impossible to get right. Each market behaves just a bit different than a comparable market. If you are off by 10% or 15%, you might close a store that is actually profitable.
Next week, we’ll talk about the approach I use. The approach is not fundamentally different than the approach smart catalogers used 5-10 years ago to dramatically scale back unprofitable pages to online buyers. Most important – the approach is FUN! You get to see outcomes that are uniquely different than you’d expect. And you’ll be armed with the ammo to be like Brad Pitt heading into a meeting with scouts on Moneyball!!
“Forecasting outcomes are the sum of all analytics and marketing knowledge possessed by your company.“
Juste quelques lignes de code, pour visualiser les résultats du premier tour en France. L’idée est de faire une carte assez minimaliste, avec des cercles centrés sur les centroïdes des départements. On commence par récupérer les données pour le fond de carte, un fichier 7z sur le site de l’ign.
download.file("https://wxs-telechargement.ign.fr/oikr5jryiph0iwhw36053ptm/telechargement/inspire/GEOFLA_THEME-DEPARTEMENTS_2016$GEOFLA_2-2_DEPARTEMENT_SHP_LAMB93_FXX_2016-06-28/file/GEOFLA_2-2_DEPARTEMENT_SHP_LAMB93_FXX_2016-06-28.7z",destfile = "dpt.7z")
On a dans ce fichier des informations sur les centroïdes
library(maptools)
library(maps)
departements<-readShapeSpatial("DEPARTEMENT.SHP")
plot(departements)
points([email protected]$X_CENTROID,[email protected]$Y_CENTROID,pch=19,col="red")
Comme ça ne marche pas très bien, on va le refaire à la main, par exemple pour l’Ille-et-Vilaine,
pos=which([email protected][,"CODE_DEPT"]==35)
Poly_35=departements[pos,]
plot(departements)
plot(Poly_35,col="yellow",add=TRUE)
[email protected][pos,c("X_CENTROID","Y_CENTROID")]
points([email protected][pos,c("X_CENTROID","Y_CENTROID")],pch=19,col="red")
library(rgeos)
(ctd=gCentroid(Poly_35,byid=TRUE))
points(ctd,pch=19,col="blue")
Comme ça marche mieux, on va utiliser ces centroïdes.
ctd=as.data.frame(gCentroid(departements,byid=TRUE))
plot(departements)
points(ctd,pch=19,col="blue")
Maintenant, il nous faut les résultats des élections, par département. On peut aller scraper le site du ministère d’ intérieur. On a une page par département, donc c’est facile de parcourir. Par contre, dans l’adresse url, il faut le code région. Comme je suis un peu fainéant, au lieu de faire une table de correspondance, on teste tout, jusqu’à ce que ça marche. L’idée est d’aller cherche le nombre de voix obtenues par un des candidats.
candidat="M. Emmanuel MACRON"
library(XML)
voix=function(no=35){
testurl=FALSE
i=1
vect_reg=c("084","027","053","024","094","044","032","028","075","076","052",
"093","011","001","002","003","004")
region=NA
while((testurl==FALSE)&(i<=20)){
reg=vect_reg[i]
nodpt=paste("0",no,sep="")
# if(!is.na(as.numeric(no))){if(as.numeric(no)<10) nodpt=paste("00",no,sep="")}
url=paste("http://elections.interieur.gouv.fr/presidentielle-2017/",reg,"/",nodpt,"/index.html",sep="")
test=try(htmlParse(url),silent =TRUE)
if(!inherits(test, "try-error")){testurl=TRUE
region=reg}
i=i+1
}
tabs tab=tabs[[2]]
nb=tab
Empty section. Edit page to add content here.
==candidat,"Voix"]
a<-unlist(strsplit(as.character(nb)," "))
as.numeric(paste(a,collapse=""))}
On peut alors tester
> voix(35)
[1] 84648
Comme ça semble marcher, on le fait pour tous les départements
[email protected]$CODE_DEPT
nbvoix=Vectorize(voix)(liste_dpt)
On peut alors visualiser sur une carte.
plot(departements,border=NA)
points(ctd,pch=19,col=rgb(1,0,0,.25),cex=nbvoix/50000)
Et on peut aussi tenter pour une autre candidate,
candidat="Mme Marine LE PEN"
et on obtient la carte suivante
plot(departements,border=NA)
points(ctd,pch=19,col=rgb(0,0,1,.25),cex=nbvoix/50000)
Source link
From the Altanta Fed: GDPNow
The final GDPNow model forecast for real GDP growth (seasonally adjusted annual rate) in the first quarter of 2017 is 0.2 percent on April 27, down from 0.5 percent on April 18. The forecast of first-quarter real consumer spending growth fell from 0.3 percent to 0.1 percent after yesterday’s annual retail trade revision by the U.S. Census Bureau. The forecast of the contribution of inventory investment to first-quarter growth declined from -0.76 percentage points to -1.11 percentage points after this morning’s advance reports on durable manufacturing and wholesale and retail inventories from the Census Bureau. The forecast of real equipment investment growth increased from 5.5 percent to 6.6 percent after the durable manufacturing report and the incorporation of previously published data on light truck sales to businesses from the U.S. Bureau of Economic Analysis.
emphasis added
From the NY Fed Nowcasting Report
The FRBNY Staff Nowcast stands at 2.7% for 2017:Q1 and 2.1% for 2017:Q2.
Mixed news from this week’s data releases left the nowcast for Q1 and Q2 essentially unchanged.
Friday:
• At 8:30 AM ET, Gross Domestic Product, 1st quarter 2017 (Advance estimate). The consensus is that real GDP increased 1.1% annualized in Q1.
• At 9:45 AM, Chicago Purchasing Managers Index for April. The consensus is for a reading of 56.5, down from 57.7 in March.
• At 10:00 AM, <University of Michigan’s Consumer sentiment index (final for April). The consensus is for a reading of 98.0, unchanged from the preliminary reading 98.0.
Source link
Twitter reminded me that there’s #NTTS2017 going on, Eurostat’s biennial scientific conference on New Techniques and Technologies for Statistics (NTTS).
The algorithms (understood in a broader sense as ‘a set of rules that precisely defines a sequence of operations ->‘) used in collecting, analyzing and disseminating data will be changing, manual work will / must be replaced by automation, robots. But the core role of being a trusted source of data-based and (in all operations) transparently produced information serving professional decision making will remain.
– are known,
– are noted for their veracity,
– are consulted
and with all this can play their role.
Filed under: 09 Stat.Office / Organization Tagged: algorithm, automation, official statistics
Source link
Can we recover an image by learning a deep regression map from pixels (x,y) to colors (r,g,b)?
Yes, we can.
The idea is to use a deep learning (DL) solution to do a deep regression to learn a mapping between pixel locations and RGB colors, with the goal of generating an image one pixel at a time. This means that if the dimensions of the target image are X-by-Y, then it is necessary to run the network X*Y. If the image is 100-by-100, then 10,000 training iterations are executed.
Keras with TensorFlow as a backend creates a working model for this project. Pythonistas rejoice: Keras functional API is a great abstraction for Theanos and Tensorflow (Keras could support new DL frameworks later) to help define complex models, such as multi-output models, directed acyclic graphs, or models with shared layers.
The Sequential model is probably a better choice to implement such a network, but it helps to start with something surprisingly simple.
Using the Model class:
- A layer instance is callable (on a tensor), and it returns a tensor.
- Input tensor(s) and output tensor(s) can then be used to define a Model.
- Such a model can be trained just like Keras Sequential models.
However, first, we need to be able to run Tensorflow on your computer.
Install Docker and run Tensorflow Notebook image on your machine
The best way to run the TensorFlow is to use a Docker container. There’s full documentation on installing Docker at docker.com, but in a few words, the steps are:
- Go to docs.docker.com in your browser.
- Step one of the instructions sends you to download Docker.
- Run that downloaded file to install Docker.
- At the end of the install process a whale in the top status bar indicates that Docker is running, and accessible from a terminal.
- Click the whale to get Preferences and other options.
- Open a command-line terminal, and run some Docker commands to verify that Docker is working as expected. Some useful commands to try are docker version to check that you have the latest release installed.
- Once Docker is installed, you can download the image which allows you to run Tensorflow on your computer.
- In a terminal run: docker pull 3blades/tensorflow-notebook
- MacOS & Linux: Run the deep learning image on your system:
docker run -it -p 8888:8888 -p 6006:6006 -v /$(pwd):/notebooks 3blades/tensorflow-notebook
- Windows: Run the deep learning image on your system:
docker run -it -p 8888:8888 -p 6006:6006 -v C:/your/folder:/notebooks 3blades/tensorflow-notebook
- Once you have completed these steps, you can check the installation by starting your web browser and introducing this URL:
http://localhost:8888
.
We are now ready to paint the Mona Lisa using Deep Regression from pixels to RGB.
Let’s get started!
The infamous Mona Lisa painting is are target image:
import matplotlib.image as mpimg
import matplotlib.pylab as plt
import numpy as np
%matplotlib inline
im = mpimg.imread("data/monalisa.jpg")
plt.imshow(im)
plt.show()
im.shape
Our training dataset will be composed of pixels locations and input and pixel values as output:
X_train = []
Y_train = []
for i in range(im.shape[0]):
for j in range(im.shape[1]):
X_train.append([float(i),float(j)])
Y_train.append(im[i][j])
X_train = np.array(X_train)
Y_train = np.array(Y_train)
print 'Samples:', X_train.shape[0]
print '(x,y):', X_train[0],'n', '(r,g,b):',Y_train[0]
Samples: 30447 (x,y): [ 0. 0.] (r,g,b): [ 85 105 116]
Let’s now build our sequential model
import keras
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.optimizers import Adam, RMSprop, Nadam
# Model architecture
model = Sequential()
model.add(Dense(500, input_dim=2, init='uniform'))
model.add(Activation('relu'))
model.add(Dense(500, init='uniform'))
model.add(Activation('relu'))
model.add(Dense(500, init='uniform'))
model.add(Activation('relu'))
model.add(Dense(500, init='uniform'))
model.add(Activation('relu'))
model.add(Dense(500, init='uniform'))
model.add(Activation('relu'))
model.add(Dense(3, init='uniform'))
model.add(Activation('linear'))
model.summary()
# Compile model
model.compile(loss='mean_squared_error',
optimizer=Nadam(),
metrics=['accuracy'])
# Why use NAdam Optimizer?
# Much like Adam is essentially RMSprop with momentum, Nadam is Adam RMSprop with Nesterov momentum.
Our output:
____________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ==================================================================================================== dense_1 (Dense) (None, 500) 1500 dense_input_1[0][0] ____________________________________________________________________________________________________ activation_1 (Activation) (None, 500) 0 dense_1[0][0] ____________________________________________________________________________________________________ dense_2 (Dense) (None, 500) 250500 activation_1[0][0] ____________________________________________________________________________________________________ activation_2 (Activation) (None, 500) 0 dense_2[0][0] ____________________________________________________________________________________________________ dense_3 (Dense) (None, 500) 250500 activation_2[0][0] ____________________________________________________________________________________________________ activation_3 (Activation) (None, 500) 0 dense_3[0][0] ____________________________________________________________________________________________________ dense_4 (Dense) (None, 500) 250500 activation_3[0][0] ____________________________________________________________________________________________________ activation_4 (Activation) (None, 500) 0 dense_4[0][0] ____________________________________________________________________________________________________ dense_5 (Dense) (None, 500) 250500 activation_4[0][0] ____________________________________________________________________________________________________ activation_5 (Activation) (None, 500) 0 dense_5[0][0] ____________________________________________________________________________________________________ dense_6 (Dense) (None, 3) 1503 activation_5[0][0] ____________________________________________________________________________________________________ activation_6 (Activation) (None, 3) 0 dense_6[0][0] ==================================================================================================== Total params: 1005003 ____________________________________________________________________________________________________
Let’s now train our model with 1000 epochs and a 500 batch size.
# use this cell to find the best model architecture
history = model.fit(X_train, Y_train, nb_epoch=1000, shuffle=True, verbose=1, batch_size=500)
Y = model.predict(X_train, batch_size=10000)
k = 0
im_out = im[:]
for i in range(im.shape[0]):
for j in range(im.shape[1]):
im_out[i,j]= Y[k]
k += 1
print "Mona Lisa by DL"
plt.imshow(im_out)
plt.show()
Give it time to run, on a 4GB laptop, it will take you up to 3 hours to get a result.
Epoch 997/1000 30447/30447 [==============================] - 12s - loss: 231.1333 - acc: 0.9138 Epoch 998/1000 30447/30447 [==============================] - 12s - loss: 213.8869 - acc: 0.9170 Epoch 999/1000 30447/30447 [==============================] - 12s - loss: 215.9076 - acc: 0.9130 Epoch 1000/1000 30447/30447 [==============================] - 12s - loss: 217.6785 - acc: 0.9154
And here is our result by painting the Mona Lisa Keras using Tensorflow as backend.
plt.imshow(im_out)
plt.show()
Let’s now plot our model accuracy
# summarize history for accuracy
plt.plot(history.history['acc'], 'b')
plt.title('Model Accuracy')
plt.xlabel('epoch')
plt.show()
And what about our model loss?
# summarize history for loss
plt.plot(history.history['loss'], 'r')
plt.title('Model Loss')
plt.xlabel('epoch')
plt.show()
The post Learning to Paint The Mona Lisa With Neural Networks appeared first on 3Blades.
Source link
(This article was originally published at R – Xi’an’s Og, and syndicated at StatsBlogs.)
The Riddler of today has a secretary problem, where one measures sequentially N random variables until one deems the current variable to be the largest of the whole sample. The classical secretary problem has a counter-intuitive solution where one first measures N/e random variables without taking any decision and then and only then picks the first next outcome larger than the largest in the first group. The added information in the current riddle is that the distribution of those iid random variables is set to be uniform on {1,…,M}, which begs for a modification in the algorithm. As for instance when observing M on the current draw.
The approach I devised is clearly suboptimal, as I decided to pick the currently observed value if the (conditional) probability it is the largest is larger than the probability subsequent draws. This translates into the following R code:
M=100 #maximum value N=10 #total number of draws hyprob=function(m){ # m is sequence of draws so far n=length(m);mmax=max(m) if ((m[n]which produces a winning rate of around 62% when N=10 and M=100, hence much better than the expected performances of the secretary algorithm, with a winning frequency of 1/e.
Filed under: Kids, R Tagged: mathematical puzzle, R, secretary problem, stopping rule, The Riddler
Please comment on the article here: R – Xi'an's Og
The post a secretary problem with maximum ability appeared first on All About Statistics.