Lawler: Early Read on Existing Home Sales in July

By | ai, bigdata, machinelearning


From housing economist Tom Lawler:

Based on publicly-available local realtor/MLS reports from across the country released through today, I project that US existing home sales as estimated by the National Association of Realtors ran at a seasonally adjusted annual rate of 5.38 million in July, down 2.5% from June’s preliminary pace and up 0.9% from last July’s seasonally adjusted pace.

On the inventory front, local realtor/MLS data suggest that the YOY decline in active listings last month was slightly less than in June, and I forecast that the NAR’s existing home inventory estimate for July will be 1.98 million, up 1.0% from June’s preliminary estimate and down 6.2% from last July’s estimate.

Finally, local realtor/MLS data suggest that the NAR’s estimate of the median exiting home sales price last month was up 6.1% from a year earlier.

CR Note: The NAR is scheduled to release existing home sales for July on Thursday, August 24th. The early consensus forecast is for sales of 5.56 million SAAR (take the under!).


Source link

New Features for the BigML Predict App for Zapier

By | machinelearning

Thanks to the feedback provided by early adopters, our BigML app for Zapier has been acquiring new useful features, including improved support for additional ML algorithms and dynamic resource selection. Support for new ML algorithms The first version of our BigML Predict app only included support for model, ensembles, and logistic regressions. Now, you can […]
Source link

Distributed TensorFlow and the hidden layers of engineering work

By | machinelearning, TensorFlow

With all the buzz around Machine Learning as of late, it’s no surprise that companies are starting to experiment with their own ML models, and a lot of them are choosing TensorFlow. Because TensorFlow is open source, you can run it locally to quickly create prototypes and deploy fail-fast experiments that help you get your proof-of-concept working at a small scale. Then, when you’re ready, you can take TensorFlow, your data, and the same code and push it up into Google Cloud to take advantage of multiple CPUs, GPUs or soon even some TPUs.

When you get to the point where you’re ready to take your ML work to the next level, you will have to make some choices about how to set up your infrastructure. In general, many of these choices will impact how much time you spend on operational engineering work versus ML engineering work. To help, we’ve published a pair of solution tutorials to show you how you can create and run a distributed TensorFlow cluster on Google Compute Engine and run the same code to train the same model on Google Cloud Machine Learning Engine. The solutions use MNIST as the model, which isn’t necessarily the most exciting example to work with, but does allow us to emphasize the engineering aspects of the solutions.

We’ve already talked about the open-source nature of TensorFlow, allowing you to run it on your laptop, on a server in your private data center, or even a Raspberry PI. TensorFlow can also run in a distributed cluster, allowing you divide your training workloads across multiple machines, which can save you a significant amount of time waiting for results. The first solution shows you how to set up a group of Compute Engine instances running TensorFlow, as in Figure 1, by creating a reusable custom image, and executing an initiation script with Cloud Shell. There are quite a few steps involved in creating the environment and getting it to function properly. Even though they aren’t complex steps, they are operational engineering steps, and will take time away from your actual ML development.

Figure 1. A distributed TensorFlow cluster on Google Compute Engine.

The second solution uses the same code with Cloud ML Engine, and with one command you’ll automatically provision the compute resources needed to train your model. This solution also delves into some of the general details of neural networks and distributed training. It also gives you a chance to try out TensorBoard to visualize your training and resulting model as seen in Figure 2. The time you save provisioning compute resources can be spent analyzing your ML work more deeply.

Figure 2. Visualizing the training result with TensorBoard.

Regardless of how you train your model, the whole point is you want to use it to make predictions. Traditionally, this is where the most engineering work has to be done. In the case where you want to build a web-service to run your predictions, at a minimum, you’ll have to provision, configure and secure some web servers, load balancers, monitoring agents, and create some kind of versioning process. In both of these solutions, you’ll use the Cloud ML Engine prediction service to effectively offload all of those operational tasks to host your model in a reliable, scalable, and secure environment. Once you set up your model for predictions, you’ll quickly spin up a Cloud Datalab instance and download a simple notebook to execute and test the predictions. In this notebook you’ll draw a number with your mouse or trackpad, as in Figure 3, which will get converted to the appropriate image matrix format that matches the MNIST data format. The notebook will send your image to your new prediction API and tell you which number it detected as in Figure 4.

Figure 3.
Figure 4.

This brings up one last and critical point about the engineering efforts required to host your model for predictions, which is not deeply expanded upon in these solutions, but is something that Cloud ML Engine and Cloud Dataflow can easily address for you. When working with pre-built machine learning models that work on standard datasets, it can be easy to lose track of the fact that machine learning model training, deployment, and prediction are often at the end of a series of data pipelines. In the real world, it’s unlikely that your datasets will be pristine and collected specifically for the purpose of learning from the data.

Rather, you’ll usually have to preprocess the data before you can feed it into your TensorFlow model. Common preprocessing steps include de-duplication, scaling/transforming data values, creating vocabularies, and handling unusual situations. The TensorFlow model is then trained on the clean, processed data.

At prediction time, it is the same raw data that will be received from the client. Yet, your TensorFlow model has been trained with de-duplicated, transformed, and cleaned-up data with specific vocabulary mappings. Because your prediction infrastructure might not be written in Python, there is a significant amount of engineering work necessary to build libraries to carry out these tasks with exacting consistency in whatever language or system you use. Many times there is too much inconsistency in how the preprocessing is done before training versus how it’s done before prediction. Even the smallest amount of inconsistency can cause your predictions to behave poorly or unexpectedly. By using Cloud Dataflow to do the preprocessing and Cloud ML Engine to carry out the predictions, it’s possible to minimize or completely avoid this additional engineering work. This is because Cloud Dataflow can apply the same preprocessing transformation code to both historical data during training and real-time data during prediction.

Summary 

Developing new machine learning models is getting easier as TensorFlow adds new APIs and abstraction layers and allows you to run it wherever you want. Cloud Machine Learning Engine is powered by TensorFlow so you aren’t locked into a proprietary managed service, and we’ll even show you how to build your own TensorFlow cluster on Compute Engine if you want. But we think that you might want to spend less time on the engineering work needed to set up your training and prediction environments, and more time tuning, analyzing and perfecting your model. With Cloud Machine Learning Engine, Cloud Datalab, and Cloud Dataflow you can optimize your time. Offload the operational engineering work to us, quickly and easily analyze and visualize your data, and build preprocessing pipelines that are reusable for training and prediction.



Source link

How to Use the Fitted Mixed Model to Calculate Predicted Values

By | ai, bigdata, machinelearning

Share

In this video I will answer a question from a recent webinar, Random Intercept and Random Slope Models.

We are answering questions here because we had over 500 people live on the webinar so we didn’t have time to get through all the questions.

If you missed the webinar live, this and the other questions in this series may make more sense if you watch that first. It was part of our free webinar series, The Craft of Statistical Analysis, and you can sign up to get the free recording, handout, and data set at this link:

http://TheCraftofStatisticalAnalysis.com/random-intercept-and-random-slope-models

Interested in learning more? Check out our free webinar recording,
Random Intercept and Random Slope Models: An Introduction to Mixed Models.


Source link

It is somewhat paradoxical that good stories tend to be anomalous, given that when it comes to statistical data, we generally want what is typical, not what is surprising. Our resolution of this paradox is . . .

By | ai, bigdata, machinelearning

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

From a blog comment a few years ago regarding an article by Robert Kosara:

As Thomas and I discuss in our paper [When Do Stories Work? Evidence and Illustration in the Social Sciences], it is somewhat paradoxical that good stories tend to be anomalous, given that when it comes to statistical data, we generally want what is typical, not what is surprising. Our resolution of this paradox is that stories should not generally be viewed as direct evidence for learning about the world, but rather they should be considered as tools for probing our understanding. Hence the importance (and attraction) of stories that are anomalous, which make us say, in the famous words attributed to Isaac Asimov, “not ‘Eureka’ but ‘hmm . . . that’s funny . . .’”

The post It is somewhat paradoxical that good stories tend to be anomalous, given that when it comes to statistical data, we generally want what is typical, not what is surprising. Our resolution of this paradox is . . . appeared first on Statistical Modeling, Causal Inference, and Social Science.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science

The post It is somewhat paradoxical that good stories tend to be anomalous, given that when it comes to statistical data, we generally want what is typical, not what is surprising. Our resolution of this paradox is . . . appeared first on All About Statistics.




Source link

It is somewhat paradoxical that good stories tend to be anomalous, given that when it comes to statistical data, we generally want what is typical, not what is surprising. Our resolution of this paradox is . . .

By | machinelearning

From a blog comment a few years ago regarding an article by Robert Kosara:

As Thomas and I discuss in our paper [When Do Stories Work? Evidence and Illustration in the Social Sciences], it is somewhat paradoxical that good stories tend to be anomalous, given that when it comes to statistical data, we generally want what is typical, not what is surprising. Our resolution of this paradox is that stories should not generally be viewed as direct evidence for learning about the world, but rather they should be considered as tools for probing our understanding. Hence the importance (and attraction) of stories that are anomalous, which make us say, in the famous words attributed to Isaac Asimov, “not ‘Eureka’ but ‘hmm . . . that’s funny . . .’”

The post It is somewhat paradoxical that good stories tend to be anomalous, given that when it comes to statistical data, we generally want what is typical, not what is surprising. Our resolution of this paradox is . . . appeared first on Statistical Modeling, Causal Inference, and Social Science.

Source link

Five Challenges of Analyzing Internet of Things (IoT) Data

By | iot


The analysis of Internet of Things (IoT) data is quickly becoming a mainstream activity. I’ve written about the Analytics of Things (AoT) before (some examples here, here, and here). For this blog, I’m going to focus on a few unique challenges that you’ll most likely encounter as you move to take IoT data into the AoT realm.

Challenge 1: The Deceptive Simplicity of IoT Data

With many historical data sources, such as transactional data, it was often quite an effort to gather the source data required for analysis. It was necessary to identify what information was available, how it was formatted, and also to reconcile data from different sources that often contained similar information, but had inconsistencies in how it was provided. Ironically, this is one area where IoT sensor data can seem deceptively simple compared to many other sources.

Most sensors spit out data in a simple format. There is a timestamp, a measure identifier (temperature, pressure, etc.), and then a value. For example, at 4:59 pm the temperature is 95 degrees. The good news is that this makes ingesting raw sensor data fairly straight forward in terms of the coding logic required. So, you can fairly quickly go from a raw feed to a dataset …

Read More on Datafloq

Source link

Enterprise IoT: A Definitive Handbook

By | iot, machinelearning

The fourth edition of the Enterprise IoT book is out now.The most comprehensive guide on understanding Enterprise IoT and how to implement IoT applications using IoT cloud offerings from Microsoft, IBM, Amazon, GE Predix and open source software. For complete details about the book, visit http://enterpriseiotbook.com “Internet of Things is a vision where every object in the world has the potential to connect to the Internet and provide their data so as to derive actionable insights on its own or through other connected objects“ The object can be anything – a vehicle, machinery, airport, city, people, phone or even a shoe. From a connected vehicle solution, you can understand the driver behaviour and vehicle usage patterns, from a connected machines solution you can determine when do machines need servicing, from a connected airport solution you can understand many things like – how much time the passenger needs to wait for check-in and security, from an operating perspective it could help to optimize the passenger movement and ensure the right equipments are available at the right time to ensure quick serviceability and finally say, from a connected footwear solution you can understand how much you have run so far and your app can automatically purchase a new pair of shoes based on the remaining shoe life. As we can see, it’s not just about connectivity, but how to use the connected data in context of your application or for that matter other connected solutions to derive insights which can’t be uncovered before. Today we are seeing data (both structured and unstructured) growing by leaps and bounds available through mediums like blogs, social media, transactional systems etc. With advent of IoT, you will see a large volume of raw data emitting from devices like sensors. Such huge and complex set of data, if not attended to, can go wasted and opportunity lost in terms of building smart environment around us. While focusing on issue of addressing this web of complexity, often understanding the real benefit of IoT is lost and most importantly how to get started on IoT. In this book, our focus will be to provide a clear vision on Internet of Things and everything you should know to get started on applying and building Enterprise IoT applications in any industry. The concepts listed down in the book are applicable across industries. Till date, it’s difficult to find a single perspective of what does an Enterprise IoT stack actually mean and our intent is to provide an applicability guide that can be taken as reference for building any IoT application. In the course of the book, we would describe some of the key components of Internet of Things through our Enterprise IoT stack. We would look at how to incrementally apply IoT transformations to build connected products in various industries. At the end, we would understand the technical strategy and how to build IoT applications using IoT cloud offerings from Microsoft, IBM, Amazon and Predix and even build one using open source technologies. To summarize, as part of the book we would cover the following – • A detail overview of key components of Internet of Things and most comprehensive view of an Enterprise IoT stack. • How to apply IoT in context of real world applications by covering detailed use cases on manufacturing, automotive and home automation. • Understand the technical strategy and how to implement IoT applications using Microsoft, IBM , Amazon and Predix IoT offerings and various open source technologies and map it to our Enterprise IoT Stack. * Includes bonus chapters on Cognitive IoT, Cognitive IoT architecture and Blockchain.

$29.99