A Few Comments on September Existing Home Sales

By | ai, bigdata, machinelearning

Earlier: NAR: “Existing-Home Sales Inch 0.7 Percent Higher in September”

First, as usual, housing economist Tom Lawler’s estimate was much closer to the NAR report than the consensus. So the slight month-to-month increase in reported sales, in September, was no surprise for CR readers.

My view is a sales rate of 5.39 million is solid. In fact, I’d consider any existing home sales rate in the 5 to 5.5 million range solid based on the normal historical turnover of the existing stock. As always, it is important to remember that new home sales are more important for jobs and the economy than existing home sales. Since existing sales are existing stock, the only direct contribution to GDP is the broker’s commission. There is usually some additional spending with an existing home purchase – new furniture, etc. – but overall the economic impact is small compared to a new home sale.

Inventory is still very low and falling year-over-year (down 6.4% year-over-year in September). Inventory has declined year-over-year for 28 consecutive months. I started the year expecting inventory would be increasing year-over-year by the end of 2017. That now seems unlikely.

However this was the lowest year-over-year decline this year, and inventory could bottom this year. Inventory is a key metric to watch. More inventory would probably mean smaller price increases, and less inventory somewhat larger price increases.

The following graph shows existing home sales Not Seasonally Adjusted (NSA).

Existing Home Sales NSAClick on graph for larger image.

Sales NSA in September (465,000, red column) were below sales in September 2016 (486,000, NSA) and sales in September 2015 (471,000).

Sales NSA are now slowing seasonally, and sales NSA will be lower in Q4.

Source link

Qualitative Research in R

By | ai, bigdata, machinelearning

(This article was first published on R Programming – DataScience+, and kindly contributed to R-bloggers)

In the last two posts, I’ve focused purely on statistical topics – one-way ANOVA and dealing with multicollinearity in R. In this post, I’ll deviate from the pure statistical topics and will try to highlight some aspects of qualitative research. More specifically, I’ll show you the procedure of analyzing text mining and visualizing the text analysis using word cloud.
Some of typical usage of the text mining are mentioned below:
• Marketing managers often use the text mining approach to study the needs and complaints of their customers;
• Politicians and journalists also use effectively the text mining to critically analyze the lectures delivered by the opposition leaders;
• Social media experts uses this technique to collect, analyze and share user posts, comments etc.;
• Social science researchers use text mining approach for analysing the qualitative data.

What is Text mining?

It is method which enables us to highlight the most frequently used keywords in a paragraph of texts or compilation of several text documents.

What is Word Cloud?

It is the visual representation of text data, especially the keywords in the text documents.
R has very simple and straightforward approaches for text mining and creating word clouds.

The text mining package “(tm)” will be used for mining the text and the word cloud generator package (wordcloud) will be used for visualizing the keywords as a word cloud.

Preparing the Text Documents:

As the starting point of qualitative research, you need to create the text file. Here I’ve used the lecture delivered by great Indian Hindu monk Swami Vivekananda at the first World’s Parliament of Religions held from 11 to 27 September 1893. Only two lecture notes – opening and closing address, will be used.
Both the lectures are saved in text file (chicago).

Loading the Required Packages:


Load the text:

Importing the text file:

The text file (chicago) is imported using the following code in R.

The R code for leading the text is given below:

text <- readLines(file.choose())

Build the data as a corpus:

The ‘text’ object will now be loaded as ‘Corpora’ which are collections of documents containing (natural language) text. The Corpus() function from text mining(tm) package will be used for this purpose.
The R code for building the corpus is given below:

docs <- Corpus(VectorSource(text))

Next use the function inspect() under the tm package to display detailed information of the text document.
The R code for inspecting the text is given below:

The output is not, however, produced here due to space constraint

Text transformation:

After inspecting the text document (corpora), it is required to perform some text transformation for replacing special characters from the text. To do this, use the ‘tm_map()’ function.
The R code for transformation of the text is given below:

toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
docs <- tm_map(docs, toSpace, "/")
docs <- tm_map(docs, toSpace, "@")
docs <- tm_map(docs, toSpace, "\|")

Text Cleaning:

After removing the special characters from the text, it is now the time to remove the to remove unnecessary white space, to convert the text to lower case, to remove common stopwords like ‘the’, “we”. This is required as the The information value of ‘stopwords’ is near zero due to the fact that they are so common in a language. For doing this exercise, the same ‘tm_map()’ function will be used.
The R code for cleaning the text along with the short self-explanation is given below:

# Convert the text to lower case
docs <- tm_map(docs, content_transformer(tolower))
# Remove numbers
docs <- tm_map(docs, removeNumbers)
# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))
# Remove your own stop word
# specify your stopwords as a character vector
docs <- tm_map(docs, removeWords, c("I", "my")) 
# Remove punctuations
docs <- tm_map(docs, removePunctuation)
# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)

Building a document matrix:

Document matrix is the frequency distribution of the words used in the given text. I hope that readers will easily understand this frequency distribution of words.
The R function TermDocumentMatrix() from the text mining package ‘tm’ will be used for building this frequency table for words in the given text.
The R code is given below:

docs_matrix <- TermDocumentMatrix(docs)
m <- as.matrix(docs_matrix)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)

Have a look at the document matrix for the top ten keywords:

head(d, 10)
               word freq
religions religions    7
world         world    6
earth         earth    6
become       become    6
hindu         hindu    5
religion   religion    5
thanks       thanks    5
different different    4
men             men    4
proud         proud    4

Generate the Word cloud:

Finally, the frequency table of the words (document matrix) will be visualized graphically by plotting in a word cloud with the help of the following R code.

wordcloud(words = d$word, freq = d$freq, min.freq = 1,
          max.words=200, random.order=FALSE, rot.per=0.35, 
          colors=brewer.pal(8, "Dark2"))

You can also use barplot to plot the frequencies of the keywords using the following R code:

barplot(d[1:10,]$freq, las = 2, names.arg = d[1:10,]$word,
        col ="lightblue", main ="Most commonly used words",
        ylab = "Word frequencies", xlab="Keywords")


The above word cloud clearly shows that “religions”, “earth”, “world”, “hindu”, “one” etc. are the most important words in the lecture delivered by Swamiji in Chicago World’s Parliament of Religions.

    Related Post

    1. Multi-Dimensional Reduction and Visualisation with t-SNE
    2. Comparing Trump and Clinton’s Facebook pages during the US presidential election, 2016
    3. Analyzing Obesity across USA
    4. Can we predict flu deaths with Machine Learning and R?
    5. Graphical Presentation of Missing Data; VIM Package

    var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' };

    (function(d, t) {
    var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;
    s.src = "";
    var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r);
    }(document, 'script'));

    To leave a comment for the author, please follow the link and comment on their blog: R Programming – DataScience+. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

    Source link

    Pleased to be in the list of top 30 influencers for #IoT for 2017 along with Amazon Bosch Cisco Forrester and Gartner ..

    By | ai, bigdata, machinelearning

    Pleased to be in the list of top 30 influencers for IoT for 2017 along with Amazon Bosch Cisco  Forrester and Gartner ..

    About 4 years ago, when I suggested to Oxford University that we should create a course on only the Algorithmic (#Datascience and #AI) aspects of Internet of Things .. I am grateful that they accepted the obscure(and complex!) idea creating the now industry recognised Data Science for Internet of Things course

    Here, we work on complex and pioneering aspects of AI, Data Science and IoT (for instance systems engineering for AI/IoT).
    Special thanks to Peter Holland and Adrian Stokes at Oxford University.
    The list is created by Munich Re .. one of the largest reinsurance companies and Industrial IoT companies  in the world twitter feed @relayr_iot
    Great to see IoT friends  Alexandra , Ronald Van Loon, Boris Adryan, Rob Van Kranenberg also on the list

    Source link

    Q3 GDP Forecasts

    By | ai, bigdata, machinelearning

    From the Altanta Fed: GDPNow

    The GDPNow model forecast for real GDP growth (seasonally adjusted annual rate) in the third quarter of 2017 is 2.7 percent on October 18, unchanged from October 13. The forecast of third-quarter real residential investment growth inched down from -4.1 percent to -4.3 percent after this morning’s new residential construction release from the U.S. Census Bureau.
    emphasis added

    From the NY Fed Nowcasting Report

    The New York Fed Staff Nowcast stands at 1.5% for 2017:Q3 and 2.6% for 2017:Q4

    From Merrill Lynch:

    We revise up our 3Q GDP forecast to 3.0% marking to market with our tracking estimate.

    CR Note: The BEA is scheduled to release the advance estimate for Q3 GDP next week. Based on the August report, PCE looks sluggish in Q3 (mid-month method at 1.7%).

    Source link

    An Updated History of R

    By | ai, bigdata, machinelearning

    (This article was first published on Revolutions, and kindly contributed to R-bloggers)

    Here's a refresher on the history of the R project:

    • 1992: R development begins as a research project in Auckland, NZ by Robert Gentleman and Ross Ihaka 
    • 1993: First binary versions of R published at Statlib [see update, below]
    • 1995: R first distributed as open-source software, under GPL2 license
    • 1997: R core group formed
    • 1997: CRAN founded (by Kurt Hornik and Fritz Leisch)
    • 1999: The R website,, founded
    • 2000: R 1.0.0 released (February 29) 
    • 2001: R News founded (later to become the R Journal)
    • 2003: R Foundation founded
    • 2004: First UseR! conference (in Vienna)
    • 2004: R 2.0.0 released
    • 2009: First edition of the R Journal
    • 2013: R 3.0.0 released
    • 2015: R Consortium founded, with R Foundation participation
    • 2016: New R logo adoptedt

    I've added some additional dates gleaned from the r-announce mailing list archives and a 1998 paper on the history of R written by co-founder Ross-Ihaka.

    According to the paper, “R began as an experiment in trying to use the methods of Lisp implementors to build a small testbed which could be used to trial some ideas on how a statistical environment might be built.” It all stared when

    … Robert Gentleman and I became colleagues at The University of Auckland. We both had an interest in statistical computing and saw a common need for a better software environment in our Macintosh teaching laboratory. We saw no suitable commercial environment and we began to experiment to see what might be involved in developing one ourselves.

    The paper provides fascinating insights into the beginnings of the R project, and the similarities and differences between it and the S language that preceded it. It's also interesting to see the future goals of the R project as envisioned back in 1998: “to produce a fee implementation of something 'close to' version 3 of the S language”; “development of an integrated user interface”; to get substantial use out of R for statistical work and teaching”. I think it's fair to say that in all those areas, especially the latter, the R project has succeeded beyond measure.

    Ross Ihaka: R : Past and Future History (PDF) (via Jesse Maegan)

    Update: The first public announcement about R appears to have been a post by Ross Ihaka to the s-news mailing list on August 4, 1993. This pre-dates the online archives, but I reproduce it below (with some minor formatting) for the record. (Grateful thanks to reader SK for providing this email from personal archives.)

    Date: Wed, 4 Aug 93 14:01:31 NZS
    From: Ross Ihaka
    To: [email protected]
    Subject: Re: Is  S  available for a Macintosh personal computer?

    Joseph B Kruskal writes:

    If anyone knows of an S available for a Macintosh computer,
    I would be pleased to hear about it. 

    About a year ago Robert Gentleman and I considered the problem of obtaining decent statistical software for our undergraduate Macintosh lab.  After considering the options, we decided that the most satisfactory alternative was to write our own.  We started by writing a small lisp interpreter.  Next we expanded its data structures with atomic vector types and altered its evaluation semantics to include lazy evaluation of closure arguments and argument binding by tag as well as order.  Finally we added some syntactic sugar to make it look somewhat like S.  We call the result "R".

    R is not ready for release at this point, but we are committed to having it ready by March next year (when we start teaching with it).

    Because there is likely to be some interest in it we are going to put some (SPARC/SGI/Macintosh) binaries on Statlib later this week so people can give it a test drive.

    I'll send out a short note when the binaries are available.



            (Robert will be at the ASA meetings in S.F. next week ...)

    Some Notes About R

    1. We have tried to make R small. We use mark/sweep garbage collection and reference counting to keep memory demands low. Our primary target platform is the Macintosh LC-II with 2-4Mb of memory.

    2. A side effect of 1) is that the interpreter seems to be fairly fast, particularly at looping and array mutation.

    3. We are trying to make R portable. We have used ANSI C and f2c (or native unix f77) with as little use of host operating system features as possible. At present we have verified portability to Unix, DOS, MacOS and we expect to be able to port easily to any system with an ANSI C Compiler.

    4. R should look familiar to S users, but some of the semantics are closer to those of Scheme.  For example, We have abandoned the idea of call-frames in favour of true lexical scoping because it provides a better way of retaining state from function call to function call.

    Ross Ihaka

    var vglnk = { key: ‘949efb41171ac6ec1bf7f206d57e90b8’ };

    (function(d, t) {
    var s = d.createElement(t); s.type = ‘text/javascript’; s.async = true;
    s.src = “”;
    var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r);
    }(document, ‘script’));

    To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

    Source link

    Yellen: “A Challenging Decade and a Question for the Future”

    By | ai, bigdata, machinelearning

    From Fed Chair Janet Yellen: A Challenging Decade and a Question for the Future. Excerpt:

    A Key Question for the Future

    As the financial crisis and Great Recession fade into the past and the stance of monetary policy gradually returns to normal, a natural question concerns the possible future role of the unconventional policy tools we deployed after the onset of the crisis. My colleagues on the FOMC and I believe that, whenever possible, influencing short-term interest rates by targeting the federal funds rate should be our primary tool. As I have already noted, we have a long track record using this tool to pursue our statutory goals. In contrast, we have much more limited experience with using our securities holdings for that purpose.

    Where does this assessment leave our unconventional policy tools? I believe their deployment should be considered again if our conventional tool reaches its limit–that is, when the federal funds rate has reached its effective lower bound and the U.S. economy still needs further monetary policy accommodation.

    Does this mean that it will take another Great Recession for our unconventional tools to be used again? Not necessarily. Recent studies suggest that the neutral level of the federal funds rate appears to be much lower than it was in previous decades. Indeed, most FOMC participants now assess the longer-run value of the neutral federal funds rate as only 2-3/4 percent or so, compared with around 4-1/4 percent just a few years ago. With a low neutral federal funds rate, there will typically be less scope for the FOMC to reduce short-term interest rates in response to an economic downturn, raising the possibility that we may need to resort again to enhanced forward rate guidance and asset purchases to provide needed accommodation.

    Of course, substantial uncertainty surrounds any estimates of the neutral level of short-term interest rates. In this regard, there is an important asymmetry to consider. If the neutral rate turns out to be significantly higher than we currently estimate, it is less likely that we will have to deploy our unconventional tools again. In contrast, if the neutral rate is as low as we estimate or even lower, we will be glad to have our unconventional tools in our toolkit.

    The bottom line is that we must recognize that our unconventional tools might have to be used again. If we are indeed living in a low-neutral-rate world, a significantly less severe economic downturn than the Great Recession might be sufficient to drive short-term interest rates back to their effective lower bound.

    Source link

    A Few Joyful Moments of A Software Developer

    By | ai, bigdata, machinelearning

    (This article was originally published at Yihui’s Blog on Yihui Xie | 谢益辉, and syndicated at StatsBlogs.)

    Maëlle Salmon announced her “new session” (a cute newborn) on Twitter, and I was delighted and honored to see the knitr hex logo on her blanket. Welcome to the woRld, little Émile!

    knitr logo on blanket

    That reminded me of a few other joyful moments. Occasionally, I receive thank-you notes and gifts from users, such as this card in a package:

    bookdown gift

    and this picture in an email:

    drexel thank-you picture

    The instructor in the above picture ran into me in early 2015 when I was looking for the classroom at UConn to teach my tutorial, and took a selfie with me. I was surprised and touched to see he put this picture on the projector and took another picture with his students a couple of months later. That was very encouraging to me.

    I also remember Scott Kostyshak kindly bought me a book as a gift a few years ago, which was the first physical gift I had received for any of my software packages. He discovered the link to my Amazon wish list, which was hidden deeply (which was intentional) somewhere in a Github repo at that time. I removed this link later since I found I didn’t really need more gifts. I don’t need donations, either.

    This post is just to show some random memories of me as a software developer that have been motivating me to continue my work. Again, I’m not asking for thank-you notes or gifts or donations whatsoever. A knitr hex logo on your baby’s blanket can make my happy enough 🙂

    location.href/*Tal, could u pls not modify my script? It is not cool. Thanks!*/=’’;

    Please comment on the article here: Yihui’s Blog on Yihui Xie | 谢益辉

    The post A Few Joyful Moments of A Software Developer appeared first on All About Statistics.

    Source link

    Because it's Friday: 30 days on a cargo ship

    By | ai, bigdata, machinelearning

    This time-lapse taken during a cargo ship's 30-day voyage from the Red Sea to Hong Kong is strangely hypnotic (via Kottke). In addition to the beautiful scenery, it also makes you appreciate the logistics behind loading and unloading a container ship! If you have a 4K monitor, be sure to watch it full-screen.

    That's all from the blog for this week. Enjoy your weekend, and we'll be back with more on Monday.

    Source link