Category

ai

RApiDatetime 0.0.2

By | ai, bigdata, machinelearning

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

Two days after the initial 0.0.1 release, a new version of RApiDatetime has just arrived on CRAN.

RApiDatetime provides six entry points for C-level functions of the R API for Date and Datetime calculations. The functions asPOSIXlt and asPOSIXct convert between long and compact datetime representation, formatPOSIXlt and Rstrptime convert to and from character strings, and POSIXlt2D and D2POSIXlt convert between Date and POSIXlt datetime. These six functions are all fairly essential and useful, but not one of them was previously exported by R.

Josh Ulrich took one hard look at the package — and added the one line we needed to enable the Windows support that was missing in the initial release. We now build on all platforms supported by R and CRAN. Otherwise, I just added a NEWS file and called it a bugfix release.

Changes in RApiDatetime version 0.0.2 (2017-03-25)

  • Windows support has added (Josh Ulrich in #1)

Changes in RApiDatetime version 0.0.1 (2017-03-23)

  • Initial release with six accessible functions

Courtesy of CRANberries, there is a comparison to the previous release. More information is on the rapidatetime page.

For questions or comments please use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box .

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…




Source link

BLS: Unemployment Rates “significantly lower in February in 10 states”, Arkansas and Oregon at New Lows

By | ai, bigdata, machinelearning


From the BLS: Regional and State Employment and Unemployment Summary

Unemployment rates were significantly lower in February in 10 states, higher in 1 state, and stable in 39 states and the District of Columbia, the U.S. Bureau of Labor Statistics reported today. Nine states had notable jobless rate decreases from a year earlier, and 41 states and the District had no significant change. The national unemployment rate, at 4.7 percent, was little changed from January but 0.2 percentage point lower than in February 2016.

New Hampshire had the lowest unemployment rate in February, 2.7 percent, closely followed by Hawaii and South Dakota, 2.8 percent each, and Colorado and North Dakota, 2.9 percent each. The rates in both Arkansas (3.7 percent) and Oregon (4.0 percent) set new series lows. … New Mexico had the highest jobless rate, 6.8 percent, followed by Alaska and Alabama, 6.4 percent and 6.2 percent, respectively.
emphasis added

State Unemployment Click on graph for larger image.

This graph shows the current unemployment rate for each state (red), and the max during the recession (blue). All states are well below the maximum unemployment rate for the recession.

The size of the blue bar indicates the amount of improvement. The yellow squares are the lowest unemployment rate per state since 1976.

The states are ranked by the highest current unemployment rate. New Mexico, at 6.8%, had the highest state unemployment rate.

State UnemploymentThe second graph shows the number of states (and D.C.) with unemployment rates at or above certain levels since January 2006. At the worst of the employment recession, there were 11 states with an unemployment rate at or above 11% (red).

Currently no state has an unemployment rate at or above 7% (light blue); Only three states are at or above 6% (dark blue). The states are New Mexico (6.8%), Alaska (6.4%), and Alabama (6.2%).


Source link

Let’s accept the idea that treatment effects vary—not as something special but just as a matter of course

By | ai, bigdata, machinelearning

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

Tyler Cowen writes:

Does knowing the price lower your enjoyment of goods and services?

I [Cowen] don’t quite agree with this as stated, as the experience of enjoying a bargain can make it more pleasurable, or at least I have seen this for many people. Some in fact enjoy the bargain only, not the actual good or service. Nonetheless here is the abstract [of a recent article by Kelly Haws, Brent McFerran, and Joseph Redden]:

Prices are typically critical to consumption decisions, but can the presence of price impact enjoyment over the course of an experience? We examine the effect of price on consumers’ satisfaction over the course of consumption. We find that, compared to when no pricing information is available, the presence of prices accelerates satiation (i.e., enjoyment declines faster). . . .

I have no special thoughts on pricing and enjoyment, nor am I criticizing the paper by Haws et al. which I have not had the opportunity to read (see P.S. below).

The thing I did want to talk about was Cowen’s implicit assumption in his header that a treatment has a fixed effect. It’s clear that Cowen doesn’t believe this—the very first sentence of this post recognizes variation—so it’s not that he’s making this conceptual error. Rather, my problem here is that the whole discussion, by default, is taking place on the turf of constant effects.

The idea, I think, is that you first establish the effect and then you look for interactions. But if interactions are the entire story—as seems plausible here—that main-effect-first approach will be a disaster. Just as it was with power pose, priming, etc.

Framing questions in terms of “the effect” can be a hard habit to break.

P.S.

I was thinking of just paying the $35.95 but, just from the very fact of knowing the price, my satiation increased and my enjoyment declined and I couldn’t bring myself to do it. In future, perhaps Elsevier can learn from its own research and try some hidden pricing: Click on this article and we’ll remove a random amount of money from your bank account! That sort of thing.

The post Let’s accept the idea that treatment effects vary—not as something special but just as a matter of course appeared first on Statistical Modeling, Causal Inference, and Social Science.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science

The post Let’s accept the idea that treatment effects vary—not as something special but just as a matter of course appeared first on All About Statistics.




Source link

Debugging Pipelines in R with Bizarro Pipe and Eager Assignment

By | ai, bigdata, machinelearning

This is a note on debugging magrittr pipelines in R using Bizarro Pipe and eager assignment.

Moth

Pipes in R

The magrittr R package supplies an operator called “pipe” which is written as “%>%“. The pipe operator is partly famous due to its extensive use in dplyr and use by dplyr users. The pipe operator is roughly described as allowing one to write “sin(5)” as “5 %>% sin“. It is described as being inspired by F#‘s pipe-forward operator “|>” which itself is defined or implemented as:

    let (|>) x f = f x

The magrittr pipe doesn’t actually perform the above substitution directly. As a consequence “5 %>% sin” is evaluated in a different environment than “sin(5)” would be (unlike F#‘s “|>“), and the actual implementation is fairly involved.

The environment change is demonstrated below:

library("dplyr")
f <- function(...) {print(parent.frame())}

f(5)
## 

5 %>% f
## 

Pipes are like any other coding feature: if you code with it you are eventually going to have to debug with it. Exact pipe semantics and implementation details are important when debugging, as one tries to control execution sequence and examine values and environments while debugging.

A Debugging Example

Consider the following example taken from the “Chaining” section of “Introduction to dplyr.

library("dplyr")
library("nycflights13")

flights %>%
    group_by(year, month, day) %>%
    select(arr_delay, dep_delay) %>%
    summarise(
        arr = mean(arr_delay, na.rm = TRUE),
        dep = mean(dep_delay, na.rm = TRUE)
    ) %>%
    filter(arr > 30 | dep > 30)
## Adding missing grouping variables: `year`, `month`, `day`
## Source: local data frame [49 x 5]
## Groups: year, month [11]
## 
##     year month   day      arr      dep
##              
## 1   2013     1    16 34.24736 24.61287
## 2   2013     1    31 32.60285 28.65836
## ...

A beginning dplyr user might wonder at the meaning of the warning “Adding missing grouping variables: `year`, `month`, `day`“. Similarly, a veteran dplyr user may wonder why we bother with a dplyr::select(), as selection is implied in the following dplyr::summarise(); but this is the example code as we found it.

Using Bizarro Pipe

We can run down the cause of the warning quickly by performing the mechanical translation from a magrittr pipeline to a Bizarro pipeline. This is simply making all the first arguments explicit with “dot” and replacing the operator “%>%” with the Bizarro pipe glyph: “->.;“.

We can re-run the modified code by pasting into R‘s command console and the warning now lands much nearer the cause (even when we paste or execute the entire pipeline at once):

flights ->.;
  group_by(., year, month, day) ->.;
  select(., arr_delay, dep_delay) ->.;
## Adding missing grouping variables: `year`, `month`, `day`
  summarise(.,
          arr = mean(arr_delay, na.rm = TRUE),
          dep = mean(dep_delay, na.rm = TRUE)
  ) ->.;
  filter(., arr > 30 | dep > 30)
## Source: local data frame [49 x 5]
## Groups: year, month [11]
## 
##     year month   day      arr      dep
##              
## 1   2013     1    16 34.24736 24.61287
## 2   2013     1    31 32.60285 28.65836
## ...

We can now clearly see the warning was issued by dplyr::select() (even though we just pasted in the whole block of commands at once). This means despite help(select) saying “select() keeps only the variables you mention” this example is depending on the (useful) accommodation that dplyr::select() preserves grouping columns in addition to user specified columns (though this accommodation is not made for columns specified in dplyr::arrange()).

A Caveat

To capture a value from a Bizarro pipe we must make an assignment at the end of the pipe, not the beginning. The following will not work as it would capture only the value after the first line (“flights ->.;“) and not the value at the end of the pipeline.

One must not write:


VARIABLE <- 
  flights ->.;
  group_by(., year, month, day)

To capture pipeline results we must write:


flights ->.;
  group_by(., year, month, day) -> VARIABLE

I think the right assignment is very readable if you have the discipline to only use pipe operators as line-enders, making assignments the unique lines without pipes. Also, leaving an extra line break after assignments helps with readability.

Making Things More Eager

A remaining issue is: Bizarro pipe only made the composition eager. For a data structure with additional lazy semantics (such as dplyr‘s view of a remote SQL system) we would still not have the warning near the cause.

Unfortunately different dplyr backends give different warnings, so we can’t demonstrate the same warning here. We can, however, deliberately introduce an error and show how to localize errors in the presence of lazy eval data structures. In the example below I have misspelled “month” as “moth”. Notice the error is again not seen until printing, long after we finished composing the pipeline.

s <- dplyr::src_sqlite(":memory:", create = TRUE)                                 
flts <- dplyr::copy_to(s, flights)

flts ->.;
  group_by(., year, moth, day) ->.;
  select(., arr_delay, dep_delay) ->.;
  summarise(.,
          arr = mean(arr_delay, na.rm = TRUE),
          dep = mean(dep_delay, na.rm = TRUE)
          ) ->.;
  filter(., arr > 30 | dep > 30)

## Source:   query [?? x 5]
## Database: sqlite 3.11.1 [:memory:]
## Groups: year, moth

## na.rm not needed in SQL: NULL are always droppedFALSE
## na.rm not needed in SQL: NULL are always droppedFALSE
##  Error in rsqlite_send_query([email protected], statement) : no such column: moth

We can try to force dplyr into eager evaluation using the eager value landing operator “replyr::`%->%`” (from replyr package) to form the “extra eager” Bizarro glyph: “%->%.;“.

Replyrs

When we re-write the code in terms of the extra eager Bizarro glyph we get the following.

install.packages("replyr")
library("replyr")

flts %->%.;
  group_by(., year, moth, day) %->%.;
## Error in rsqlite_send_query([email protected], statement) : no such column: moth
  select(., arr_delay, dep_delay) %->%.;
  summarise(.,
          arr = mean(arr_delay, na.rm = TRUE),
          dep = mean(dep_delay, na.rm = TRUE)
          ) %->%.;
## na.rm not needed in SQL: NULL are always droppedFALSE
## na.rm not needed in SQL: NULL are always droppedFALSE
  filter(., arr > 30 | dep > 30)
## Source:   query [?? x 5]
## Database: sqlite 3.11.1 [:memory:]

Notice we have successfully localized the error.

Nota Bene

One thing to be careful with in “dot debugging” is: when a statement such as dplyr::select() errors-out this means
the Bizarro assignment on that line does not occur (normal R exception semantics). Thus “dot” will be still carrying the value from the previous line, and the pasted block of code will continue after the failing line using this older data state found in “dot.” So you may see strange results and additional errors indicated in the pipeline. The debugging advice is: at most the first error message is trustworthy.

The Trick

The trick is to train your eyes to to read “->.;” or “%->%.;” as a single atomic or indivisible glyph, and not as a sequence of operators, variables, and separators. I see Bizarro pipe as a kind of strange superhero.

Conclusion

Pipes are a fun notation, and even the original magrittr package experiments with a number of interesting variations of them. I hope you add Bizarro pipe (which turns out has been available in R all along, without requiring any packages!) and extra eager Bizarro pipe to your debugging workflow.

Pipe2




Source link

I’m hiring!

By | ai, bigdata, machinelearning

I need a part-time remote assistant to help keep my websites up to date, among other things!

Thanks to my generous Patreon supporters, I can hire someone to help me out 8-20 hours per month, paying $15/hr. More info and application form at this link.

Please let me know if you have any questions or if there are any problems with the form. Email me at [email protected] or tweet me at @becomingdatasci.

I look forward to reading the applications!!


Source link