Thursday, September 10, 2015

Is the recycling getting picked up next week?

To develop my Python skills, I have been working through the codeacademy tutorials in Python for beginners. I really think that when learning a new program, the best way to get comfortable with it is to just work on my own projects. To that end, I wrote a simple script to solve a problem I run into in my daily life with unfortunate regularity. The City of Knoxville collects trash every Monday, and recycling every other Monday. I have a terrible time remembering whether the next Monday will be recycling pick-up day or not. This script will let me know whether or not the next coming Monday is a recycling day.

The conceptual flow for this script is:

1. Pull the current date from the computer

2. Assess how many weeks it has been since 31 August 2015, a day when the recylcing was picked up.

3. Determine whether the current week is an odd week since that week that the recycling was known to be picked up, and if so, return a print out stating that this coming Monday will be recycling day. If not, it will return a different output.

This script utilizes the 'datetime' module from the Python 2.7.3 library. It was neat to see the differences and similarities between Python functions and R functions. Importing the function in Python is just like importing packages in R. I like the syntax in the module, such as in the line 'now = datetime.datetime.now()', where I call the module (the first 'datetime') and then the function (the 'datetime.now').

The code for the script can be found at:

https://gist.github.com/dwalke44/3d50b824de04e294bf9d

Tuesday, September 8, 2015

Fun with RMarkdown

This document is a practice exhibition with RMarkdown, generating simulated data, ggplot2, and writing functions.
Packages to include: stats, ggplot2
  1. Generating new data from given distributions
    • Normally Distributed - n = 100, mean = 0, SD = 1
    library(stats)
    library(ggplot2)
    Normaldist<- rnorm(100, 0, 1)
    qplot(Normaldist, geom = 'histogram', main = 'Normally Distributed Data')
    ## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
    • Poisson Distribution - n = 1000 , mean = 50
    Poisdist<- rpois(1000, 50)
    qplot(Poisdist, geom = 'histogram', main = 'Poisson Distributed Data')
    ## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
    • Binomial Distribution - 10 replicates of 100 coin tosses - ntotal = 1000 , chance of success (heads) = 0.5
    cointoss<-rbinom(10, 100, 0.5)
    trials<-c(1,2,3,4,5,6,7,8,9,10)
    trials.success<-data.frame(trials, cointoss) 
    cointossplot<-qplot(trials, cointoss, geom = 'bar', stat='identity', xlab = 'Trial Number', ylab = 'Number of Heads',  main = 'Number of Heads in 1000 Trials')
    cointossplot + scale_x_discrete(levels(trials))
    #This step adds space for an 11th trial, so I state explicitly the labels for the x-axis
    cointossplot + scale_x_discrete(limits=c(1,2,3,4,5,6,7,8,9,10)) 
  2. Function to generate summary statistics about the Normal dataset.
    #A function that returns a table of the mean, variance, and standard deviation of the input dataset
    sum.stat1<- function(x){
      mean<- mean(x)
      variance<- var(x)
      standarddev<- sqrt(var(x))
    
      result<-data.frame(mean, variance, standarddev)
    
      return(result)
    }
    
    #Running the function with the normally distributed simulated data
    fxn1<-sum.stat1(Normaldist)
    fxn1
    ##        mean  variance standarddev
    ## 1 0.1409396 0.9489135   0.9741219
    #Compare the results of my function to the summary() function
    
    fxn2<-summary(Normaldist)
    fxn2
    ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    ## -2.3670 -0.5213  0.1364  0.1409  0.8603  2.7350