Luckily using IPython Notebook you can have the goodness of both the worlds. Ipython notebook (especially rpy2 package) allows to seamlessly transfer objects between python and R environment. Below is a brief explanation and code snippet of how data generated/processed in python can be visualized using R’s ggplot. [In hurry ! Sample notebook over here.]
Step 1: Load R Kenel within IPython using rpy2 package
Previously communication between R and Ipython notebook was handled by rmagic extension. Now most of this logic has been abstracted into its own python package known as rpy2. You can install rpy2 using the following command: pip install rpy2 --upgrade
. Once rpy2 is installed, you can initialize R kernel within IPython Notebook using rpy2.ipython
extension as shown below.
%load_ext rpy2.ipython
Step 2: Convert Data To Pandas Dataframe
If you already have some data available as pandas dataframe then feel free to use that data in the next step (and skip this step). If not, let’s randomly select 1000 points from normal distribution using numpy numpy and finally convert it to pandas dataframe. In the next step we will pass this dataframe to R’s ggplot library and plot the density curve.
import pandas as pd import numpy as np data = np.random.randn(5000, 1) df = pd.DataFrame(data, columns=["value"])
Step 3: Using %%R cell magic function
Finally use %%R cell magic function and pass df
(python object pointing to pandas dataframe) using -i
parameter. rpy2 package will make it available within R’s environment by applying necessary transformations. Now we can do anything to this data, including visualizing using R’s ggplot library.
%%R -i df2 -w 800 -h 480 -u px library(ggplot2) ggplot(df) + geom_density(aes(x=value))
Below is the list of some of the important parameters that can be passed to %%R magic function:
Full documentation on parameters can be found over here.
Reference:
1. Revolution Analytics’ Blog On Using R With Jupyter Notebook
2. Stack Overflow
1. %run -i
: Running another notebook in the context of current python kernel
One of the fundamental tenet of object oriented programming is to avoid duplication of code. That was one of issues I always had with IPython Notebook. There are always few classes/functions that you use across different notebooks. Initially I use to copy these functions in each notebook. However, using %run magic function I finally found a solution to the above problem. Magic function %run allows you to run another notebook in the context of current python kernel.
Assuming you defined all the common classes/functions in “common.ipynb” and you want to incorporate those in another notebook (say projectA.ipynb), then invoke the below command to make them available in projectA.ipynb.
%run -i common.ipynb
2. Progress Bars: Keep a check on your iterators.
Progress bars are nice way to keep track of processing time remaining. As shown below, IPython Notebook makes it pretty easy to include a nice-looking progress bar in your notebooks.
from ipywidgets import FloatProgress from IPython.display import display f = FloatProgress(min=0, max=100) display(f) # Increment value of the progress bar within the iterator from time import sleep for i in xrange(100): sleep(0.1) f.value = i
(Yikes!!!.. so much code to get a progress bar). If you feel like me then you should install tqdm package. It makes adding a progress bar with minimal code a breeze.
from tqdm import trange for i in trange(100): sleep(0.1)
3. Unit Testing: Make sure your functions/classes are working fine
Testing code is important and its easy to include unit test in your ipython notebook. Below is an example of how to incorporate unittest
import unittest # Define Person class class Person(object): def __init__(self, name, age): self.__name = name self.__age = age @property def name(self): return self.__name @property def age(self): return self.__age def __str__(self): return &quot;{} ({})&quot;.format(self.name, self.age) def __eq__(self, other): return self.name == other.name and self.age == other.age # Define unit test class PersonTest(unittest.TestCase): def test_initialization(self): p1 = Person(&quot;xyz&quot;, 10) self.assertEqual(&quot;xyz&quot;, p1.name) self.assertEqual(10, p1.age) def test_equality(self): p1 = Person(&quot;xyz&quot;, 10) p2 = Person(&quot;xyz&quot;, 10) self.assertEqual(p1, p2) # Run unit test suite = unittest.TestLoader().loadTestsFromTestCase( PersonTest ) unittest.TextTestRunner().run(suite)
4. Use R’s ggplot to visualize data
Both Python and R have there own pros and cons. Luckily you can have goodness of both the worlds within IPython notebook. Using rpy2 python package you can seamlessly transform data/objects between python and R environment. Checkout more about this in one of my another blog post over here.
where
Covariance matrix for a dataset with independent feature is a diagonal matrix. For a diagonal matrix we can easily show that
Using the above two properties of the diagonal matrix we can show that equation 1 essentially same as equation 2 when features are independent. Let’s first tackle in equation 1. Since determinant of a diagonal matrix is equal to the product of diagonal elements we can rewrite
Now let’s focus on the exponential part in equation 1. Using 3, we can show that
Now can be written as . Thus
Replacing 1 with 5 and 7 we get
Hence proved.
In order to write a custom UDAF you need to extend UserDefinedAggregateFunctions and define following four methods:
initialize
— On a given node, this method is called once for each group.update
— For a given group, spark will call “update” for each input record of that group.merge
— if the function supports partial aggregates, spark might (as an optimization) compute partial result and combine them togetherevaluate
— Once all the entries for a group are exhausted, spark will call evaluate to get the final result.Depending on whether the function supports combiner option or not, the order of execution can vary in the following two ways:
if the function supports partial aggregates
You can read more about the execution pattern in my earlier blog on custom UDAF in hive.
Apart from defining the above four methods you also need to define input, intermediate and final datatype. Below is a example showing how to write a custom function that computes mean.
package com.myuadfs import org.apache.spark.sql.Row import org.apache.spark.sql.expressions.{MutableAggregationBuffer, UserDefinedAggregateFunction} import org.apache.spark.sql.types._ /** * Created by ragrawal on 9/23/15. * Computes Mean */ //Extend UserDefinedAggregateFunction to write custom aggregate function //You can also specify any constructor arguments. For instance you //can have CustomMean(arg1: Int, arg2: String) class CustomMean() extends UserDefinedAggregateFunction { // Input Data Type Schema def inputSchema: StructType = StructType(Array(StructField("item", DoubleType))) // Intermediate Schema def bufferSchema = StructType(Array( StructField("sum", DoubleType), StructField("cnt", LongType) )) // Returned Data Type . def dataType: DataType = DoubleType // Self-explaining def deterministic = true // This function is called whenever key changes def initialize(buffer: MutableAggregationBuffer) = { buffer(0) = 0.toDouble // set sum to zero buffer(1) = 0L // set number of items to 0 } // Iterate over each entry of a group def update(buffer: MutableAggregationBuffer, input: Row) = { buffer(0) = buffer.getDouble(0) + input.getDouble(0) buffer(1) = buffer.getLong(1) + 1 } // Merge two partial aggregates def merge(buffer1: MutableAggregationBuffer, buffer2: Row) = { buffer1(0) = buffer1.getDouble(0) + buffer2.getDouble(0) buffer1(1) = buffer1.getLong(1) + buffer2.getLong(1) } // Called after all the entries are exhausted. def evaluate(buffer: Row) = { buffer.getDouble(0)/buffer.getLong(1).toDouble } }
Below is the code that shows how to use UDAF with dataframe.
import org.apache.spark.sql.Row import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.types.{DoubleType, StringType, StructField, StructType} import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.sql.functions._ import com.myudafs.CustomMean // define UDAF val customMean = new CustomMean() // create test dataset val data = (1 to 1000).map{x:Int => x match { case t if t <= 500 => Row("A", t.toDouble) case t => Row("B", t.toDouble) }} // create schema of the test dataset val schema = StructType(Array( StructField("key", StringType), StructField("value", DoubleType) )) // construct data frame val rdd = sc.parallelize(data) val df = sqlContext.createDataFrame(rdd, schema) // Calculate average value for each group df.groupBy("key").agg( customMean(df.col("value")).as("custom_mean"), avg("value").as("avg") ).show()
Output should be
key | custom_mean | avg |
---|---|---|
A | 250.5 | 250.5 |
B | 750.5 | 750.5 |
— | —– | —– |
Few shortcomings of the UserDefinedAggregateFunction class
As a motivating example assume we are given some student data containing student’s name, subject and score and we want to convert numerical score into ordinal categories based on the following logic:
Below is the relevant python code if you are using pyspark.
# Generate Random Data import itertools import random students = ['John', 'Mike','Matt'] subjects = ['Math', 'Sci', 'Geography', 'History'] random.seed(1) data = [] for (student, subject) in itertools.product(students, subjects): data.append((student, subject, random.randint(0, 100))) # Create Schema Object from pyspark.sql.types import StructType, StructField, IntegerType, StringType schema = StructType([ StructField("student", StringType(), nullable=False), StructField("subject", StringType(), nullable=False), StructField("score", IntegerType(), nullable=False) ]) # Create DataFrame from pyspark.sql import HiveContext sqlContext = HiveContext(sc) rdd = sc.parallelize(data) df = sqlContext.createDataFrame(rdd, schema) # Define udf from pyspark.sql.functions import udf def scoreToCategory(score): if score >= 80: return 'A' elif score >= 60: return 'B' elif score >= 35: return 'C' else: return 'D' udfScoreToCategory=udf(scoreToCategory, StringType()) df.withColumn("category", udfScoreToCategory("score")).show(10)
Line 2-10 is the basic python stuff. We are generating a random dataset that looks something like this:
student | subject | score |
---|---|---|
John | Math | 13 |
… | … | … |
Mike | Sci | 45 |
Mike | Geography | 65 |
… | … | … |
Next line 12-24 are dealing with constructing the dataframe. The main part of the code is in line 27-34. We first define our function in a normal python way.
Below is scala example of the same:
// Construct Dummy Data import util.Random import org.apache.spark.sql.Row implicit class Crossable[X](xs: Traversable[X]) { def cross[Y](ys: Traversable[Y]) = for { x <- xs; y <- ys } yield (x, y) } val students = Seq("John", "Mike","Matt") val subjects = Seq("Math", "Sci", "Geography", "History") val random = new Random(1) val data =(students cross subjects).map{x => Row(x._1, x._2,random.nextInt(100))}.toSeq // Create Schema Object import org.apache.spark.sql.types.{StructType, StructField, IntegerType, StringType} val schema = StructType(Array( StructField("student", StringType, nullable=false), StructField("subject", StringType, nullable=false), StructField("score", IntegerType, nullable=false) )) // Create DataFrame import org.apache.spark.sql.hive.HiveContext val rdd = sc.parallelize(data) val df = sqlContext.createDataFrame(rdd, schema) // Define udf import org.apache.spark.sql.functions.udf def udfScoreToCategory=udf((score: Int) => { score match { case t if t >= 80 => "A" case t if t >= 60 => "B" case t if t >= 35 => "C" case _ => "D" }}) df.withColumn("category", udfScoreToCategory(df("score"))).show(10)
As compared to earlier Hive version this is much more efficient as its uses combiners (so that we can do map side computation) and further stores only N records any given time both on the mapper and reducer side.
import heapq def takeOrderedByKey(self, num, sortValue = None, reverse=False): def init(a): return [a] def combine(agg, a): agg.append(a) return getTopN(agg) def merge(a, b): agg = a + b return getTopN(agg) def getTopN(agg): if reverse == True: return heapq.nlargest(num, agg, sortValue) else: return heapq.nsmallest(num, agg, sortValue) return self.combineByKey(init, combine, merge) # Create some fake student dataset. The objective is to use identify top 2 # students in each class based on GPA scores. data = [ ('ClassA','Student1', 3.89),('ClassA','Student2', 3.13),('ClassA', 'Student3',3.87), ('ClassB','Student1', 2.89),('ClassB','Student2', 3.13),('ClassB', 'Student3',3.97) ] # Add takeOrderedByKey function to RDD class from pyspark.rdd import RDD RDD.takeOrderedByKey = takeOrderedByKey # Load dataset rdd1 = sc.parallelize(data).map(lambda x: (x[0], x)) # extract top 2 records in each class ordered by GPA in descending order for i in rdd1.takeOrderedByKey(2, sortValue=lambda x: x[2], reverse=True).flatMap(lambda x: x[1]).collect(): print i
Output of the above program is:
('ClassB', 'Student3', 3.97) ('ClassB', 'Student2', 3.13) ('ClassA', 'Student1', 3.89) ('ClassA', 'Student3', 3.87)
The key line to understand is line number 22. We use combineByKey
operator to split the dataset by key and then use the heap data structure to order input records by GPA score. You can find a good explanation of combineByKey
operator on Adam Shinn’s blog.
Finally note that in line number 40, x
in sortValue = lambda x: x[2]
refers to the value of the PairRDD created at line number 37.
print 3 > 2 # True print [3] > [2] # True print [2,1] > [2] # True print (2,1) > (2,) # True print (2,2) > (2,2) # False
Below is an example of how to use the above information to sort RDD based on multiple fields and extract top N records. Basically we return a tuple as the key.
# load dataset data = sc.parallelize(...) # Order by Col 1 in Desc Order and then Col 0 in ascending order topN = data.takeOrdered(10, key=lambda x: (-1 * x[1], x[0]))
Code References:
1. takeOrdered: Note that it uses MaxHeapQ to collect elements and order them.
2. MaxHeapQ: Uses basic python comparison operator to determine the organize heap.
For simplicity, as indicated by the blue line in Fig A., let’s assume the target distribution from which you want random samples is a truncated normal distribution with -3 to 3 domain i.e . Also, as indicated by dashed black line, assume a rectangular enveloping region around the target distribution that is bounded by (-3, 0) and (3, 0.4167).
If randomly select x and y from this enveloping region and plot these points, as shown by red and green dots, some of the them will fall inside our target distribution and some outside of the target distribution. For any random point (x, y), if it falls within the target distribution i.e. then we accept it as a valid sample point. For instance assume the random point is (-1, 0.15). At X=-1, the probability density for the target distribution is given as . Since , we accept (-1, 0.15) as a valid point.
Let's convert the above idea into a working python code. Next we will look how to create this enveloping region around our target distribution.
import random import numpy as np import matplotlib.pyplot as plt import matplotlib.mlab as mlab from scipy.stats import truncnorm #Domain of X xdomain = [-3, 3] def pdf(x): """ Probability distribution function for Random Variable X from which we want to sample points. Here we assume we have truncated standard normal distribution in the domain of -3 to 3 """ return truncnorm.pdf(x, xdomain[0], xdomain[1]) def random_point_within_enveloping_region(): """ """ x = random.uniform(xdomain[0], xdomain[1]) y = random.uniform(0, 0.4167) return (x,y) #Number of sample points to sample n = 100 #Creating two arrays to capture accepted and rejected points accepted = [] rejected = [] #Run this loop until we got required number of valid points while len(accepted) < n: #Get random point x, y = random_point_within_enveloping_region() #If y is below blue curve then accept it if y < pdf(x): accepted.append((x, y)) #otherwise reject it. else: rejected.append((x, y)) #Plot the graph x = np.linspace(a, b, 100) plt.plot(x, [pdf(i) for i in x], color='blue') # Plot Random Variable X plt.plot(x, [0.4167 for i in x], color='black', ls='dashed', lw=2) # Plot Enveloping Region plt.plot([x[0] for x in accepted], [x[1] for x in accepted] , 'ro', color='g') # Plot Accepted Points plt.plot([x[0] for x in rejected], [x[1] for x in rejected] , 'ro', color='r') # Plot Rejected Points plt.show() #Calculate expected value for the truncated standard normal distribution approxMean = sum([x[0] for x in accepted])/len(accepted) print "Expected Mean = ", 0, pdf(0) print "Approximated Mean = ", approxMean, pdf(approxMean) print "Approximated Variance = ", sum([(x[0] - approxMean)**2 for x in accepted])/(len(accepted)-1)
Expected Mean = 0 0.400022258921 Approximated Mean = -0.0896272908375 0.398418781625 Approximated Variance = 1.17167227825
Above the term “enveloping region” was broadly used to indicate some distribution function. Any distribution (such as gaussian, uniform, etc) can be used as an enveloping distribution as long as following condition is meet:
.
That is at all possible value of x in the domain of the target distribution, probability of X=x based on the envelop distribution is more than that obtained by the target distribution. For instance for the truncated normal distribution at X=0, . Thus any enveloping distribution has to be more than 0.4 at point X = 0.
In the above example I assumed a uniform distribution ranging from -3 to 3 as the envelope distribution. For this envelope distribution, . However if we simply use this uniform distribution as it is, then we will violating the above condition A. For instance, since the envelop distribution is a uniform distribution, at X=0 .
To overcome this challenge, rejection sampling introduces a multiplier constant “M”. In the above example I used M = 2.5. The reason I used M = 2.5 because the max height for standard normal distribution is 0.4 at X = 0. For the given envelop distribution, P(X=0) = 0.167. Thus M = 0.167/0.4 = 2.3. Based on this multiplier condition, we can reformulate the condition for enveloping distribution as follows:
.
In practice, finding the max height of the target distribution can be challenging. Also we just want to make sure that at point condition B holds true. Hence in practice we start with random M. After each sample we make sure that the above condition holds (See line number 57-62 in the modified code below). If this is false then we increment M and restart the sampling from beginning.
import random import numpy as np import matplotlib.pyplot as plt import matplotlib.mlab as mlab from scipy.stats import truncnorm #Domain of X xdomain = [-3, 3] #Multiplier Constant M = 2.0 def pdf(x): """ Probability distribution function for Random Variable X from which we want to sample points. Here we assume we have truncated standard normal distribution in the domain of -3 to 3 """ return truncnorm.pdf(x, xdomain[0], xdomain[1]) def random_point_within_enveloping_region(): """ Return random point within the enveloping region. For x we will randomly sample point between -3 and 3 Since we are assuming uniform distribution, the height of the enveloping region at any x is 1/6. So for Y we randomly sample point between 0 and 1/6 """ #Randomly sample x from -3 to 3 x = random.uniform(xdomain[0], xdomain[1]) # probability of obtain any x is equal to 1/6. i.e. height of enveloping region # for any X is 1/6. y = random.uniform(0, M * 1.0/6.0 ) return (x,y) def height_of_enveloping_region(x): """Return height of enveloping region at x.""" return M * 1.0/6.0 #Number of sample points to sample n = 100 #Creating two arrays to capture accepted and rejected points accepted = [] rejected = [] M = 2.0 #Run this loop until we got required number of valid points while len(accepted) < n: #Get random point x, y = random_point_within_enveloping_region() #If for any x if envelping region is below the distribution from which we want to sample points #increment the multipler constant and resample all the points. if height_of_enveloping_region(x) < pdf(x): print "Increasing M from {0} to {1}".format(M, M+1) accepted = [] rejected = [] M += 1.0 continue #If y is below blue curve then accept it if y < pdf(x): accepted.append((x, y)) #otherwise reject it. else: rejected.append((x, y)) x = np.linspace(a, b, 100) plt.plot(x, [pdf(i) for i in x], color='blue') plt.plot(x, [1.0/6 for i in x], color='black', ls='dashed', lw=1) plt.plot(x, [M * 1.0/6 for i in x], color='black', ls='dashed', lw=2) plt.plot([x[0] for x in accepted], [x[1] for x in accepted] , 'ro', color='g') plt.plot([x[0] for x in rejected], [x[1] for x in rejected] , 'ro', color='r') plt.show() #Calculate expected value for the truncated standard normal distribution approxMean = sum([x[0] for x in accepted])/len(accepted) print "Expected Mean = ", 0, pdf(0) print "Approximated Mean = ", approxMean, pdf(approxMean) print "Approximated Variance = ", sum([(x[0] - approxMean)**2 for x in accepted])/(len(accepted)-1)
See example over here
Since we are throwing all the points that are outside of target distribution, one might question why not directly sample y in “random_point_within_enveloping_region” function between 0 and P_{target}(X=x) (instead of P_{Envelop}(X=x)). The problem with this approach is that x is not uniformly distributed in our target distribution. If we sample points from truncated normal distribution we are likely to get lot more x = 0 then x = -3. However in “random_point_within_enveloping_region” we are using uniform distribution to sample x.
The intuition behind the MLE approach is to find one such that maximizes the probability of getting the above observed dataset () i.e. we want to maximize . Since the observed dataset is independent and identically distributed (iid), we can write
For practical purpose (to avoid numerical overflow) rather than maximizing the above equation we take the log of the above equation and maximize it (Since log is monotonically increasing function, it doesn’t change our max estimate).
Above represents number of heads. Now to maximize above equation we take derivative with represent to and equate it to zero. This will give us maximum likelihood estimate for .
Note that the above solution is obtained by maximizing the probability of getting the observed dataset i.e. . However few scholars argue that in reality what we want to maximize is the probability of getting given the observed dataset i.e . Using Bayesian approach this can be expanded as
One advantage of the above approach is that it allows us to include prior knowledge. For instance from past experiences we all know that the probability of getting head or tail of a fair coin is 0.5. But if a coin is tossed 10 times, rarely we get exactly 5 heads and 5 tails. That means we have some kind of distribution for that has a mean around 0.5 and a small variance. Bayesian approach allows us to mathematically incorporate this belief. However the downside of the Bayesian approach is that it quickly becomes lot more difficult to find the closed form solution. Further the challenges are different when trying to solve the above equation analytically and practically.
Theoretically the challenge is computing probability of evidence i.e. . can be represented as . However solving this integration can be impossible unless we assume conjugate prior. This will become clearer after we solve the equation 1. For now, as shown below, let’s assume that we represent our prior belief about using beta distribution. The two parameters () of the Beta distribution allows us to model our belief that a fair coin has equal chances of getting head or tail.
By representing our prior using Beta distribution we can rewrite evidence as
Solving the above integration turns out to be simple and results into a constant value that can be represented as . Thus we can rewrite eqn 1. as
Note that the above posterior has the same form as that of prior distribution (i.e. beta distribution) but with different parameters. For posterior we got and . For the given likelihood distribution (here bernoulli) if the choice of the prior distribution (here beta) results in the same form of distribution for the posterior then the choice of prior distribution is known as conjugate prior. For bernoulli distribution, the conjugate prior is given as beta distribution. For gaussian distribution, the conjugate prior is gaussian distribution itself. For the beta distribution, the expected value of parameter is given as
Based on equation 2 and 3, we can calculate expected value of as
Let’s assume , then . Thus, while as per MLE , as per Bayesian estimator . In the case of Bayesian estimation, there are two different forces that are trying to pull in different directions. From eqn 4 we notice that the bigger our hyper parameters (), i.e. the stronger we belief in the prior, the stronger is the pull towards our prior belief. On the other hand the bigger is the sample size N, the stronger is the pull towards the estimate based on the dataset. If N is sufficiently large then bayesian estimate and MLE will converge. Another difference to note is that MLE gives a point estimate (i.e it returns one value), where as Bayesian estimate returns a distribution. For instance above we got a beta distribution with certain alpha and beta parameter.
From the practical perspective, computing probability of evidence is not an issue. It is merely a normalizing constant. From a practical perspective expected value for can be represented as
As per Monte Carlo method, above integral can be approximated by taking large number of samples from probability distribution of and taking the average value for the above integral at these sample points i.e.
However the challenge is how to sample . Random as drawn from the prior distribution might not be good representative of the sample space represented by . To overcome this challenge there are many different sampling techniques. Broadly these sampling techniques are referred as Importance Sampling. Below is a python example demonstrating one of the simplest importance sampling technique known as Metropolis Sampling technique. I leave the discussion for importance sampling for some other time but for now below is the python code to compute for the given problem.
Below I am assume that the prior follows a beta distribution with . Thus the integral that we need to solve is
Any constant can be dropped when using Monte Carlo approach. Hence dropping Beta(2,2) from the above equation, the integral that we need to solve from the Monte Carlo approach perspective is
import random import scipy.stats import scipy.special import matplotlib.pyplot as plt def integral(theta): """ This is likelihood X prior after removing any constant terms. """ return theta**9*(1-theta)**3 n = 100 # Number of points to sample samples = [random.random()] # Random starting point for i in range(n): #Grab last sample point theta = samples[-1] #Create new sample by randomly selecting a point from a normal distribution #of mean = 0 and sd = 0.1. if the new sample is outside of the #domain then ignore it and use existing sample newTheta = theta + random.normalvariate(0, 0.1) if newTheta < 0 or newTheta > 1: newTheta = theta #If the probability of new sample as compared to last sample is less than uniform distribution #then ignore it. acceptanceRatio = integral(newTheta)/integral(theta) if acceptanceRatio > random.random(): # accept only if going uphill samples.append(newTheta) else: samples.append(theta) print "Estimate: ", sum(samples)/n # Plot how sample theta varies with each iteration ylab = [i for i in xrange(len(samples))] pylab.plot(samples, ylab) pylab.title('Random Walk Visualization') pylab.xlabel('Theta Value') pylab.ylabel('Time') pylab.show()
Running the above code gave mean as 0.64. The plot shows sampled at different iteration number.
Note that the above code only sampled 100 points. Typically you sample much larger number of points and also ignore few initial points, known as burn-in period, so as to remove influence of starting point from the estimate. If we simply set then the estimate for is 0.71. This is same as we got analytically by applying the bayesian form.
References and Notes
One of the best resources I found on this topic is a series of tutorials by Dr. Avinash KaK. The python code is based on the Hilbert’s blog on Monte Carlo Markov Chains.
Apart from MLE and Bayesian approach, another frequently used approach is Maximum A Posterior (MAP). Similar to MLE, MAP gives a point estimate but also allows to incorporate prior beliefs.
Below are the three steps to get live preview kind of setup.
I am assuming you already have homebrew installed. If not, please find instructions over here. Once you have homebrew, installing fswatch is a breeze.
# Homebrew $ brew install fswatch
Copy the below Apple Script and save it in your home folder as reloadChrome.sh. This script tells Google Chrome browser to reload its active browser.
#!/bin/sh exec <"$0" || exit; read v; read v; exec /usr/bin/osascript - "$@"; exit tell application "Google Chrome" reload active tab of window 1 end tell
Make sure that the script is executable: $ chmod +x ~/reloadChrome.sh
Now inside your sphinx docs folder run the following command from terminal.
fswatch -r -0 ../myproject/ | xargs -0 -I % sh -c "make html; ~/reloadChrome.sh"
Open a Google Chrome browser and in the active tab open your local sphinx documentation. Once done, start updating your docstring. Every time you save the file, fswatch will fire an event and eventually xargs will execute make html
command and finally reloadChrome.sh will reload the active tab. With two screens you will have (nearly) live preview kind of setup.
Enjoy writing your docstring in reStructured Text format.