admin

The Notes Of Machine Learning On Coursera (Week 9-11)
Week 9 Anomaly Detection And Recommender Systempossible a...
扫描右侧二维码阅读全文
19
2018/07

The Notes Of Machine Learning On Coursera (Week 9-11)

Week 9 Anomaly Detection And Recommender System

possible applications: fraud detections, aircraft detections and so on. Just use the probability p < threhold to catch unusual example or unusual behavior.

Gaussian distrIbution(normal distribution)

We often use 1/m instead of 1/(m-1) ; the gaussian distribution is expressed by sigma squared and miu. When we are given a data set, we can avearage them to get the miu and use the formula to compute sigma squared.

Building an Anomaly Detection System

1. When we derive the model, we multiply p1,p2...pn no matter whether the indepence principle holds true.
2.Evaluate the anomaly detction system: We label the examples as 1 if anomaly and as 0 if normal.; Using the same good examples or anomalous examples for the cross validation set and the test set is less recommeded, though some people are still using this way; Because the data is very skewed, a recommended strategy is recall/precision rather than the accuracy; We can also use the cross validation set to choose paramter eplison.
3.Anomaly Detection vs. Supervised Learning: Problem: now that the examples are labelled, why don't we just use supervised learning? When to use the anomaly detection: when we have small number of the positive examples ( anomaly ones ) or there are many different types of anomalies, making it hard to learn from the positive examples ; In some cases, the anomaly detection can shift over to the supervised leanring method.
4.Choosing what features to use : If we find that our data is non-gaussian with some feature after plotting, we may perform some transformation first to make it more gaussian, for example x_2 <--- log(x_2+1) ; Error analysis to come up with new features: If p(x) is comparable for normal and anomalous examples, we can look at the particular example and consider whether we can find some new features to create a new feature and improve our performance.

Multivariate Gaussian Distribution

There the capital sigma is the covariance matrix( n*n matrix) ; We can also convey the correlation of the x_1 and x_2 through this method ;The original model corresponds to a multivariate Gaussian where the coutours of p(x;miu;capital sigma) are axis-aligned, which means that the sigma matirx gets zero off the diagonal. ; Using the multivariate Gaussian model can automatically capture correaltions between different features in x ; If you find that the multivariate gaussian algorithm does not perform well or that the sigma matrix is sigular, the possible cause is that the features are redunant(like x_2 = x_1, x_3 = x_4 + x_5)

Recommender System

Recommender system is one of the major applications of the machine learning, and this a different setting where the machine have the algorithm to learn the features automatically ;
Problem fomulation: movie rating. We introduce the r matrix and the y matrix to recomend the movies that have not been watched by some users.
Content Based Recommendations:this one algorithm is to learn j users.
Collaborative Filtering: this one is to learn m  movies.This method has the algorithm to obtain the features by itself. From the appetites of the users, we get the scores of the movies(for example, in action aspect and in romance aspect) ; It's necessary to initialize theta to nonzero values to break the symmstery.
Simultaneously learn the parameters and features
Mean normaliation: if there is some user who doesn't rate any movie, we have to first perform the mean normalization to minimize the J and add back the average value miu later to recommend movies to this user ;  Besides, if we need to merge different data on different websites, we need to also conduct feature scaling.
Pictures that may be useful:
  

Week 10 Large Scale Machine Learning

Map reduce method is also applicable in building the neural network

Stochastic Gradient Descent And Mini-Batch Gradient Descent

In stochastic gradient descent, during every iteration, we compute only one example. When minimizing the J, we usually plot the cost averaged by the last 1000 examples or so instead of  process the entire millions of examples, making it more computationally efficient. If we find that our curve is flutatng strongly, increasing the number of examples may help; If we find that the J increases, using a small learning rate may help ; What's more, we can decrease the alpha when performing the learning process, but it may be more computaionally expensive.
If we can implement good vectorizations, mini-batch may be a better choice.

Online Learning

This is performed when we are dealing with a continuous stream of data and examples instead of a fixed training set. Each example will be used only once and will then be dropped. It is equally important to select good features during online learning.

Map Reduce

We can split the training set evenly and assign different parts to different computers and then merge the results in the center
If we have computers with more than one CPU or process-core, we can also use one computer to commplete the process.
Pictures that may be useful:

Week 11 Photo OCR

Problem Description And Pipeline:

When someone refers to a “machine learning pipeline,” he or she is referring to: A system with many stages / components, several of which may use machine learning.

Artificial Data:

On occasions when we need large amount of data, there are mainly two ways to create artificial data. One of them is create from scratch, and the other one is create from exising examples; When performing rotations, translations or introducing gaussian noise, we must consider what is meaningful for our examples.

Ceiling Analysis

We can't hurry to doing some particular type of work and should do the ceiling analysis instead, which can help us decide on allocation of resources in terms of which component in a machine learning pipline to spend more effort on and void watsing time on some work that will not improve our overall performance significantly.

Last modification:March 13th, 2019 at 07:06 pm
If you think my article is useful to you, please feel free to appreciate

Leave a Comment