浅析Hadoop下的基础排序算法(MergeSort） - Hadoop

TOP

浅析Hadoop下的基础排序算法(MergeSort）

2018-12-04 08:39:17 【大中小】浏览:51次

Tags：浅析 Hadoop 基础排序算法 MergeSort

一前言:

1.语术：

·MS:MergeSort

·IS:InsertSort

·Hadoop 版本: 基于Version 2.7.1代码分析

2. MS归并排序的思想：

把数组二分为sub数组，递归各个sub数组排序，排好sub数组后，归并到目的数组dest的对应的下标段，归并的下标段最后回归到0到(length-1)。

这里有动画， https://www.jianshu.com/p/7d037c332a9d

3. 目的：

分析Hadoop的MS排序是否与标准的MS排序有区别？是否有优化？

二．内容

源代码：

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.hadoop.util;

import java.util.Comparator;

import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.io.IntWritable;

/** An implementation of the core algorithm of MergeSort. */
@InterfaceAudience.LimitedPrivate({"MapReduce"})
@InterfaceStability.Unstable
public class MergeSort {
  //Reusable IntWritables
  IntWritable I = new IntWritable(0);
  IntWritable J = new IntWritable(0);
  
  //the comparator that the algo should use
  private Comparator<IntWritable> comparator;
  
  public MergeSort(Comparator<IntWritable> comparator) {
    this.comparator = comparator;
  }
  
  public void mergeSort(int src[], int dest[], int low, int high) {
    int length = high - low;

    // Insertion sort on smallest arrays
    if (length < 7) {
      for (int i=low; i<high; i++) {
        for (int j=i;j > low; j--) {
          I.set(dest[j-1]);
          J.set(dest[j]);
          if (comparator.compare(I, J)>0)
            swap(dest, j, j-1);
        }
      }
      return;
    }

    // Recursively sort halves of dest into src
    int mid = (low + high) >>> 1;
    mergeSort(dest, src, low, mid);
    mergeSort(dest, src, mid, high);

    I.set(src[mid-1]);
    J.set(src[mid]);
    // If list is already sorted, just copy from src to dest.  This is an
    // optimization that results in faster sorts for nearly ordered lists.
    if (comparator.compare(I, J) <= 0) {
      System.arraycopy(src, low, dest, low, length);
      return;
    }

    // Merge sorted halves (now in src) into dest
    for (int i = low, p = low, q = mid; i < high; i++) {
      if (q < high && p < mid) {
        I.set(src[p]);
        J.set(src[q]);
      }
      if (q>=high || p<mid && comparator.compare(I, J) <= 0)
        dest[i] = src[p++];
      else
        dest[i] = src[q++];
    }
  }

  private void swap(int x[], int a, int b) {
    int t = x[a];
    x[a] = x[b];
    x[b] = t;
  }
}

1.Sub数组的长度length小于7时，引用lS来排序，这个跟Hadoop 的QS里length < 13类似。

// Insertion sort on smallest arrays

if(length< 7) {

for(inti=low; i<high; i++) {

for (intj=i;j > low; j--) {

I.set(dest[j-1]);

J.set(dest[j]);

if (comparator.compare(I, J)>0)

swap(dest, j, j-1);

}

return;

}

2.二分法后递归，思路一致。

// Recursively sort halves of destinto src

intmid= (low+ high)>>> 1;

mergeSort(dest,src,low,mid);

mergeSort(dest,src,mid,high);

3.如果两个sub数组拼接到一起就是有序的话

// If list is already sorted, just copyfrom src to dest. This isan

// optimization that results in fastersorts for nearly ordered lists.

if(comparator.compare(I, J) <= 0) {

System.arraycopy(src, low,dest,low,length);

return;

}

4.按从小到大挨个放入dest数组中

// Merge sorted halves (now in src) intodest

for (int i = low, p = low, q = mid; i <high; i++) {

if (q < high && p < mid) {

I.set(src[p]);

J.set(src[q]);

}

if (q>=high || p<mid &&comparator.compare(I, J) <= 0)

dest[i] = src[p++];

else

dest[i] = src[q++];

}


【大中小】【打印】【繁体】【投稿】【收藏】【推荐】【举报】【评论】【关闭】【返回顶部】

上一篇：hadoop学习笔记之	下一篇：Flink on Hadoop 从零搭建