Satellite Data Preprocessing with Python/PCI II – Cloud Masking, Haze Removal, Terrain Derivates – ATCOR

GEO214 /409 /402 Python Data Preprocessing Code – Section II

22.03.2018:  Section II: MASKING/HAZEREM/TERSETUP/ATCOR/RESAMP processing

Batch multispectral data pre processing for RapidEye and Sentinel-2 data (L1C) in PCI Geomatica 2017. A reader for two of the sessions in GEOG214 and GEO402/GEOG409 at FSU Jena (BSc Geography / MSc Geoinformatics).

Section I: NITF file import and reprojection in a given directory:

  1. imports NITF RE L1B data to PIX files by also importing all metadata there is
  2. reprojects the files to UTM WGS84 CC,
  3. resamples to low resolution copies of the data files with 30 and 100m and
  4. reads header information and georef to text report
  5. reads simple band statistics to text report

Section II:   Cloud masking/haze removal/terrain Analysis – full processing in Atmospheric Correction package ATCOR. 

  1. Calculate a cloud, haze and water mask
  2. Remove the haze from satellite date
  3. Calculate DEM derivates: slope/aspect/shadow-cast layer
    1. Optional: lowpass filter the DEM
    2. slope/aspect/shadow cast layer calculation
  4. Atmospheric correction on haze corrected input calculating the local incidence angle and outputting: 16 Bit scaled topographic normalized reflectance.

Section III:  R statistics integration using rpy2 and numpy2

Section IV:   Integration of GRASS binaries into workflows 

Section V:     Integration of AGISOFT


1. Loading libraries and defining locale environment

Here we  load PCI libraries, define the locale environment and define what information the user should provide on the CLI (data path and DEM).

print "******************* ATCOR Processing with RE Data **************************"
# by SHese 22.3.2018 RE ATCOR processing 
# Script to loop through files in a folder and apply the full atmosphere correction workflow.
# - cloud masking 
# - hazerem uses cld/hazemasks for hazerem
# - tersetup creates the illufiles 
# - atcor with illufiles/hazerem files
# ---------------------------------------------------------------------------
# Import the required algorithms 
# ---------------------------------------------------------------------------
# usage: /full/directory/to/datafiles /fulldir/path/andfilename-DEM.pix
print "1. Importing PCI Libraries ... "

from import *
from pci.lut import *
from pci.pcimod import *
from pci.exceptions import *
from pci.ortho import *
from pci.fimport import fimport
from pci.resamp import *
from pci.masking import *
from pci.hazerem import * #
from pci.tersetup import * # is not supporting multithreading unfortunately
from pci.atcor import * # uses big tmp files in the tmp file directory - cld pot crash
from sys import argv 

import sys # These two libraries are required to extract file names from folders
import os
import fnmatch

# The following locale settings are used to ensure that Python is configured
# the same way as PCI's C/C++ code. 
import locale
locale.setlocale(locale.LC_ALL, "")
locale.setlocale(locale.LC_NUMERIC, "C")

script, InFolder, DEMFile = argv # here we define the input by use on the CLI 
# usage: /full/directory/to/datafiles /fulldir/path/andfilename-DEM.pix
# ------------------------------------------------------------------------
# Set File variables
# ------------------------------------------------------------------------
print "2. Processing in folder path: " + InFolder

# InFolder = raw_input("Please enter full input folder path: ") 
# User will enter pathname at a later stage - we dont want this here

2. Next we are listing  the input files that should be processed:
This section searches for the filenames that we want in our filename input list. Its important to get the file_filter here right to populate the right input files in our for-loop later.

# ----------------------------------------------------
# Create list of files to loop through from InFolder
# ----------------------------------------------------
print "3. Creating list of valid files ..."
# This line is a wildcard filter. In this case only ORTHO.pix files will be used
file_filter = "*ORTHO.pix" 

# This list will become populated with the full pathnames of each file in InFolder
input_files_list = [] 

# os.walk searches a directory tree and produces the file name of any file it encounters
# r is the main directory
# d is any subdirectory within r
# f image file names within r
for r,d,f in os.walk(InFolder):
 for file in fnmatch.filter(f, file_filter): # fnmatch.filter filters out non .pix files
 print "4. Found valid input file: ", file
 input_files_list.append(os.path.join(r,file)) # The file name and full path are added to input_files_list

3. Cloud Masking using MASKING  
Now we start the cloud masking algorithm from PCI: “MASKING” – here the basic parameters are derived from expected TOA values that we can measure using the C0C1 coeffis. This isnt extremely accurate but allows masking in order to avoid overcorrection at a later stage.

 for ORTHO in input_files_list:
 print "5. Beginning Cloudmasking ..."
 fili = ORTHO
 filo = ORTHO + "MASKING.pix"
 visirchn = [1,3,5,0] # RE channels blue red nir and swir missing
 asensor = "RapidEye" # ssensor is RapidEye should be "Eye"! here - should be changed as appropriate
 cfile = "/opt/geomatica_2017/atcor/cal/rapideye_m1/" #The path and file name of the text 
 # file that contains, for each band, the calibration coefficients 
 # used to transform the values from the image to absolute radiance values.
 znangle = [] # ingest that from metadata
 hazecov = [15] # default is 50% - me thinks this is usually too much! 
 clthresh = [] 
 wuthresh = []
 srcbgd = ""
 masking(fili, srcbgd, asensor, visirchn, cfile, znangle, hazecov, clthresh, wuthresh, filo)

cfile = “/opt/geomatica_2017/atcor/cal/rapideye_m1/”
The path and file name of the text file that contains, for each band, the calibration coefficients used to transform the values from the image to absolute radiance values.

4. Haze Removal using HAZEREM
With the cloud and haze mask – stored as a bitmap layers we now apply the haze correction using HAZEREM.

 # ----------------------------------------------------------------------------------
 # Use HAZERM algorithm with cloud mask from MASKING to avoid overcorrection
 # ----------------------------------------------------------------------------------
 print "6. Beginning HAZEREM ..."
 fili = ORTHO # we use the original ORTHO output as input here but dont mix it with maskfili
 fili_pan = "" # we dont have this for RapidEye
 srcbgd = ""
 asensor = "Rapideye" # Rapideye "eye"here
 visirchn = [] # this is taken vom ASENSOR automatically
 chanopt = "p,p,p,c,c" # "p,p,p,c,c" - apply haze removal to 1,2,3 and copy all NIR channels wo
 maskfili = ORTHO + "MASKING.pix" 
 maskseg = [2,3,4] # hazerem expects haze mask in 2, cloud mask in 3 and water mask in segment 4 
 hazecov = [15] # we go as low as possible here - overcorrection should be avoided
 hazeflsz = []
 filo = ORTHO + "MASKING-HAZEREM.pix" 
 filo_pan = ""
 ftype = "PIX"
 foptions = ""
 hazerem( fili, fili_pan, srcbgd, asensor, visirchn, chanopt, maskfili, maskseg, hazecov, hazeflsz, filo, filo_pan, ftype, foptions )

5. TERSETUP creates the DEM derivates
To perform topographic normalization we need a DEM -slope/aspect and illumination angles to calculate the local incidence angle. This is done by ATCOR on the fly given you provide aspect/slope. This is what TERSETUP does. ILLUMCAST can calculate the local incidence angle but ILLUMCAST is not needed when we use TERSETUP. Note however that you might need another lowpass filtering run to reduce artefacts that originate often from edges in your DEM. Some higher order AVERAGE low pass filter runs are helpful than:

From PCI online help:

from pci.fme import *

file	=	'input.pix'
dbic	=	[10]	# elevation data channel
dboc	=	[11]	# same as input channel
flsz	=	[5,5]	# 5x5 filter
mask	=	[]	# process entire image
bgrange	=	[]
failvalu	=	[]
bgzero	=	''

fme( file, dbic, dboc, flsz, mask, bgrange, failvalu, bgzero )

or use an average filter:

from pci.fav import fav

file	=	'input.pix'
dbic	=	[1]	# elevation data channel
dboc	=	[6]	# output channel
flsz	=	[5,5]	# use a 5x5 filter
mask	=	[]	# bitmap mask
bgrange	=	[0]	# remove background values
failvalu	=	[]
bgzero	=	''	# default, set background to 0

fav( file, dbic, dboc, flsz, mask, bgrange, failvalu, bgzero )
        # ----------------------------------------------------------------------------------
        # Use TERSETUP algorithm to prepare terrain derivates: slope, aspect, and skyview rasters from an input DEM.
        # ----------------------------------------------------------------------------------
        print "7. Beginning TERSETUP ..."
        filedem	=	DEMFile	# input DEM file should be hard coded ASTER GDEM 15 or SRTM 30
        dbec	=	[1]	# input DEM channel expected in channel 1 here
        terfile	=	"tersetup.pix"	# output terrain image file - we use a temp file here
        backelev	=	[]	# search for No Data value in metadata
        elevref	=	"ELLIPS"	# default, Mean Sea Level
        elevunit	=	"METER"	# default, METER
        elfactor	=	[]	# default, [0.0,1.0]
        tersetup( filedem, dbec, terfile, backelev, elevref, elevunit, elfactor )

ILLUMCAST is not needed by ATCOR when TERSETUP is run – ATCOR calculates local incidence angle on the fly from tersetup output.

6. ATCOR – Atmospheric Correction – calculating scaled reflectance
Now we have all the input layers for ATCOR. MASKING and HAZEREM provide the cloud/haze/water mask and generated the haze corrected dataset, TERSETUP created the DEM derivates to perform a topographic normalization.

       # ----------------------------------------------------------------------------------
        # ATCOR algorithm 
        # ----------------------------------------------------------------------------------
        print "8. Beginning ATCOR ..."  
        fili	=	ORTHO + "MASKING-HAZEREM.pix"
        dbic	=	[1,2,3,4,5]
        srcbgd	=	""
        asensor	=	"Rapideye Mode1"
        cfile	=	"/opt/geomatica_2017/atcor/cal/rapideye_m1/" # this is linux specific - should be change to your setup
        maskfili	= ORTHO + "MASKING.pix" 
        terfile	=	"tersetup.pix"
        illufile	=	""  
        meanelev	=	[10]
        vistype	=	"constant,25.0"    
        visfilo	=	""
        atmdef	=	"Urban"             # needs modification for every project
        atmcond	=	"summer"            # needs modification for every project
        satilaz	=	[]                  # is taken from the metadata by ATCOR - should be left open
        sazangl	=	[]                  # is also taken from the metadata by ATCOR, 
                                            #note: Solar zenith = 90 degrees - solar_elevation
        adjacen	=	"ON,5"               
        brdffun	=	[]
        terrefl	=	[]             
        # Optionally specifies the number iterations used to apply adjacency effect corrections. use 0 to switch off
        outunits	=	"16bit_Reflectance" # we want 16bit Reflectance to keep it simple 
        filo	=	ORTHO + "MASKING-HAZEREM-ATCOR.pix"
        ftype	=	"PIX"
        foptions=	""
        try: atcor( fili, dbic, srcbgd, asensor, cfile, maskfili, terfile, illufile, meanelev, vistype, visfilo,\
        atmdef, atmcond, satilaz, sazangl, adjacen, brdffun, terrefl, outunits, filo, ftype, foptions )

        except PCIException, e: print e
        except Exception, e: print e
        print "   Beginning RESAMP 100m ..."        
        fili = ORTHO + "MASKING-HAZEREM-ATCOR.pix"
        filo = ORTHO + "MASKING-HAZEREM-ATCOR-RESAMP100m.pix"
        dbic	=	[1,2,3,4,5]
        dbsl	=	[]
        sltype	=	"ALL"
        ftype	=	"PIX"
        foptions	= ' '
        pxszout	=	[100, 100]	# output with meter resolution
        resample	=	"CUBIC"	# Cubic Convolution method

        try: resamp(fili, dbic, dbsl, sltype, filo, ftype, foptions, pxszout, resample )
        except PCIException, e: print e
        except Exception, e: print e
print "END Atcor finished", file

 Some comments that I removed from the script for better readability (partly take from the PCI Online Help):

ATCOR receives as input a haze-free image, the cloud and cloud shadow, water, ice/snow and saturated pixel masks, and the terrain derivatives and illumination map for each scene to process to compute the visibility map and apply the atmospheric LUT to the scene, so that the atmospheric addition to the DN of each pixel can be removed. ATCOR allows users to prepare their data for analysis, such as GCP collection, segmentation, classification, or extraction of vegetation indices. ATCOR depends heavily on the quality of the visibility map and the accuracy of the atmospheric tables (based on MOTRAN1 code), as well as on the quality of the haze-free scene and terrain derivatives.

maskfili = ORTHO + “MASKING.pix”
specifies the name of the input file that contains the Haze and Cloud mask produced by MASKING for the reference image. The specified file must match exactly the geocoding of the input image and the mask for cloud and haze. ATCOR assumes the first bitmap found to be the haze mask, and the second to be the Cloud mask.

illufile = “”
Optionally specifies the name of the input illumination file that contains the illumination map and topographic cast shadow bitmap produced by ILLUMCAST for the input image. If the illumination file is not provided, the information is computed on-the-fly when a Terrain Derivatives file (TERFILE) is provided.

adjacen = “ON,5”
Optionally specifies whether a correction for the adjacency effect is required. This parameter is composed of a keyword and an optional filter size. Available options are:
OFF: no adjacency effect correction is applied,
ON,n: the correction for adjacency is applied using a nxn kernel size, where n is an odd number between 3 and 39
ON: the correction for adjacency is applied, using a 3×3 kernel size
The adjacency is the radiation reflected from the neighborhood and scattered energy into the viewing direction. The effect is a result of atmospheric scattering, and depends on the reflectance contrast between a target pixel and its large-scale neighborhood, and decreases with wavelength. It reduces the apparent surface contrast by decreasing the top-of-the-atmosphere radiance over bright pixels and increasing the brightness of the dark pixels. The adjacency effect causes a certain amount of blurring, known as crosstalk. This effect is often noticeable at the boundary of adjacent features; for example, roads passing through a forest may appear blurred. The adjacency effect is especially important for sensors of high spatial resolution, such as RapidEye, KazEOSat-2 and SPOT and is usually negligible for low spatial resolution sensors such as AVHRR and Landsat.

brdffun = []
[],[],[] where:

FUNCTION: a number between 0 and 4, each representing a different correction function

THRSANG: the threshold illumination angle (by default, set as the solar zenith angle)

LBOUND: indicates the lowest correction applied. By default, set to 0.25, meaning that the pixel values will be changed at most by 75%, in order to avoid overcorrection.

This parameter can receive up to five different values, which set the following correction functions:

0: no BRDF correction is applied,

1: sets the correction factor as a linear function based on the illumination angle (i), defined as cos(i)/cos(THRSANG),

2: sets the correction factor as an exponential function, defined as sqrt [cos(i)/cos(THRSANG)],

3: sets the correction factor as a linear function based on the illumination angle and exitance angles, defined as cos (i)*cos(e)/cos(THRSANG),

4: sets the correction factor as an exponential function based on the illumination angle and exitance angles, defined as sqrt[cos(i)*cos(e)/cos(THRSANG)]. This function is relevant only for tilt sensors.

When a Terrain Derivative (TERFILE) file is specified, this parameter is set to 2 by default. The effects of the threshold illumination are scene-dependent and its effect varies according to local topography. If the topographic correction must be increased on the steepest slopes, the following rules are recommended, to define the threshold angle: if solar zenith is less than 20 degrees, THRSANG = Solar Zenith + 20 degrees if the solar zenith angle is between 20 and 45 degrees, THRSANG = Solar Zenith + 15 degrees if the solar zenith angle is higher than 45 degrees, THRSANG = Solar Zenith + 10 degrees.