Satellite Data Preprocessing with Python/PCI I (Import & Ortho)

GEO214 /409 /402 Python Data Processing Code Examples – Section I

19.3.2018

Batch multispectral data pre processing for RapidEye data in PCI Geomatica 2017. This will be extended over the next few months – I will keep a date imprint here and there to make the updates a bit more transparent. This is more/less a Reader for two of the sessions in GEOG214 and GEO402/GEOG409.

Section I: NITF file import and reprojection in a given directory:

  1. imports NITF RE L1B data to PIX files by also importing all metadata there is
  2. reprojects the files to UTM WGS84 CC,
  3. resamples to low resolution copies of the data files with 30 and 100m and
  4. reads header information and georef to text report
  5. reads simple band statistics to text report

Section II:   Cloud masking/haze removal/terrain Analysis – full processing in Atmosphere Correction package ATCOR. 

  1. Calculate a cloud, haze and water mask
  2. Remove the haze from satellite date
  3. Calculate DEM derivates: slope/aspect/shadow-cast layer
    1. Optional: lowpass filter the DEM
    2. slope/aspect/shadow cast layer calculation
  4. Atmospheric correction on haze corrected input calculating the local incidence angle and outputting: 16 Bit scaled topographic normalized reflectance.

Section III:  R statistics integration using rpy2 and numpy2

Section IV:   Integration of GRASS binaries into workflows 

Section V:     Integration of AGISOFT

 

1. Head section of a Python PCI script

Here we  load some PCI libraries and define the locale environment and add some specific inout and write txt libs.

print "******************* Import AND Reproject NTF **************************"
# bySHese 19.3.2018 RE L1b Import and Reprojection and Resamp to 30m and 100m
# Script to loops through files in a folder and do the following:
# - import NTF Files to PCIPIX file format
# - reproject to utm wgs84 with defined projection parameters - should be changed 
# - reduces the resolution by a factor to get to 30m res and 100m low res files 
# - read some report to text file
# all is done on a user input directory tree recursive using "InFolder". 

# -------------------------------
# Import the required algorithms
# -------------------------------
print " Importing PCI Libraries ... "

from pci.fun import *
from pci.lut import *
from pci.pcimod import *
from pci.exceptions import * 
from pci.ortho import * #needed for ortho
from pci.fimport import fimport #needed for fimport
from pci.resamp import * #this is needed for resamp
from sys import argv
from pci.shl import * #this is for reading pci pix header info
from pci.asl import * #this is needed for asl segment listing
from pci.cdl import * #this is needed for cdl
from pci.his import * #needed for his report
from pci.nspio import Report, enableDefaultReport #to write pci reports from pace programs 
from pci.prorep import * #needed by prorep

import sys # These two libraries are required to extract file names from folders
import os
import fnmatch

# The following locale settings are used to ensure that Python is configured
# the same way as PCI's C/C++ code. 
import locale
locale.setlocale(locale.LC_ALL, "")
locale.setlocale(locale.LC_NUMERIC, "C")

script, InFolder, DEMFile = argv
The comments are defined using a hash sign - you can also put these comments directly behind and coded line
# comments are defined after the hash sign

2. The input section of the script

where path variables are defined – you usually want that from the user and not hard coded into your script – so f.e. using the following expression the user would have to start the script by typing:

python pythonscriptname /dir/where/the/files/are

to start the script and to include the “InFolder” string (in this example). We later use “InFolder” to place the string into pathname.

script, InFolder, DEMFile = argv

 

Optional and an alternative would be:

make a path definition later in the routine where the script waits for user input – this is ok but you have to interact and cannot batch these commands than.

InFolder = path_input("Please enter full input folder path: ") # User will enter pathname here

However this is not so elegant because you cannot batch stack these python scripts later when you want to run these from another script.

3. The file selection that defines which files are processed

This section is programmed to recursively find all files with a specific naming convention and to process these later. Its a search routine that creates a container for the input filenames. We later just let the algorithm that processes data in PCI ingest the filename container that we create here:

#
# ---------------------
# Set File variables
# ---------------------
print "Processing in folder path: " + InFolder

# ----------------------------------------------------
# Create list of files to loop through from InFolder
# ----------------------------------------------------
print "   Creating list of valid files ..."
# This line is a wildcard filter. In this case only xml metadata files from RE will be used
file_filter = "*metadata.xml"  # this is nitf RE L1b data specific metadata file
                               # here you can import also all metadata fields using fimport and
                               # these fields will be important later f.e. with ATCOR
# This list will become populated with the full pathnames of each file in InFolder
input_files_list = [] 

# section from Geomatica
# os.walk searches a directory tree and produces the file name of any file it encounters
# r is the main directory
# d is any subdirectory within r
# f image file names within r
for r,d,f in os.walk(InFolder):
        for file in fnmatch.filter(f, file_filter): # fnmatch.filter filters out non .pix files
                print "   Found valid input file: ", file
                input_files_list.append(os.path.join(r,file)) # The file name and full path are 
                                                              # added to input_files_list
                                                              #

4. The core processing section now processes the input files

and FIMPORT ( in this example) will ingest all files from the input_file_list. You set the various parameters first (fili, filo etc) and than execute the binary with the full listing of the variable names.

Make sure that the spaces are all as deep as appropriate for your position in the for-loop. Python cares for spaces in every line!

# ----------------------------------------------------------------------------------
# FIMPORT algorithm 
# ----------------------------------------------------------------------------------
print ""
print " Entering raw input information for FIMPORT algorithm ..."
# Run FIMPORT algorithm with each file listed in input_files_list
print ""
print " Beginning the FIMPORT ..."

for xml in input_files_list:
 # Using the _raw variables from above, define each of the required arguments
 fili = xml # input file
 filo = xml + "-FIMPORT.pix"
 dbiw = []
 poption = "AVERAGE" 
 dblayout = "BAND"

 try: fimport(fili, filo, dbiw, poption, dblayout)
 except PCIException, e: print e
 except Exception, e: print e

fili = xml #uses all sections in input_files_list where we find the filename.
Since we now have the right pix files available we can start the reprojection of all files.

5. Reprojection of all PCIDSK files to UTM32 WGS84

Here we use ORTHO (in Geomatica Version 2017). ORTHO has slightly modified syntax compared to ORTHO2.
Since we have a different/new input files we repeat the input file definition for this script.

# ----------------------------------------------------
# Create list of files to loop through from InFolder for UTM Reprojection
# ----------------------------------------------------
print " Creating list of valid files for ORTHO ..."
# This line is a wildcard filter. In this case only FIMPORT.pix files will be used
file_filter = "*FIMPORT.pix" 

# This list will become populated with the full pathnames of each file in InFolder
input_files_list = [] 

# os.walk searches a directory tree and produces the file name of any file it encounters
# r is the main directory
# d is any subdirectory within r
# f image file names within r
for r,d,f in os.walk(InFolder):
 for file in fnmatch.filter(f, file_filter): # fnmatch.filter filters out non .pix files
 print " Found valid input file: ", file
 input_files_list.append(os.path.join(r,file)) # The file name and full path are added to input_files_list

# ----------------------------------------------------------------------------------
# Use ORTHO algorithm 
# ----------------------------------------------------------------------------------
print " Entering raw input information for Ortho algorithm ..."
# Run ORTHO algorithm with each file listed in input_files_list
print " Beginning ORTHO and RESAMP ..."

for FIMPORT in input_files_list:
 # Using the _raw variables from above, define each of the required arguments
 mfile = FIMPORT # input file
 dbic = []
 mmseg = [2]
 dbiw= []
 srcbgd = ""
 filo = FIMPORT + "-ORTHO.pix"
 ftype = "PIX"
 foptions = ""
 outbgd = [] #float
 ulx = ""
 uly = ""
 lrx = ""
 lry = ""
 edgeclip = [] #integer
 tipostrn = ""
 mapunits = "UTM54 D000" # make sure you are in the right UTM zone here: Tokyo 54S Sendai 54S London 30 Koeln/Bonn/HH/Berlin 32 Paris Shanghai Dehli Mexico city NewYork Teheran LA Guangzhou Rom 32 Kulunda Steppe Russia: 44
 bxpxsz = "6.5"
 bypxsz = "6.5"
 filedem = DEMFile # DEMFile is set by user on CLI 
 dbec = [1] # we expect the DEM in channel 1
 backelev = []
 elevref = "ELLIPS" # ASTER GDEM2 this height reference is WGS84 / EGM96 Geoid
 elevunit = "METER" # 
 elfactor = [0.0,1.0] # 
 proc = ""
 sampling = [1]
 resample = "CUBIC"

 try: ortho(mfile, dbic, mmseg, dbiw, srcbgd, filo, ftype, foptions, outbgd, ulx, uly, lrx, lry, edgeclip, tipostrn, mapunits, bxpxsz, bypxsz, filedem, dbec, backelev, elevref, elevunit, elfactor, proc, sampling, resample)
 except PCIException, e: print e
 except Exception, e: print e
 #

Ortho takes some parameters to be correctly tuned for what you want. Some of these params however can be set to default values because they do not effect our processing right now.

It is important to get mapunits right since we should know the right UTM zone for our data and D000 stands for WGS84 – change this as appropriate.

6. Finally we want to resample the files to low res versions

This is easily done with Resamp.

Resamp just takes your input file and the desired resolution in Meters and resamples according to your resampling interpolation method.

print "   Beginning RESAMP 30m ..."        
        # Run RESAMP        
        fili = FIMPORT + "-ORTHO.pix"
        filo = FIMPORT + "-ORTHO-RESAMP30m.pix"
        dbic	=	[1,2,3,4,5]
        dbsl	=	[]
        sltype	=	"ALL"
        ftype	=	"PIX"
        foptions	= ' '
        pxszout	=	[30, 30]	# output with meter resolution
        resample	=	"CUBIC"	# Cubic Convolution method
        
        try: resamp(fili, dbic, dbsl, sltype, filo, ftype, foptions, pxszout, resample)
        except PCIException, e: print e
        except Exception, e: print e
        
        print "   Beginning RESAMP 100m ..."        
        fili = FIMPORT + "-ORTHO.pix"
        filo = FIMPORT + "-ORTHO-RESAMP100m.pix"
        dbic	=	[1,2,3,4,5]
        dbsl	=	[]
        sltype	=	"ALL"
        ftype	=	"PIX"
        foptions	= ' '
        pxszout	=	[100, 100]	# output with meter resolution
        resample	=	"CUBIC"	# Cubic Convolution method

        try: resamp(fili, dbic, dbsl, sltype, filo, ftype, foptions, pxszout, resample )
        except PCIException, e: print e
        except Exception, e: print e
        print "Ortho finished", file

Resamp just takes the Filename from the Ortho processing and we just add new extensions to the filename to get the reduced version information in the filenaming.

7. Read some header and segment / channel information to text files:

Here we use SHL, PROREP, ASL and CDL to read the information into a textfile. This is comparable to the “REPORT” mode in EASI/PACE and easily adds to an existing ASCII file.

This is handy for keeping track of processing steps and parameter setup.

Everything between the brackets:

print "Reporting now about all finished files"
        rep_file = FIMPORT + "ORTHO-REPORT.txt"
        try: 
            Report.clear()
            enableDefaultReport(rep_file)

and the closing sequence

</pre>
<pre>  finally: 
            enableDefaultReport('term')  # this will close the report file

_
… is written to the ASCII file rep_file.

The full section just puts the reporting binaries from PCI exactly in between these expressions as in the following section:

# now lets check some of the pix data based info for the record - always handy to have - and we
        # can look into these reports if needed - 
        # to do this we need the pci.nspio module with the report framework to handle text export write
        # and close. 
        
        print "Reporting now about all finished files"
        rep_file = FIMPORT + "ORTHO-REPORT.txt"
        try: 
            Report.clear()
            enableDefaultReport(rep_file)
        
            # first lets look now into the header now
            print "   Reading Header of file: " + file
            file= FIMPORT + "-ORTHO.pix"
            shl( file)  # starts the shl bin
            
            
            # now lets read the georeferencing segment with projection information 
            print "   Reading Georef Segment of file: " + file
            dbgeo=[1]
            prorep( file, dbgeo) # starts the prorep pci bin
            
            # now lets list all segments in file
            print "   Reading Segment List of file: " + file
            ltyp="FULL"
            aslt=[]
            assn=''
            asl(file, ltyp, aslt, assn)
            
            # now checking channels and reporting in short mode
            print "   Listing all channels of file: " + file
            dbcl    =   []
            dtyp     =   "SHORT"
            cdl(file,dbcl,dtyp)      
        
        finally: 
            enableDefaultReport('term')  # this will close the report file
        print "FINISHED All"

8. Reading some basic statistics from the PCIDSK files:

If we want some more statistical insights its useful to read some more values into the report files. This can easily be done with HIS and SPLREG. Both are not in any way comparable to statistical analysis in R but we get some information about the raster data without the need to open (sometimes) very big raster data files. This can be handy to find out if something went wrong in our processing workflow.

HIS is easily integrated in front of the Report “Close” sequence in the following code:

 finally printing simple descriptive statistics into the report
            print "   Reading simple band stats of file: " + file
            dbic	=	[1,2,3,4,5]	
            gmod	=	'ON'	# show graphic mode
            cmod	=	'OFF'	# no cumulative mode
            pcmo	=	'OFF'	# no percentage mode
            nsam	=	[]	# no number of samples
            trim	=	[]	# no trimming
            hisw	=	[]	# no specific histogram window
            mask	=	[]	# process entire image
            imstat	=	[]	# parameter to receive output
            his( file, dbic, gmod, cmod, pcmo, nsam, trim, hisw, mask, imstat )

But we could also integrate some more useful information about the band to band relations calculating SPLREG as in the following example:

# and we add some linear band to band regressions to the report
            dbic	=	[1,2]	# input channels
            mask	=	[]	# process entire image
            imstat	=	[]
            splreg( file, dbic, mask, imstat )
            dbic	=	[2,3]	# input channels
            mask	=	[]	# process entire image
            imstat	=	[]
            splreg( file, dbic, mask, imstat )
            dbic	=	[3,4]	# input channels
            mask	=	[]	# process entire image
            imstat	=	[]
            splreg( file, dbic, mask, imstat )
            dbic	=	[4,5]	# input channels
            mask	=	[]	# process entire image
            imstat	=	[]
            splreg( file, dbic, mask, imstat )
            dbic	=	[3,5]	# input channels
            mask	=	[]	# process entire image
            imstat	=	[]
            splreg( file, dbic, mask, imstat )

Finally dont forget to close the reporting sequence with the following snippet:

finally: 
            enableDefaultReport('term')  # this will close the report file
        print "FINISHED All"