Satellite Data Preprocessing with Python/PCI I (Import & Ortho)
GEO214 /409 /402 Python Data Processing Code Examples – Section I
19.3.2018
Batch multispectral data pre processing for RapidEye data in PCI Geomatica 2017. This will be extended over the next few months – I will keep a date imprint here and there to make the updates a bit more transparent. This is more/less a Reader for two of the sessions in GEOG214 and GEO402/GEOG409.
Section I: NITF file import and reprojection in a given directory:
- imports NITF RE L1B data to PIX files by also importing all metadata there is
- reprojects the files to UTM WGS84 CC,
- resamples to low resolution copies of the data files with 30 and 100m and
- reads header information and georef to text report
- reads simple band statistics to text report
Section II: Cloud masking/haze removal/terrain Analysis – full processing in Atmosphere Correction package ATCOR.
- Calculate a cloud, haze and water mask
- Remove the haze from satellite date
- Calculate DEM derivates: slope/aspect/shadow-cast layer
- Optional: lowpass filter the DEM
- slope/aspect/shadow cast layer calculation
- Atmospheric correction on haze corrected input calculating the local incidence angle and outputting: 16 Bit scaled topographic normalized reflectance.
Section III: R statistics integration using rpy2 and numpy2
Section IV: Integration of GRASS binaries into workflows
Section V: Integration of AGISOFT
1. Head section of a Python PCI script
Here we load some PCI libraries and define the locale environment and add some specific inout and write txt libs.
print "******************* Import AND Reproject NTF **************************" # bySHese 19.3.2018 RE L1b Import and Reprojection and Resamp to 30m and 100m # Script to loops through files in a folder and do the following: # - import NTF Files to PCIPIX file format # - reproject to utm wgs84 with defined projection parameters - should be changed # - reduces the resolution by a factor to get to 30m res and 100m low res files # - read some report to text file # all is done on a user input directory tree recursive using "InFolder". # ------------------------------- # Import the required algorithms # ------------------------------- print " Importing PCI Libraries ... " from pci.fun import * from pci.lut import * from pci.pcimod import * from pci.exceptions import * from pci.ortho import * #needed for ortho from pci.fimport import fimport #needed for fimport from pci.resamp import * #this is needed for resamp from sys import argv from pci.shl import * #this is for reading pci pix header info from pci.asl import * #this is needed for asl segment listing from pci.cdl import * #this is needed for cdl from pci.his import * #needed for his report from pci.nspio import Report, enableDefaultReport #to write pci reports from pace programs from pci.prorep import * #needed by prorep import sys # These two libraries are required to extract file names from folders import os import fnmatch # The following locale settings are used to ensure that Python is configured # the same way as PCI's C/C++ code. import locale locale.setlocale(locale.LC_ALL, "") locale.setlocale(locale.LC_NUMERIC, "C") script, InFolder, DEMFile = argv
The comments are defined using a hash sign - you can also put these comments directly behind and coded line
# comments are defined after the hash sign
2. The input section of the script
where path variables are defined – you usually want that from the user and not hard coded into your script – so f.e. using the following expression the user would have to start the script by typing:
python pythonscriptname /dir/where/the/files/are
to start the script and to include the “InFolder” string (in this example). We later use “InFolder” to place the string into pathname.
script, InFolder, DEMFile = argv
Optional and an alternative would be:
make a path definition later in the routine where the script waits for user input – this is ok but you have to interact and cannot batch these commands than.
InFolder = path_input("Please enter full input folder path: ") # User will enter pathname here
However this is not so elegant because you cannot batch stack these python scripts later when you want to run these from another script.
3. The file selection that defines which files are processed
This section is programmed to recursively find all files with a specific naming convention and to process these later. Its a search routine that creates a container for the input filenames. We later just let the algorithm that processes data in PCI ingest the filename container that we create here:
# # --------------------- # Set File variables # --------------------- print "Processing in folder path: " + InFolder # ---------------------------------------------------- # Create list of files to loop through from InFolder # ---------------------------------------------------- print " Creating list of valid files ..." # This line is a wildcard filter. In this case only xml metadata files from RE will be used file_filter = "*metadata.xml" # this is nitf RE L1b data specific metadata file # here you can import also all metadata fields using fimport and # these fields will be important later f.e. with ATCOR # This list will become populated with the full pathnames of each file in InFolder input_files_list = [] # section from Geomatica # os.walk searches a directory tree and produces the file name of any file it encounters # r is the main directory # d is any subdirectory within r # f image file names within r for r,d,f in os.walk(InFolder): for file in fnmatch.filter(f, file_filter): # fnmatch.filter filters out non .pix files print " Found valid input file: ", file input_files_list.append(os.path.join(r,file)) # The file name and full path are # added to input_files_list #
4. The core processing section now processes the input files
and FIMPORT ( in this example) will ingest all files from the input_file_list. You set the various parameters first (fili, filo etc) and than execute the binary with the full listing of the variable names.
Make sure that the spaces are all as deep as appropriate for your position in the for-loop. Python cares for spaces in every line!
# ---------------------------------------------------------------------------------- # FIMPORT algorithm # ---------------------------------------------------------------------------------- print "" print " Entering raw input information for FIMPORT algorithm ..." # Run FIMPORT algorithm with each file listed in input_files_list print "" print " Beginning the FIMPORT ..." for xml in input_files_list: # Using the _raw variables from above, define each of the required arguments fili = xml # input file filo = xml + "-FIMPORT.pix" dbiw = [] poption = "AVERAGE" dblayout = "BAND" try: fimport(fili, filo, dbiw, poption, dblayout) except PCIException, e: print e except Exception, e: print e
fili = xml #uses all sections in input_files_list where we find the filename.
Since we now have the right pix files available we can start the reprojection of all files.
5. Reprojection of all PCIDSK files to UTM32 WGS84
Here we use ORTHO (in Geomatica Version 2017). ORTHO has slightly modified syntax compared to ORTHO2.
Since we have a different/new input files we repeat the input file definition for this script.
# ---------------------------------------------------- # Create list of files to loop through from InFolder for UTM Reprojection # ---------------------------------------------------- print " Creating list of valid files for ORTHO ..." # This line is a wildcard filter. In this case only FIMPORT.pix files will be used file_filter = "*FIMPORT.pix" # This list will become populated with the full pathnames of each file in InFolder input_files_list = [] # os.walk searches a directory tree and produces the file name of any file it encounters # r is the main directory # d is any subdirectory within r # f image file names within r for r,d,f in os.walk(InFolder): for file in fnmatch.filter(f, file_filter): # fnmatch.filter filters out non .pix files print " Found valid input file: ", file input_files_list.append(os.path.join(r,file)) # The file name and full path are added to input_files_list # ---------------------------------------------------------------------------------- # Use ORTHO algorithm # ---------------------------------------------------------------------------------- print " Entering raw input information for Ortho algorithm ..." # Run ORTHO algorithm with each file listed in input_files_list print " Beginning ORTHO and RESAMP ..." for FIMPORT in input_files_list: # Using the _raw variables from above, define each of the required arguments mfile = FIMPORT # input file dbic = [] mmseg = [2] dbiw= [] srcbgd = "" filo = FIMPORT + "-ORTHO.pix" ftype = "PIX" foptions = "" outbgd = [] #float ulx = "" uly = "" lrx = "" lry = "" edgeclip = [] #integer tipostrn = "" mapunits = "UTM54 D000" # make sure you are in the right UTM zone here: Tokyo 54S Sendai 54S London 30 Koeln/Bonn/HH/Berlin 32 Paris Shanghai Dehli Mexico city NewYork Teheran LA Guangzhou Rom 32 Kulunda Steppe Russia: 44 bxpxsz = "6.5" bypxsz = "6.5" filedem = DEMFile # DEMFile is set by user on CLI dbec = [1] # we expect the DEM in channel 1 backelev = [] elevref = "ELLIPS" # ASTER GDEM2 this height reference is WGS84 / EGM96 Geoid elevunit = "METER" # elfactor = [0.0,1.0] # proc = "" sampling = [1] resample = "CUBIC" try: ortho(mfile, dbic, mmseg, dbiw, srcbgd, filo, ftype, foptions, outbgd, ulx, uly, lrx, lry, edgeclip, tipostrn, mapunits, bxpxsz, bypxsz, filedem, dbec, backelev, elevref, elevunit, elfactor, proc, sampling, resample) except PCIException, e: print e except Exception, e: print e #
Ortho takes some parameters to be correctly tuned for what you want. Some of these params however can be set to default values because they do not effect our processing right now.
It is important to get mapunits right since we should know the right UTM zone for our data and D000 stands for WGS84 – change this as appropriate.
6. Finally we want to resample the files to low res versions
This is easily done with Resamp.
Resamp just takes your input file and the desired resolution in Meters and resamples according to your resampling interpolation method.
print " Beginning RESAMP 30m ..." # Run RESAMP fili = FIMPORT + "-ORTHO.pix" filo = FIMPORT + "-ORTHO-RESAMP30m.pix" dbic = [1,2,3,4,5] dbsl = [] sltype = "ALL" ftype = "PIX" foptions = ' ' pxszout = [30, 30] # output with meter resolution resample = "CUBIC" # Cubic Convolution method try: resamp(fili, dbic, dbsl, sltype, filo, ftype, foptions, pxszout, resample) except PCIException, e: print e except Exception, e: print e print " Beginning RESAMP 100m ..." fili = FIMPORT + "-ORTHO.pix" filo = FIMPORT + "-ORTHO-RESAMP100m.pix" dbic = [1,2,3,4,5] dbsl = [] sltype = "ALL" ftype = "PIX" foptions = ' ' pxszout = [100, 100] # output with meter resolution resample = "CUBIC" # Cubic Convolution method try: resamp(fili, dbic, dbsl, sltype, filo, ftype, foptions, pxszout, resample ) except PCIException, e: print e except Exception, e: print e print "Ortho finished", file
Resamp just takes the Filename from the Ortho processing and we just add new extensions to the filename to get the reduced version information in the filenaming.
7. Read some header and segment / channel information to text files:
Here we use SHL, PROREP, ASL and CDL to read the information into a textfile. This is comparable to the “REPORT” mode in EASI/PACE and easily adds to an existing ASCII file.
This is handy for keeping track of processing steps and parameter setup.
Everything between the brackets:
print "Reporting now about all finished files" rep_file = FIMPORT + "ORTHO-REPORT.txt" try: Report.clear() enableDefaultReport(rep_file)
and the closing sequence
</pre> <pre> finally: enableDefaultReport('term') # this will close the report file
_
… is written to the ASCII file rep_file.
The full section just puts the reporting binaries from PCI exactly in between these expressions as in the following section:
…
# now lets check some of the pix data based info for the record - always handy to have - and we # can look into these reports if needed - # to do this we need the pci.nspio module with the report framework to handle text export write # and close. print "Reporting now about all finished files" rep_file = FIMPORT + "ORTHO-REPORT.txt" try: Report.clear() enableDefaultReport(rep_file) # first lets look now into the header now print " Reading Header of file: " + file file= FIMPORT + "-ORTHO.pix" shl( file) # starts the shl bin # now lets read the georeferencing segment with projection information print " Reading Georef Segment of file: " + file dbgeo=[1] prorep( file, dbgeo) # starts the prorep pci bin # now lets list all segments in file print " Reading Segment List of file: " + file ltyp="FULL" aslt=[] assn='' asl(file, ltyp, aslt, assn) # now checking channels and reporting in short mode print " Listing all channels of file: " + file dbcl = [] dtyp = "SHORT" cdl(file,dbcl,dtyp) finally: enableDefaultReport('term') # this will close the report file print "FINISHED All"
8. Reading some basic statistics from the PCIDSK files:
If we want some more statistical insights its useful to read some more values into the report files. This can easily be done with HIS and SPLREG. Both are not in any way comparable to statistical analysis in R but we get some information about the raster data without the need to open (sometimes) very big raster data files. This can be handy to find out if something went wrong in our processing workflow.
HIS is easily integrated in front of the Report “Close” sequence in the following code:
finally printing simple descriptive statistics into the report print " Reading simple band stats of file: " + file dbic = [1,2,3,4,5] gmod = 'ON' # show graphic mode cmod = 'OFF' # no cumulative mode pcmo = 'OFF' # no percentage mode nsam = [] # no number of samples trim = [] # no trimming hisw = [] # no specific histogram window mask = [] # process entire image imstat = [] # parameter to receive output his( file, dbic, gmod, cmod, pcmo, nsam, trim, hisw, mask, imstat )
But we could also integrate some more useful information about the band to band relations calculating SPLREG as in the following example:
# and we add some linear band to band regressions to the report dbic = [1,2] # input channels mask = [] # process entire image imstat = [] splreg( file, dbic, mask, imstat ) dbic = [2,3] # input channels mask = [] # process entire image imstat = [] splreg( file, dbic, mask, imstat ) dbic = [3,4] # input channels mask = [] # process entire image imstat = [] splreg( file, dbic, mask, imstat ) dbic = [4,5] # input channels mask = [] # process entire image imstat = [] splreg( file, dbic, mask, imstat ) dbic = [3,5] # input channels mask = [] # process entire image imstat = [] splreg( file, dbic, mask, imstat )
Finally dont forget to close the reporting sequence with the following snippet:
finally: enableDefaultReport('term') # this will close the report file print "FINISHED All"