Table of Contents

Processing Binary Data

Python and Binary Data

Many satellite data sets are distributed as binary files, often in compressed one or two byte formats. Usually software for reading the data is provided, but in some cases a simply python script can do the job better. Attached are some use cases for satellite data processing, identified by satellite or retrieval product.

GSMaP

In this use case a python script reads binary GSMaP data into well-formed arrays which then are written to a netcdf file. The GSMaP data is hourly and on a 0.1deg grid between 60N and 60S. It masks the GSMaP -99.0 missing value and converts the output to CF compliant naming and units. Here two months of data are processed. Some of the paths to the data will be depreciated.

unit_conversion = 1./3600.
gsm_out = '/work/mh0492/m219063/DYAMOND/GSMaP.v7_0.10deg.nc'
 
if (os.path.isfile(gsm_out)): 
    print (gsm_out+' exists, not overwriting')
else:
    time= pd.date_range(ym+'2016-08-01','2016-09-30', freq='1H')
    lat    = xr.Coordinate('lat', np.arange(-59.95,60.,0.1)[::-1].astype('float32'),
                           attrs={'long_name':'latitude','standard_name':'latitude','units': "degrees_north"})
    lon    = xr.Coordinate('lon', np.arange(  0.05,360,0.1).astype('float32'),
                           attrs={'long_name':'longitude','standard_name':'longitude','units': "degrees_east"})
    da     = np.ndarray(shape=(time.size,lat.size,lon.size))
 
    path = '/work/mh0492/m219063/DYAMOND/Data/GSMaP/standard/v7/hourly/2016/'
    for i in np.arange(time.size):
        gsm_in = path + time[i].strftime("%m/%d")+ '/gsmap_mvk.2016'+time[i].strftime("%m%d.%H")+'00.v7.0001.0.dat'
        if (os.path.isfile(gsm_in)):
            with open(gsm_in,'rb') as f:
                data = np.fromfile(f, dtype=np.float32, count = lon.size*lat.size)                
            d2   = pd.DataFrame(np.reshape(data,(lat.size,lon.size)), lat, lon)
            da[i,:,:] = d2.where(d2 != -99.0) * unit_conversion
        else:
            print(gsm_in)
 
    ds     = xr.Dataset({'pr': (['time', 'lat', 'lon'],  da.astype('float32'))},
                     coords={'time': time,'lat': lat,'lon': lon})
 
    ds.pr.attrs['long_name']    ='precipitation'
    ds.pr.attrs['standard_name']='precipitation_flux'
    ds.pr.attrs['units']        ='km m-2 s-1'
    ds.to_netcdf(gsm_out)  

GMI

This example works with Remote Sensing Systems single byte binary files. Here GMI 3 day averaged data files are read and their data covered to NetCDF. For lack of a better method and because this was a one-time task, I used a rather inefficient (looping over ord) to convert ascii byte characters to integers for rescaling; ifield specifies which data field to process with possible fields being SST (0), Wind – low frequency (1), Wind – high frequency (2), precipitable water (3), cloud water (4) and rain (5).

gmi_out = '/work/mh0492/m219063/DYAMOND/Data/GMI-PRW_0.25deg.nc'
xscale = np.asarray([ 0.15, 0.2, 0.2, 0.3, 0.01, 0.1])
xoffset= np.asarray([-3.  , 0. , 0. , 0. ,-0.05, 0. ])
xfact = 251
ifield = 3
unit_conversion = 1.0
 
gmi_time= pd.date_range('2016-08-01','2016-09-30', freq='1d')
lat     = xr.Coordinate('lat', np.arange(-89.875,90.,0.25)[::-1].astype('float32'),
                       attrs={'long_name':'latitude','standard_name':'latitude','units': "degrees_north"})
lon     = xr.Coordinate('lon', np.arange(  0.125,360,0.25).astype('float32'),
                       attrs={'long_name':'longitude','standard_name':'longitude','units': "degrees_east"})
da      = np.ndarray(shape=(gmi_time.size,lat.size,lon.size))
 
if (os.path.isfile(gmi_out)): 
    print (gmi_out+' exists, not overwriting')
else:
    for i in np.arange(gmi_time.size):
        gmi_in = '/work/mh0492/m219063/DYAMOND/Data/GMI/f35_2016' + gmi_time[i].strftime("%m%d")+'v8.2_d3d'
        print (gmi_in)
        with open(gmi_in,'rb') as f:
            dx = np.fromfile(f, dtype='S1',count=-1)
 
        i1 = lon.size*lat.size * ifield
        i2 = i1 + lon.size*lat.size 
        data = np.ndarray(lon.size*lat.size)
 
        for j, x in enumerate(dx[i1:i2]):
            if (len(x) != 0): data[j] = ord(x)
 
        d2        = pd.DataFrame(np.reshape(data,(lat.size,lon.size)), lat, lon)
        da[i,::-1,:] = (d2.where(d2 < xfact) * xscale[ifield] + xoffset[ifield]) * unit_conversion
 
    if (ifield == 3):
        ds        = xr.Dataset({'prw': (['time', 'lat', 'lon'],  da.astype('float32'))},
                               coords={'time': gmi_time,'lat': lat,'lon': lon})
 
        ds.prw.attrs['long_name']    ='precipitable water (vapor)'
        ds.prw.attrs['standard_name']='atmosphere_mass_content_of_water_vapor'
        ds.prw.attrs['units']        ='kg m-2'
        ds.prw.attrs['source']       ='compiled from v8.2 d3d binary data files provided by REMSS'
 
        plt.plot(ds.prw[:,:,:].mean(dim=('time','lon')))
 
        ds.to_netcdf(gmi_out)