======= Processing Binary Data ======= ==== Python and Binary Data ==== Many satellite data sets are distributed as binary files, often in compressed one or two byte formats. Usually software for reading the data is provided, but in some cases a simply python script can do the job better. Attached are some use cases for satellite data processing, identified by satellite or retrieval product. ==== GSMaP ==== In this use case a python script reads binary GSMaP data into well-formed arrays which then are written to a netcdf file. The GSMaP data is hourly and on a 0.1deg grid between 60N and 60S. It masks the GSMaP -99.0 missing value and converts the output to CF compliant naming and units. Here two months of data are processed. Some of the paths to the data will be depreciated. unit_conversion = 1./3600. gsm_out = '/work/mh0492/m219063/DYAMOND/GSMaP.v7_0.10deg.nc' if (os.path.isfile(gsm_out)): print (gsm_out+' exists, not overwriting') else: time= pd.date_range(ym+'2016-08-01','2016-09-30', freq='1H') lat = xr.Coordinate('lat', np.arange(-59.95,60.,0.1)[::-1].astype('float32'), attrs={'long_name':'latitude','standard_name':'latitude','units': "degrees_north"}) lon = xr.Coordinate('lon', np.arange( 0.05,360,0.1).astype('float32'), attrs={'long_name':'longitude','standard_name':'longitude','units': "degrees_east"}) da = np.ndarray(shape=(time.size,lat.size,lon.size)) path = '/work/mh0492/m219063/DYAMOND/Data/GSMaP/standard/v7/hourly/2016/' for i in np.arange(time.size): gsm_in = path + time[i].strftime("%m/%d")+ '/gsmap_mvk.2016'+time[i].strftime("%m%d.%H")+'00.v7.0001.0.dat' if (os.path.isfile(gsm_in)): with open(gsm_in,'rb') as f: data = np.fromfile(f, dtype=np.float32, count = lon.size*lat.size) d2 = pd.DataFrame(np.reshape(data,(lat.size,lon.size)), lat, lon) da[i,:,:] = d2.where(d2 != -99.0) * unit_conversion else: print(gsm_in) ds = xr.Dataset({'pr': (['time', 'lat', 'lon'], da.astype('float32'))}, coords={'time': time,'lat': lat,'lon': lon}) ds.pr.attrs['long_name'] ='precipitation' ds.pr.attrs['standard_name']='precipitation_flux' ds.pr.attrs['units'] ='km m-2 s-1' ds.to_netcdf(gsm_out) ==== GMI ==== This example works with Remote Sensing Systems single byte binary files. Here GMI 3 day averaged data files are read and their data covered to NetCDF. For lack of a better method and because this was a one-time task, I used a rather inefficient (looping over ord) to convert ascii byte characters to integers for rescaling; ifield specifies which data field to process with possible fields being SST (0), Wind -- low frequency (1), Wind -- high frequency (2), precipitable water (3), cloud water (4) and rain (5). gmi_out = '/work/mh0492/m219063/DYAMOND/Data/GMI-PRW_0.25deg.nc' xscale = np.asarray([ 0.15, 0.2, 0.2, 0.3, 0.01, 0.1]) xoffset= np.asarray([-3. , 0. , 0. , 0. ,-0.05, 0. ]) xfact = 251 ifield = 3 unit_conversion = 1.0 gmi_time= pd.date_range('2016-08-01','2016-09-30', freq='1d') lat = xr.Coordinate('lat', np.arange(-89.875,90.,0.25)[::-1].astype('float32'), attrs={'long_name':'latitude','standard_name':'latitude','units': "degrees_north"}) lon = xr.Coordinate('lon', np.arange( 0.125,360,0.25).astype('float32'), attrs={'long_name':'longitude','standard_name':'longitude','units': "degrees_east"}) da = np.ndarray(shape=(gmi_time.size,lat.size,lon.size)) if (os.path.isfile(gmi_out)): print (gmi_out+' exists, not overwriting') else: for i in np.arange(gmi_time.size): gmi_in = '/work/mh0492/m219063/DYAMOND/Data/GMI/f35_2016' + gmi_time[i].strftime("%m%d")+'v8.2_d3d' print (gmi_in) with open(gmi_in,'rb') as f: dx = np.fromfile(f, dtype='S1',count=-1) i1 = lon.size*lat.size * ifield i2 = i1 + lon.size*lat.size data = np.ndarray(lon.size*lat.size) for j, x in enumerate(dx[i1:i2]): if (len(x) != 0): data[j] = ord(x) d2 = pd.DataFrame(np.reshape(data,(lat.size,lon.size)), lat, lon) da[i,::-1,:] = (d2.where(d2 < xfact) * xscale[ifield] + xoffset[ifield]) * unit_conversion if (ifield == 3): ds = xr.Dataset({'prw': (['time', 'lat', 'lon'], da.astype('float32'))}, coords={'time': gmi_time,'lat': lat,'lon': lon}) ds.prw.attrs['long_name'] ='precipitable water (vapor)' ds.prw.attrs['standard_name']='atmosphere_mass_content_of_water_vapor' ds.prw.attrs['units'] ='kg m-2' ds.prw.attrs['source'] ='compiled from v8.2 d3d binary data files provided by REMSS' plt.plot(ds.prw[:,:,:].mean(dim=('time','lon'))) ds.to_netcdf(gmi_out)