earthaccess and NASA EDL¶
earthaccess allows us to access data from datasets behind NASA EDL. This library comes with handy methods to generate an access token, create an authenticated Python requests session or fsspec file accessors.
The following are simple examples of what can we do with them.
InĀ [1]:
Copied!
import earthaccess
auth = earthaccess.login()
import earthaccess
auth = earthaccess.login()
--------------------------------------------------------------------------- HTTPError Traceback (most recent call last) Cell In[1], line 3 1 import earthaccess ----> 3 auth = earthaccess.login() File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1104/lib/python3.11/site-packages/earthaccess/api.py:217, in login(strategy, persist, system) 214 continue 216 if earthaccess.__auth__.authenticated: --> 217 earthaccess.__store__ = Store(earthaccess.__auth__) 218 break 219 else: File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1104/lib/python3.11/site-packages/earthaccess/store.py:133, in Store.__init__(self, auth, pre_authorize) 131 # sets the initial URS cookie 132 self._requests_cookies: Dict[str, Any] = {} --> 133 self.set_requests_session(oauth_profile, bearer_token=True) 134 if pre_authorize: 135 # collect cookies from other DAACs 136 for url in DAAC_TEST_URLS: File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1104/lib/python3.11/site-packages/earthaccess/store.py:222, in Store.set_requests_session(self, url, method, bearer_token) 220 self._requests_cookies = self._http_session.cookies.get_dict() 221 else: --> 222 resp.raise_for_status() File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1104/lib/python3.11/site-packages/requests/models.py:1026, in Response.raise_for_status(self) 1021 http_error_msg = ( 1022 f"{self.status_code} Server Error: {reason} for url: {self.url}" 1023 ) 1025 if http_error_msg: -> 1026 raise HTTPError(http_error_msg, response=self) HTTPError: 502 Server Error: Bad Gateway for url: https://urs.earthdata.nasa.gov/profile
Data in AWS¶
If the data we want to access is on AWS, we can use earthaccess to generate temporary S3 credentials for any of the DAACs. This line is commented out for security reasons.
InĀ [2]:
Copied!
# s3_credentials = auth.get_s3_credentials("NSIDC")
# s3_credentials = auth.get_s3_credentials("NSIDC")
These S3 temporary credentials are valid for 1 hour and can be used by third party libraries that support S3 buckets.
HTTPS access¶
We can also access data over HTTP using presigned Python requests sessions. The advantage of these sessions is that they work on every DAAC or data in S3 when accessed through HTTPS.
InĀ [3]:
Copied!
nsidc_url = "https://n5eil01u.ecs.nsidc.org/DP7/ATLAS/ATL06.005/2019.02.21/ATL06_20190221121851_08410203_005_01.h5"
lpcloud_url = "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220903T163129_2224611_012/EMIT_L2A_RFL_001_20220903T163129_2224611_012.nc"
# this is a Python requests session
session = earthaccess.get_requests_https_session()
nsidc_url = "https://n5eil01u.ecs.nsidc.org/DP7/ATLAS/ATL06.005/2019.02.21/ATL06_20190221121851_08410203_005_01.h5"
lpcloud_url = "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220903T163129_2224611_012/EMIT_L2A_RFL_001_20220903T163129_2224611_012.nc"
# this is a Python requests session
session = earthaccess.get_requests_https_session()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[3], line 5 2 lpcloud_url = "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220903T163129_2224611_012/EMIT_L2A_RFL_001_20220903T163129_2224611_012.nc" 4 # this is a Python requests session ----> 5 session = earthaccess.get_requests_https_session() File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1104/lib/python3.11/site-packages/earthaccess/api.py:401, in get_requests_https_session() 382 def get_requests_https_session() -> requests.Session: 383 """Returns a requests Session instance with an authorized bearer token. 384 This is useful for making requests to restricted URLs, such as data granules or services that 385 require authentication with NASA EDL. (...) 399 ``` 400 """ --> 401 session = earthaccess.__store__.get_requests_session() 402 return session AttributeError: 'NoneType' object has no attribute 'get_requests_session'
InĀ [4]:
Copied!
headers = {"Range": "bytes=0-100"}
r = session.get(lpcloud_url, headers=headers)
r.text
headers = {"Range": "bytes=0-100"}
r = session.get(lpcloud_url, headers=headers)
r.text
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[4], line 2 1 headers = {"Range": "bytes=0-100"} ----> 2 r = session.get(lpcloud_url, headers=headers) 3 r.text NameError: name 'session' is not defined
Accessing remote files as if they were local with fsspec¶
InĀ [5]:
Copied!
fs = earthaccess.get_fsspec_https_session()
fs = earthaccess.get_fsspec_https_session()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[5], line 1 ----> 1 fs = earthaccess.get_fsspec_https_session() File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1104/lib/python3.11/site-packages/earthaccess/api.py:378, in get_fsspec_https_session() 362 def get_fsspec_https_session() -> AbstractFileSystem: 363 """Returns a fsspec session that can be used to access datafiles across many different DAACs. 364 365 Returns: (...) 376 ``` 377 """ --> 378 session = earthaccess.__store__.get_fsspec_session() 379 return session AttributeError: 'NoneType' object has no attribute 'get_fsspec_session'
InĀ [6]:
Copied!
with fs.open(lpcloud_url) as f:
data = f.read(100)
data
with fs.open(lpcloud_url) as f:
data = f.read(100)
data
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[6], line 1 ----> 1 with fs.open(lpcloud_url) as f: 2 data = f.read(100) 3 data NameError: name 'fs' is not defined
InĀ [7]:
Copied!
%%time
import xarray as xr
# earthaccess can open a list of files
files = earthaccess.open([lpcloud_url])
ds = xr.open_dataset(files[0], group="sensor_band_parameters")
ds
%%time
import xarray as xr
# earthaccess can open a list of files
files = earthaccess.open([lpcloud_url])
ds = xr.open_dataset(files[0], group="sensor_band_parameters")
ds
CPU times: user 340 ms, sys: 55.1 ms, total: 395 ms Wall time: 293 ms
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[7], line 1 ----> 1 get_ipython().run_cell_magic('time', '', '\nimport xarray as xr\n\n# earthaccess can open a list of files\nfiles = earthaccess.open([lpcloud_url])\n\nds = xr.open_dataset(files[0], group="sensor_band_parameters")\nds\n') File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1104/lib/python3.11/site-packages/IPython/core/interactiveshell.py:2565, in InteractiveShell.run_cell_magic(self, magic_name, line, cell) 2563 with self.builtin_trap: 2564 args = (magic_arg_s, cell) -> 2565 result = fn(*args, **kwargs) 2567 # The code below prevents the output from being displayed 2568 # when using magics with decorator @output_can_be_silenced 2569 # when the last Python token in the expression is a ';'. 2570 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False): File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1104/lib/python3.11/site-packages/IPython/core/magics/execution.py:1452, in ExecutionMagics.time(self, line, cell, local_ns) 1450 if interrupt_occured: 1451 if exit_on_interrupt and captured_exception: -> 1452 raise captured_exception 1453 return 1454 return out File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1104/lib/python3.11/site-packages/IPython/core/magics/execution.py:1416, in ExecutionMagics.time(self, line, cell, local_ns) 1414 st = clock2() 1415 try: -> 1416 exec(code, glob, local_ns) 1417 out = None 1418 # multi-line %%time case File <timed exec>:4 File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1104/lib/python3.11/site-packages/earthaccess/api.py:303, in open(granules, provider, pqdm_kwargs) 283 def open( 284 granules: Union[List[str], List[DataGranule]], 285 provider: Optional[str] = None, 286 *, 287 pqdm_kwargs: Optional[Mapping[str, Any]] = None, 288 ) -> List[AbstractFileSystem]: 289 """Returns a list of file-like objects that can be used to access files 290 hosted on S3 or HTTPS by third party libraries like xarray. 291 (...) 301 A list of "file pointers" to remote (i.e. s3 or https) files. 302 """ --> 303 return earthaccess.__store__.open( 304 granules=granules, 305 provider=_normalize_location(provider), 306 pqdm_kwargs=pqdm_kwargs, 307 ) AttributeError: 'NoneType' object has no attribute 'open'