Friday 7 October 2016

Geo Data Science

I have always wondered why and how people establish new nomenclature in industry. Geo-spatial industry is not new, neither is data science. I was one of the few lucky guys who tested both at the same time way back in 2011-12, while i was still trainee with Tata Consultancy Services. Although i was trained as Java developer, but was given opportunity to work on data mapping using Informatica. Before i could know its part of BI, i had one of myu feet in GIS in maintaining spatial repository for GIS application . It was a accidental entry as non in the existing team were interested in working on GIS.

Between 2012-15 it was altogether different experience. It was all about understanding sociology, political science, anthropology, history, environment etc. This has given me strong belief that complexity exist in problem solving, reductionist attitude doesn't provides solution but clarity of the complexity could. Clarity of complexity is just have high resolution data of large area. I always dreamt of working in a domain that integrates all that i have learnt. It has been 1 year in my master in GIS, we have been working on 2 projects on Climate change
  • Detecting Greening effect in Uttarakhand area of India.
  • AOD study of cities of India
Both of these projects are first of its kind in terms of scale both spatially and temporally. We use both level 2 and level 3 data of MODIS Terra Satellite. Task in both were similar conceptually
  • compile 15 years of temporal data with 500m spatial resolution and 16 days temporal resolution
  • 3 parameters were used in Uttarakhand project and 3 other parameters in AOD project
For both project we applied different methodology but we used same approach of data science which included following steps
  1. Data preparation
  2. Data extraction
  3. Data visualizations
  4. Data interpretation
For Data preparation and Data extraction we used Ubuntu server and bash programming. Huge amount of data was downloaded from LPDAAC. In GIS or in Geo-spatial domain Data preparation also involves
  • creating vector data which include point and polygon on the study area. This included creating point cloud of 150 cities.
  • data preparation also included downloading or ftping huge amount of data from the servers
Data Extraction

We used ArcGIS 10.4.1 software to extract data . As we had large number of files to process we used python scripting. In Uttarkhand project we developed series of code which performed not only better then MRT tool- given by LPDAAC(USGS, NASA)- but also was running on the local machine. All was possible by using ArcPy packages in ArcGIS packages for data processing. This processed data which was in the image format was using ArcGIS itself which is Geo-visualization software as well. But we were able to create lot classified images which were extracted to .csv file. And these images were in order of thousands. Working on thousands of image was huge challenges.

.csv that were generated out of the classified images were fed into Tableau 9.1(Now Tableau 10) for visualization. I am happy to announce that we finally found Greening effect in Himalayas of Uttarakhand. We had another huge task of creating bins fro 3 parameters ---> NDVI, Temperature and Snow Extent & Days. These bins were made on bases of elevation. Bin width was 500meters. It was computationally heavy as each image of single parameter needed to be classified in 10 bins. This data was as previous said was converted into .csv file. All was done using python code in ArcGIS.
I am also happy to tell viewers that there is increase in NDVI pixels at elevation above 5000meters in Himalayan mountains of Uttarakhand. This is clearly visible from temporal graphs that were out come of the Tableau Visualization. This gives us confidence to put another aspect of Data Science ie
Data Visualization 

As i explained we have used
Data Preparation--->Data Extraction---->Data Visualization
Interpretation is our result of Green effect, the question of weather and how these parameters can be interrelated and to which extent.

Same logic was followed while carrying out Aerosol Optical Depth(AOD) for more than 150 cities of India. It included

Data Preparation
  • using ArcGIS software
  • FME desktop
  • Ubuntu Server
Data Extraction
  • Python script(ArcPy)
Data Visualization
  • Tableau
  • ArcMap
Data Interpretation: We were able to prove following
  • there exist co-relation between temperature and AOD. This correlation is geographical correlation.
  • AOD of most Indian cities except some South Indian cities is greater than 0.3 for most of the month of the Year
  • Brown cloud (AOD) exist all along the Indo-Gangetic plain staring from Punjab and ending up in West Bengal. Hence most cities that comes under this region dont have clear air to breath.
We are conducting similar projects which i will update in coming months. From above projects its clear that Data Science methods are and has been part of Geo-Data or Geo-spatial data. Geo Data Science is inherent part of GIS or Geo-spatial world.




No comments:

Post a Comment