Seqana Data Transfer App
At Seqana we empower climate positive land use managers to conduct analyses of soil organic carbon (SOC) with ease and efficiency. We would like to introduce and share the development story of one of our useful internal tools that is used to ingest and process data – the Seqana data transfer app.
Soil data are central to our mission to enable data-driven carbon sequestration which is why we created this tool to integrate data from various sources into our algorithms. While there have been efforts to standardize soil data over the past decades, multiple datasets on soils with a variety of formats remain that require harmonization. This is where our data transfer app comes in handy.
To aggregate and process this data, our app easily allows users to ingest their data in various formats and allows for rapid processing. A general overview of the functionality of the app can be seen in figure 1.
Flask:
To enhance user accessibility, the app was made to be lightweight, the codebase simple to read and easy to maintain, without compromising the flexibility to scale up to a complex application in the future. Since python is our main programming language at Seqana and previous experience with the web development framework, Flask, has been successful, we chose to utilize it again in developing our new tool.
The main functionality of the app allows users to upload data as files from a local or cloud storage directory (such as a shared drive or a cloud storage bucket) to a dedicated cloud storage location. While uploading, the users should provide metadata through a user input form. For users to preview their data, the app allows a quick glimpse at the data about to be uploaded through the preview functionality. Over the next sections, we want to give you some insights into how we developed these features and how you can use them in your SOC studies.
Transfer files from local/shared drives to cloud storage:
We implemented an abstract Storage class with abstract methods: list_contents, get_file, put_file. Other classes such as LocalStorage, GoogleDriveStorage, GoogleCloudStorage, GitLabStorage inherit from Storage and implement its methods based on their specific “needs”. The abstract class and methods make it easy to implement changes and thus make the code flexible. Code snippet 1 displays the original and simple implementation of the abstract Storage class and its methods, while code snippet 2 reveals the particular implementation of the GoogleCloudStorage class, inheriting from Storage, with the implementation of the put_file_binary method which can be used to upload files.
class Storage:
@abc.abstractmethod
def get_file_binary(self, file):
pass
@abc.abstractmethod
def put_file_binary(self, file, path):
pass
@abc.abstractmethod
def list_contents(self, path, nested=False):
pass
Code snippet 1: Implementation of abstract class and methods for the Storage class.
class GoogleCloudStorage(Storage):
# CONSTRUCTOR ------------------------------------------
def __init__(self) …
# PUT --------------------------------------------------
def put_file_binary(self, file, gc_storage_path):
socket.setdefaulttimeout(300)
# parse path : '/data/subfolder/file.txt'= ['data','subfolder','file.txt']
gc_storage_path_parts = Path(gc_storage_path).parts
# Get the bucket on google cloud storage to where to upload the file
gc_bucket = self.__get_connection().get_bucket(gc_storage_path_parts[0])
logger.info("Got destination Bucket")
# create a "file-holder" on the cloud storage in the specified bucket
blob = gc_bucket.blob(
str(
Path(
*gc_storage_path_parts[1::]
) # create path from the array of the parts
) # transform it into a string
)
# the file is uploaded via stream without downloading it to the local storage
blob._CHUNK_SIZE_MULTIPLE = 5 * 1024 * 1024
blob._chunk_size = 5 * 1024 * 1024
blob.upload_from_file(
file, rewind=True
) # rewind=True -> bring the pointer to the beginning
logger.info("Uploading file to gcs is done")
# returns a public url to the file
return blob.public_url
Code snippet 2: Implementation the GoogleCloudStorage class with put_file_binary method.
A wrapper class StorageManager was developed which manages the transfer of data files from source to destination. This structure allows the movement of files between two storage classes, irrespective of their specific type.
Now, you can simply initiate the instances of the storage types, specify the source and destination file paths to be moved, and send this information to the StorageManager which handles the rest for you. An example script is shown in code snippet 3.
# 1. Init google drive storage
google_drive_storage = GoogleDriveStorage(cquest_metadata.gdrive_secret_payload)
# 2. Init google cloud storage
google_cloud_storage = GoogleCloudStorage(cquest_metadata.gcp_access_creds)
# 3. Init storage manager
storage_manager = StorageManager()
# 4. Move files
storage_manager.move_files_from_source_to_destination(
google_drive_storage,
google_cloud_storage,
gdrive_folder_id,
"{0}/{1}".format(
cquest_metadata.gcp_bucket_name, result_dict["short_title"][0]
),
)
Code snippet 3: Example for moving data between google drive and a cloud storage bucket
User Input Form:
The user input form to upload data comes in the form of a questionnaire which is displayed in figure 2. This questionnaire is implemented through a JSON file, which is retrieved by calling the method get_file_binary().
Then the file is passed to the questionnaire.html with flask and the command:
render_template("questionnaire.html", table=json.dumps(questionnaire_json))
JavaScript then parses the JSON file and dynamically generates HTML content for the questionnaire page. While doing this JavaScript completely relies on the information passed to it from the JSON object. For example, the “question_type” field defines what type of input the HTML has to add (text, radio, checkbox, date, ect. ), “question_string” defines how the question should be presented on the web page, “question_options” define the available options from which the user can pick the answer, etc. This design enables addition or modification of the questionnaire simply by changing the corresponding json file without touching the code or redeploying the app.
Preview File:
To increase assurance of accurate work, we developed the functionality to preview file content before uploading. This is implemented with JavaScript: when the input file gets selected an eventListener triggers an execution of a function. This function converts the binary of the selected file into a json object and then dynamically generates an HTML table content for the page. Check out this last code snippet 4 for more details:
// FILE PREVIEWER : handle
function handleFilePreview(e) {
// Adds Names of the files as a list
if(!e.target.files) return;
prevFileDiv.innerHTML = "";
// get file
var file = e.target.files[0];
var fileType = file.name.split(".")[1]
// read file
var fileReader = new FileReader();
if(fileType == "csv"){
fileReader.readAsText(file)
} else if (fileType == "xls" || fileType == "xlsx") {
fileReader.readAsBinaryString(file)
}
// load file to fileReader and generate html string for inner content
fileReader.onload = function() {
var htmlString = ""
const text = fileReader.result;
var jsonObject = null;
// convert csv to json object
if(fileType == "csv"){
jsonObject = csvJSON(text); // separately implemented function
// convert xlsx file binary to json object
} else if (fileType == "xls" || fileType == "xlsx") {
jsonObject = xlsJSON(text); // separately implemented function
}
const keys = Object.keys(jsonObject[0])
// create table
htmlString += "<table class='table table-striped'>"; // <-------- START TABLE
// code for dynamical html content generation
htmlString += "</table>" // <-------------------------------------- END TABLE
prevFileDiv.innerHTML = htmlString;
};
}
Code snippet 3: Example for moving data between google drive and a cloud storage bucket
As we developed the Seqana data transfer app to assist users with studies and analyses of SOC, we are excited for the opportunity for you to utilize this tool. At Seqana, we strive to make SOC projects as simple, efficient, and organized as possible in order to make efforts to combat the issue of atmospheric CO2 accessible to anyone interested. So, for the time being, we develop and use this app as an internal tool, but we are open to share access and let you use the app together with us. Please reach out if you are interested or have any questions.