This website requires cookies to function. Please enable cookies in order to browse this website properly. Read more here.

File sync protocol app

Added on 2021-06-17 18:48:45 UTC


Project on GitHub.

Table of contents

Background

For Networking class we got an assignment to develop a server and client application for exchanging and synchronising files. Basically a simplified version of Dropbox, Google Drive, Microsoft OneDrive, etc.

One group was responsible for creating the protocol for the file exchange. The other groups, including ours, were responsible for building the applications based on this file sharing protocol. The purpose of the project was that it wouldn't matter for which operating system the applications were developed, as long as the applications followed the protocol they would always be able to communicate with each other.

The protocol

Specifications

Client-server communication is done by sending out requests and response in UTF-8 JSON format over a TCP connection. Here the server would listen for new requests to handle that were sent by a client. The client does not listen for requests from the server. The client itself has to ask the server to check for any modifications on the server side.

The content of a request from the clients uses the following format:

VERB PROTOCOL_VERSION

JSON_BODY

Example:

GET idh14sync/1.0

{
    "filename": "ZXhhbXBsZS50eHQ="
}

The server will read the request one character at a time and detects the end of the request when the JSON is closed. Next, the verb and protocol version are validated and he verb is used to determine the action the server must perform. Once the server processed a request, with or without success, it will send a response to the client with the result.

The content of a response from the server uses the following format:

VERB PROTOCOL_VERSION
 
JSON_BODY

Example:

RESPONSE idh14sync/1.0
     
{
    "status": 200,
    "filename": "ZXhhbXBsZS50eHQ=",
    "checksum": "51182c5394952e8f6c52b6efcfde64259272a439",
    "content": "c29tZSBleGFtcGxlIGNvbnRuZXQ="
}

Requests

The protocol supports the following types of request that can be sent as a verb as well:

  • LIST
  • GET
  • PUT
  • DELETE

LIST requests a list of all files on the server side. This requests contains an empty JSON body.

Example request JSON:

{}

GET requests a specific file by it's name. The UTF-8 filename is sent in the JSON as a base64 string.

Example request JSON:

{
    "filename": "ZXhhbXBsZS50eHQ="
}

PUT sends a file to the server. Here, the JSON contains the following properties:

  • The base64 filename
  • The SHA-1 checksum of the file
  • The SHA-1 checksum of the file before it was modified (only if the file already existed)
  • The base64 content of the file

Example request JSON:

{
    "filename": "ZXhhbXBsZS50eHQ=",
    "checksum": "51182c5394952e8f6c52b6efcfde64259272a439=",
    "original_checksum": "fbdbf2bafc9835d2267140e33a506ac424de17db",
    "content": "c29tZSBleGFtcGxlIGNvbnRuZXQ="
}

DELETE removeds the file by it's name. Here, the checksum is also supplied so the server can check if the file to remove is the same version as the version that the client specified. If not, this will cause a conflict and the file cannot be removed.

Example request JSON:

{
    "filename": "ZXhhbXBsZS50eHQ=",
    "checksum": "51182c5394952e8f6c52b6efcfde64259272a439="
}

Responses

The protocol supports only a single verb called "RESPONSE". The client already knows the format of the JSON that is returned by the server, by the type of request that was originally made. Every type of response provides a status that describes if the request was succesfully processed and, if not, what the cause of the failure was.

The response of a LIST request contains a list of files present on the server. Each file contains a filename and checksum.

Example response JSON:

{
    "status": 200,
    "files": [
        {
            "filename": "ZXhhbXBsZS50eHQ=",
            "checksum": "51182c5394952e8f6c52b6efcfde64259272a439"
        }            
    ]
}

The response of a GET request contains the name, checksum and content of the file.

Example response JSON:

{
    "status": 200,
    "filename": "ZXhhbXBsZS50eHQ=",
    "checksum": "51182c5394952e8f6c52b6efcfde64259272a439",
    "content": "c29tZSBleGFtcGxlIGNvbnRuZXQ="
}

The response of a PUT or DELETE request, or a generic error, only contains the status and no other information.

Example response JSON:

{
    "status": 200
}

Status codes

The protocol supports the following statusses:

  • 200 (OK, when the request is processed without any issues)
  • 400 (BadRequest, when the request is in an incorrect format, for example)
  • 404 (NotFound, when a non-existing file is requested, for example)
  • 412 (FileConflict, when there is a checksum mismatch when synchronising, for example)
  • 500 (InternalServerError, when an unexpected error occured)

Limitations

While the protocol itself was very well specified in a lot of details, a choice was made in the early stages that caused a lot of limitations. It was decided to transfer data using requests in JSON format, where file content was to be converted to base64.

The problem that arose was that in order to process JSON, you would first have to read the JSON contents in it's entirety. Afterwards, the content would also have to be decoded from base64 into raw binary data. Only then are you able to write the file data to the file system. This of course would require a lot of memory. This made the protocol unsuitable for larger files. In fact, the parser we used for converting the JSON already had problems with files over 50 MB!

Possible improvement

A better idea would have been to first send the headers of a request, containg the name and size of the file. The end of the headers could then be made detectable by a special characters that marks the end of the header data. Everything after the header would be the raw binary data of the file to transfer. Here you would know how many bytes to receive since that would be described in the headers. You could then read the data using a buffer and write it directly to the file system.

The applications

Our group was responsible for building the applications that would work using the protocol the other group designed.

The applications were built using the .NET framework and are made up of server and client console applications, as well as two libraries.

Protocol library

This library contains the entire implementation of the protocol by making available the SyncServer and SyncClient classes.

The SyncServer uses a TcpListener to listen for new TCP connections that are started by a client. A new connection is handled on a new thread. There, the request will be read from the TCP stream according to the protocol specifications, validated and converted to an object to work with. Depending on the verb of the request, a corresponding callback method is triggered with the request object as an argument. These callback methods can be implemented by a server application to handle the request and create a proper response. Once a request has been handled or a (validation) error occured, the server will convert the response object and send the data back to the client by writing it to the TCP stream.

The SyncClient contains the methods for sending and request data to and from the server. When executing one of the methods, the client sets up a connection to the server using a TcpClient. This is done on the same thread. The request data is then created according to the protocol specifications and sent to the server by writing it to the TCP stream. The client then waits for a response from the server, reads the response data and converts this data to a response object to work with. A client application can use these methods for exchanging files with the server.

Shared library

This library contains shared utilities used by the server and client console applications. The lbirary contains a ChecksumManager for creating and storing checksums of synced files. It also contains a FileManager to work with the physical files that are exchanged between client and server.

Server console applicatie

The server console application implementes the SyncServer from the Protocol library. The application runs indefinitely and used the callbacks of the SyncServer class to process the requests and read and sync the physical files. The application only accepts a single command that can be used for shutting down the server. Other than that, the server requires no user input at all.

Client console applicatie

The client console application implements the SyncClient from the Protocol library. This application also runs indefinitely but accepts more commands from the user.

When starting the application, an initial sync is performed between client and server. This sync check the status of all files between the client and server and then calls the corresponding methods of the SyncClient to add, remove or replace files. Changes from the server are processed before the changes from the client side.

After the initial sync a FileSystemWatcher is used to detect any changes to the file system. When changes to files are detected a new sync is performed. This sync can also be forced by typing the "sync" command. If you only want to check the status of the files without syncing, the "list" command can be used instead.

Wesley Donker

Software Engineer

The Netherlands