Convert Your Word Documents to Google Docs Effortlessly In Drive
Google Drive, Google Apps Script, Word Docs, Normalize Data
This tutorial uses Google Apps Script. If you need to review the Google Apps Script Primer, here it is: Take Me To The Primer
If you want to build tools to work with large document sets, you need to standardize the data. This piece of code was developed to streamline a curriculum analysis project. And, for a bit for foreshadowing (insert weird sound effect), this is one of the steps you can follow to send data to an AI.
That’s right, you can send an entire folder of documents to an AI if you can get the data into an AI friendly format.
Step 1: In Google Drive, make two folders
Folder 1, “DocX”
Open “DocX”, and make a folder inside of it called “BackUp”
Step 2: Grab the Folder ID numbers, and copy them into a text editor like Notepad, Notepad++, TextEdit, Gedit, etc.
This image is sourced from: https://alicekeeler.com/2015/08/16/pdf-my-google-drive-folder/
Step 3: Open Apps Scrip, and add these services
Step 4: Copy and paste the Google into the editor, and add your Folder IDs
// Initialize an array to store DOCX files
var docxFiles = [];
// ID for the target folder where the word documents are located
var targetFolderID = "YOUR FOLDER ID";
// ID for the backup folder where documents will be moved after processing
var backupFolderID = "YOUR FOLDER ID";
// Function to convert Word documents to Google Docs format and handle file backups
function convert_word_to_doc() {
// Retrieve the target folder using the stored ID
var folder = DriveApp.getFolderById(targetFolderID);
// Get all files of type Microsoft Word in the target folder
var files = folder.getFilesByType(MimeType.MICROSOFT_WORD);
var i = 0;
// Loop through each file found
while (files.hasNext()) {
var file = files.next();
// Ensure the file is a Microsoft Word document
if (file.getMimeType() == MimeType.MICROSOFT_WORD) {
var blob = file.getBlob();
var contentType = blob.getContentType();
// Check if the content type is specifically a DOCX document
if (contentType == "application/vnd.openxmlformats-officedocument.wordprocessingml.document") {
var fileName = file.getName();
var fileExtension = fileName.substring(fileName.lastIndexOf(".") + 1);
// Ensure the file extension is 'docx'
if (fileExtension.toLowerCase() == "docx") {
// Add the file to the array of DOCX files
docxFiles.push(file);
// Log the current file being processed
Logger.log(docxFiles[i]);
// Retrieve the file from Drive by its name (inefficient, consider improving)
var docx = DriveApp.getFilesByName(docxFiles[i].getName()).next();
var theParentFolder = docxFiles[i].getParents().next();
var officeBackup = DriveApp.getFolderById(backupFolderID);
var newDoc = Drive.newFile();
var blob = docxFiles[i].getBlob();
// Insert the file to Drive and convert it to Google Docs format
var file = Drive.Files.insert(newDoc, blob, {convert: true});
var fileId = file.id;
// Open the newly created Google Docs and rename it to match the original file
DocumentApp.openById(file.id).setName(docxFiles[i].getName());
// Move the new Google Docs file to the original file's location
var folderId = theParentFolder.getId();
var folder = DriveApp.getFolderById(folderId);
var filesN = DriveApp.getFileById(fileId);
filesN.moveTo(folder);
// Move the original DOCX document to the backup folder
var folderIdMove = officeBackup.getId();
var filesMove = docxFiles[i];
Logger.log(filesMove);
var folderM = DriveApp.getFolderById(folderIdMove);
filesMove.moveTo(folderM);
i++;
}
}
}
}
removeDocxFromString(targetFolderID);
}
/**
* Renames all files in the specified folder by removing the ".docx" string from their names.
*
* @param {string} folderId - The ID of the folder to process.
*/
function removeDocxFromString(folderId) {
// Retrieve the folder using the provided ID
var folder = DriveApp.getFolderById(folderId);
// Get all files within the folder
var files = folder.getFiles();
while (files.hasNext()) {
var file = files.next();
var fileName = file.getName();
// Check if the file name contains ".docx"
if (fileName.indexOf(".docx") > -1) {
// Create a new file name by replacing ".docx" with an empty string
var newFileName = fileName.replace(".docx", "");
// Rename the file with the new name
file.setName(newFileName);
}
}
}
Step 5: Load up the “DocX” folder with some Word DocX files
Step 6: Run the code using the convert_word_to_doc() function. You can also adjust this function to use parameters and eventually Script Properties to hide the IDs
Step 7: Whenever you run the script, the orginal documents are backed-up in the “BackUp” folder. It is perfectly safe to automate this process. Users can upload files to this folder, and all DocX files will be changed, the other files left alone. You can setup a simply Trigger like this:
That’s it, and this structure can be used for other types of documents, but some like PDFs are more complex.
Enjoy automating and normalizing data!
Copyright © Domain Seven LLC. All rights reserved.
For permissions to use or share any content behind our paywall, please email us at: tonydeprato@domain7.tech .