function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
JotmenowJotmenow 

knowledge article base64 encoding[Image] to images

Hi Ohana,
We are in process of moving knowledge Articles.
Source HTML files have image references in Base64 code instead of actual file references. SF knowledge import tool is unable to parse Base64,expects image references.
Each HTML file has 15+ image references,with thousands of knowledge articles(HTML files). 
Appreciate solution here: Tool/code which can convert base64 to images and update html with image references or workarounds.
Thanks much in advance
Ajay
Timothy GurrolaTimothy Gurrola
Hello,

Here's a general outline of a potential solution:

Preparing the Environment:
Before you start, make sure you have a working environment with the necessary programming tools and libraries. You might need a programming language like Python, libraries for HTML parsing and image manipulation, and access to your source HTML files.

Parsing HTML Files:
Write a script in a programming language like Python that can parse each HTML file to locate the Base64-encoded image data. You can use libraries like Beautiful Soup for HTML parsing.

Converting Base64 to Images:
For each Base64-encoded image, you'll need to decode the Base64 data and save it as an image file (e.g., JPEG, PNG) in a designated directory. Libraries like Python's base64 and PIL (Pillow) can be helpful for this step.

Updating HTML Files:
As you decode each image and save it to the designated directory, update the HTML file to reference the newly saved image file instead of the Base64 data. Modify the <img> tags' src attributes to point to the new image file paths.

Handling Multiple Files:
Loop through all your HTML files, performing the parsing, decoding, and updating steps for each file.

Here's a simplified example in Python:
from bs4 import BeautifulSoup
import base64
from PIL import Image
import os

source_dir = "path/to/source/html/files"
output_dir = "path/to/output/images"

# Create the output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)

for filename in os.listdir(source_dir):
    if filename.endswith(".html"):
        with open(os.path.join(source_dir, filename), "r") as f:
            html_content = f.read()
        
        soup = BeautifulSoup(html_content, "html.parser")
        
        # Find all <img> tags in the HTML
        img_tags = soup.find_all("img")
        
        for img_tag in img_tags:
            if "src" in img_tag.attrs and img_tag["src"].startswith("data:image"):
                # Extract Base64 image data
                base64_data = img_tag["src"].split(",")[1]
                
                # Decode Base64 data and save as an image file
                image_data = base64.b64decode(base64_data)
                image_path = os.path.join(output_dir, f"image_{len(os.listdir(output_dir))}.png")
                
                with open(image_path, "wb") as img_file:
                    img_file.write(image_data)
                
                # Update the img tag's src attribute
                img_tag["src"] = image_path
                
        # Update the HTML file with modified img tags
        with open(os.path.join(output_dir, filename), "w") as f:
            f.write(soup.prettify())
Backup and Testing:
Before running the script on all your files, make sure to create backups of your original HTML files. Test the script on a smaller subset of files to ensure it works as expected.

Considerations:
Keep in mind that this is a simplified example. Depending on the complexity of your HTML files and the variability in the structure of the <img> tags, you might need to fine-tune the script. Additionally, handling error cases, such as different image formats and edge cases in HTML structure, is important. (https://www.mayoclinicpatientportals.com/)