LAUREL BRIDGE

LaurelBridge.DCF.Examples.CharacterSetEncoding Namespace

DICOM Connectivity Framework V3.4
The CharacterSetEncoding example demonstrates how to use DCF to perform character set operations and conversions for datasets with different Specific Character Sets (0008,0005).
Classes

  ClassDescription
Public classProgram

If you are working with multi-byte character sets, and/or other non-western locales, it is possible that you may need to encode, decode or reencode string values that are sensitive to the Specific Character Set (0008,0005) (SCS). If you are lucky, you can avoid much of this by encoding using the ISO_IR 192 character set, also know as UTF-8, which is able to encode all Unicode characters. If not so lucky, you may need to do character set conversions.

This example demonstrates how to determine the best SCS value for a given Dicom dataset by making alterations to the dataset and re-analyzing it after each change. The progression for this example is as follows:
  1. Given a ISO_IR 6 encoded dataset, insert an ASCII patient name and view the encoding analysis.
  2. Insert a japanese patient name and determine the best SCS, and re-encode the dataset.
  3. Add a chinese string element and determine the best SCS, and re-encode the dataset.
  4. Re-analyze the dataset using an encoding list that does not include ISO_IR 192.
  5. The final step in the program opens a notepad to view the output.

Each pass first encodes any SCS sensitive VRs (SH, LO, PN, ST, LT, UT) to ISO_IR 192. Then the dataset is re-encoded using the requested SCS set(s). The best SCS is determined by finding the first SCS encoding with no encoding failures. Failing that, the SCS with the least number of encoding failures is used.

The final example demonstates a case where an imperfect encoding is chosen. To reemphasize, if you can use ISO_IR 100 for western locales, and ISO_IR 192 for others much of the character set encoding complexity can be avoided.

Examples

CharacterSetEncoding Sample Code
public class Program
{
    private static readonly string _outputFilePath = "CharacterSetEncodingOutputText.txt";

    /// <summary>
    /// Main
    /// </summary>
    [SuppressMessage("ReSharper", "UnusedVariable")]
    public static void Main()
    {
        try
        {
            string inputFilename = "mr-knee.dcm";

            DicomDataSet dds;
            StringBuilder outputText = new StringBuilder();
            outputText.AppendFormat("Character Set Encoding Example:{0}", Environment.NewLine);
            using (DicomFileInput dfi = new DicomFileInput(inputFilename))
            {
                dds = dfi.ReadDataSet();
                dds.ExpandStreamingModeData(true);
            }

            // For the following examples, instead of throwing, replace any encoding or decoding with the 
            // hexadecimal representation of the characters that failed
            ElementOptions.Instance.EncoderFallbackStyle = EncodingFallbackStyle.XmlHex;
            ElementOptions.Instance.DecoderFallbackStyle = EncodingFallbackStyle.XmlHex;

            DicomDataSet isoIR6DataSet = InsertAsciiPersonNameToIsoIR6(outputText, dds);
            DicomDataSet japaneseDataSet = InsertJapanesePersonNameToIsoIR6(outputText, isoIR6DataSet);
            DicomDataSet mixedDataSet = InsertThaiPersonNameToJapanese(outputText, japaneseDataSet);
            DicomDataSet thaiHebrewDataSet = InsertThaiandHebrewPersonNameToJapanese(outputText, japaneseDataSet);

            WriteOutputToFile(_outputFilePath, outputText.ToString());

            Process.Start("notepad.exe", _outputFilePath);
        }
        catch (Exception e)
        {
            Console.WriteLine("Exception caught during execution: {0}", e);
        }

        if (Debugger.IsAttached)
        {
            Console.Write("Press any key to continue . . . ");
            Console.ReadKey();
        }
    }

    /// <summary>
    /// This method inserts an ASCII person name into the input ISO_IR 6 dataset. The resultant dataset does not need
    /// re-encoding, as made evident by the fact the recommended Specific Character Set to use matched the input.
    /// <para>
    /// NOTE: When the Specific Character Set is either missing or is the empty string, ISO_IR 6 is implied.
    /// </para>
    /// </summary>
    private static DicomDataSet InsertAsciiPersonNameToIsoIR6(StringBuilder outputText, DicomDataSet dds)
    {
        outputText.AppendFormat("{0}ISO_IR 6 to ISO_IR 6{0}==============================================={0}", Environment.NewLine);
        string origScs = !String.IsNullOrEmpty(dds.SpecificCharacterSet) ? dds.SpecificCharacterSet : "ISO_IR 6";
        outputText.AppendFormat("Initial Specific Character Set (0008,0005): \'{0}\'{1}", origScs, Environment.NewLine);

        // Before inserting the string element into the dataset, convert the dataset to unicode to make the insertion process trivial
        string isoIR6PersonName = "Doe^John";
        outputText.AppendFormat("Patient Name (0010,0010) to insert: {0}{1}", isoIR6PersonName, Environment.NewLine);
        dds.Insert(Tags.PatientName, isoIR6PersonName);

        // Determine the best Specific Character Set (0008,0005) to use for the updated Dicom dataset from the list of character sets.
        IList<string> isoir6Encodings = new List<string>() { null, "ISO_IR 100", "ISO_IR 166", "ISO_IR 192" };
        DicomDataSet result = EncodeDataSetWithBestScs(outputText, isoir6Encodings, dds);
        outputText.AppendFormat("Patient Name Re-encoded: {0}{1}",
            result.GetElement(Tags.PatientName).GetStringValueAt(0),
            Environment.NewLine);

        return result;
    }

    /// <summary>
    /// This method inserts a Japanese person name into the input ISO_IR 6 dataset by first converting the input dataset to ISO_IR 192 (Unicode). 
    /// The resultant dataset is than re-encoded using the recommended Specific Character Set (0008,0005) produced by analyzing each character set
    /// given in the list of character set encodings. The best specific character set value in this example was ISO 2022 IR 13\\ISO 2022 IR 87 and 
    /// was able to encode every character in the dataset properly.
    /// </summary>
    /// <para>
    /// NOTE: When the Specific Character Set is either missing or is the empty string \'\', ISO_IR 6 is implied.
    /// </para>
    private static DicomDataSet InsertJapanesePersonNameToIsoIR6(StringBuilder outputText, DicomDataSet dds)
    {
        outputText.AppendFormat("{0}ISO_IR 6 to Japanese{0}==============================================={0}", Environment.NewLine);
        string origScs = !String.IsNullOrEmpty(dds.SpecificCharacterSet) ? dds.SpecificCharacterSet : "ISO_IR 6";
        outputText.AppendFormat("Initial Specific Character Set (0008,0005): \'{0}\'{1}", origScs, Environment.NewLine);

        // Before inserting the string element into the dataset, convert the dataset to unicode to make the insertion process trivial
        string japanesePersonName = "ヤマダ^タロウ=山田^太郎=やまだ^たろう";
        outputText.AppendFormat("Patient Name (0010,0010) to insert: {0}{1}", japanesePersonName, Environment.NewLine);
        dds.Insert(Tags.PatientName, japanesePersonName);

        // Determine the best Specific Character Set (0008,0005) to use for the updated Dicom dataset from the list of character sets.
        IList<string> japaneseEncodings = new List<string>() { null, "\\ISO 2022 IR 87", "ISO 2022 IR 13\\ISO 2022 IR 87", "ISO_IR 192" };
        DicomDataSet result = EncodeDataSetWithBestScs(outputText, japaneseEncodings, dds);
        outputText.AppendFormat("Patient Name Re-encoded: {0}{1}",
            result.GetElement(Tags.PatientName).GetStringValueAt(0),
            Environment.NewLine);

        return result;
    }

    /// <summary>
    /// This method inserts a Thai person name into the input Japanese dataset by first converting the input dataset to ISO_IR 192 (Unicode). 
    /// The resultant dataset is than re-encoded using the recommended Specific Character Set (0008,0005) produced by analyzing each character set
    /// given in the list of character set encodings. The best Specific Character Set in this example turns out to be ISO_IR 192 because of the fact
    /// there was no other specific character set value that could properly encode all string elements, thus the need for unicode.
    /// </summary>
    /// <para>
    /// NOTE: When the Specific Character Set is either missing or is the empty string \'\', ISO_IR 6 is implied.
    /// </para>
    private static DicomDataSet InsertThaiPersonNameToJapanese(StringBuilder outputText, DicomDataSet dds)
    {
        outputText.AppendFormat("{0}Japanese to Mixed{0}==============================================={0}", Environment.NewLine);
        string origScs = !String.IsNullOrEmpty(dds.SpecificCharacterSet) ? dds.SpecificCharacterSet : "ISO_IR 6";
        outputText.AppendFormat("Initial Specific Character Set (0008,0005): \'{0}\'{1}", origScs, Environment.NewLine);

        // Before inserting the string element into the dataset, convert the dataset to unicode to make the insertion process trivial
        string thaiPersonName = "McIntai^Thongchai=แม็คอินไตย์^ธงไชย";
        outputText.AppendFormat("Performing Physician Name (0008,1050) to insert: {0}{1}", thaiPersonName, Environment.NewLine);
        dds.Insert(Tags.PerformingPhysicianName, thaiPersonName);

        // Determine the best Specific Character Set (0008,0005) to use for the updated Dicom dataset from the list of character sets.
        IList<string> mixedEncodings = new List<string>() { null, "\\ISO 2022 IR 87", "ISO 2022 IR 13\\ISO 2022 IR 87", "ISO_IR 192" };
        DicomDataSet result = EncodeDataSetWithBestScs(outputText, mixedEncodings, dds);
        outputText.AppendFormat("Patient Name re-encoded: {0}{1}",
            result.GetElement(Tags.PatientName).GetStringValueAt(0),
            Environment.NewLine);
        outputText.AppendFormat("Performing Physician Name re-encoded: {0}{1}",
            result.GetElement(Tags.PerformingPhysicianName).GetStringValueAt(0),
            Environment.NewLine);

        return result;
    }

    /// <summary>
    /// This method inserts a Thai and Hebrew person name into the input Japanese dataset by first converting the input dataset to ISO_IR 192 (Unicode). 
    /// The resultant dataset is than re-encoded using the recommended Specific Character Set (0008,0005) produced by analyzing each character set
    /// given in the list of character set encodings. In this case, there was no specific character set value that was able to encode every character
    /// in the dataset properly. Therefore, the best specific character set value was the value with the least number of encoding fallbacks(problems), 
    /// which turns out to be ISO_IR 166 for Thai. Note how the Hebrew string in this encoding was not decoded properly.
    /// <para>
    /// NOTE: When the Specific Character Set is either missing or is the empty string \'\', ISO_IR 6 is implied.
    /// </para>
    /// </summary>
    private static DicomDataSet InsertThaiandHebrewPersonNameToJapanese(StringBuilder outputText, DicomDataSet dds)
    {
        outputText.AppendFormat("{0}Mixed with No Unicode{0}==============================================={0}", Environment.NewLine);
        string origScs = !String.IsNullOrEmpty(dds.SpecificCharacterSet) ? dds.SpecificCharacterSet : "ISO_IR 6";
        outputText.AppendFormat("Initial Specific Character Set (0008,0005): \'{0}\'{1}", origScs, Environment.NewLine);

        // Before inserting the string element into the dataset, convert the dataset to unicode to make the insertion process trivial
        string hebrewPersonName = "Aleichem^Sholom==שלום^עליכם";
        string thaiPersonName = "McIntai^Thongchai=แม็คอินไตย์^ธงไชย";
        outputText.AppendFormat("Performing Physician Name (0008,1050) to insert: {0}{1}", hebrewPersonName, Environment.NewLine);
        dds.Insert(Tags.PatientName, hebrewPersonName);
        outputText.AppendFormat("Patient Name (0010,0010) to insert: {0}{1}", thaiPersonName, Environment.NewLine);
        dds.Insert(Tags.PerformingPhysicianName, thaiPersonName);

        // Determine the best Specific Character Set (0008,0005) to use for the updated Dicom dataset from the list of character sets.
        IList<string> mixedEncodings = new List<string>() { null, "ISO_IR 100", "ISO_IR 138", "ISO_IR 166" };
        DicomDataSet result = EncodeDataSetWithBestScs(outputText, mixedEncodings, dds);
        outputText.AppendFormat("Patient Name re-encoded: {0}{1}",
            result.GetElement(Tags.PatientName).GetStringValueAt(0),
            Environment.NewLine);
        outputText.AppendFormat("Performing Physician Name re-encoded: {0}{1}",
            result.GetElement(Tags.PerformingPhysicianName).GetStringValueAt(0),
            Environment.NewLine);

        return result;
    }

    /// <summary>
    /// This method analyzes each string element affected by the Specific Character Set (0008,0005) by encoding each string using the list of encodings.
    /// This analysis returns a score for each encoding, determining the best specific character set value based on the lowest number of fallback encodings.
    /// The unicode dataset is than re-encoded using this best specific character set value and returned.
    /// </summary>
    /// <param name="sb">Output string builder to keep track of results</param>
    /// <param name="encodings">The list of specific character set values to each test encoding the unicode dataset</param>
    /// <param name="dds">The unicode dataset to encode</param>
    /// <returns>The encoded dataset using the determined best specific character set value.</returns>
    private static DicomDataSet EncodeDataSetWithBestScs(StringBuilder sb, IList<string> encodings, DicomDataSet dds)
    {
        // Encode each string element affected by the specific character set using each of the encoding given in the list, keeping track of the 
        // number of fallbacks in the encoding that occur.
        EncodingResults analysisResults = EncodingUtils.AnalyzeEncodings(dds, encodings);

        // Output the results of the analysis of the encoding
        sb.AppendFormat("{0}Results from the encoding analysis:{0}", Environment.NewLine);
        foreach (EncodingScore score in analysisResults.Scores)
        {
            sb.AppendFormat("SCS Value: \'{0}\' Fallbacks: {1} Total Characters: {2}{3}", score.ScsValue, score.Fallbacks, score.TotalChars, Environment.NewLine);
        }

        sb.AppendFormat("{1}Recommended Specific Character Set (0008,0005): \'{0}\'{1}", analysisResults.Best.ScsValue, Environment.NewLine);
        string bestScsValue = analysisResults.Best.ScsValue != null ? analysisResults.Best.ScsValue : "ISO_IR 6";
        DicomDataSet reencodedDataSet = ReencodeDataSet(dds, bestScsValue);
        return reencodedDataSet;
    }

    /// <summary>
    /// Helper method that writes a given data set to the memory stream, returning the data set read from
    /// the memory stream. The purpose being to force the dataset to re-encode the necessary elements on write using 
    /// the given specific character set
    /// </summary>
    /// <param name="dds">The input data set that needs to be re-encoded</param>
    /// <param name="specificCharacterSet">The new specific character set to use to encode the dds with.</param>
    private static DicomDataSet ReencodeDataSet(DicomDataSet dds, string specificCharacterSet)
    {
        if (dds.SpecificCharacterSet != null && dds.SpecificCharacterSet.Equals(specificCharacterSet))
            return dds;

        // Clone the input dataset and change the specific character set in preparation for the re-encoding
        DicomDataSet clonedDataSet = (DicomDataSet)dds.Clone();
        clonedDataSet.Insert(Tags.SpecificCharacterSet, specificCharacterSet);

        DicomDataSet recodedDataSet;

        MemoryStream ms = new MemoryStream();
        DicomStreamWriter dsw = new DicomStreamWriter(ms, new DicomSessionSettings());
        using (DicomFileOutput dfo = new DicomFileOutput(dsw, Uids.ELE, null, null))
        {
            dfo.CreateChapter10Format = false;
            dfo.WriteDataSet(clonedDataSet);
        }

        ms = new MemoryStream(ms.ToArray());
        DicomStreamReader dsr = new DicomStreamReader(ms, new DicomSessionSettings());
        using (DicomFileInput dfi = new DicomFileInput(dsr, Uids.ELE, new DicomSessionSettings()))
        {
            recodedDataSet = dfi.ReadDataSet();
            recodedDataSet.ExpandStreamingModeData(true);
        }

        return recodedDataSet;
    }

    /// <summary>
    /// Write the given output text to disk in UTF8.
    /// </summary>
    /// <param name="outputFile">The output file path</param>
    /// <param name="outputText">The text to write</param>
    private static void WriteOutputToFile(string outputFile, string outputText)
    {
        using (StreamWriter writer = new StreamWriter(outputFile, false, Encoding.UTF8))
        {
            writer.Write(outputText);
        }
    }