1. Home
  2. Account and Permissions Mapping
  3. Account Mapping – Cleaning Up the Metadata CSV file

Account Mapping – Cleaning Up the Metadata CSV file

The csv files that contain the actual metadata for the migration frequently pose significant challenges due to the following issues:

  • compilation by a third party that does not completely understand the goal for creating the file
  • the data may have been collected over a long period of time, via many different systems with different formats for key fields, such as dates
  • the sheer size of the files makes it difficult to detect errors in the file

Below are some tips to help you sanitize the csv that contains the metadata before you upload, so that validation is a faster and easier process.

1: Getting Started

Open the file in Excel.  Save it as a csv (MS-DOS) file.

2: About Quotes

If you are working on a csv file in Excel, you don’t need to add quotes to any of the fields unless that field contains both a comma and a quote in the actual content.  Excel will add the quotes to delimit the fields automatically when the file is saved as a csv.  Adding quotes beyond what is essential can result in errors.  For example, if you edit a path /folder/myfile.txt to “/folder/myfile.txt”, the quotes will become part of the string, and the path will be invalid.

3: Trailing and Leading Spaces

Make sure that none of the headers contain trailing or leading spaces. While CFP preserves the spaces in the spreadsheet, this can create a lot of confusion later in the process.

4: Extra Columns

There should not be any data in columns that do not have headers.  Any data in columns that do not have headers suggests that there is an issue with the metadata for that particular row of the spreadsheet.  In many cases, an extra cell is in the affected row and that shifts all of the other data one or more cells to the right.  To check for extra columns:

  1. Go to the first cell after the last header.
  2. Click in that cell.
  3. Select “Add a Filter” in Excel
  4. If the filter dropown DOES show up in that cell, then there are unexpected long rows.
  5. if the filter dropdown DOES NOT show up in that cell, then there are no extra long rows, which is good.
  6. Investigate why you have extra data in any long rows.

5: Missing Required Fields

Using the filter we created while checking for extra columns, find each required field (Source Path and, in some cases, Source Owner) and check the filter to see if any fields are blank.

6: Check Field Types

Make sure that all number fields are numbers, date fields are dates, and date fields are all of a consistent format. String format is the most lenient format and should be used when possible.

  1. Add a temporary column to check each column’s format.
  2. Date Fields:  For a date field, create a column that contains the formula:
    “=ISERROR(DATE(DAY(A2),MONTH(A2),YEAR(A2)))”
    Where “A2” contains the data you want to validate. You’ll need to paste the formula into the remaining cells of the temp column so that all dates are validated with the formula.

    1. This function will return TRUE if the date is invalid.
    2. Filter this added column for anything that is TRUE.
  3. Number Fields:  For a number field, create a column that contains the formula:
    “=NOT(ISNUMBER(A1))”
    Where “A2” contains the data you want to validate.

    1. This function will return TRUE if the field is a string and not a number.
    2. Filter this added column for anything that is TRUE.
  4. ENUM Fields:  For an ENUM field, create a separate tab and put the list of allowed values in it. Then create a column that contains the formula:
    “=ISNA(LOOKUP(N4,Sheet1!A$1:A$2))”
    Where the first argument is the cell and the second argument is the list on the other sheet). Anywhere this is TRUE, the value is not an ENUM. Then filter this added column for anything where it is TRUE.

DELETE ALL TEMPORARY COLUMNS BEFORE UPLOADING THE CSV

7: Check for date formats

  1. When we map the CSV, we specify the format of the date, so look for dates that do not match that format. A word of caution about Excel: it will recognize several date formats as dates, and convert it on checkout.
  2. Since we looked at whether all dates were valid dates in the field type check above, then this is about making sure that the format in Excel is correct. Excel uses the date format for your locale when saving a CSV, so just make sure that is ok before doing the final save of the CSV.
Updated on April 25, 2019

Related Articles