Working With CSV Files

Connection strings

For details of connection strings to use with different versions of Excel, see ConnectionString Attribute.

Detecting data types

Any data surrounded by the text delimiter (double quotation marks [], is interpreted as text. For example, "42" is a text string, and not a number.

The OLE-DB driver for Text automatically assigns the data type of each column based on a sampling of 25 rows. This may generate issues in some cases. For example, if the 25 first rows of particular column include only numeric values, the column is now typed as numeric. If string values exist in the 26th row or beyond, either a fatal error will be generated or values will be discarded.

One possible way to solve the problem is to change the number of rows considered by the driver. Of course, this is only reliable if all rows in the CSV file are examined; but on very large files, this may have performance impacts.

To modify the number of rows Excel examines to determine the data type, use regedit to locate the following registry key:
  • For a 32-bit operating system:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text
  • For a 64-bit operating system:
    HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Text

In this key, the value for MaxScanRows defines the number of rows that the driver reads to determine the data type of a column. A value of zero causes the driver to read the full worksheet to evaluate column types.

Using Schema.ini

Columns names, data types, character set, and data conversions are specified for the Text OLE-DB driver by using a schema.ini file. This file contains the definition of the columns for any text file in the current directory and overwrites all other settings.

Microsoft Windows offers an easy way to generate a default schema.ini file based on the existing text files in a directory. For details, see Editing a Schema.ini File.

A CSV example

Assume that an asset.csv file, located in the temp directory, contains the following values:
Assetname, AssetSerialNumber, AssetPrice
"First Computer", "SerialNumber1", 1000
"Second Computer", "SerialNumber2", 2000
"Third Computer", "SerialNumber3", 3000
The corresponding XML adapter file used to load the assets into the compliance database is as follows:
<Import Name="ASSET"
        Type="CSV"
        ConnectionString="Provider=Microsoft.Jet.OLEDB.4.0;
            Data Source=c:\temp;
            Extended Properties='text;HDR=Yes;FMT=CSVDelimited'"
        Query="select * from [asset.csv]">
    <Log Name="NewLog"
        Output="file"
        Loglevel="debug"
        filename="IMPORT_NAME.log.txt"
    </Log>
    <Object Name="asset"
        Type="asset"
        Output="assetoutid"
        Update="True"
        Create="True">
        <Property
            Type="shortdescription"
            Name="Description"
            Value="AssetName"
            ValueType="FieldValue"
            UseForMatching="false">
        </Property>
        <Property
            Type="serialnumber"
            Name="Serial Number"
            Value="AssetSerialNumber"
            ValueType="FieldValue"
            UseForMatching="true">
        </Property>
        <Property
            Type="purchaseprice"
            Name="Purchase Price"
            Value="AssetPrice"
            ValueType="FixedValue"
            UseForMatching="false">
        </Property>
    </Object>
</Import>

IT Asset Management (Cloud)

Current