How to read huge CSV file in Mule -


i'am using mule studio 3.4.0 community edition. have big problem how parse large csv file incoming file endpoint. scenario have 3 csv files , putting files'content database. when try load huge file (about 144mb) "outofmemory" exception. thought solution divide/split large csv smaller size csvs (i don't know if solution best) o try find way process csv without throwing exception.

<file:connector name="file" autodelete="true" streaming="true" validateconnections="true" doc:name="file"/>  <flow name="csvtofile" doc:name="csvtofile">         <file:inbound-endpoint path="src/main/resources/inbox" movetodirectory="src/main/resources/processed"  responsetimeout="10000" doc:name="csv" connector-ref="file">             <file:filename-wildcard-filter pattern="*.csv" casesensitive="true"/>         </file:inbound-endpoint>         <component class="it.aizoon.grpbuyer.addmessageproperty" doc:name="add message property"/>         <choice doc:name="choice">             <when expression="invocation:nome_file=azienda" evaluator="header">                 <jdbc-ee:csv-to-maps-transformer delimiter="," mappingfile="src/main/resources/companies-csv-format.xml" ignorefirstrecord="true" doc:name="csv2azienda"/>                 <jdbc-ee:outbound-endpoint exchange-pattern="one-way" querykey="insertaziende" querytimeout="-1" connector-ref="jdbcconnector" doc:name="database azienda">                     <jdbc-ee:query key="insertaziende" value="insert aw006_azienda values (#[map-payload:aw006_id], #[map-payload:aw006_id_cliente], #[map-payload:aw006_ragione_sociale])"/>                 </jdbc-ee:outbound-endpoint>             </when>             <when expression="invocation:nome_file=servizi" evaluator="header">                 <jdbc-ee:csv-to-maps-transformer delimiter="," mappingfile="src/main/resources/services-csv-format.xml" ignorefirstrecord="true" doc:name="csv2servizi"/>                 <jdbc-ee:outbound-endpoint exchange-pattern="one-way" querykey="insertservizi" querytimeout="-1" connector-ref="jdbcconnector" doc:name="database servizi">                     <jdbc-ee:query key="insertservizi" value="insert ctrl_aemd_unb_servizi values (#[map-payload:ctrl_id_tipo_operazione], #[map-payload:ctrl_descrizione], #[map-payload:ctrl_cod_servizio])"/>                 </jdbc-ee:outbound-endpoint>             </when>             <when expression="invocation:nome_file=richiesta" evaluator="header">                 <jdbc-ee:csv-to-maps-transformer delimiter="," mappingfile="src/main/resources/requests-csv-format.xml" ignorefirstrecord="true" doc:name="csv2richiesta"/>                 <jdbc-ee:outbound-endpoint exchange-pattern="one-way" querykey="insertrichieste" querytimeout="-1" connector-ref="jdbcconnector" doc:name="database richiesta">                     <jdbc-ee:query key="insertrichieste" value="insert ctrl_aemd_unb_richiesta values (#[map-payload:ctrl_id_controller], #[map-payload:ctrl_num_rich_venditore], #[map-payload:ctrl_venditore], #[map-payload:ctrl_canale_venditore], #[map-payload:ctrl_codice_servizio], #[map-payload:ctrl_stato_avanz_servizio], #[map-payload:ctrl_data_inserimento])"/>                 </jdbc-ee:outbound-endpoint>             </when>         </choice>        </flow> 

please, not know how fix problem. in advance kind of help

as steves said, csv-to-maps-transformer might try load entire file memory before process it. can try split csv file in smaller parts , send parts vm processed individually. first, create component achieve first step:

public class csvreader implements callable{     @override     public object oncall(muleeventcontext eventcontext) throws exception {          inputstream filestream = (inputstream) eventcontext.getmessage().getpayload();         datainputstream ds = new datainputstream(filestream);         bufferedreader br = new bufferedreader(new inputstreamreader(ds));          muleclient muleclient = eventcontext.getmulecontext().getclient();          string line;         while ((line = br.readline()) != null) {             muleclient.dispatch("vm://in", line, null);         }          filestream.close();         return null;     } } 

then, split main flow in two

<file:connector name="file"      workdirectory="yourworkdirpath" autodelete="false" streaming="true"/>  <flow name="csvtofile" doc:name="split , dispatch">     <file:inbound-endpoint path="inboxpath"         movetodirectory="processedpath" pollingfrequency="60000"         doc:name="csv" connector-ref="file">         <file:filename-wildcard-filter pattern="*.csv"             casesensitive="true" />     </file:inbound-endpoint>     <component class="it.aizoon.grpbuyer.addmessageproperty" doc:name="add message property" />     <component class="com.dgonza.csvreader" doc:name="split file , dispatch every line vm" /> </flow>  <flow name="storeindatabase" doc:name="receive lines , store in database">     <vm:inbound-endpoint exchange-pattern="one-way"         path="in" doc:name="vm" />     <choice>         .         .         jdbc stuff         .         .     <choice /> </flow> 

maintain current file-connector configuration enable streaming. solution csv data can processed without need load entire file memory first. hth


Comments