Blog de Pierrick Le Gall

Aller au contenu | Aller au menu | Aller à la recherche

mercredi 24 janvier 2007

Oracle to standard output in UTF-8, with Perl

Perl Camel, FidoI have an Oracle database with UTF-8 data inside. In a Perl script, I want to extract, transform and print these data in STDOUT, the standard output. The only difference between this ticket entry and Oracle to file in UTF-8, with Perl is the destination of data, so to avoid redundancy, take the time to read the previous ticket.

Lire la suite

Oracle to file in UTF-8, with Perl

Perl Camel, FidoI have an Oracle database with UTF-8 data inside. In a Perl script, I want to extract, transform and load these data in a file.

Lire la suite

jeudi 18 janvier 2007

26 ans

Une nouvelle bougie. Comme prévu, l'année de mes 25 ans fut plutôt chargée. Notamment côté boulot avec 2 changements, d'abord chez Inéo media system puis Talend. Côté perso, Erwann marche depuis la fin de l'été et il a un vocabulaire d'une dizaine de mots (qu'à peu près seuls ses parents arrivent à comprendre). Erwann va avoir un petit frère (ou une petite soeur, c'est la surprise) d'ici la fin du mois de février. La nouvelle du moment, c'est l'achat de l'appartement. On devrait signer le compromis dans la semaine, et déménager cet été... au bout de la rue.

Pierrick, Erwann et Timon, janvier 2007

lundi 8 janvier 2007

Talend Open Studio 1.1.0 is out

TOS 1.1.0

Talend Open Studio release 1.1.0 is out. Exactly 3 months after release 1.0.0. This release contains many new features and of course many new Perl components. The list of new features is described on Freshmeat.

To give an example, TOS is now able to perform such a job:

  1. retrieve email files form a remote POP3 server
  2. extract informations from email headers (such as the "From" information)
  3. count the number of emails coming from the same author, with the new aggregate functions
  4. sort the result
  5. load result in bulk mode in a MySQL database

TOS can also read XML files with standard XPath queries, or even read/write LDIF files. Duplicates can be removed from a data flow.

To write components such as tAggregateRow or tSortRow, the 1.0 code generation model needed some improvements. Indeed, when you sort a list of lines, you need to first read all lines before outputing the first sorted line. This behaviour was not possible in TOS 1.0. We've implemented a system of virtual component. A virtual component hides a set of sub-components working altogether. This new technical feature of the Perl code generation model gives many possibilities to component writers.

For example, tSortRow is a virtual component hiding a tArray (filling a Perl array) and a tSortIn (sorting array values and outputing result). tSortIn starts its execution once tArray has finished to fill the Perl array. The first and second screenshots represent the same job.

Of course, there are many other new features in TOS 1.1, in this blog ticket I wanted to give information about the Perl part of TOS.

mercredi 3 janvier 2007

Whitelist generator with Talend Open Studio

I've written my first use case with Talend Open Studio : my purpose is to generate an email addresses whitelist based on the emails already accepted in my inbox. Using Talend Open Studio has saved me maybe 2 or 3 hours compared to a from scratch Perl script development. The generated Perl script is nearly as fast as if I had written the script specificaly for this task.

TOS use case 1 screenshot