Participant %7: Team CSV, Universidad Central "Marta Abreu" de Las Villas #20

chenzimin · 2018-05-18T06:54:29Z

Created for Team CSV(@cesarsotovalero) from the Universidad Central "Marta Abreu" de Las Villas for discussions. Welcome!

monperrus · 2018-05-18T09:10:40Z

Excellent, welcome! What's your score on Dataset1?

cesarsotovalero · 2018-05-18T16:12:32Z

My current scores using just a very naive string comparison based approach:

Score on dataset1: 0.1236735
Score on dataset2: 0.1096176

No machine learning yet.

monperrus · 2018-05-21T07:08:03Z

Yes. The first 0.8 are easy to get (purely due to the data).

The remaining points are super hard.

Best score seen so far:

Dataset1: 0.114
Dataset2: 0.085

cesarsotovalero · 2018-05-29T15:06:44Z

My last scores:

Dataset	Perfect Match	Score
Dataset 1	3867	0.11842962430821
Dataset 2	9833	0.108660931336428
Dataset 3	17197	0.0753167732657934

My current approach: string matching + parse checking

A related paper: A comparison of code similarity analysers

chenzimin · 2018-05-30T08:57:55Z

Thanks, I have updated the rankings

monperrus · 2018-05-31T10:11:40Z

good scores, getting quite close to @tdurieux :-)

cesarsotovalero · 2018-08-21T22:00:37Z

Hi everyone, I want to give an update of my scores for the preliminary ranking:

Dataset	Perfect Match	Score
Dataset1	3900	0.1111243868013270
Dataset2	9948	0.0995737723246198
Dataset3	17438	0.0631975953292782
Dataset4	15773	0.0769219481612277

My current approach is: string matching + parse checking + decision rules + heuristics

monperrus · 2018-08-22T08:35:58Z

It seems that you beat @tdurieux!! Congrats. It's too late to be considered in the intermediate ranking, but it's really remarkable.

cesarsotovalero · 2018-08-22T08:51:36Z

Thanks @monperrus!!
However, my approach has some performance issues. For instance, it takes almost 2h for Dataset1, which is far from the performance results of @tdurieux. Also, I think the accuracy (in terms of the loss function) should be improved much more to really win the competition. I'll continue working on that.

tdurieux · 2018-08-22T09:10:37Z

Strangely my technique is still better for the dataset 2 but worse for the others.

I still have some room for improvement but I am very happy of the performance of my technique. It takes less than 10min to have the results on all datasets. That is helping a lot to try new improvements

chenzimin added the participant Participant of the CodRep-competition label May 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Participant %7: Team CSV, Universidad Central "Marta Abreu" de Las Villas #20

Participant %7: Team CSV, Universidad Central "Marta Abreu" de Las Villas #20

chenzimin commented May 18, 2018

monperrus commented May 18, 2018 via email

cesarsotovalero commented May 18, 2018 •

edited

Loading

monperrus commented May 21, 2018

cesarsotovalero commented May 29, 2018 •

edited

Loading

chenzimin commented May 30, 2018

monperrus commented May 31, 2018 via email

cesarsotovalero commented Aug 21, 2018 •

edited

Loading

monperrus commented Aug 22, 2018 via email

cesarsotovalero commented Aug 22, 2018 •

edited

Loading

tdurieux commented Aug 22, 2018 •

edited

Loading

Participant %7: Team CSV, Universidad Central "Marta Abreu" de Las Villas #20

Participant %7: Team CSV, Universidad Central "Marta Abreu" de Las Villas #20

Comments

chenzimin commented May 18, 2018

monperrus commented May 18, 2018 via email

cesarsotovalero commented May 18, 2018 • edited Loading

monperrus commented May 21, 2018

cesarsotovalero commented May 29, 2018 • edited Loading

chenzimin commented May 30, 2018

monperrus commented May 31, 2018 via email

cesarsotovalero commented Aug 21, 2018 • edited Loading

monperrus commented Aug 22, 2018 via email

cesarsotovalero commented Aug 22, 2018 • edited Loading

tdurieux commented Aug 22, 2018 • edited Loading

cesarsotovalero commented May 18, 2018 •

edited

Loading

cesarsotovalero commented May 29, 2018 •

edited

Loading

cesarsotovalero commented Aug 21, 2018 •

edited

Loading

cesarsotovalero commented Aug 22, 2018 •

edited

Loading

tdurieux commented Aug 22, 2018 •

edited

Loading